PodcastsTechnologyThis is Fine! A podcast about resilience engineering and software

This is Fine! A podcast about resilience engineering and software

Colette Alexander and Clint Byrum
This is Fine! A podcast about resilience engineering and software
Latest episode

34 episodes

  • This is Fine! A podcast about resilience engineering and software

    Paper Club: Two Years Before the Mast w/special guest eric dobbs

    04/05/2026 | 44 mins.
    Mitchell Hashimoto’s post on leaving Github: https://mitchellh.com/writing/ghostty-leaving-github

    The Reddit post on Github’s availability historically (that we find questionable): https://www.reddit.com/r/github/comments/1rnvhs9/githubs_historic_downtime_scraped_and_plotted/

    A reminder, the Messy 9 are: congestion, cascade, conflict, lag, saturation, friction, tempo, surprise, tangles

    We have sometimes loved his stuff, but Gergely is annoying us with these posts: https://newsletter.pragmaticengineer.com/p/the-pulse-is-github-still-best-for?r=78c7k&utm_medium=email

    https://x.com/GergelyOrosz/status/2048017382036082706

    You can find the RISF store with Hindsight Bias merch here: https://www.bonfire.com/store/risf/

    You can find a copy of Richard Cook’s Two Years Before the Mast at Lorin’s Blog: https://surfingcomplexity.blog/wp-content/uploads/2026/03/twoyearsbeforethemast.pdf

    A reminder, Richard Cook’s How Complex Systems Fail can be found at http://how.complexsystems.fail

    Some writing on the 1996 Annenberg conference: https://www.researchgate.net/publication/351953417_Coming_Together_The

    Folk models paper (not by Woods, by Dekker and Hollnagel), which is specifically targeting Situational Awareness as being a folk model: https://link.springer.com/article/10.1007/s10111-003-0136-9

    Some stuff about SNAFU Catchers: https://www.snafucatchers.com/
    And https://snafucatchers.github.io/

    Eric referenced our conversation with Beth Long about Building and Revising Adaptive Capacity, which she co-wrote with Richard Cook about New Relic’s real-life example of resilience engineering: https://youtu.be/A_rU4-M61Hk and https://www.sciencedirect.com/science/article/abs/pii/S0003687020301903?via%3Dihub for the paper

    Erik Hollnagel’s RAG get’s referenced: https://erikhollnagel.com/onewebmedia/RAG%20Outline%20V2.pdf

    Once again, we link you to Lorin’s Law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/

    Eric is referencing Lund, that is their Human Factors and Systems Safety program: https://www.humanfactors.lth.se/

    Check out Crisis Engineering! https://crisisengineering.layeraleph.com/crisis-engineering-the-book/

    The upcoming RISF event on Practice of Practice Gamelan: https://resilienceinsoftware.org/events/245030
  • This is Fine! A podcast about resilience engineering and software

    SRECon Americas 2026 recap

    14/04/2026 | 55 mins.
    Colette’s talk at SRECon intro: https://www.usenix.org/conference/srecon26americas/presentation/alexander

    Clint’s talk at SRECon intro: https://www.usenix.org/conference/srecon26americas/presentation/byrum

    Dan Slimmon is an excellent engineer (per Clint’s shoutout) and ALSO an excellent podcast creator/host: https://techblows.net/

    Michelle Brush’s Keynote summary is here: https://www.usenix.org/conference/srecon26americas/presentation/brush

    Jevon’s Paradox: https://en.wikipedia.org/wiki/Jevons_paradox

    Dr. Nicole Forsgren’s talk summary: https://www.usenix.org/conference/srecon26americas/presentation/forsgren

    DORA is always worth a dive into if you haven’t taken a look yet: https://dora.dev/

    The blog post Colette mentioned comparing AI gold rush to Mao’s Revolution: https://leehanchung.github.io/blogs/2026/04/05/the-ai-great-leap-forward/

    Many people have written about why MTTR is a bad metric to track, you can read a write up from Adrian Hornsby here: https://newsletter.resiliumlabs.com/p/mttr-problems-better-incident-metrics

    And watch the OG, Courtney Nash, speak about it here: https://www.youtube.com/watch?v=uhCgBOHo8EY

    Beth Long’s SRE Soundbath: https://www.usenix.org/conference/srecon26americas/presentation/long

    Vanessa Huerta-Granda’s talk is summarized here: https://www.usenix.org/conference/srecon26americas/presentation/huerta-granda

    Martin Smith and Abe Hoffman’s talk is summarized here: https://www.usenix.org/conference/srecon26americas/presentation/hoffman

    Some information about Metrist: https://vault42consulting.com/about/portfolio/metrist

    AI Agents Good Bad and Ugly talk: https://www.usenix.org/conference/srecon26americas/presentation/budichenko

    The CAST talk: https://www.usenix.org/conference/srecon26americas/presentation/barroso

    Engineering a Safer World by Nancy Leveson is worth a look: https://bookshop.org/p/books/engineering-a-safer-world-systems-thinking-applied-to-safety-nancy-g-leveson/57b01ef464f9f81b?ean=9780262533690&next=t

    Erik Hollnagel wrote the book on FRAM and it has a lot of support in the safety world across industries: https://functionalresonance.com/ and https://etn-peter.eu/2021/02/11/fram-in-a-nutshell/ are good resources.

    Daria Barteneva’s closing keynote on game theory and SRE was great: https://www.usenix.org/conference/srecon26americas/presentation/barteneva

    Some good stuff on Above the Line/Below the Line, if you’re curious:
    https://queue.acm.org/detail.cfm?id=3380777

    https://www.youtube.com/watch?v=xA5U85LSk0M

    Lorin Hochstein’s closing keynote on storytelling was rad: https://www.usenix.org/conference/srecon26americas/presentation/hochstein

    SRECon EMEA 2026 (in Dublin) has their CFP up: https://www.usenix.org/conference/srecon26emea/call-for-participation

    As always, you can check out the Resilience in Software Foundation at resilienceinsoftware.org
  • This is Fine! A podcast about resilience engineering and software

    The 2025 DORA Report w/special guest Fred Hebert

    12/03/2026 | 59 mins.
    You can find the 2025 DORA Report here: https://dora.dev/research/2025/dora-report/

    Read more of Fred’s work/opinions here: https://ferd.ca/

    If you want to know more about Lund’s Human Factors and Systems Safety program, you can read here: https://www.humanfactors.lth.se/

    DORA has some good writeups of generative leadership and Westrum’s model here: https://dora.dev/capabilities/generative-organizational-culture/

    We can reset the counter, it’s been 0 episodes since we mentioned Lorin’s Law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/

    Fred writes well about the Law of Stretched Systems: https://ferd.ca/the-law-of-stretched-cognitive-systems.html

    We’re still trying to schedule a DORA event with our friends who make the report, but keep an eye out on https://resilienceinsoftware.org/events - it will pop up there when we do!
  • This is Fine! A podcast about resilience engineering and software

    Building and Revising Adaptive Capacity Sharing for Technical Incident Response with Beth Adele Long

    26/02/2026 | 1h 9 mins.
    The Keewenaw snow gauge that Colette mentioned is a tourist attraction. If you want to see where measurements are at for the season you can find them here: https://www.pasty.com/snow/

    The paper we’re talking about today can be found here: https://www.sciencedirect.com/science/article/abs/pii/S0003687020301903

    If you want to know more about SNAFU Catchers, you can see their website here: https://www.snafucatchers.com/

    They produced the STELLA report: https://snafucatchers.github.io/

    Richard Cook’s Bone Talk is kind of famous - here’s a version from REDeploy: https://www.youtube.com/watch?v=8LbePBiOvZ4

    Some writing from New Relic about NERFs: https://newrelic.com/blog/observability/best-practices-incident-commander-training

    We failed to mention it in the podcast itself, but Michael Wettick did a great thesis at Lund on asking for help in software operations incidents: https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9150096&fileOId=9150099

    Speaking of Hitchhiker’s Guide, etsy has some cool merch: https://www.etsy.com/listing/1071043200/dont-panic-hitchhikers-guide-to-the

    You can find David Woods’ paper on Graceful Extensibility here: https://link.springer.com/article/10.1007/s10669-018-9708-3

    Our Paper Club event on this paper on March 17th can be signed up for here: https://resilienceinsoftware.org/events/164680
  • This is Fine! A podcast about resilience engineering and software

    Outsourcing and Resilience

    12/02/2026 | 41 mins.
    Colette mentioned Menlo Innovations https://menloinnovations.com/ and Atomic Object https://atomicobject.com/ who both build custom software for folks. The CEO of Menlo is Richard Sheridan who wrote Joy, Inc. - https://bookshop.org/p/books/joy-inc-how-we-built-a-workplace-people-love-richard-sheridan/7677689?ean=9781591847120&next=t

    Chad Todd’s thesis on Handovers in Software Operations is worth a read: https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9076274&fileOId=9076276

    Clint refers to Zingerman’s and their servant leadership model, one of Colette’s favorite places to learn about leadership from. If you want to know more, go to https://www.zingtrain.com/ and in particular, read https://shop.zingtrain.com/products/a-lapsed-anarchists-approach-to-being-a-better-leader

More Technology podcasts

About This is Fine! A podcast about resilience engineering and software

A podcast about resilience engineering and software. Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help! Write us anonymously at our open question form Email us at: [email protected] Call us and leave a voicemail, or text us at: ‪(401) 592-7574‬
Podcast website

Listen to This is Fine! A podcast about resilience engineering and software, Acquired and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features