

Incident Status: On Hold w/special guest Will Gallego
28/11/2025 | 42 mins.
Mentioned multiple times, Em Ruppe’s amazing talk on incident severity: https://www.usenix.org/conference/srecon24americas/presentation/ruppeWe talk about the RIS Slack sometimes - you can join us in the slack, by joining the Foundation here: https://resilienceinsoftware.org/Please ask us a question at thisisfinepod.com

Complex Systems and the Messy Nine w/special guests Dave Woods and John Allspaw
13/11/2025 | 1h 8 mins.
The writeup on the AWS outage from AWS themselves, if you haven’t seen it: https://aws.amazon.com/message/101925/Dave’s department at OSU, Cognitive Systems Engineering: https://ise.osu.edu/human-systems-integration/cognitive-systems-engineering is a part of the larger Integrated Systems Engineering school: https://ise.osu.edu/human-systems-integration Dave was talking early on about the discussion on the war on expertise, it was this webinar through the NDM association: https://vimeo.com/1129606494?fl=pl&fe=sh&mc_cid=c807a504fbDave was a part of the Paul Feltovich got a shout out - he wrote a lot, but one of the best is with Gary Klein on Common Ground and Coordination in Joint Activity: https://www.academia.edu/download/31764257/Common_Ground_Single.pdfAnd Studies of Expertise from Psychological Perspectives: https://www.researchgate.net/profile/Paul-J-Feltovich/publication/200772882_Studies_of_expertise_from_psychological_perspectives/links/58bd18b2aca27261e528de07/Studies-of-Expertise-from-Psychological-Perspectives.pdfDave mentions his “Command-Adapt Paradox chapter” - you can find that here: https://library.oapen.org/bitstream/handle/20.500.12657/88327/1/978-3-031-45055-6.pdf#page=77Shout out to Norbert Weiner, the godfather of cybernetics: https://www.jstor.org/stable/24945913For just two studies on how private equity in hospitals causes worse outcomes for patients you can see: https://hsph.harvard.edu/news/private-equitys-appetite-for-hospitals-may-put-patients-at-risk/Andhttps://www.sciencedirect.com/science/article/pii/S0304405X25001151Dave talks a bit about saturation and crossing boundaries towards failure - it’s worth familiarizing yourself with Rasmussen’s boundary model - Lorin Hochstein writes a good summary over at his blog: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/Dave also mentions graceful extensibility - this is a concept he’s written quite a bit about, you can start here: https://link.springer.com/article/10.1007/s10669-018-9708-3Shout out to Slight Reliability: https://slightreliability.com/One of the great Woods/Cook write ups on anticipation in anesthesiology: https://www.sciencedirect.com/science/article/pii/S0952818096900094In case you’re unfamiliar with the Chicago Seven: https://en.wikipedia.org/wiki/Chicago_SevenThe Messy 9 are:congestioncascadeconflictlagsaturationfrictiontemposurprisetanglesKeep an eye on the merch store over at https://www.bonfire.com/store/risf/ if you want the t-shirt.

All the things about Incident Command
30/10/2025 | 37 mins.
It’s Spamton G (not J) Spamton, Clint! Get hip to the game characters! https://deltarune.fandom.com/wiki/SpamtonThere are a couple of incident command trainers out there who tend to get recommended in the tech world (that we know of): https://www.blackrock3.com/ and Great Circle: https://greatcircle.com/im/

Root Cause Analysis vs. Resilience Engineering w/special guest Lorin Hochstein
16/10/2025 | 59 mins.
A history of the 5 whys and root cause analysis from papersSome critiques of the 5 whys:From John Allspaw: https://www.oreilly.com/radar/the-infinite-hows/From Alan J Card: https://qualitysafety.bmj.com/content/26/8/671James Reason and the Swiss Cheese Model: https://pmc.ncbi.nlm.nih.gov/articles/PMC8514562/James Reason’s book Human Error: https://bookshop.org/p/books/human-error/9e06d8a100a07537?ean=9780521314190&next=tAnd a classic from Sidney Dekker (et al.) on the implication of complexity within safety investigations:https://www.sciencedirect.com/science/article/abs/pii/S0925753511000105?via%3DihubWe always recommend the Howie Guide: https://howie-guide.pagerduty.com/STAMP is starting to get popular: https://functionalsafetyengineer.com/introduction-to-stamp/Google’s STAMP paper: https://www.usenix.org/publications/loginonline/evolution-sre-googleGoogle’s STAMP discussion on ProdCast: https://sre.google/prodcast/#season4-episode7And presentation at SRECon: https://www.usenix.org/conference/srecon25americas/presentation/kleinNancy Leveson’s google scholar is always worth browsing: https://scholar.google.com/citations?user=78y4sEcAAAAJ&hl=enAllspaw’s LinkedIn post that we quoted: https://www.linkedin.com/posts/jallspaw_important-reminders-about-learning-effectively-activity-7378775591447183360-c_eDLorin’s Law: https://surfingcomplexity.blog/2017/06/24/a-conjecture-on-why-reliable-systems-fail/Want to talk more about this subject? We’re doing a live event co-sponsored by RISF and you can sign up for it here: https://resilienceinsoftware.org/networks/events/146485

First Stories/Second Stories
02/10/2025 | 52 mins.
More robustness than resilience, but worth repeating that you should always check your earthquake go-bag: https://www.earthquakeauthority.com/blog/2019/how-to-make-an-earthquake-emergency-kitClint did ASA 103: https://americansailing.com/learn-to-sail/certifications/asa-103-coastal-cruising/Since this is a science podcast, there is a scientific reason people get emotional on airplanes: https://www.cntraveler.com/story/why-do-we-always-cry-on-planes52 Hertz Whale documentary: https://en.wikipedia.org/wiki/The_Loneliest_Whale:_The_Search_for_52And Leslie Jamison wrote 52 Blue as a chapter in one of her essay collections (you can read it excerpted here: https://slate.com/technology/2014/08/52-blue-the-loneliest-whale-in-the-world.html )Colette was wrong, Jamison referenced a famous Kathryn Schulz piece in one of her own essays, which was the source of confusion - The Big One: https://www.newyorker.com/magazine/2015/07/20/the-really-big-one about a cataclysmic earthquake on the west coast. In case you’re curious, Colette uses scholar.google.com and paperpile.com shamelessly live.We reference A Tale of Two Stories: Contrasting View of Patient Safety by Richard Cook and Dave Woods: https://www.researchgate.net/publication/245102691_A_Tale_of_Two_Stories_Contrasting_Views_of_Patient_Safety?enrichId=rgreq-a699511fb5bc518bf1584a0a6613d8d0-XXX&enrichSource=Y292ZXJQYWdlOzI0NTEwMjY5MTtBUzoyMDYyMjM2NjExMTMzNDdAMTQyNjE3ODk2MDQ4NA%3D%3D&el=1_x_2&_esc=publicationCoverPdfThe Beaumaiden report (that dives into a deeper, second story) is here: https://dmaib.com/reports/2021/beaumaiden-grounding-on-18-october-2021We will continue to point to DORA’s organizational model page: https://dora.dev/capabilities/generative-organizational-culture/Some Wikipedia on double loop learning: https://en.wikipedia.org/wiki/Double-loop_learningColette mentioned Mads Møller’s Lund HFSS thesis on deaths and accountability: https://lup.lub.lu.se/student-papers/search/publication/9106422And Bram Couteaux’s Lund HFSS thesis on the drunk flight attendants/pilots court: https://lup.lub.lu.se/student-papers/search/publication/9111661J Paul Reed wrote about being ‘Blame Aware’ - https://medium.com/@jpaulreed/why-blameless-postmortems-might-feel-wrong-cbeee00d51b2



This is Fine! A podcast about resilience engineering and software