This is Fine! A podcast about resilience engineering and software

Available Episodes

5 of 18

Lund University - Academic Theory and Practice
A huge thanks to our panelists:⁠John Allspaw⁠⁠Jed Needle⁠⁠Chad Todd⁠RISF and TiF will host a live follow up to this episode on July 31st! You can sign up here: ⁠https://resilienceinsoftware.org/networks/events/133948⁠If you’re interested in Lund’s Masters of Science program in Human Factors and Systems Safety, or any of their learning labs, you can check out more info here: ⁠https://www.humanfactors.lth.se/⁠⁠Adaptive Capacity Labs⁠ is how Jed was introduced to some of the concepts of LFI & Resilience Engineering, which eventually landed him at Lund.John mentioned SciShow Tangents, a podcast by Hank Green and Ceri Riley: ⁠https://www.youtube.com/c/scishowtangents⁠As well as Conway’s Law: https://en.wikipedia.org/wiki/Conway%27s_lawAnd Dunbar’s Number: ⁠https://en.wikipedia.org/wiki/Dunbar%27s_number⁠ And the Theory of Graceful Extensibility, which you can read about here: ⁠https://infoscience.epfl.ch/server/api/core/bitstreams/87cfe245-c138-43cb-87c9-4062dc1a0519/content⁠Lund theses list: https://www.humanfactors.lth.se/ny-sajt/msc-programme/msc-theses/Our panel’s select theses that they love:Colette’s pick: ⁠https://lup.lub.lu.se/student-papers/search/publication/9106422⁠Chad’s pick: ⁠https://lup.lub.lu.se/student-papers/search/publication/9009930⁠John’s picks were all of the software theses, I’m probably missing some but this is my attempt:John’s (was the first): ⁠https://lup.lub.lu.se/student-papers/search/publication/8084520⁠ J Paul Reed: ⁠https://lup.lub.lu.se/student-papers/search/publication/8966930⁠ Chad’s thesis on handovers in software: ⁠https://lup.lub.lu.se/student-papers/search/publication/9076274⁠ Michael Wettick: ⁠https://lup.lub.lu.se/student-papers/search/publication/9150096⁠ Colette’s thesis on QRA: ⁠https://lup.lub.lu.se/student-papers/search/publication/9148570⁠Jessica De Vita: ⁠https://lup.lub.lu.se/student-papers/search/publication/9149521⁠ Dr. Raymer’s I want to Treat the Patient and Not the Alarm: ⁠https://lup.lub.lu.se/student-papers/search/publication/2861164⁠
--------
1:04:32
--------
1:04:32
What’s the ROI on Reliability and Resilience work?
Dave Wood’s Talk at SRECon 25 was on Complexification and SRE: https://www.youtube.com/watch?v=lmBvUJnGUX4Jens Rasmussen’s model - Is really well explained by Richard Cook’s talk at Velocity: https://www.youtube.com/watch?v=PGLYEDpNu60&t=3sLorin’s blog also has a good summary: https://surfingcomplexity.blog/2021/05/31/transgressing-the-boundaries-rasmussen-and-woods/And finally, Jens Rasmussen’s original paper on the subject: Risk Management in a Dynamic Society https://linkinghub.elsevier.com/retrieve/pii/S0925753597000520SRECon 25 talk on Incident Metrics that Matter that was awesome - https://www.youtube.com/watch?v=QrR2SvpWvdgWant to read about how things are getting a bit fash-y in tech these days?https://www.newyorker.com/culture/infinite-scroll/techno-fascism-comes-to-america-elon-muskhttps://www.theguardian.com/technology/ng-interactive/2025/jan/29/silicon-valley-rightwing-technofascismPerrow/Normal Accidents: https://bookshop.org/p/books/normal-accidents-living-with-high-risk-technologies-updated-edition-revised-charles-perrow/10369279?ean=9780691004129High Reliability Organizations (HROs):Started (ish) with “A Rejoinder to Perrow” https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-5973.1994.tb00047.xAnd you can find Rochlin & La Porte behind a lot of the early writing on HROs, including https://www.jstor.org/stable/44637690?seq=1 and https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1468-5973.1996.tb00078.xAs well as Weick and Sutcliffe: https://bookshop.org/p/books/managing-the-unexpected-sustained-performance-in-a-complex-world-kathleen-m-sutcliffe/11267666?ean=9781118862414&https://journals.sagepub.com/doi/10.2307/41165243
--------
58:03
--------
58:03
Runbooks: the Good, Bad and Ugly w/special guest Andrew Hatch
You can register for the After-the-Episode chat with Andrew at https://resilienceinsoftware.org/networks/events/129997Tickets are free for members, $10 for non-members. You can join the Foundation at https://resilienceinsoftware.org/signup Zuul is what Volvo uses for their CI, and it’s part of the OpenInfra Foundation, it’s rad.You can find Andrew on LinkedIn here.
--------
53:37
--------
53:37
What is an incident? How come no one declare them?
Michael Wettick’s Lund thesis is great, and Laura Maguire’s paper on the Costs of Coordination that is a shortened version of her dissertation is worth a read!Clint’s SRECon talk that he mentioned a couple times: https://www.youtube.com/watch?v=k4UaDDkLOhwLorin wrote a great article on incidents and improvisation: https://surfingcomplexity.blog/2023/06/11/when-theres-no-plan-for-this-scenario-youve-got-to-improvise/Incident.io and the people who work there have hilarious LinkedIn posts about how people use incidents in their org.We talked about BlackRock3 who do incident command training: https://www.blackrock3.com/Brent Chapman has also done great incident command training and has done some talks on why IT incident management can learn from fire/emergency response management processes.We have a LinkedIn! https://www.linkedin.com/company/this-is-fine-a-podcast-about-software-and-resilience-engineering/And you can ask us questions here: https://forms.gle/rggrbGG6aFVrgZsv9
--------
55:08
--------
55:08
Chaos Engineering w/special guest Casey Rosenthal
The O’Reilly book on Chaos Engineering by Casey and Nora Jones is here: https://www.oreilly.com/library/view/chaos-engineering/9781492043850/Some of the Netflix posts introducing Chaos Monkey and Simian Army are here and here.You can see Lorin Hochstein talking about Chaos Engineering at Netflix here.The Void is an awesome collection of information on incidents throughout tech and you can find it here.Casey mentioned Rasmussen’s model. Lorin has a great summary of that on his blog, but you can read the original paper by Rasmussen introducing this model here.A report on the Netflix outage during Christmas of 2012.A reminder - you can ask us questions for the podcast at www.thisisfinepod.com
--------
48:13
--------
48:13

More Technology podcasts

Trending Technology podcasts

About This is Fine! A podcast about resilience engineering and software

A podcast about resilience engineering and software. Ever wondered why things on the internet break? Do you work in software and wish that you could have a Dear-Abby-Like call-in show that could answer your deepest questions about how to make your workplace suck less? We're here to help! Write us anonymously at our open question form Email us at: [email protected] Call us and leave a voicemail, or text us at: ‪(401) 592-7574‬

Podcast website

Technology

Listen to This is Fine! A podcast about resilience engineering and software, Lex Fridman Podcast and many other podcasts from around the world with the radio.net app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Open app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

This is Fine! A podcast about resilience engineering and software

Scan code,
download the app,
start listening.