PodcastsScienceAXRP - the AI X-risk Research Podcast

AXRP - the AI X-risk Research Podcast

Daniel Filan
AXRP - the AI X-risk Research Podcast
Latest episode

60 episodes

  • AXRP - the AI X-risk Research Podcast

    47 - David Rein on METR Time Horizons

    02/1/2026 | 1h 47 mins.
    When METR says something like "Claude Opus 4.5 has a 50% time horizon of 4 hours and 50 minutes", what does that mean? In this episode David Rein, METR researcher and co-author of the paper "Measuring AI ability to complete long tasks", talks about METR's work on measuring time horizons, the methodology behind those numbers, and what work remains to be done in this domain.
    Patreon: https://www.patreon.com/axrpodcast
    Ko-fi: https://ko-fi.com/axrpodcast
    Transcript: https://axrp.net/episode/2026/01/03/episode-47-david-rein-metr-time-horizons.html
     
    Topics we discuss, and timestamps:
    0:00:32 Measuring AI Ability to Complete Long Tasks
    0:10:54 The meaning of "task length"
    0:19:27 Examples of intermediate and hard tasks
    0:25:12 Why the software engineering focus
    0:32:17 Why task length as difficulty measure
    0:46:32 Is AI progress going superexponential?
    0:50:58 Is AI progress due to increased cost to run models?
    0:54:45 Why METR measures model capabilities
    1:04:10 How time horizons relate to recursive self-improvement
    1:12:58 Cost of estimating time horizons
    1:16:23 Task realism vs mimicking important task features
    1:19:50 Excursus on "Inventing Temperature"
    1:25:46 Return to task realism discussion
    1:33:53 Open questions on time horizons
     
    Links for METR:
    Main website: https://metr.org/
    X/Twitter account: https://x.com/METR_Evals/
     
    Research we discuss:
    Measuring AI Ability to Complete Long Tasks: https://arxiv.org/abs/2503.14499
    RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts: https://arxiv.org/abs/2411.15114
    HCAST: Human-Calibrated Autonomy Software Tasks: https://arxiv.org/abs/2503.17354
    Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity: https://arxiv.org/abs/2507.09089
    Anthropic Economic Index: Tracking AI's role in the US and global economy: https://www.anthropic.com/research/anthropic-economic-index-september-2025-report
    Bridging RL Theory and Practice with the Effective Horizon (i.e. the Cassidy Laidlaw paper): https://arxiv.org/abs/2304.09853
    How Does Time Horizon Vary Across Domains?: https://metr.org/blog/2025-07-14-how-does-time-horizon-vary-across-domains/
    Inventing Temperature: https://global.oup.com/academic/product/inventing-temperature-9780195337389
    Is there a Half-Life for the Success Rates of AI Agents? (by Toby Ord): https://www.tobyord.com/writing/half-life
    Lawrence Chan's response to the above: https://nitter.net/justanotherlaw/status/1920254586771710009
    AI Task Length Horizons in Offensive Cybersecurity: https://sean-peters-au.github.io/2025/07/02/ai-task-length-horizons-in-offensive-cybersecurity.html
     
    Episode art by Hamish Doodles: hamishdoodles.com
  • AXRP - the AI X-risk Research Podcast

    46 - Tom Davidson on AI-enabled Coups

    07/8/2025 | 2h 5 mins.
    Could AI enable a small group to gain power over a large country, and lock in their power permanently? Often, people worried about catastrophic risks from AI have been concerned with misalignment risks. In this episode, Tom Davidson talks about a risk that could be comparably important: that of AI-enabled coups.
    Patreon: https://www.patreon.com/axrpodcast
    Ko-fi: https://ko-fi.com/axrpodcast
    Transcript: https://axrp.net/episode/2025/08/07/episode-46-tom-davidson-ai-enabled-coups.html
     
    Topics we discuss, and timestamps:
    0:00:35 How to stage a coup without AI
    0:16:17 Why AI might enable coups
    0:33:29 How bad AI-enabled coups are
    0:37:28 Executive coups with singularly loyal AIs
    0:48:35 Executive coups with exclusive access to AI
    0:54:41 Corporate AI-enabled coups
    0:57:56 Secret loyalty and misalignment in corporate coups
    1:11:39 Likelihood of different types of AI-enabled coups
    1:25:52 How to prevent AI-enabled coups
    1:33:43 Downsides of AIs loyal to the law
    1:41:06 Cultural shifts vs individual action
    1:45:53 Technical research to prevent AI-enabled coups
    1:51:40 Non-technical research to prevent AI-enabled coups
    1:58:17 Forethought
    2:03:03 Following Tom's and Forethought's research
     
    Links for Tom and Forethought:
    Tom on X / Twitter: https://x.com/tomdavidsonx
    Tom on LessWrong: https://www.lesswrong.com/users/tom-davidson-1
    Forethought Substack: https://newsletter.forethought.org/
    Will MacAskill on X / Twitter: https://x.com/willmacaskill
    Will MacAskill on LessWrong: https://www.lesswrong.com/users/wdmacaskill
     
    Research we discuss:
    AI-Enabled Coups: How a Small Group Could Use AI to Seize Power: https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power
    Seizing Power: The Strategic Logic of Military Coups, by Naunihal Singh: https://muse.jhu.edu/book/31450
    Experiment using AI-generated posts on Reddit draws fire for ethics concerns: https://retractionwatch.com/2025/04/28/experiment-using-ai-generated-posts-on-reddit-draws-fire-for-ethics-concerns/
     
    Episode art by Hamish Doodles: hamishdoodles.com
  • AXRP - the AI X-risk Research Podcast

    45 - Samuel Albanie on DeepMind's AGI Safety Approach

    06/7/2025 | 1h 15 mins.
    In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines.
    Patreon: https://www.patreon.com/axrpodcast
    Ko-fi: https://ko-fi.com/axrpodcast
    Transcript: https://axrp.net/episode/2025/07/06/episode-45-samuel-albanie-deepminds-agi-safety-approach.html
     
    Topics we discuss, and timestamps:
    0:00:37 DeepMind's Approach to Technical AGI Safety and Security
    0:04:29 Current paradigm continuation
    0:19:13 No human ceiling
    0:21:22 Uncertain timelines
    0:23:36 Approximate continuity and the potential for accelerating capability improvement
    0:34:29 Misuse and misalignment
    0:39:34 Societal readiness
    0:43:58 Misuse mitigations
    0:52:57 Misalignment mitigations
    1:05:20 Samuel's thinking about technical AGI safety
    1:14:02 Following Samuel's work
     
    Samuel on Twitter/X: x.com/samuelalbanie
     
    Research we discuss:
    An Approach to Technical AGI Safety and Security: https://arxiv.org/abs/2504.01849
    Levels of AGI for Operationalizing Progress on the Path to AGI: https://arxiv.org/abs/2311.02462
    The Checklist: What Succeeding at AI Safety Will Involve: https://sleepinyourhat.github.io/checklist/
    Measuring AI Ability to Complete Long Tasks: https://arxiv.org/abs/2503.14499
     
    Episode art by Hamish Doodles: hamishdoodles.com
  • AXRP - the AI X-risk Research Podcast

    44 - Peter Salib on AI Rights for Human Safety

    28/6/2025 | 3h 21 mins.
    In this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity.
    Patreon: https://www.patreon.com/axrpodcast
    Ko-fi: https://ko-fi.com/axrpodcast
    Transcript: https://axrp.net/episode/2025/06/28/episode-44-peter-salib-ai-rights-human-safety.html
     
    Topics we discuss, and timestamps:
    0:00:40 Why AI rights
    0:18:34 Why not reputation
    0:27:10 Do AI rights lead to AI war?
    0:36:42 Scope for human-AI trade
    0:44:25 Concerns with comparative advantage
    0:53:42 Proxy AI wars
    0:57:56 Can companies profitably make AIs with rights?
    1:09:43 Can we have AI rights and AI safety measures?
    1:24:31 Liability for AIs with rights
    1:38:29 Which AIs get rights?
    1:43:36 AI rights and stochastic gradient descent
    1:54:54 Individuating "AIs"
    2:03:28 Social institutions for AI safety
    2:08:20 Outer misalignment and trading with AIs
    2:15:27 Why statutes of limitations should exist
    2:18:39 Starting AI x-risk research in legal academia
    2:24:18 How law reviews and AI conferences work
    2:41:49 More on Peter moving to AI x-risk research
    2:45:37 Reception of the paper
    2:53:24 What publishing in law reviews does
    3:04:48 Which parts of legal academia focus on AI
    3:18:03 Following Peter's research
     
    Links for Peter:
    Personal website: https://www.peternsalib.com/
    Writings at Lawfare: https://www.lawfaremedia.org/contributors/psalib
    CLAIR: https://clair-ai.org/
     
    Research we discuss:
    AI Rights for Human Safety: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4913167
    Will humans and AIs go to war? https://philpapers.org/rec/GOLWAA
    Infrastructure for AI agents: https://arxiv.org/abs/2501.10114
    Governing AI Agents: https://arxiv.org/abs/2501.07913
     
    Episode art by Hamish Doodles: hamishdoodles.com
  • AXRP - the AI X-risk Research Podcast

    43 - David Lindner on Myopic Optimization with Non-myopic Approval

    15/6/2025 | 1h 40 mins.
    In this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out.
    Patreon: https://www.patreon.com/axrpodcast
    Ko-fi: https://ko-fi.com/axrpodcast
    Transcript: https://axrp.net/episode/2025/06/15/episode-43-david-lindner-mona.html
     
    Topics we discuss, and timestamps:
    0:00:29 What MONA is
    0:06:33 How MONA deals with reward hacking
    0:23:15 Failure cases for MONA
    0:36:25 MONA's capability
    0:55:40 MONA vs other approaches
    1:05:03 Follow-up work
    1:10:17 Other MONA test cases
    1:33:47 When increasing time horizon doesn't increase capability
    1:39:04 Following David's research
     
    Links for David:
    Website: https://www.davidlindner.me
    Twitter / X: https://x.com/davlindner
    DeepMind Medium: https://deepmindsafetyresearch.medium.com
    David on the Alignment Forum: https://www.alignmentforum.org/users/david-lindner
     
    Research we discuss:
    MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking: https://arxiv.org/abs/2501.13011
    Arguments Against Myopic Training: https://www.alignmentforum.org/posts/GqxuDtZvfgL2bEQ5v/arguments-against-myopic-training
     
    Episode art by Hamish Doodles: hamishdoodles.com

More Science podcasts

About AXRP - the AI X-risk Research Podcast

AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.
Podcast website

Listen to AXRP - the AI X-risk Research Podcast, More or Less and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

AXRP - the AI X-risk Research Podcast: Podcasts in Family

Social
v8.5.0 | © 2007-2026 radio.de GmbH
Generated: 2/9/2026 - 2:29:24 PM