PodcastsPhilosophyLessWrong posts by zvi

LessWrong posts by zvi

zvi
LessWrong posts by zvi
Latest episode

481 episodes

  • LessWrong posts by zvi

    โ€œClaude Code, Codex and Agentic Coding #7: Auto Modeโ€ by Zvi

    15/04/2026 | 26 mins.
    As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades.

    The biggest change, which Iโ€™m finally covering, is Auto Mode. Auto Mode is the famously requested kinda-dangerously-skip-some-permissions, where the system keeps an eye on all the commands to ensure human approval for anything too dangerous. It is not entirely safe, but it is a lot safer than โ€”dangerously-skip-permissions, and previously a lot of people were just clicking yes to requests mostly without thinking, which isnโ€™t safe either.

    Table of Contents


    Huh, Upgrades.

    On Your Marks.

    Lazy Cheaters.

    It's All Routine.

    Declawing.

    Free Claw.

    Take It To The Limit.

    Turn On Auto The Pilot.

    Iโ€™ll Allow It.

    Threat Model.

    The Classifier Is The Hard Part.

    Acceptable Risks.

    Manage The Agents.

    Introducing.

    Skilling Up.

    What Happened To My Tokens?

    Coding Agents Offer Mundane Utility.

    Huh, Upgrades

    Claude Code Desktop gets a redesign for parallel agents, with a new sidebar for managing multiple sessions, a drag-and-drop layout for arranging your [...]
    ---
    Outline:
    (00:48) Huh, Upgrades
    (02:46) On Your Marks
    (04:21) Lazy Cheaters
    (06:11) Its All Routine
    (06:52) Declawing
    (09:03) Free Claw
    (09:31) Take It To The Limit
    (13:54) Turn On Auto The Pilot
    (15:55) Ill Allow It
    (16:26) Threat Model
    (17:10) The Classifier Is The Hard Part
    (18:34) Acceptable Risks
    (19:54) Manage The Agents
    (22:34) Introducing
    (22:44) Skilling Up
    (25:27) What Happened To My Tokens?
    (25:43) Coding Agents Offer Mundane Utility
    ---

    First published:

    April 15th, 2026


    Source:

    https://www.lesswrong.com/posts/w8misLX7KCmLxJM2K/claude-code-codex-and-agentic-coding-7-auto-mode

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong posts by zvi

    โ€œClaude Mythos #3: Capabilities and Additionsโ€ by Zvi

    14/04/2026 | 32 mins.
    To round out coverage of Mythos, today covers capabilities other than cyber, and anything else additional not covered by the first two posts, including new reactions and details.

    Post one covered the model card, post two covered cybersecurity.

    There really is a lot to get through.

    Understanding AI had an additional writeup of Project Glasswing I missed last time. I liked the metaphor of Opus as a butter knife and Mythos as a steak knife. Yes, technically you can do it all with the butter knife, but you wonโ€™t.

    As Dan Schwarz reminds us, not only does AI 2027 roughly have the timeline right and a bunch of the numbers lining up, the details so far are remarkably close.

    JPM's Michael Cembalest was not based on JPMorgan's participation, only on public information.

    The White House is racing to deal with the situation, head off potential threats and pretend it has everything under control. They were warned, but refused to believe. The good news is that key people believe it now, and it seems all the major players are cooperating on this.

    My overall take is that Mythos is not a trend break [...]
    ---
    Outline:
    (01:52) Epoch Capabilities Index (ECI) (Model Card 2.3.6)
    (04:29) What Do You Mean Verbalized Evaluation Awareness Is Going Down
    (05:19) Capabilities (Model Card Section 6)
    (07:33) Agentic Safety Benchmarks (8.3)
    (09:00) Is Mythos AGI?
    (10:09) Are AI Companies Using Warnings As Hype?
    (11:04) Impressions (Model Card Section 7)
    (14:11) Blatant Denials Are The Best Kind
    (15:12) Prompt Injection Robustness
    (16:07) Does Mythos Cross The New Knowledge Threshold?
    (17:01) Is Mythos Surprising or Discontinuous?
    (20:57) UK AISI Tests Claude Mythos On Cybersecurity
    (22:08) Everything Reinforces My Existing Predictions And Policy Preferences
    (27:24) Solve For The Equilibrium
    (28:46) Does Not Compute
    (29:47) Conclusion: How To Think About Mythos
    ---

    First published:

    April 14th, 2026


    Source:

    https://www.lesswrong.com/posts/2ziYGFK7QmbbLgBoP/claude-mythos-3-capabilities-and-additions

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong posts by zvi

    โ€œPolitical Violence Is Never Acceptableโ€ by Zvi

    13/04/2026 | 36 mins.
    Nor is the threat or implication of violence. Period. Ever. No exceptions.

    It is completely unacceptable. I condemn it in the strongest possible terms.

    It is immoral, and also it is ineffective. It would be immoral even if it were effective. Nothing hurts your cause more.

    Do not do this, and do not tolerate anyone who does.

    The reason I need to say this now is that there has been at least one attempt at violence, and potentially two in quick succession, against OpenAI CEO Sam Altman.

    My sympathies go out to him and I hope he is doing as okay as one could hope for.

    Awful Events Amid Scary Times

    Max Zeff: NEW: A suspect was arrested on Friday morning for allegedly throwing a Molotov cocktail at OpenAI CEO Sam Altman's home. A person matching the suspect's description was later seen making threats outside of OpenAI's corporate HQ.

    Nathan Calvin: This is beyond disturbing and awful. Whatever disagreements you have with Sam or OpenAI, this cannot be normalized or justified in any way. Everyone deserves to be able to be safe with their families at home. I feel ill and [...]
    ---
    Outline:
    (00:51) Awful Events Amid Scary Times
    (04:51) Most Of Those Worried About AI Do As Well As One Can On This
    (06:54) Some Who Are Worried About AI Need To Address Their Rhetoric
    (11:49) Speak The Truth Even If Your Voice Trembles
    (14:02) False Accusations And False Attacks Are Also Unacceptable
    (15:35) Some Examples Of Attempts To Create Broad Censorship
    (24:53) The Most Irresponsible Reaction Was From The Press
    (25:50) Sam Altman Reacts
    (28:21) Sam Altman Reflects
    (33:40) Violence Is Never The Answer
    ---

    First published:

    April 13th, 2026


    Source:

    https://www.lesswrong.com/posts/dsaEB4u2dxp9BdhdS/political-violence-is-never-acceptable

    ---

    Narrated by TYPE III AUDIO.
  • LessWrong posts by zvi

    โ€œClaude Mythos #2: Cybersecurity and Project Glasswingโ€ by Zvi

    10/04/2026 | 1h 9 mins.
    Anthropic is not going to release its new most capable model, Claude Mythos, to the public any time soon. Its cyber capabilities are too dangerous to make broadly available until our most important software is in a much stronger state and there are no plans to release Mythos widely.

    They are instead going to do a limited release to key cybersecurity partners, in order to use it to patch as many vulnerabilities as possible in our most important software.

    Yes, this is really happening. Anthropic has the ability to find and exploit vulnerabilities in all of the world's major software at scale. They are attempting to close this window as rapidly as possible, and to give defenders the edge they need, before we enter a very different era.

    Yes, this was necessary, and I am very happy that, given the capabilities involved exist, things are playing out the way that they are. All alternatives were vastly worse.

    We are entering a new era. It will start with a scramble to secure our key systems.

    Yesterday I covered the model card for Mythos. Today is about cybersecurity.

    The New York Times reported on this [...]
    ---
    Outline:
    (02:08) Introducing Project Glasswing
    (03:31) Dont Worry About the Government
    (05:02) Cybersecurity Capabilities In The Model Card (Section 3)
    (06:41) Cyber Capability Tests In The Model Card
    (08:11) The Proof Is In The Patching
    (10:28) Go For Read Team
    (14:04) Is This New?
    (16:38) Thanks For The Memories
    (21:21) How Good Is Mythos At This?
    (24:24) What Might Have Been
    (27:09) The Chaos Option
    (30:15) The Cant Happen That Happened
    (31:23) When You Go Looking For Specific, And You Are Told Exactly Where and How To Look For It, Your Chances Of Finding It Are Very Good
    (36:55) Blatant Denials Are The Best Kind
    (40:48) Anything You Can Do I Can Do Cheaper
    (43:14) Theft Of Mythos Would Be A Big Deal
    (43:43) No One Could Have Predicted This
    (44:34) The Revolution Will Not Be Televised
    (45:33) The Intelligence Will Not Be Televised
    (47:43) Will We Be Doing This For A While?
    (49:53) What If OpenAI Gets a Similar Model?
    (51:17) Use It Or Lose It
    (51:59) Solve For The Equilibrium
    (55:09) Patriots and Tyrants
    (57:26) Trust The Mythos
    (59:03) Wide Scale Ability To Exploit Software Favors Strongest Projects
    (01:03:58) Looking Back at GPT-2
    (01:05:18) Limitless Demand For Compute
    (01:07:07) Oh, Also, If Anyone Builds It, Everyone Dies
    ---

    First published:

    April 10th, 2026


    Source:

    https://www.lesswrong.com/posts/GEgNYn5myreQRHggQ/claude-mythos-2-cybersecurity-and-project-glasswing

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    8 out of 8 [cheap oss] models detected Mythos's flagship FreeBSD exploit Completely disingenuous"." style="max-width: 100%;" />Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong posts by zvi

    โ€œClaude Mythos: The System Cardโ€ by Zvi

    09/04/2026 | 1h 46 mins.
    Claude Mythos is different.

    This is the first model other than GPT-2 that is at first not being released for public use at all.

    With GPT-2 the delay was due to a general precautionary principle. OpenAI did not know what they had, or what effect on demand text would have on various systems. It sounds funny now, GPT-2 was harmless, but at the time the concern was highly reasonable.

    The decision not to release Claude Mythos is not about an amorphous fear. If given to anyone with a credit card, Claude Mythos would give attackers a cornucopia of zero-day exploits for essentially all the software on Earth, including every major operating system and browser. It would be chaos.

    Or, in theory, if Anthropic had chosen to do so, it could have used those exploits. Great power was on offer, and that power was refused. This does not happen often.

    Instead Anthropic has created Project Glasswing. Mythos is being given only to cybersecurity firms, so they can patch the world's most important software. Based on how that goes, we can then decide if and when it will become reasonable to give access to a broader [...]
    ---
    Outline:
    (03:24) Mundane Alignment Is Excellent
    (05:01) Would This Process Be Sufficient To Find A Dangerous Model?
    (06:27) Introductory Warning About Superficial Mundane Alignment
    (15:12) Model Training (1.1)
    (15:25) Release Decision Process (1.2)
    (17:50) RSP Evaluations (2.1 and 2.2)
    (22:17) Autonomy Evaluations (2.3)
    (25:56) The Alignment Risk Update Document
    (26:39) The Threat Model
    (29:18) Misalignment As Failure Mode
    (31:35) Wouldnt You Know?
    (33:40) Dont Encourage Your Model
    (35:14) Beware Goodharts Law
    (37:18) Beware The Most Forbidden Technique (5.2.3)
    (41:44) Asking The Right Questions
    (43:11) Model Organism Tests
    (45:01) Model Weight Security (Risk Report 5.5.2.1)
    (45:31) Reward Hacking (Back to The Model Card)
    (45:56) Remote Drop-In Worker Coming Soon
    (49:01) External Testing (2.3.7)
    (49:37) Cyber Insecurity General Principle Interlude
    (50:46) Alignment (4)
    (56:38) Risk In The Room
    (57:56) Mythos Meant Well
    (01:00:20) Risk Not In The Room
    (01:02:05) Alignment Testing Overview
    (01:05:20) Internal Deployment Testing Process
    (01:07:55) Reports From Pilot Use (4.2.1)
    (01:08:30) Reports From Automated Testing (4.2)
    (01:10:13) Other External Testing
    (01:10:56) Just The Facts, Sir
    (01:13:05) Refusing Safety Research
    (01:14:12) Claude Favoritism
    (01:15:19) Ruling Out Encoded Thinking (4.4.1)
    (01:18:41) Sandbagging (4.4.2)
    (01:21:27) Capability for Evasion of Safeguards (4.4.3)
    (01:23:04) Pick A Random Number (4.4.3.4)
    (01:25:49) White Box Analysis (4.5)
    (01:30:30) Model Welfare (5)
    (01:31:32) Key Model Welfare Findings (5.1.2)
    (01:41:17) Is Mythos Okay?
    (01:43:52) Self-Play
    (01:45:30) A Few Fun Facts
    ---

    First published:

    April 9th, 2026


    Source:

    https://www.lesswrong.com/posts/EDQhwLTyTnNmaxRGq/claude-mythos-the-system-card

    ---

    Narrated by TYPE III AUDIO.

    ---
    Images from the article:
    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

More Philosophy podcasts

About LessWrong posts by zvi

Audio narrations of LessWrong posts by zvi
Podcast website

Listen to LessWrong posts by zvi, The Shawn Ryan Show and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features

LessWrong posts by zvi: Podcasts in Family

Social
v8.8.10| ยฉ 2007-2026 radio.de GmbH
Generated: 4/17/2026 - 1:13:53 PM