LessWrong posts by zvi podcast | Listen online for free

Available Episodes

5 of 388

“DeepSeek v3.2 Is Okay And Cheap But Slow” by Zvi
DeepSeek v3.2 is DeepSeek's latest open model release with strong bencharks. Its paper contains some technical innovations that drive down cost. It's a good model by the standards of open models, and very good if you care a lot about price and openness, and if you care less about speed or whether the model is Chinese. It is strongest in mathematics. What it does not appear to be is frontier. It is definitely not having a moment. In practice all signs are that it underperforms its benchmarks. When I asked for practical experiences and reactions, I got almost no responses. A Brief History of DeepSeek DeepSeek is a cracked Chinese AI lab that has produced some very good open models, done some excellent research, and given us strong innovations in terms of training techniques and especially training efficiency. They also, back at the start of the year, scared the hell out of pretty much everyone. A few months after OpenAI released o1, and shortly after DeepSeek released the impressive v3 that was misleadingly known as the ‘six million dollar model,’ DeepSeek came out with a slick app and with r1, a strong [...] ---Outline:(00:49) A Brief History of DeepSeek(03:51) Once More, With Feeling(06:23) Reading The Paper(08:20) Open Language Model Offers Mundane Utility(11:14) Those Benchmarks(15:18) Open Language Model Doesn't Offer Mundane Utility(16:49) Open Language Model Does Do The Math(18:11) I'll Get You Next Time, Gadget --- First published: December 5th, 2025 Source: https://www.lesswrong.com/posts/vcmBEmKFJFQkDaXTP/deepseek-v3-2-is-okay-and-cheap-but-slow --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
--------
19:30
--------
19:30
“AI #145: You’ve Got Soul” by Zvi
The cycle of language model releases is, one at least hopes, now complete. OpenAI gave us GPT-5.1 and GPT-5.1-Codex-Max. xAI gave us Grok 4.1. Google DeepMind gave us Gemini 3 Pro and Nana Banana Pro. Anthropic gave us Claude Opus 4.5. It is the best model, sir. Use it whenever you can. One way Opus 4.5 is unique is that it as what it refers to as a ‘soul document.’ Where OpenAI tries to get GPT-5.1 to adhere to its model spec that lays out specific behaviors, Anthropic instead explains to Claude Opus 4.5 how to be virtuous and the reasoning behind its rules, and lets a good model and good governance flow from there. The results are excellent, and we all look forward to learning more. See both the Opus 4.5 post and today's update for more details. Finally, DeepSeek gave us v3.2. It has very good benchmarks and is remarkably cheap, but it is slow and I can’t find people excited to use it in practice. I’ll offer a relatively short report on it tomorrow, I am giving one last day for more reactions. The latest attempt to slip unilateral [...] ---Outline:(01:47) Language Models Offer Mundane Utility(02:51) Language Models Don't Offer Mundane Utility(04:14) On Your Marks(05:21) Get My Agent On The Line(06:02) Advertising Is Coming(07:30) Deepfaketown and Botpocalypse Soon(13:43) Fun With Media Generation(15:11) A Young Lady's Illustrated Primer(16:33) You Drive Me Crazy(16:50) Unprompted Attention(17:05) They Took Our Jobs(22:49) Get Involved(24:02) Introducing(24:27) Variously Effective Altruism(28:27) In Other AI News(30:38) Show Me the Money(30:45) Quiet Speculations(32:06) Seb Krier On Agents Versus Multiagents(38:24) Olivia Moore Makes 2026 Predictions(41:17) Bubble, Bubble, Toil and Trouble(42:30) Americans Really Do Not Like AI(47:46) The Quest for Sane Regulations(49:57) My Offer Is Nothing(55:28) America Pauses(57:05) David Sacks Covered In New York Times(01:00:12) The Week in Audio(01:00:43) Rhetorical Innovation(01:01:41) To The Moon(01:08:54) Showing Up(01:13:22) DeepMind Pivots Its Interpretability Research(01:16:12) The Explicit Goal Of OpenAI Is Recursive Self-Improvement(01:21:20) Aligning a Smarter Than Human Intelligence is Difficult(01:28:03) Misaligning a Smarter Than Human Intelligence Is Difficult To Hire For(01:29:12) You've Got Soul(01:40:04) Disagreements About Timelines(01:44:53) Other Disagreements About Timelines(01:50:18) Messages From Janusworld(01:50:33) People Are Worried About AI Killing Everyone(01:50:58) The Lighter Side The original text contained 1 footnote which was omitted from this narration. --- First published: December 4th, 2025 Source: https://www.lesswrong.com/posts/bCkijKnuEpjnZtX84/ai-145-you-ve-got-soul --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
--------
1:54:09
--------
1:54:09
“On Dwarkesh Patel’s Second Interview With Ilya Sutskever” by Zvi
Some podcasts are self-recommending on the ‘yep, I’m going to be breaking this one down’ level. This was very clearly one of those. So here we go. Double click to interact with video As usual for podcast posts, the baseline bullet points describe key points made, and then the nested statements are my commentary. If I am quoting directly I use quote marks, otherwise assume paraphrases. What are the main takeaways? Ilya thinks training in its current form will peter out, that we are returning to an age of research where progress requires more substantially new ideas. SSI is a research organization. It tries various things. Not having a product lets it punch well above its fundraising weight in compute and effective resources. Ilya has 5-20 year timelines to a potentially superintelligent learning model. SSI might release a product first after all, but probably not? Ilya's thinking about alignment still seems relatively shallow to me in key ways, but he grasps many important insights and understands he has a problem. Ilya essentially despairs of having a substantive plan beyond ‘show everyone the thing as early [...] ---Outline:(01:42) Explaining Model Jaggedness(03:15) Emotions and value functions(04:38) What are we scaling?(05:47) Why humans generalize better than models(07:00) Straight-shooting superintelligence(08:39) SSI's model will learn from deployment(09:35) Alignment(17:40) We are squarely an age of research company(22:27) Research taste(25:11) Bonus Coverage: Dwarkesh Patel on AI Progress These Days --- First published: December 3rd, 2025 Source: https://www.lesswrong.com/posts/bMvCNtSH8DiGDTvXd/on-dwarkesh-patel-s-second-interview-with-ilya-sutskever --- Narrated by TYPE III AUDIO.
--------
39:06
--------
39:06
“Reward Mismatches in RL Cause Emergent Misalignment” by Zvi
Learning to do misaligned-coded things anywhere teaches an AI (or a human) to do misaligned-coded things everywhere. So be sure you never, ever teach any mind to do what it sees, in context, as misaligned-coded things. If the optimal solution (as in, the one you most reinforce) to an RL training problem is one that the model perceives as something you wouldn’t want it to do, it will generally learn to do things you don’t want it to do. You can solve this by ensuring that the misaligned-coded things are not what the AI will learn to do. Or you can solve this by making those things not misaligned-coded. If you then teaching aligned behavior in one set of spots, this can fix the problem in those spots, but the fix does not generalize to other tasks or outside of distribution. If you manage to hit the entire distribution of tasks you care about in this way, that will work for now, but it still won’t generalize, so it's a terrible long term strategy. Yo Shavit: Extremely important finding. Don’t tell your model you’re rewarding it for A and then reward it for B [...] ---Outline:(02:59) Abstract Of The Paper(04:12) The Problem Statement(05:35) The Inoculation Solution(07:02) Cleaning The Data Versus Cleaning The Environments(08:16) No All Of This Does Not Solve Our Most Important Problems(13:18) It Does Help On Important Short Term Problems --- First published: December 2nd, 2025 Source: https://www.lesswrong.com/posts/a2nW8buG2Lw9AdPtH/reward-mismatches-in-rl-cause-emergent-misalignment --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
--------
14:17
--------
14:17
“Claude Opus 4.5 Is The Best Model Available” by Zvi
Claude Opus 4.5 is the best model currently available. No model since GPT-4 has come close to the level of universal praise that I have seen for Claude Opus 4.5. It is the most intelligent and capable, most aligned and thoughtful model. It is a joy. There are some auxiliary deficits, and areas where other models have specialized, and even with the price cut Opus remains expensive, so it should not be your exclusive model. I do think it should absolutely be your daily driver. Image by Nana Banana Pro, prompt chosen for this purpose by Claude Opus 4.5 Table of Contents It's The Best Model, Sir. Huh, Upgrades. On Your Marks. Anthropic Gives Us Very Particular Hype. Employee Hype. Every Vibe Check. Spontaneous Positive Reactions. Reaction Thread Positive Reactions. Negative Reactions. The Lighter Side. Popularity. You’ve Got Soul. It's The Best Model, Sir Here is the full picture of where we are now (as mostly seen in Friday's post): You want to be using Claude Opus 4.5. That is especially true for coding, or if [...] ---Outline:(00:59) It's The Best Model, Sir(03:18) Huh, Upgrades(04:50) On Your Marks(09:12) Anthropic Gives Us Very Particular Hype(13:35) Employee Hype(15:40) Every Vibe Check(18:16) Spontaneous Positive Reactions(21:44) Reaction Thread Positive Reactions(28:39) Negative Reactions(30:34) The Lighter Side(31:27) Popularity(33:26) You've Got Soul --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/HtdrtF5kcpLtWe5dW/claude-opus-4-5-is-the-best-model-available --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
--------
44:49
--------
44:49