PodcastsTechnologyMachine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)
Machine Learning Street Talk (MLST)
Latest episode

255 episodes

  • Machine Learning Street Talk (MLST)

    The Benchmark With No Instructions — ARC-AGI-3 (winning team!)

    01/07/2026 | 1h 24 mins.
    Tim Scarfe travels to Zurich to sit down with the Tufa Labs ARC-AGI-3 team — founder Benjamin Crouzier, with Jeroen Cottaar, Dries Smit, Stefano Viel and Michal Tesnar — to work out what their leaderboard-topping system does and what the benchmark is really testing.The cut opens on the games: a walkthrough of the Locksmith game, where you read the rules of an unfamiliar world straight from raw frames. ARC-AGI-3 makes ARC interactive and agentic, so the model has to *discover* the goal rather than transduce a static grid. It stays easy for humans and breaks LLMs, and it runs through everything that follows. Dries traces his StochasticGoose preview win — brute force that only searched actions which changed the frame — and why it collapsed once the organisers added action-efficiency scoring and unseen games.Induction and transduction run through the middle of the conversation — how much of an answer is really priors leaking back the moment a model recognises a maze. The abstraction mountain, and Tim's case that LLMs reach the right answer through fractured, entangled representations — performance, not competence. Whether transformers plan at all or just fake it well enough. Why the score really measures action efficiency, not games solved, and why agents lock onto the wrong goal and cannot climb back out.Crouzier closes on the Tufa Labs thesis — a small lab against the giants, the bitter lesson against hand-built harnesses, and safety — and Tim ties it back to Kenneth Stanley, deep constraints, and creativity as competence.

    Disclosure: Tufa Labs sponsors MLST. ---TIMESTAMPS:00:00:00 Meet the Tufa team and what makes ARC-AGI-3 hard00:02:11 Locksmith game: reading the rules from raw frames00:03:10 Why build an independent research lab00:04:11 StochasticGoose: a preview win, then the hardened games00:07:58 Induction, transduction, and priors inside LLMs00:10:31 Curiosity, world models, and exploring by frame change00:14:32 Understanding debt and losing sight of your own code00:15:53 Requirements-based agents and human-AI co-creativity00:19:22 Why auto-research misses the big picture00:21:54 The abstraction mountain and fractured representations00:27:36 Constraints and making LLMs act as if they understand00:34:51 Human difficulty calibration, esports priors, and emergence00:41:35 Agency, goal acquisition, and two kinds of planning00:47:31 Harnesses, the 36% number, and wrong-goal loops00:52:33 Rewards, goals, and why ARC-AGI-3 resists brute force01:00:46 Would solving ARC-AGI-3 prove AGI?01:07:53 Stripping language away, then priors leak back01:14:06 Representation and whether language is necessary01:18:04 The bitter lesson versus specialised harnesses01:22:20 Capability research, safety, and the software singularity---REFERENCES:organization:[00:02:11] ARC-AGI-3https://arcprize.org/arc-agi/3[00:03:10] Tufa Labshttps://tufalabs.ai/team/[00:04:20] ARC-AGI-3 Preview Agent Competitionhttps://arcprize.org/competitions/arc-agi-3-preview-agentstool:[00:04:55] StochasticGoose ARC-AGI-3 solutionhttps://github.com/DriesSmit/ARC3-solution[00:07:42] ArcGenticahttps://github.com/symbolica-ai/arcgentica[00:07:49] RGB-Agenthttps://github.com/alexisfox7/RGB-Agent[00:14:38] Claude Codehttps://www.anthropic.com/claude-code[01:03:42] Qwen 3.6 27Bhttps://huggingface.co/Qwen/Qwen3.6-27Bpaper:[00:13:03] On the Measure of Intelligencehttps://arxiv.org/abs/1911.01547[00:27:42] DreamCoderhttps://arxiv.org/abs/2006.08381[00:43:55] On the Biology of a Large Language Modelhttps://transformer-circuits.pub/2025/attribution-graphs/biology.html[01:18:46] ImageNet Classification with Deep CNNs (AlexNet)https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdfother:[01:18:16] The Bitter Lessonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html---https://app.rescript.info/share/463d7f031349b4b9db428553eed88230
  • Machine Learning Street Talk (MLST)

    The Thermodynamic AI Computing Chip - Thomas Ahle

    28/06/2026 | 1h 2 mins.
    Thomas Ahle wants Normal Computing to be the Lovable for chip design: type your intent, and a swarm of agents carries it from design through optimisation, formalisation and verification to tape-out. To get there, his team at wrote their own open-source Verilog simulator, 580,000 lines in 43 days, because commercial EDA verifiers run about $10,000 per core and there are no decent open-source compilers to build on.

    That sets up the question Tim keeps pressing: if an agent can produce a chip design, a proof, or a working program, how do you actually know it is correct? Passing 70% of tests is not the same as being right, and a single fabricated bug can cost a company a fortune. They dig into ProgramBench (rebuild a program from its tests, roughly 0% success), the difference between structure and competence, and the "understanding debt" you take on when nobody reads the code.

    From there: auto-formalisation in Lean and the AlphaProof trick of training on prove-or-disprove; why there is no single true representation of a spec (Petri nets, TLA+, Erik Curiel's "math does not represent"); and thermodynamic computing, where Normal Computing's CN101 chip is built so that its physical noise *is* the computation, settling a stochastic differential equation in hardware to invert a matrix. Plus Bayesian uncertainty, specialisation, the Chomsky hierarchy, AI slop, and whether performance is all that matters.

    Recorded in Zurich.

    Disclosure: Normal Computing paid our production and travel costs for this show. We retained full editorial control. They did not see the video before publication, and we did not show it to them or discuss it with them beforehand.

    ---
    TIMESTAMPS:
    00:00:00 Meet Thomas Ahle: the Lovable for chip design
    00:03:41 Why hardware needs formal verification
    00:06:36 Ten thousand dollars per core and a six-month agent run
    00:07:40 Rebuilding programs from tests: ProgramBench and zero percent
    00:12:15 Structure vs competence: can you learn a program from behavior?
    00:15:27 Continual learning, abstraction, and Claude as an ecosystem
    00:23:17 Autoformalization and the AlphaProof trick
    00:29:31 No single true representation: specs, Petri nets and TLA+
    00:34:43 Thermodynamic computing: when noise is the computation
    00:37:32 Bayesian uncertainty in the age of token streams
    00:41:12 Hybrid compute: vibe-coding loops, binaries and Stockfish
    00:44:44 Co-design, central-AI apps and API pricing
    00:49:45 Chain of thoughtlessness and the Chomsky hierarchy
    00:53:40 AI psychosis, slop and the broken social contract
    00:57:34 Typing it yourself, teamwork and performance vs competence

    ---
    REFERENCES:
    person:
    [00:00:10] Thomas Ahle
    https://thomasahle.com
    organization:
    [00:00:27] Normal Computing
    https://normalcomputing.com/
    paper:
    [00:11:21] ProgramBench: Can Language Models Rebuild Programs From Scratch?
    https://arxiv.org/abs/2605.03546
    [00:31:55] Autoformalizing Memory Device Specifications with Agents
    https://arxiv.org/abs/2605.00058
    [00:35:20] Thermo AI and the Fluctuation Frontier
    https://arxiv.org/abs/2302.06584
    [00:36:40] Thermo Comp System for AI Applications
    https://arxiv.org/abs/2312.04836
    [00:37:05] Thermodynamic Linear Algebra
    https://arxiv.org/abs/2308.05660
    [00:44:50] An efficient probabilistic hardware architecture for diffusion-like models
    https://arxiv.org/abs/2510.23972
    tool:
    other:
    [00:01:00] Building an Open-Source Verilog Simulator with AI: 580K Lines in 43 Days
    https://normalcomputing.com/blog/building-an-open-source-verilog-simulator-with-ai-580k-lines-in-43-days
    [00:02:55] Normal Computing Announces Tape-Out of the World's First Thermodynamic Computing Chip (CN101)
    https://www.normalcomputing.com/blog/normal-computing-announces-tape-out-of-worlds-first-thermodynamic-computing-chip
    [00:32:02] DRAMBench: Autoformalizing DRAM Specifications with Timed Petri Nets
    https://www.iese.fraunhofer.de/blog/drambench-autoformalizing-dram-specifications/

    ---
    ReScript: https://app.rescript.info/share/ff9684a112ab37744096adaeb097a263
  • Machine Learning Street Talk (MLST)

    He won a Nobel here for AlphaFold. Then he left. - John Jumper

    22/06/2026 | 53 mins.
    This episode is sponsored by Notion. Learn more about Notion's Developer Platform today at https://notion.com/mlstProtein folding stalled biology for fifty years. A sequence of amino acids dictates a three-dimensional shape, but reading that shape meant a year and roughly $100,000 of crystallography per structure. Then AlphaFold 2 won CASP14 so decisively the organizers called the problem essentially solved.In this documentary cut, John Jumper, who shared the 2024 Nobel Prize in Chemistry and has since left DeepMind for Anthropic, walks Tim Scarfe through what the system did and, more interestingly, what it did not. The architecture gets a proper dissection: MSAs, the Evoformer, invariant point attention, the FAPE loss, and Jumper's correction of the equivariance story, which ablations valued at roughly 2.5 of 30 GDT points rather than the whole win. He is blunt about the limits. AlphaFold predicts one experiment extraordinarily well; it is not a model of the cell, it does not capture dynamics, and on a given drug target it is "wrong nine times out of ten."From there: the AlphaFold Database of 200M+ predicted structures, AlphaFold 3 and ligands, Isomorphic Labs, and Jumper's quarrel with the bitter lesson, where finite data and human hypotheses still matter. Emmanuel Nji of BioStruct Africa closes the film on what changes when work that took years now takes months, and on training the next thousand structural biologists across Africa.---TIMESTAMPS:00:00:00 Cold open: predicting nature with a button press00:01:03 The protein folding bottleneck and CASP00:04:39 The Nobel, the database, and the move to Anthropic00:05:50 Sponsor (Notion) and framing: what AlphaFold does not claim00:07:39 Proteins as self-assembling nanomachines00:12:24 From structures to biology: drug discovery and Midnolin00:17:37 The humility of AlphaFold: a narrow predictor00:22:18 Inside the architecture: Evoformer, IPA and FAPE00:30:20 Ruthless empiricism: ablations and 100x in data00:35:20 Predict, control, understand00:40:00 Against the bitter lesson; AlphaFold 3 as diffusion00:45:07 Intelligence, representations and AGI00:49:23 Epilogue: AlphaFold in Africa00:52:16 Closing: the case for hybrid science models---REFERENCES:organization:[00:01:55] Critical Assessment of Structure Prediction (CASP)https://predictioncenter.org/[00:04:39] The Nobel Prize in Chemistry 2024https://www.nobelprize.org/prizes/chemistry/2024/summary/[00:05:18] BioStruct Africahttps://www.biostructafrica.org/[00:18:03] Isomorphic Labshttps://www.isomorphiclabs.com/paper:[00:03:09] AlphaFold Protein Structure Databasehttps://doi.org/10.1093/nar/gkab1061[00:17:25] Accurate structure prediction of biomolecular interactions with AlphaFold 3https://www.nature.com/articles/s41586-024-07487-w[00:22:18] Highly accurate protein structure prediction with AlphaFoldhttps://www.nature.com/articles/s41586-021-03819-2[00:23:10] Midnolin promotes degradation of substrates independent of ubiquitinationhttps://doi.org/10.1126/science.adh5021[00:27:00] Improved protein structure prediction using potentials from deep learninghttps://www.nature.com/articles/s41586-019-1923-7tool:[00:03:09] AlphaFold Protein Structure Database (EBI)https://alphafold.ebi.ac.uk/[00:45:55] AlphaEvolve: a coding agent for designing advanced algorithmshttps://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/other:[00:39:40] The Bitter Lessonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html---ReScript: https://app.rescript.info/share/d8cde5c221fb71e2c0f5aafe94f90dfaDisclaimer - not sponsored, editorial with us - we filmed it at GDM, London
  • Machine Learning Street Talk (MLST)

    When AI Decides You're a Threat — Brad Carson

    31/05/2026 | 1h 20 mins.
    Brad Carson was the Army's General Counsel, served two terms in Congress and was Acting Under Secretary of Defense for Personnel and Readiness. He now heads Americans for Responsible Innovation, the AI-policy advocacy group he co-founded. Keith Duggar spends roughly eighty minutes pushing back.

    SPONSOR:
    ---
    Cyber Fund built the Monastery to help founders ship products that were impossible a year ago. Applications for Batch 1 are now open.
    Apply now: https://cyber.fund
    ---

    Carson's whole case rests on one line: the genie is not out of the bottle. We have pulled dangerous tech back before. Asilomar halted recombinant DNA in 1975, and the West still controls the chips AI runs on. Calling it unstoppable, he says, is the most dangerous idea in the room.

    Then Keith drags him somewhere darker. A Palantir heat map scores you 0.73 on whether you are a combatant, and a strike follows. The model is wrong some accepted share of the time, and when it is, nobody answers for it. You cannot court-martial a model, and not even the interpretability researchers can say why it picked you.


    Note: after recording, we learned that Americans for Responsible Innovation is backed by EA-aligned philanthropy (not sponsored)

    ---
    TIMESTAMPS:
    00:00:00 From the Pentagon to AI governance
    00:04:52 Regulatory capture vs Silicon Valley networks
    00:07:56 Transparency and the Claude tier changes
    00:09:40 Tort liability when AI tools cause harm
    00:13:40 AI is a product, not a person
    00:16:01 Children, suicide, and the suicide business
    00:19:59 Opaque neural nets and the law of war
    00:25:54 Probabilistic targeting and the death of accountability
    00:28:47 The arms race fallacy: Asilomar and restraint
    00:34:02 Talking to China: track 2 talks and chip leverage
    00:39:45 Air power never wins: capital for labour
    00:43:29 Anthropic vs the Department of War
    00:51:29 Concentration, open source, and brain drain
    01:00:18 DeepSeek, Chinese culture, and AI as diplomacy
    01:12:25 Upskilling Congress and why public trust matters

    ---
    REFERENCES:
    organization:
    [00:02:45] ICRC position on autonomous weapons
    https://www.icrc.org/en/law-and-policy/autonomous-weapons
    [00:05:22] Americans for Responsible Innovation (ARI)
    https://ari.us
    [00:07:20] Andreessen Horowitz (a16z)
    https://a16z.com/
    [01:16:05] Office of Technology Assessment
    https://en.wikipedia.org/wiki/Office_of_Technology_Assessment
    other:
    [00:03:35] Beneficial AGI 2019 Conference (Future of Life Institute, Puerto Rico)
    https://futureoflife.org/event/beneficial-agi-2019/
    [00:18:30] Section 230 of the Communications Decency Act
    https://en.wikipedia.org/wiki/Section_230
    [00:19:59] Lethal Autonomous Weapons (LAWS)
    https://en.wikipedia.org/wiki/Lethal_autonomous_weapon
    [00:31:35] Strategic Arms Limitation Talks (SALT)
    https://en.wikipedia.org/wiki/Strategic_Arms_Limitation_Talks
    [00:32:28] Asilomar Conference on Recombinant DNA (1975)
    https://en.wikipedia.org/wiki/Asilomar_Conference_on_Recombinant_DNA
    [00:39:45] The New Iron Triangle (ARI policy byte)
    https://ari.us/policy-bytes/the-new-iron-triangle/
    [00:48:05] Defense Production Act
    https://en.wikipedia.org/wiki/Defense_Production_Act
    person:
    [00:03:35] Anthony Aguirre
    https://en.wikipedia.org/wiki/Anthony_Aguirre
    [00:06:48] Dean Ball — Hyperdimensional
    https://www.hyperdimensional.co/
    [00:23:13] Neel Nanda — mechanistic interpretability
    https://www.neelnanda.io/
    [00:36:02] Jack Clark (Anthropic) on Conversations with Tyler
    https://conversationswithtyler.com/episodes/jack-clark/
    [00:39:15] Robert Trager — Centre for the Governance of AI
    https://www.governance.ai/team/robert-trager
    [00:41:55] Giulio Douhet
    https://en.wikipedia.org/wiki/Giulio_Douhet
    [01:15:05] Don Beyer (US Congress)
    https://en.wikipedia.org/wiki/Don_Beyer
    tool:
    [00:22:19] Phalanx CIWS
    https://en.wikipedia.org/wiki/Phalanx_CIWS

    ---
    ReScript:
    https://app.rescript.info/public/share/9405ff35c0215b7cdae6402d41284171
    https://app.rescript.info/api/public/sessions/0a6c081b8e5fe413/pdf
  • Machine Learning Street Talk (MLST)

    Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

    21/05/2026 | 1h 17 mins.
    Michael I. Jordan, described by Science magazine as the most influential computer scientist alive, has never thought of himself as an AI researcher. In this conversation he explains why that distinction matters.

    SPONSOR:
    ---
    Cyber Fund built the Monastery to help founders ship products that were impossible a year ago. Applications for Batch 1 are now open.
    Apply now: https://cyber.fund
    ---

    Jordan trained as a statistician and cognitive scientist, and his career has been spent building machine learning systems that work in the real world: supply chains, commerce, healthcare, and large economic systems. When the field rebranded itself as AI and then AGI, he did not follow. Instead he argues that the framing is wrong. AI is better understood as a collective economic system than as a race to build a disembodied superintelligence.

    We talk about why AGI is mostly a PR term, what machine learning achieved before the LLM hype cycle, and why the assistant-on-your-shoulder vision may be less compelling than it sounds. Jordan explains why explanations need to be actionable, not merely mechanistic; why AlphaFold's missing error bars matter; how prediction-powered inference changes the picture; and why drug discovery is an incentive-design problem rather than a pure pattern-matching problem.

    ERRATA: Science magazine ranked him the most influential computer scientist, not Nature

    ---
    TIMESTAMPS:
    00:00:00 Cold open: A demoralizing message to young builders
    00:02:04 CyberFund sponsor read
    00:02:50 From symbolic AI to machine learning systems
    00:05:42 Why AGI is mostly a PR term
    00:08:48 A collectivist, economic perspective on AI
    00:11:33 Why LLMs need system design, not hype
    00:14:50 Predictability beats faux understanding
    00:17:55 AlphaFold, bias, and prediction-powered inference
    00:21:48 Stop anthropomorphizing intelligence
    00:27:44 Drug discovery as an incentive problem
    00:32:29 The three-layer data market
    00:38:07 Social knowledge, markets, and culture
    00:45:39 Creator economics beyond Spotify
    00:48:30 How science-fiction AI narratives mislead young builders
    00:51:45 AI should improve humans, not replace them
    00:56:42 Safety is a property of the whole system
    00:58:12 Silicon Valley gurus and the cream off the top
    01:00:47 Game theory, mechanism design, and contracts
    01:04:39 Conformal prediction, e-values, and anytime inference
    01:08:11 A new liberal arts triangle for the AI era
    01:11:30 The Bayesian duck and markets as uncertainty reduction

    ReScript (transcript, PDF, refs etc) - https://app.rescript.info/public/share/fb68f94af29d3745c6cf6125e01328b5
    ---
    REFERENCES:
    person:
    [00:02:50] Michael I. Jordan (homepage)
    https://people.eecs.berkeley.edu/~jordan/
    paper:
    [00:06:01] A Collectivist, Economic Perspective on AI
    https://arxiv.org/abs/2507.06268
    [00:18:09] AlphaFold
    https://www.nature.com/articles/s41586-021-03819-2
    [00:20:36] Prediction-Powered Inference
    https://arxiv.org/abs/2301.09633
    [00:33:47] On Three-Layer Data Markets
    https://arxiv.org/abs/2402.09697
    [01:04:39] Conformal Prediction with Conditional Guarantees
    https://arxiv.org/abs/2107.07511
    [01:04:51] A Tutorial on Conformal Prediction
    https://www.jmlr.org/papers/v9/shafer08a.html
    [01:06:00] E-Values Expand the Scope of Conformal Prediction
    https://arxiv.org/abs/2503.13050
    [01:08:23] Computational Thinking
    https://www.cs.cmu.edu/~CompThink/papers/Wing06.pdf
    other:
    [00:28:20] How Should the FDA Test?
    https://rdi.berkeley.edu/events/sbc-assets/pdfs/Summit%20session%20speaker%20slides%20submission%20form-s1-5%20%28File%20responses%29/Slides%20in%20PDF%20%28Please%20name%20the%20submitted%20file%20as%20_firstname_-_lastname_-slides.pdf%29.%20%28File%20responses%29/27-Michael%20Jordan-Session%20V.pdf#page=15
    [00:28:40] Michael I. Jordan Session V Slides
    <truncated, see ReScript link or YT VD>
More Technology podcasts
About Machine Learning Street Talk (MLST)
Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).
Podcast website

Listen to Machine Learning Street Talk (MLST), Lex Fridman Podcast and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features