Powered by RND
PodcastsBusinessLatent Space: The AI Engineer Podcast

Latent Space: The AI Engineer Podcast

swyx + Alessio
Latent Space: The AI Engineer Podcast
Latest episode

Available Episodes

5 of 140
  • Personalized AI Language Education — with Andrew Hsu, Speak
    Speak (https://speak.com) may not be very well known to native English speakers, but they have come from a slow start in 2016 to emerge as one of the favorite partners of OpenAI, with their Startup Fund leading and joining their Series B and C as one of the new AI-native unicorns, noting that “Speak has the potential to revolutionize not just language learning, but education broadly”. Today we speak with Speak’s CTO, Andrew Hsu, on the journey of building the “3rd generation” of language learning software (with Rosetta Stone being Gen 1, and Duolingo being Gen 2). Speak’s premise is that speech and language models can now do what was previously only possible with human tutors—provide fluent, responsive, and adaptive instruction—and this belief has shaped its product and company strategy since its early days. https://www.linkedin.com/in/adhsu/ https://speak.com One of the most interesting strategic decisions discussed in the episode is Speak’s early focus on South Korea. While counterintuitive for a San Francisco-based startup, the decision was influenced by a combination of market opportunity and founder proximity via a Korean first employee. South Korea’s intense demand for English fluency and a highly competitive education market made it a proving ground for a deeply AI-native product. By succeeding in a market saturated with human-based education solutions, Speak validated its model and built strong product-market fit before expanding to other Asian markets and eventually, globally. The arrival of Whisper and GPT-based LLMs in 2022 marked a turning point for Speak. Suddenly, capabilities that were once theoretical—real-time feedback, semantic understanding, conversational memory—became technically feasible. Speak didn’t pivot, but rather evolved into its second phase: from a supplemental practice tool to a full-featured language tutor. This transition required significant engineering work, including building custom ASR models, managing latency, and integrating real-time APIs for interactive lessons. It also unlocked the possibility of developing voice-first, immersive roleplay experiences and a roadmap to real-time conversational fluency. To scale globally and support many languages, Speak is investing heavily in AI-generated curriculum and content. Instead of manually scripting all lessons, they are building agents and pipelines that can scaffold curriculum, generate lesson content, and adapt pedagogically to the learner. This ties into one of Speak’s most ambitious goals: creating a knowledge graph that captures what a learner knows and can do in a target language, and then adapting the course path accordingly. This level-adjusting tutor model aims to personalize learning at scale and could eventually be applied beyond language learning to any educational domain. Finally, the conversation touches on the broader implications of AI-powered education and the slow real-world adoption of transformative AI technologies. Despite the capabilities of GPT-4 and others, most people’s daily lives haven’t changed dramatically. Speak sees itself as part of the generation of startups that will translate AI’s raw power into tangible consumer value. The company is also a testament to long-term conviction—founded in 2016, it weathered years of slow growth before AI caught up to its vision. Now, with over $50M ARR, a growing B2B arm, and plans to expand across languages and learning domains, Speak represents what AI-native education could look like in the next decade. Chapters 00:00:00 Introductions & Thiel Fellowship Origins 00:02:13 Genesis of Speak: Early Vision & Market Focus 00:03:44 Building the Product: Iterations and Lessons Learned 00:10:59 AI’s Role in Language Learning 00:13:49 Scaling Globally & B2B Expansion 00:16:30 Why Korea? Localizing for Success 00:19:08 Content Creation, The Speak Method, and Engineering Culture 00:23:31 The Impact of Whisper and LLM Advances 00:29:08 AI-Generated Content & Measuring Fluency 00:35:30 Personalization, Dialects, and Pronunciation 00:39:38 Immersive Learning, Multimodality, and Real-Time Voice 00:50:02 Engineering Challenges & Company Culture 00:53:20 Beyond Languages: B2B, Knowledge Graphs, and Broader Learning 00:57:32 Fun Stories, Lessons, and Reflections 01:02:03 Final Thoughts: The Future of AI Learning & Slow Takeoff
    --------  
    1:04:09
  • AI Video Is Eating The World — Olivia and Justine Moore, a16z
    When the first video diffusion models started emerging, they were little more than just “moving pictures” - still frames extended a few seconds in either direction in time. There was a ton of excitement about OpenAI’s Sora on release through 2024, but so far only Sora-lite has been widely released. Meanwhile, other good videogen models like Genmo Mochi, Pika, MiniMax T2V, Tencent Hunyuan Video, and Kuaishou’s Kling have emerged, but the reigning king this year seems to be Google’s Veo 3, which for the first time has added native audio generation into their model capabilities, eliminating the need for a whole class of lipsynching tooling and SFX editing. The rise of Veo 3 unlocks a whole new category of AI Video creators that many of our audience may not have been exposed to, but is undeniably effective and important particularly in the “kids” and “brainrot” segments of the global consumer internet platforms like Tiktok, YouTube and Instagram. By far the best documentarians of these trends for laypeople are Olivia and Justine Moore, both partners at a16z, who not only collate the best examples from all over the web, but dabble in video creation themselves to put theory into practice. We’ve been thinking of dabbling in AI brainrot on a secondary channel for Latent Space, so we wanted to get the braindump from the Moore twins on how to make a Latent Space Brainrot channel. Jump on in! Chapters 00:00:00 Introductions & Guest Welcome 00:00:49 The Rise of Generative Media 00:02:24 AI Video Trends: Italian Brain Rot & Viral Characters 00:05:00 Following Trends & Creating AI Content 00:07:17 Hands-On with AI Video Creation 00:18:36 Monetization & Business of AI Content 00:23:34 Platforms, Models, and the Creator Stack 00:37:22 Native Content vs. Clipping & Going Viral 00:41:52 Prompt Theory & Meta-Trends in AI Creativity 00:47:42 Professional, Commercial, and Platform-Specific AI Video 00:48:57 Wrap-Up & Final Thoughts
    --------  
    49:27
  • Information Theory for Language Models: Jack Morris
    Our last AI PhD grad student feature was Shunyu Yao, who happened to focus on Language Agents for his thesis and immediately went to work on them for OpenAI. Our pick this year is Jack Morris, who bucks the “hot” trends by -not- working on agents, benchmarks, or VS Code forks, but is rather known for his work on the information theoretic understanding of LLMs, starting from embedding models and latent space representations (always close to our heart). Jack is an unusual combination of doing underrated research but somehow still being to explain them well to a mass audience, so we felt this was a good opportunity to do a different kind of episode going through the greatest hits of a high profile AI PhD, and relate them to questions from AI Engineering. Papers and References made AI grad school: https://x.com/jxmnop/status/1933884519557353716A new type of information theory: https://x.com/jxmnop/status/1904238408899101014EmbeddingsText Embeddings Reveal (Almost) As Much As Text: https://arxiv.org/abs/2310.06816Contextual document embeddings https://arxiv.org/abs/2410.02525Harnessing the Universal Geometry of Embeddings: https://arxiv.org/abs/2505.12540Language modelsGPT-style language models memorize 3.6 bits per param: https://x.com/jxmnop/status/1929903028372459909Approximating Language Model Training Data from Weights: https://arxiv.org/abs/2506.15553https://x.com/jxmnop/status/1936044666371146076LLM Inversion"There Are No New Ideas In AI.... Only New Datasets"https://x.com/jxmnop/status/1910087098570338756https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-onlymisc reference: https://junyanz.github.io/CycleGAN/ — for others hiring AI PhDs, Jack also wanted to shout out his coauthor Zach Nussbaum, his coauthor on Nomic Embed: Training a Reproducible Long Context Text Embedder.
    --------  
    1:18:13
  • Scaling Test Time Compute to Multi-Agent Civilizations — Noam Brown, OpenAI
    Solving Poker and Diplomacy, Debating RL+Reasoning with Ilya, what's *wrong* with the System 1/2 analogy, and where Test-Time Compute hits a wall Timestamps 00:00 Intro – Diplomacy, Cicero & World Championship 02:00 Reverse Centaur: How AI Improved Noam’s Human Play 05:00 Turing Test Failures in Chat: Hallucinations & Steerability 07:30 Reasoning Models & Fast vs. Slow Thinking Paradigm 11:00 System 1 vs. System 2 in Visual Tasks (GeoGuessr, Tic-Tac-Toe) 14:00 The Deep Research Existence Proof for Unverifiable Domains 17:30 Harnesses, Tool Use, and Fragility in AI Agents 21:00 The Case Against Over-Reliance on Scaffolds and Routers 24:00 Reinforcement Fine-Tuning and Long-Term Model Adaptability 28:00 Ilya’s Bet on Reasoning and the O-Series Breakthrough 34:00 Noam’s Dev Stack: Codex, Windsurf & AGI Moments 38:00 Building Better AI Developers: Memory, Reuse, and PR Reviews 41:00 Multi-Agent Intelligence and the “AI Civilization” Hypothesis 44:30 Implicit World Models and Theory of Mind Through Scaling 48:00 Why Self-Play Breaks Down Beyond Go and Chess 54:00 Designing Better Benchmarks for Fuzzy Tasks 57:30 The Real Limits of Test-Time Compute: Cost vs. Time 1:00:30 Data Efficiency Gaps Between Humans and LLMs 1:03:00 Training Pipeline: Pretraining, Midtraining, Posttraining 1:05:00 Games as Research Proving Grounds: Poker, MTG, Stratego 1:10:00 Closing Thoughts – Five-Year View and Open Research Directions Chapters 00:00:00 Intro & Guest Welcome 00:00:33 Diplomacy AI & Cicero Insights 00:03:49 AI Safety, Language Models, and Steerability 00:05:23 O Series Models: Progress and Benchmarks 00:08:53 Reasoning Paradigm: Thinking Fast and Slow in AI 00:14:02 Design Questions: Harnesses, Tools, and Test Time Compute 00:20:32 Reinforcement Fine-tuning & Model Specialization 00:21:52 The Rise of Reasoning Models at OpenAI 00:29:33 Data Efficiency in Machine Learning 00:33:21 Coding & AI: Codex, Workflows, and Developer Experience 00:41:38 Multi-Agent AI: Collaboration, Competition, and Civilization 00:45:14 Poker, Diplomacy & Exploitative vs. Optimal AI Strategy 00:52:11 World Models, Multi-Agent Learning, and Self-Play 00:58:50 Generative Media: Image & Video Models 01:00:44 Robotics: Humanoids, Iteration Speed, and Embodiment 01:04:25 Rapid Fire: Research Practices, Benchmarks, and AI Progress 01:14:19 Games, Imperfect Information, and AI Research Directions
    --------  
  • The Shape of Compute (Chris Lattner of Modular)
    Chris Lattner of Modular (https://modular.com) joined us (again!) to talk about how they are breaking the CUDA monopoly, what it took to match NVIDIA performance with AMD, and how they are building a company of "elite nerds". X: https://x.com/latentspacepod Substack: https://latent.space 00:00:00 Introductions 00:00:12 Overview of Modular and the Shape of Compute 00:02:27 Modular’s R&D Phase 00:06:55 From CPU Optimization to GPU Support 00:11:14 MAX: Modular’s Inference Framework 00:12:52 Mojo Programming Language 00:18:25 MAX Architecture: From Mojo to Cluster-Scale Inference 00:29:16 Open Source Contributions and Community Involvement 00:32:25 Modular's Differentiation from VLLM and SGLang 00:41:37 Modular’s Business Model and Monetization Strategy 00:53:17 DeepSeek’s Impact and Low-Level GPU Programming 01:00:00 Inference Time Compute and Reasoning Models 01:02:31 Personal Reflections on Leading Modular 01:08:27 Daily Routine and Time Management as a Founder 01:13:24 Using AI Coding Tools and Staying Current with Research 01:14:47 Personal Projects and Work-Life Balance 01:17:05 Hiring, Open Source, and Community Engagement
    --------  

More Business podcasts

About Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space
Podcast website

Listen to Latent Space: The AI Engineer Podcast, Hot Money: Agent of Chaos and many other podcasts from around the world with the radio.net app

Get the free radio.net app

  • Stations and podcasts to bookmark
  • Stream via Wi-Fi or Bluetooth
  • Supports Carplay & Android Auto
  • Many other app features
Social
v7.20.2 | © 2007-2025 radio.de GmbH
Generated: 7/13/2025 - 12:10:13 PM