Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where "everything is vision." Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more. Chapters:0:00 - Intro1:12 - Why Gemini is natively multimodal2:23 - The technology behind multimodal models5:15 - Video understanding with Gemini 2.59:25 - Deciding what to build next13:23 - Building new product experiences with multimodal AI17:15 - The vision for proactive assistants24:13 - Improving video usability with variable FPS and frame tokenization27:35 - What’s next for Gemini’s multimodal development31:47 - Deep dive on Gemini’s document understanding capabilities37:56 - The teamwork and collaboration behind Gemini40:56 - What’s next with model behaviorWatch on YouTube: https://www.youtube.com/watch?v=K4vXvaRV0dw
--------
44:17
--------
44:17
Building Gemini's Coding Capabilities
Connie Fan, Product Lead for Gemini's coding capabilities, and Danny Tarlow, Research Lead for Gemini's coding capabilities, join host Logan Kilpatrick for an in-depth discussion on how the team built one of the world's leading AI coding models. Learn more about the early goals that shaped Gemini's approach to code, the rise of 'vibe coding' and its impact on development, strategies for tackling large codebases with long context and agents, and the future of programming languages in the age of AI.Watch on YouTube: https://www.youtube.com/watch?v=jwbG_m-X-gEChapters:0:00 - Intro1:10 - Defining Early Coding Goals6:23 - Ingredients of a Great Coding Model9:28 - Adapting to Developer Workflows11:40 - The Rise of Vibe Coding14:43 - Code as a Reasoning Tool17:20 - Code as a Universal Solver20:47 - Evaluating Coding Models24:30 - Leveraging Internal Googler Feedback26:52 - Winning Over AI Skeptics28:04 - Performance Across Programming Languages33:05 - The Future of Programming Languages36:16 - Strategies for Large Codebases41:06 - Hill Climbing New Benchmarks42:46 - Short-Term Improvements44:42 - Model Style and Taste47:43 - 2.5 Pro’s Breakthrough51:06 - Early AI Coding Experiences56:19 - Specialist vs. Generalist Models
--------
1:00:27
--------
1:00:27
Sergey Brin on the Future of AI & Gemini
A conversation with Sergey Brin, co-founder of Google and computer scientist working on Gemini, in reaction to a year of progress with Gemini.Watch on YouTube: https://www.youtube.com/watch?v=o7U4DV9Fkc0Chapters0:20 - Initial reactions to I/O2:00 - Focus on Gemini’s core text model4:29 - Native audio in Gemini and Veo 38:34 - Insights from model training runs10:07 - Surprises in current AI developments vs. past expectations14:20 - Evolution of model training16:40 - The future of reasoning and Deep Think20:19 - Google’s startup culture and accelerating AI innovation24:51 - Closing
--------
27:19
--------
27:19
Google I/O 2025 Recap with Josh Woodward and Tulsee Doshi
Learn moreAI Studio: https://aistudio.google.com/Gemini Canvas: https://gemini.google.com/canvasMariner: https://labs.google.com/mariner/Gemini Ultra: https://one.google.com/about/google-a...Jules: https://jules.google/Gemini Diffusion: https://deepmind.google/models/gemini...Flow: https://labs.google/flow/aboutNotebook LM: https://notebooklm.google.com/Stitch: https://stitch.withgoogle.com/Chapters0:59 - I/O Day 1 Recap02:48 - Envisioning I/O 203008:11 - AI for Scientific Breakthroughs09:20 - Veo 3 & Flow7:35 - Gemini Live & the Future of Proactive Assistants20:30 - Gemini in Chrome & Future Apps22:28 - New Gemini Models: DeepThink, Diffusion & 2.5 Flash/Pro Updates27:19 - Developer Momentum & Feedback Loop31:50 - New Developer Products: Jules, Stitch & CodeGen in AI Studio37:44 - Evolving Product Development Process with AI39:23 - Closing
--------
40:15
--------
40:15
Deep Dive into Long Context
Explore the synergy between long context models and Retrieval Augmented Generation (RAG) in this episode of Release Notes. Join Google DeepMind's Nikolay Savinov as he discusses the importance of large context windows, how they enable Al agents, and what's next in the field.Chapters:0:52 Introduction & defining tokens5:27 Context window importance9:53 RAG vs. Long Context14:19 Scaling beyond 2 million tokens18:41 Long context improvements since 1.5 Pro release23:26 Difficulty of attending to the whole context28:37 Evaluating long context: beyond needle-in-a-haystack33:41 Integrating long context research34:57 Reasoning and long outputs40:54 Tips for using long context48:51 The future of long context: near-perfect recall and cost reduction54:42 The role of infrastructure56:15 Long-context and agents
Ever wondered what it's really like to build the future of AI? Join host Logan Kilpatrick for a deep dive into the world of Google AI, straight from the minds of the builders. We're pulling back the curtain on the latest breakthroughs, sharing the unfiltered stories behind the tech, and answering the questions you've been dying to ask.
Whether you're a seasoned developer or an AI enthusiast, this podcast is your backstage pass to the cutting-edge of AI technology. Tune in for:
- Exclusive interviews with AI pioneers and industry leaders.
- In-depth discussions on the latest AI trends and developments.
- Behind-the-scenes stories and anecdotes from the world of AI.
- Unfiltered insights and opinions from the people shaping the future.
So, if you're ready to go beyond the headlines and get the real scoop on AI, join Logan Kilpatrick on Google AI: Release Notes.