DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever
At OpenAI DevDay, we sit down with Sherwin Wu and Christina Cai from the OpenAI Platform Team to discuss the launch of AgentKit - a comprehensive suite of tools for building, deploying, and optimizing AI agents. Christina walks us through the live demo she performed on stage, building a customer support agent in just 8 minutes using the visual Agent Builder, while Sherwin shares insights on how OpenAI is inverting the traditional website-chatbot paradigm by embedding apps directly within ChatGPT through the new Apps SDK.
The conversation explores how OpenAI is tackling the challenges developers face when taking agents to production - from writing and optimizing prompts to building evaluation pipelines. They discuss the decision to adopt Anthropic's MCP protocol for tool connectivity, the importance of visual workflows for complex agent systems, and how features like human-in-the-loop approvals and automated prompt optimization are making agent development more accessible to a broader range of developers.
Sherwin and Christina also reveal how OpenAI is dogfooding these tools internally, with their own customer support at openai.com already powered by AgentKit, and share candid insights about the evolution from plugins to GPTs to this new agent platform. They discuss the surprising persistence of prompting as a critical skill (contrary to predictions from two years ago), the challenges of serving custom fine-tuned models at scale, and why they believe visual agent builders are essential as workflows grow to span dozens of nodes.
Guests:
Sherwin Wu: Head of Engineering, OpenAI Platform https://www.linkedin.com/in/sherwinwu1/ https://x.com/sherwinwu?lang=en
Christina Huang: Platform Experience, OpenAI https://x.com/christinaahuang https://www.linkedin.com/in/christinaahuang/
Thanks very much to Lindsay and Shaokyi for helping us set up this great deepdive into the new DevDay launches!
Key Topics:
• AgentKit launch: Agent SDK, Builder, Evals, and deployment tools
• Apps SDK and the inversion of the app-chatbot paradigm
• Adopting MCP protocol for universal tool connectivity
• Visual agent building vs code-first approaches
• Human-in-the-loop workflows and approval systems
• Automated prompt optimization and "zero-gradient fine-tuning"
• Service Health Dashboard and achieving five nines reliability
• ChatKit as an embeddable, evergreen chat interface
• The evolution from plugins to GPTs to agent platforms
• Internal dogfooding with Codex and agent-powered support
--------
--------
Taste is your Moat (Dylan Field of Figma)
Dylan Field (CEO Figma) on how they are letting designers build with Figma Make, how Figma can be the context repository for aesthetic in the age of vibe coding, and why design is your only differentiator now.
Full show notes: https://www.latent.space/p/figma
00:00 Figma’s Mission: Bridging Imagination and Reality
00:56 Becoming AI-Pilled
07:44 Figma Make
08:57 Language as the Interface for Design
13:37 Source of truth between design and code
18:15 Figma as a Context Repository
21:30 Understanding and Representing Design Diffs through AI
24:20 Figma’s Role in Shaping Visual Aesthetics
31:56 Fast Fashion in Software
36:04 Limitations of Prompt-Based Software Creation
39:43 Interfaces Beyond Chat
42:12 Lessons from the Thiel Fellowship
44:58 Using X for Product Feedback
48:10 Early-Stage Recruiting at Figma
53:11 Positioning Figma Make in the Prompt-to-App Landscape
55:19 Digital Scarcity & AI
--------
--------
Amp: The Emperor Has No Clothes
Quinn Slack (CEO) and Thorsten Ball (Amp Dictator) from SourceGraph join the show to talk about Amp Code, how they ship 15x/day with no code reviews, and why subagents and prompt optimizers aren’t a promising direction for coding agents.
Amp Code: https://ampcode.com/
Latent Space: https://latent.space/
00:00 Introduction
00:41 Transition from Cody to Amp
03:18 The Importance of Building the Best Coding Agent
06:43 Adapting to a Rapidly Evolving AI Tooling Landscape
09:36 Dogfooding at Sourcegraph
12:35 CLI vs. VS Code Extension
21:08 Positioning Amp in Coding Agent Market
24:10 The Diminishing Importance of Model Selectors
32:39 Tooling vs. Harness
37:19 Common Failure Modes of Coding Agents
47:33 Agent-Friendly Logging and Tooling
52:31 Are Subagents Real?
56:52 New Frameworks and Agent-Integrated Developer Tools
1:00:25 How Agents Are Encouraging Codebase and Workflow Changes
1:03:13 Evolving Outer Loop Tasks
1:07:09 Version Control and Merge Conflicts in an AI-First World
1:10:36 Rise of User-Generated Enterprise Software
1:14:39 Empowering Technical Leaders with AI
1:17:11 Evaluating Product Without Traditional Evals
1:20:58 Hiring
--------
--------
Context Engineering for Agents - Lance Martin, LangChain
Lance: https://www.linkedin.com/in/lance-martin-64a33b5/
How Context Fails: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html
How New Buzzwords Get Created: https://www.dbreunig.com/2025/07/24/why-the-term-context-engineering-matters.html
Content Engineering: https://x.com/RLanceMartin/status/1948441848978309358 https://rlancemartin.github.io/2025/06/23/context_engineering/ https://docs.google.com/presentation/d/16aaXLu40GugY-kOpqDU4e-S0hD1FmHcNyF0rRRnb1OU/edit?usp=sharing
Manus Post: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus
Cognition Post: https://cognition.ai/blog/dont-build-multi-agents
Multi-Agent Researcher: https://www.anthropic.com/engineering/multi-agent-research-system
Human-in-the-loop + Memory: https://github.com/langchain-ai/agents-from-scratch
- Bitter Lesson in AI Engineering -
Hyung Won Chung on the Bitter Lesson in AI Research: https://www.youtube.com/watch?v=orDKvo8h71o
Bitter Lesson w/ Claude Code: https://www.youtube.com/watch?v=Lue8K2jqfKk&t=1s
Learning the Bitter Lesson in AI Engineering: https://rlancemartin.github.io/2025/07/30/bitter_lesson/
Open Deep Research: https://github.com/langchain-ai/open_deep_research https://academy.langchain.com/courses/deep-research-with-langgraph
Scaling and building things that "don't yet work": https://www.youtube.com/watch?v=p8Jx4qvDoSo
- Frameworks -
Roast framework at Shopify / standardization of orchestration tools: https://www.youtube.com/watch?v=0NHCyq8bBcM
MCP adoption within Anthropic / standardization of protocols: https://www.youtube.com/watch?v=xlEQ6Y3WNNI
How to think about frameworks: https://blog.langchain.com/how-to-think-about-agent-frameworks/
RAG benchmarking: https://rlancemartin.github.io/2025/04/03/vibe-code/
Simon's talk with memory-gone-wrong: https://simonwillison.net/2025/Jun/6/six-months-in-llms/
--------
--------
A Technical History of Generative Media
Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy!
00:00 - Introductions
04:58 - History of Major AI Models and Their Impact on Fal.ai
07:06 - Pivoting to Generative Media and Strategic Business Decisions
10:46 - Technical discussion on CUDA optimization and kernel development
12:42 - Inference Engine Architecture and Kernel Reusability
14:59 - Performance Gains and Latency Trade-offs
15:50 - Discussion of model latency importance and performance optimization
17:56 - Importance of Latency and User Engagement
18:46 - Impact of Open Source Model Releases and Competitive Advantage
19:00 - Partnerships with closed source model developers
20:06 - Collaborations with Closed-Source Model Providers
21:28 - Serving Audio Models and Infrastructure Scalability
22:29 - Serverless GPU infrastructure and technical stack
23:52 - GPU Prioritization: H100s and Blackwell Optimization
25:00 - Discussion on ASICs vs. General Purpose GPUs
26:10 - Architectural Trends: MMDiTs and Model Innovation
27:35 - Rise and Decline of Distillation and Consistency Models
28:15 - Draft Mode and Streaming in Image Generation Workflows
29:46 - Generative Video Models and the Role of Latency
30:14 - Auto-Regressive Image Models and Industry Reactions
31:35 - Discussion of OpenAI's Sora and competition in video generation
34:44 - World Models and Creative Applications in Games and Movies
35:27 - Video Models’ Revenue Share and Open-Source Contributions
36:40 - Rise of Chinese Labs and Partnerships
38:03 - Top Trending Models on Hugging Face and ByteDance's Role
39:29 - Monetization Strategies for Open Models
40:48 - Usage Distribution and Model Turnover on FAL
42:11 - Revenue Share vs. Open Model Usage Optimization
42:47 - Moderation and NSFW Content on the Platform
44:03 - Advertising as a key use case for generative media
45:37 - Generative Video in Startup Marketing and Virality
46:56 - LoRA Usage and Fine-Tuning Popularity
47:17 - LoRA ecosystem and fine-tuning discussion
49:25 - Post-Training of Video Models and Future of Fine-Tuning
50:21 - ComfyUI Pipelines and Workflow Complexity
52:31 - Requests for startups and future opportunities in the space
53:33 - Data Collection and RedPajama-Style Initiatives for Media Models
53:46 - RL for Image and Video Models: Unknown Potential
55:11 - Requests for Models: Editing and Conversational Video Models
57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing
58:23 - Bitter Lesson and the Future of Model Workflows
58:44 - FAL's hiring approach and team structure
59:29 - Team Structure and Scaling Applied ML and Performance Teams
1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration
1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks
1:04:02 - Closing Remarks and Culture at FAL
The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space