This episode is sponsored by tastytrade.
Trade stocks, options, futures, and crypto in one platform with low commissions and zero commission on stocks and crypto. Built for traders who think in probabilities, tastytrade offers advanced analytics, risk tools, and an AI-powered Search feature.
Learn more at https://tastytrade.com/
Voice AI is moving far beyond transcription.
In this episode, Carter Huffman, CTO and co-founder of Modulate, explains how real-time voice intelligence is unlocking something much bigger than speech-to-text. His team built AI that understands emotion, intent, deception, harassment, and fraud directly from live conversations. Not after the fact. Instantly.
Carter shares how their technology powers ToxMod to moderate toxic behavior in online games at massive scale, analyzes millions of audio streams with ultra-low latency, and beats foundation models using an ensemble architecture that is faster, cheaper, and more accurate. We also explore voice deepfake detection, scam prevention, sentiment analysis for finance, and why voice might become the most important signal layer in AI.
If you're building voice agents, working on AI safety, or curious where conversational AI is heading next, this conversation breaks down the technical and practical future of voice understanding.
Stay Updated:
Craig Smith on X: https://x.com/craigss
Eye on A.I. on X: https://x.com/EyeOn_AI
(00:00) Real-Time Voice AI: Detecting Emotion, Intent & Lies
(03:07) From MIT & NASA to Building Modulate
(04:45) Why Voice AI Is More Than Just Transcription
(06:14) The Toxic Gaming Problem That Sparked ToxMod
(12:37) Inside the Tech: How "Ensemble Models" Beat Foundation Models
(21:09) Achieving Ultra-Low Latency & Real-Time Performance
(26:16) From Voice Skins to Fighting Harassment at Scale
(37:31) Beyond Gaming: Fraud, Deepfakes & Voice Security
(46:14) Privacy, Ethics & Voice Fingerprinting Risks
(52:10) Lie Detection, Sentiment & Finance Use Cases
(54:57) Opening the API: The Future of Voice Intelligence