Podcasts EducationData Engineering Podcast

Listen to this podcast in the app for free:

radio.net

Sleep timer

Save favourites

Download for free in the App Store

Data Engineering Podcast

Tobias Macey

Education Technology

Latest episode

505 episodes

Orion at Gravity: Trustworthy AI Analysts for the Enterprise
08/03/2026 | 1h 5 mins.
Summary
In this episode of the Data Engineering Podcast, Lucas Thelosen and Drew Gilson, co-founders of Gravity, discuss their vision for agentic analytics in the enterprise, enabled by semantic layers and broader context engineering. They share their journey from Looker and Google to building Orion, an AI analyst that combines data semantics with rich business context to deliver trustworthy and actionable insights. Lucas and Drew explain how Orion uses governed, role-specific "custom agents" to drive analysis, recommendations, and proactive preparation for meetings, while maintaining accuracy, lineage transparency, and human-in-the-loop feedback. The conversation covers evolving views on semantic layers, agent memory, retrieval, and operating across messy data, multiple warehouses, and external context like documents and weather. They emphasize the importance of trust, governance, and the path to AI coworkers that act as reliable colleagues. Lucas and Drew also share field stories from public companies where Orion has surfaced board-level issues, accelerated executive prep with last-minute research, and revealed how BI investments are actually used, highlighting a shift from static dashboards to dynamic, dialog-driven decisions. They stress the need for accessible (non-proprietary) models, managing context and technical debt over time, and focusing on business actions - not just metrics - to unlock real ROI.

Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at dataengineeringpodcast.com/retool today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.
Your host is Tobias Macey and today I'm interviewing Lucas Thelosen and Drew Gilson about the application of semantic layers to context engineering for agentic analytics

Interview

Introduction
How did you get involved in the area of data management?
Can you start by digging into the practical elements of what is involved in the creation and maintenance of a "semantic layer"?
How does the semantic layer relate to and differ from the physical schema of a data warehouse?
In generative AI and agentic systems the latest term of art is "context engineering". How does a semantic layer factor into the context management for an agentic analyst?
What are some of the ways that LLMs/agents can help to populate the semantic layer?
What are the cases where you want to guard against hallucinations by keeping a human in the loop?
Beyond a physical semantic layer, what are the other elements of context that you rely on for guiding the activities of your agents?
What are some utilities that you have found helpful for bootstrapping the structural guidelines for an existing warehouse environment?
What are the most interesting, innovative, or unexpected ways that you have seen Orion used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Orion?
When is Orion the wrong choice?
What do you have planned for the future of Orion?

Contact Info

LucasLinkedIn

DrewLinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.

Links

Gravity
Orion
Looker
Semantic Layer
dbt
LookML
Tableau
OpenClaw
Pareto Distribution

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
From Models to Momentum: Uniting Architects and Engineers with ER/Studio
02/03/2026 | 45 mins.
Summary
In this episode of the Data Engineering Podcast, Jamie Knowles (Product Director) and Ryan Hirsch (Product Marketing Manager) discuss the importance of enterprise data modeling with ER/Studio. They highlight how clear, shared semantic models are a foundational discipline for modern data engineering, preventing semantic drift, speeding up delivery, and reducing rework. Jamie explains that ER/Studio helps teams define logical models that translate into physical designs and code across warehouses and analytics platforms, while maintaining traceability and governance. The conversation also touches on how AI increases the tolerance for ambiguity, but doesn't fix unclear definitions - it amplifies them. Jamie and Ryan describe ER/Studio's integrations with governance tools, collaboration features like TeamServer, reverse engineering, and metadata bridges, as well as new AI-assisted modeling capabilities. They emphasize that most data problems are meaning problems, and investing in architecture and a semantic backbone can make engineering faster, governance simpler, and analytics more reliable.

Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at dataengineeringpodcast.com/retool today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.
Your host is Tobias Macey and today I'm interviewing Jamie Knowles and Ryan Hirsch about ER/Studio and the foundational role of enterprise data modeling in modern data engineering.

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what ER/Studio is and the story behind it?
How has it evolved to handle the shift from traditional on-prem databases to modern, complex, and highly regulated enterprise environments?
How do you define "Enterprise Data Architecture" today, and how does it differ from just managing a collection of pipelines in a modern data stack?
In your view, what are the distinct responsibilities of a Data Architect versus a Data Engineer, and where is the critical overlap where they typically succeed or fail together?
From what you see in the field, how often are the technical struggles of data engineering teams—like tool sprawl or "broken" pipelines—actually just "data meaning" problems in disguise?
What is a logical data model, and why do you advocate for framing these as "knowledge models" rather than just technical diagrams?
What are the long-term consequences, such as "semantic drift" or the erosion of trust, when organizations skip logical modeling to go straight to physical implementation and pipelines?
What is the intersection of data modeling and data governance?
What are the elements of integration between ER/Studio and governance platforms that reduce friction and time to delivery?
For the engineers who worry that architecture and modeling slow down development, how does having a central design authority actually help teams scale and reduce downstream rework?
What does a typical workflow look like across data architecture and data engineering for individuals and teams who are using ER/Studio as a core part of their modeling?
What are the most interesting, innovative, or unexpected ways that you have seen ER/Studio used? * Context: Specifically regarding grounding AI initiatives or defining enterprise ontologies.
What are the most interesting, unexpected, or challenging lessons that you have learned while working on ER/Studio?
When is ER/Studio the wrong choice for a data team or a specific project?
What do you have planned for the future of ER/Studio, particularly regarding AI and the "design-time" foundation of the data stack?

Contact Info

Jamie
LinkedIn
Ryan
LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.

Links

Idera
Wherescape
ER/Studio
Entity-Relation Diagram (ERD)
Business Keys
Medallion Architecture
RDF == Resource Description Framework
Collibra
Martin Fowler
DB2

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
From Data Models to Mind Models: Designing AI Memory at Scale
22/02/2026 | 57 mins.
Summary
In this episode of the Data Engineering Podcast, Vasilije "Vas" Markovich, founder of Cognee, discusses building agentic memory, a crucial aspect of artificial intelligence that enables systems to learn, adapt, and retain knowledge over time. He explains the concept of agentic memory, highlighting the importance of distinguishing between permanent and session memory, graph+vector layers, latency trade-offs, and multi-tenant isolation to ensure safe knowledge sharing or protection. The conversation covers practical considerations such as storage choices (Redis, Qdrant, LanceDB, Neo4j), metadata design, temporal relevance and decay, and emerging research areas like trace-based scoring and reinforcement learning for improving retrieval. Vas shares real-world examples of agentic memory in action, including applications in pharma hypothesis discovery, logistics control towers, and cybersecurity feeds, as well as scenarios where simpler approaches may suffice. He also offers guidance on when to add memory, pitfalls to avoid (naive summarization, uncontrolled fine-tuning), human-in-the-loop realities, and Cognee's future plans: revamped session/long-term stores, decision-trace research, and richer time and transformation mechanisms. Additionally, Vas touches on policy guardrails for agent actions and the potential for more efficient "pseudo-languages" for multi-agent collaboration.

Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at dataengineeringpodcast.com/retool today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.
Your host is Tobias Macey and today I'm interviewing Vasilije Markovic about agentic memory architectures and applications

Interview

Introduction
How did you get involved in the area of data management?
Can you start by giving an overview of the different elements of "memory" in an agentic context?
storage and retrieval mechanisms
how to model memories
how does that change as you go from short-term to long-term?
managing scope and retrieval triggers
What are some of the useful triggers in an agent architecture to identify whether/when/what to create a new memory?
How do things change as you try to build a shared corpus of memory across agents?
What are the most interesting, innovative, or unexpected ways that you have seen agentic memory used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cognee?
When is a dedicated memory layer the wrong choice?
What do you have planned for the future of Cognee?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.

Links

Cognee
AI Engineering Podcast Episode
[Kimball Memory](
Cognitive Science
Context Window
RAG == Retrieval Augmented Generation
Memory Types
Redis Vector Store
Qdrant
Vector on Edge
Milvus
LanceDB
KuzuDB
Neo4J
Mem0
Zepp Graphiti
A2A (Agent-to-Agent) Protocol
Snowplow
Reinforcement Learning
Model Finetuning
OpenClaw

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Prompt Management, Tracing, and Evals: The New Table Stakes for GenAI Ops
15/02/2026 | 50 mins.
Summary
In this episode of the Data Engineering Podcast, Aman Agarwal, creator of OpenLit, discusses the operational groundwork required to run LLM-powered applications reliably and cost-effectively. He highlights common blind spots that teams face, including opaque model behavior, runaway token costs, and brittle prompt management, and explains how OpenTelemetry-native observability can turn these black-box interactions into stepwise, debuggable traces across models, tools, and data stores. Aman showcases OpenLit's approach to open standards, vendor-neutral integrations, and practical features such as fleet-managed OTEL collectors, zero-code Kubernetes instrumentation, prompt and secret management, and evaluation workflows. They also explore experimentation patterns, routing across models, and closing the loop from evals to prompt/dataset improvements, demonstrating how better visibility reshapes design choices from prototype to production. Aman shares lessons learned building in the open, where OpenLit fits and doesn't, and what's next in context management, security, and ecosystem integrations, providing resources and examples of multi-database observability deployments for listeners.

Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at dataengineeringpodcast.com/retool today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.
Your host is Tobias Macey and today I'm interviewing Aman Agarwal about the operational investments that are necessary to ensure you get the most out of your AI models

Interview

Introduction
How did you get involved in the area of AI/data management?
Can you start by giving your assessment of the main blind spots that are common in the existing AI application patterns?
As teams adopt agentic architectures, how common is it to fall prey to those same blind spots?
There are numerous tools/services available now focused on various elements of "LLMOps". What are the major components necessary for a minimum viable operational platform for LLMs?
There are several areas of overlap, as well as disjoint features, in the ecosystem of tools (both open source and commercial). How do you advise teams to navigate the selection process? (point solutions vs. integrated tools, and handling frameworks with only partial overlap)
Can you describe what OpenLit is and the story behind it?
How would you characterize the feature set and focus of OpenLit compared to what you view as the "major players"?
Once you have invested in a platform like OpenLit, how does that change the overall development workflow for the lifecycle of AI/agentic applications?
What are the most complex/challenging elements of change management for LLM-powered systems? (e.g. prompt tuning, model changes, data changes, etc.)
How can the information collected in OpenLit be used to develop a self-improvement flywheel for agentic systems?
Can you describe the architecture and implementation of OpenLit?
How have the scope and goals of the project changed since you started working on it?
Given the foundational aspects of the project that you have built, what are some of the adjacent capabilities that OpenLit is situated to expand into?
What are the sharp edges and blind spots that are still challenging even when you have OpenLit or similar integrated?
What are the most interesting, innovative, or unexpected ways that you have seen OpenLit used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on OpenLit?
When is OpenLit the wrong choice?
What do you have planned for the future of OpenLit?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data/AI management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.

Links

OpenLit
Fleet Hub
OpenTelemetry
LangFuse
LangSmith
TensorZero
AI Engineering Podcast Episode
Traceloop
Helicone
Clickhouse

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
From Legacy to AI-Ready: How MongoDB AMP Accelerates Modernization
08/02/2026 | 46 mins.
Summary
In this episode, Shilpa Kolhar, SVP of Product and Engineering at MongoDB, discusses using MongoDB as a unified foundation for AI-driven and agentic applications. She explains how the Application Modernization Platform (AMP) accelerates the transition from legacy relational systems to a document-first architecture, driven by the need for AI-readiness and speed of change. Shilpa highlights MongoDB's features, such as its native JSON document model, Atlas Vector Search, auto-embeddings, and integrated search, which help eliminate drift and latency across operational data, indexing, and vectors, emphasizing the importance of keeping context, transactions, and embeddings together for real-time AI use cases. She shares best practices for re-architecting legacy systems, including schema validation and versioning patterns to tame schema drift, aggregation pipelines for consistent reads, and pragmatic standardization across services, while also detailing AMP's approach to scoping large estates and the balance of LLM-powered automation with human-in-the-loop governance.

Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
If you lead a data team, you know this pain: Every department needs dashboards, reports, custom views, and they all come to you. So you're either the bottleneck slowing everyone down, or you're spending all your time building one-off tools instead of doing actual data work. Retool gives you a way to break that cycle. Their platform lets people build custom apps on your company data—while keeping it all secure. Type a prompt like 'Build me a self-service reporting tool that lets teams query customer metrics from Databricks—and they get a production-ready app with the permissions and governance built in. They can self-serve, and you get your time back. It's data democratization without the chaos. Check out Retool at dataengineeringpodcast.com/retool today and see how other data teams are scaling self-service. Because let's be honest—we all need to Retool how we handle data requests.
Your host is Tobias Macey and today I'm interviewing Shilpa Kolhar about using MongoDB as the foundation for AI-driven applications
Interview
Introduction
How did you get involved in the area of data management?
Can you describe what MongoDB is and the core primitives that it offers?
The MongoDB engine has gone through substantial evolution since it was first introduced over 20 years ago. What are some of the most notable features that have been added in recent years?
You recently launched the MongoDB Application Modernization Platform (AMP). What are the key elements of modernization that it is focused on?
How do the core primitives of the MongoDB engine align with modernization objectives?
There is a lot of attention being paid now to AI applications where data is the most critical element for success. What are the features of MongoDB that lend itself to being the context store for generative AI services?
Besides the data used for context and grounding, AI applications also want to track user interactions and form short and long term memory to improve the system over time. How can MongoDB assist in that work as well?
While the lack of schema enforcement on write can be beneficial to rapid evolution of software, it can also be a detriment if not managed well. How can MongoDB help in avoiding schema drift over time that leads to old data being incompatible with current code?
What are the most interesting, innovative, or unexpected ways that you have seen MongoDB used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on MongoDB and application modernization?
When is MongoDB/AMP the wrong choice?
What do you have planned for the future of AMP?
Contact Info
LinkedIn
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.
Links
MongoDB
MongoDB AMP
Google Gemini
Voyage AI
Qdrant
ChromaDB
Weaviate
Pinecone
MongoDB Autoembedding
Retool
ODM == Object Document Mapper
RAG == Retrieval Augmented Generation
Agentic Memory
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

More Education podcasts

Trending Education podcasts

About Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Podcast website

Education Technology

Listen to Data Engineering Podcast, Learning English Conversations and many other podcasts from around the world with the radio.net app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Open app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Data Engineering Podcast

Scan code,
download the app,
start listening.

Data Engineering Podcast: Podcasts in Family

AI Engineering Podcast
Technology, Education