We have never had more data about our planet: petabytes of satellite imagery, aerial photos, and sensor readings collected daily. Yet, turning that massive volume of "noise" into a clear signal remains the fundamental challenge of the geospatial industry.
In this episode of the Spatial Stack, I sit down with the engineering and product minds from Wherobots: Ryan, Phil, and Len - to tear down the architecture required to handle Earth Observation data at a planetary scale. We move beyond the buzzwords to discuss the engineering "war stories" of building resilient inference pipelines.
We dive deep into why the industry is moving away from simple computer vision toward "Large Earth Models" that function like LLMs for the physical world. We also get into the weeds of the tech stack: the battle between Dask and Ray for distributed compute, why Cloud-Optimized GeoTIFFs (COGs) aren't always the answer for inference, and how formats like Zarr are unlocking multidimensional analysis.
In this episode, we cover:
The Data Bottleneck: Why "garbage in, garbage out" is still the biggest hurdle in monitoring a changing planet.
Infrastructure Realities: The specific limitations of Google Earth Engine and why we needed a cloud-agnostic approach.
Engineering Pivot: Why Wherobots migrated from Dask to Ray to solve "crashing cluster" syndromes and memory management issues.
The Future of GeoAI: How embeddings and foundation models are compressing petabytes of data into searchable, semantic insights.
✅ Sign Up for Wherobots: https://wherobots.com/
✅ Learn more about Apache Sedona: https://wherobots.com/apache-sedona/
✅ Learn more about RasterFlow: https://wherobots.com/blog/rasterflow-earth-observation-inference-engine/
✅ Sign Up for the RasterFlow Private Preview: https://wherobots.com/rasterflow-preview/
00:00 – Teaser: The "Garbage In, Garbage Out" problem in GeoAI
00:01:51 – Introductions & Icebreakers (The controversial ice cream opinions)
00:03:08 – The Challenge: Monitoring a changing Earth at scale
00:10:30 – Data Engineering: The hidden complexity of NAIP, clouds, and tiling artifacts
00:14:19 – Modeling Reality: Why Computer Vision models fail on geospatial data
00:21:51 – The Google Earth Engine Debate: Walled gardens vs. bringing compute to the data
00:27:53 – Introducing Rasterflow: A new architecture for scalable inference
00:36:51 – The Engineering Story: Why we switched from Dask to Ray
00:43:40 – File Formats: Why Zarr is superior to COGs for multidimensional inference
00:47:40 – Workflow Walkthrough: Running the "Fields of the World" model
00:51:40 – Embeddings, Foundation Models, and Large Earth Models
00:57:40 – How to get started with Rasterflow
📰 Modern GIS insights: https://forrest.nyc
CONNECT WITH ME
📸 Instagram: https://www.instagram.com/matt_forrest/
💼 LinkedIn: https://www.linkedin.com/in/mbforr/
🌐 Website: https://forrest.nyc