We’re working on new technology for answering geographic problems on a global scale. To determine OpenStreetMap’s total coverage, identify bicycle paths that are in RunKeeper but not in OpenStreetMap, or identify incorrectly-tagged sidewalks, we need to run an analysis on an entire world’s worth of data. And once we’ve got the answer, we want to automate the tasks to see progress over time. It’s a lot of data to crunch, so we need fast tools.

TileReduce is our scalable open source framework for these challenging tasks. It uses the built-in segmentation of vector tiles to run analyses across multiple CPU cores. From the outset, TileReduce changed the kinds of problems we could solve, and did so very quickly. Today we’re releasing a new version that increases its performance even further.

A visualization of OpenStreetMap coverage analysis with TileReduce 3. Made using this guide.

Introducing TileReduce 3

TileReduce 3 includes a range of optimizations that reduce the overhead of individual tasks, make better usage of multiple CPU cores, and dramatically increase overall performance. Highlights:

  • Fetching & parsing vector tiles and converting them into GeoJSON is parallelized across many CPU cores. This both makes it much faster and reduces the overhead of sending data between processes.
  • There’s an option to parse vector tiles lazily, avoiding the slow process of unpacking geometries if they aren’t needed.
  • The working area is streamed into a job tile by tile. This avoids storing the whole scope of work in memory.
  • Simpler, streamlined API and extensive test coverage.

With TileReduce 3 and OSM QA Tiles, calculating the difference between US Census roads and OpenStreetMap takes 11 minutes on a Macbook Pro, down from up to a day on a high-powered EC2. OpenStreetMap Coverage and Sidewalk scripts each take under 25 minutes to run for the whole world.

Let’s talk

The new performance opens up lots of new possibilities for geospatial data analysis, and we can’t wait to see more cool uses of TileReduce from the community — let’s talk about this!

Hit up Morgan, Tim and me on Twitter if you have any questions or ideas, check out the code, docs and examples for the new release, and stay tuned for more posts about analysis at Mapbox this week.