On Sunday morning at State of the Map USA in Portland I presented an overview of the ways we process OpenStreetMap data to facilitate better cartography for MapBox Streets, our global basemap. This blog post is a recap of that presentation.
OpenStreetMap data is a collection of geographically associated facts. As users are encouraged to enter it, it is focused much more on accurately representing reality than suitability for any single purpose. Everyone who’s spent any time in the OpenStreetMap community knows that you “don’t tag for the renderer” (for various definitions of ‘tag’ and ‘renderer’).
Good data is fact(-ish)
This is important because it gives everyone more freedom and creativity when it comes to actually using the data. No one use case is prioritized from the data point of view. This is fantastic, though it presents a challenge to achieving effective cartography.
Cartographic design decisions need to be made about how to generalize, combine, and simplify facts so that they are readable and understandable. Some facts need to be hidden and others need to be exaggerated.
Design must be opinionated
In order to try to produce the best cartographic results, I can’t just deal with the raw facts straight from the OpenStreetMap database. There are various stages of transformations and editorial decisions before we get to the final render.
OSM XML transformation with Python
Some of what we do is to transform OpenStreetMap data before we even try to import it. An example of what we do with this is improved representation of administrative boundaries.
As you can see MapBox Streets has differentiated maritime boundaries, as well as no overlapping lines as commonly happens with multipolygons that share a boundary. This is important as it prevents dashed lines from appearing incorrectly. Additionally, it make it much easier for us to generate simplified lines for low and medium zoom levels, improving the visual quality.
Joining other datasets
OpenStreeMap data provides a limited number of place classifications (such as city, town, village). For the best cartographic results we need classes that are a little more opinionated about how they rank cities. “Which of these labels should be visible?” and “how much should this label be emphasized?” are important decisions that need to be made in cartographic design.
To do this we’re joining additional information from Natural Earth.
The code for this is on GitHub. It’s extremely specific to the joining of these particular datasets, but work could be done to encapsulate the core ideas and make it useful for more general situations.
Imposm (more Python!)
Imposm in an alternative to osm2pgsql for importing OpenStreetMap data into a PostGIS database for rendering. Comparing the two gives a lot of pros and cons on each side. osm2pgsql is great because it is mature, popular, and can be used for differential updates where your database is synchronized with the main OpenStreetMap database at an interval of minutes, hours, or days. Imposm is great because it’s fast, extremely extensible, and is able to provide a Postgres database that is exactly suited to our needs for rendering with Mapnik and TileMill. Its major flaw currently is that it cannot do differential updates, but its import times combined with its other advantages make Imposm the choice for us.
Apart from its impressive speed, the killer advantage of Imposm for us is that its mapping style file is pure Python. With osm2psql you map OpenStreetMap tags to PostgreSQL columns via a limited tabular syntax. With Imposm this mapping is much more extendible - you can extend the existing functions and add your own.
One way we take advantage of this is to normalize various types of separator characters in to just one. OpenStreetMap has no standard convention in this area and many mappers have many different approaches. This is an aesthetic decision to present a more consistent final design.
foo; bar foo / bar foo - bar ⬇ foo — bar
We’ve also customized the mapping to handle a selection of abbreviations, particularly for street name suffixes.
Quite a bit of the processing we do happens in PostgreSQL after it has already imported but before we are acutally rendering. A lot of it is just preprocessing that we would rather just have to happen once, like calculating label positions for polygons or associating turning circles with the correct class of road. Because these calculations would have to be made multiple times if we did them on the fly (due to different zoom levels and overlapping meta-tiles), its faster to just pre-calculate them once.
Another thing we add as a pre-processing step is tint bands for certain types of area geometries. These are for areas that tend to be larger but that we don’t want to render a solid fill for.
Finally, there is a PostgreSQL data transformation that we do on the fly at render time. That is to account for situations where a single conceptual element may be represented by multiple objects in the data. There could be many ways this happens - our main example is rail stations along a multi-track rail line.
Without any processing, our renderer would try to put four icons and four labels here.
We merge these by name, but this is something we can’t do on a global scale since many train stations across the world share names. Instead we do this at render time so that the merge is limited by the size of the metatile.
Doing more of this
This is by no means an exhaustive list of the possibilities for processing OpenStreetMap data for better cartography. We’re always working on improvements to MapBox Streets and there are many areas that we could look to next for presenting in a better way.
Doing less of this
At the same time, the OpenStreetMap database is evolving and certain practices are becoming common that might make some of these processing tasks unecessary. For example the problem of multiple rail station icons can be avoided with relations, and many OpenStreetMap contributor have already started mapping in this way.