This is a transcript of my Rendering the World session for anyone who was not able to attend the FOSS4G conference. This transcript is based on my memory and is not a recording so there may be differences from the actual presentation.
We work at MapBox here in Washington, DC. MapBox is a mapping company that focuses on open source and open data solutions for web and mobile mapping. Our best known product is probably the open source map design studio TileMill.
We recently launched a new product called MapBox Streets. MapBox Streets is a base map with global coverage - we’re in New York, Paris, Tokyo, and everywhere in between. It includes deep zoom levels so you can see individual city blocks, tunnels and subway stations.
Great, we’ve got a cool map. So let’s render the world! Not so fast. It turns out the world is a really big place. Big can mean a lot of different things, even in the context of rendering the world. For the purposes of this talk, big is going to mean two things. One, taking a long time to render. And two, taking a lot of space to store.
Suppose you were to start up TileMill and ask it to render the world down to zoom level 15, 16, maybe 17. And say you put aside all kinds of other performance factors – not worry about PostGIS tuning, or optimizing styles, or even Mapnik performance. Say we wanted to render a very simple or blank map down to zoom level 17. TileMill would show you a screen that looks something like this: 443 days remaining.
And if you were to do a little googling around the web for how much space it would take to store all these tiles you would probably run across this OpenStreetMap wiki page. It would take some 54 terabytes of space to store OpenStreetMap tiles up to z17 or 18 and OpenStreetMap’s tile images are really small.
If you’re new to tile-based mapping or even if you’re familiar you might think: WTF. It’s pretty hard to wrap your head around how or why these numbers are so big.
Let’s do a quick review of tile-based mapping. By tile based mapping I mean maps that load progressively through 256x256px image tiles. Usually these images are square, but they can really be any size and dimension. These tiles are rendered at multiple zoom levels. So using the ubiquitous web mercator projection, zoom level 0 has 1 tile and contains the entire world. At the next zoom level, we double the amount of detail so our 1x1 grid becomes a 2x2 grid and contains 4 tiles total. The next zoom level is a 4x4 grid and contains 16 tiles. The next 8x8 and 64 tiles. We can calculate the number of tiles at a given zoom level using this equation: 4^z is the number of tiles at a given zoom level z.
Then suppose we have a super fast map renderer and really tiny tiles. Say we can render 1000 tiles/sec and each of our tiles is about 400 bytes. These numbers are pretty arbitrary but you’ll see in a second that tweaking these numbers is not going to make much of a difference.
At zoom level 15 there are 4 to 15 tiles, or 1.1 billion tiles. That would be 12 days of rendering time and take 644 GB of storage. z16 will be four times that amount, so about 4.3 billion tiles, 50 days and 2.6 terabytes of storage. And z17 will be 17.2 billion tiles, 200 days, 10.3 terabytes of storage.
With these numbers you can see that even if we worked really hard to optimize PostGIS or tune our renderer – say we could double our speed and get to 2000 tiles/sec – we’d be looking at a render time around 100 days. If I kicked this off now it would probably finish in time for the next FOSS4G.
Common sense says: don’t even try this. It’s not reasonable. Instead there’s some accepted wisdom to how you serve a global baselayer. Users are only going to request a small fraction of these tiles at any given time. So you could render each tile only when it’s requested by a user.
This scenario looks like this: on the right is a big beefy render server and on the left a computer or mobile device that wants your map. When the device requests the map the server renders it and sends it over.
Of course in reality there may be dozens or hundreds of devices that are viewing your map at any given time and your beefy render server is going to start looking a lot less beefy. So the solution is quite straightforward - you just add a lot more render servers, right?
Not so fast. It turns out the internet is actually a really big place. You’re going to need to scale to not hundreds but thousands of requests per second. Your maps are going to start to get slow and your render servers may actually go down.
Render servers are slow, costly and stressful. We have a small team at MapBox and any time we’re spending babysitting servers is time we could be spending on our open source projects or improving the experience for our users.
On the other hand, serving pre-rendered tiles is fast, cheap and reliable. We don’t have to worry about spikes in traffic or whether our servers can keep up – we know all of the tiles are there and are cheap and fast to serve.
Rendering the world
These are the reasons why we set out to try to solve this very hard problem. So how do you render the world?
We had two tools in our belt when we started work on this problem that I didn’t know would play an important role, but they did. Just keep these in the back of your mind as we go through the problem.
Tool #1: MBTiles is a tile storage format we’ve been using for a while. It’s an sqlite-based format that can store an image once and reference it multiple times for different tile coordinates.
Tool #2: One of the features of MapBox Hosting is fast image compositing. We take multiple pre-rendered tiles and blend them together into a single image and we’ve found that this can be quite fast and efficient.
So back to our problem. What’s actually on these 17.2 billion tiles at z17? If you were to pluck a tile out of the pile at random you would sometimes get this - some roads, buildings, parks, lakes. But mostly you’ll get this: a patch of blue somewhere in the middle of the Pacific. In fact, mostly is about 60% of the time. You may remember from gradeschool that the world’s surface is some 70-80% ocean but in the web mercator projection that percentage is more like 60. Antarctica and Greenland are really huge.
This solid blue tile is an example of redundancy. Once I’ve seen this blue tile once, I actually know what 60% of the map looks like - it’s exactly the same. It’s a lot like this example from information theory: “if u cn rd ths msg u cn gt a gd jb w hi pa” or “If you can read this message you can get a good job with high pay.” It turns out some 50% of the letters in English are redundant. You can get your message across without these letters.
How much of the world is actually redundant? This next slide has a clue where we’ve split the world up into 4 different layers. For now let’s focus on the bottom two, the land and water layers. If we take just these two layers our map looks like this. Now we know that of all these tiles, about 60% of them are solid blue areas of water. Pretty good.
But wait, there’s another type of redundant tile on this map. If you were to pan to the middle of Kansas you’re going to see a lot of this tile here: the solid land tile. This tile is also redundant. Once you’ve seen this sandy color you know what all the other solid areas of land look like.
The only type of tile on this map that contains information is a tile containing coastline. Tiles that show where water and land meet, that show a distinction. Under this analysis then, some 38% of the world is solid land leaving just 2% of the tiles containing coastline information.
That’s great but a map with just water and land is pretty boring. What about roads, parks, buildings… human stuff? It turns out that humans really haven’t touched much of the earth at all in this respect – about 1%.
Thinking of tiles in terms of information and redundancy changes our equation from the beginning: 4^z. Consider a map with just a single point. At z0 it would take 1 tile to capture this point. But at z1, if we throw out the redundant tiles, it still takes just 1 tile to represent this point. And at z2, z3, z4 it still takes just 1 tile to represent the point. Instead of quadrupling at each next zoom level, the number of tiles it takes to represent a point stays constant.
If you play this thought experiment out you can see that for points the equation is going to be a lot more like 1^z, and for lines somewhere between 2 to 3^z. For polygons interestingly you only need to represent the perimeter in tiles like in our analysis of the land and water layer where only the coastline contained actual information.
Getting to 2%
Theoretically then we only need to store some 2% of those 17.2 billion tiles. But to do this we’ll need a storage format that actually represents redundancy. MBTiles does this very well. It can store an image once and then reference it many times for different zxy coordinates using an id. So we only need to store the blue ocean tile once for billions of different coordinates.
So 10 terabytes times 2% is 200GB. Pretty good.
What about render time though? Remember those 400 days it would take to render the world? We still have to render a tile before we know that it’s a solid blue ocean tile and can be thrown out or one that contains actual information. Can you know a tile is redundant without rendering, querying or even iterating? If we take Mapnik and PostGIS out of the equation, even just iterating over 17.2 billion zxy coordinates in node.js, python, whatever language you choose, takes a long time.
First, let’s look at how our latest release TileMill 0.9.0 renders an export. It does a scanline. If you’ve ever pre-seeded or rendered tiles you’ll be familiar with this scheme: you start at the first zoom level and do one row, one tile at a time. Once you’ve finished all the tiles at that zoom level you move on to the next, and so on.
In TileMill master we’ve been working on a new rendering scheme. The pyramid scheme takes a tile and looks at its child tiles on the next zoom level. And then it takes each of those and looks at their children. And so on, recursively. If it finds a tile that is redundant at z5 it assumes that all of its child tiles are redundant as well at z6, z7, z8 all the way down to z17. With this method we can skip millions of tiles per second.
It works. Mostly. There are some cases where this scheme will skip tiles too aggressively and I worked hard with our cartographer AJ so that there’s usually an indication at lower zoom levels of features coming up on higher zoom levels.
So 2% of 200 days and we’re down to 4 days of rendering. Not bad. With tiles pre-rendered all our servers need to do is composite them together and serve them. We’ve found that it’s fast, affordable and reliable.
Of course, the truth is a bit more complicated. We also had to tune PostGIS, optimize our OpenStreetMap import, materialize imposm tables, find a high i/o cloud service, parallelize our renders - we do 32 of them at a time - and optimize our sqlite storage. If you’re curious about these things, please ask us. Thank you.