How Interactivity Works with UTFGrid
Adding interactivity to millions of objects on a map requires some tricks. Tom stunningly visualized the inner workings of UTFGrid, an encoding scheme that efficiently encodes interactivity data for map tiles that has made its way into all of our projects, including TileMill, TileStream, and Wax, in his Visible Map. This also lets you explore the stages Wax takes to retrieve the interactivity information associated with a pixel.
While this is a good introduction to how UTFGrid brings interactivity to maps, in this post I’ll give a behind the scenes look at some of the design decisions we made to best use interactivity in our map design and hosting services.
Once your map has millions of objects to interact with, it’s no longer viable to have users download all of the data and cache it in the browser - it’s way too big. Loading lots of data like this also means that the time required to do lookups increases in a linear way. This rules out vectors and polygons as data transfer means, as Tom discussed in his Where 2.0 talk.
Luckily when you have millions of objects to show on a map, you can’t interact with them all at once anyway. Users might zoom in, bringing the majority out of scope, or there might be more items than physical pixels in the map. Taking inspiration from Google Maps, we decided to send data in tiles. This makes it viable to add interactivity to millions of objects on the map while limiting the data the user has to download to the region that is actually being viewed.
From polygons to grids
Even with tiled data requests, there are still a lot of points and polygons and querying in polygons is an expensive operation, one that must be done every time the users moves the mouse. Creating a lookup table for each pixel solves that problem and makes lookup time linear. But with this approach it’s necessary to either calculate the lookup table in the browser, causing a significant initialization cost, or generate that table on the server and send it to the browser.
Bringing the grid data to the client is still an issue. A 256x256 grid has 65,536 data points, requiring an efficient way to encode and decode it. As a first measure, we’re reducing the 256x256 grid to just 64x64, exploiting the fact that interaction with a single pixel is pretty hard anyway, especially on mobile devices. We still need to encode 16,384 data points. Encoding the grid as an actual PNG image would leverage PNG’s excellent zlib compression, however decoding the data in the browser is non-trivial because we’d have to paint the image to a canvas to access its pixels (or decode the PNG/zlib compression manually).
The easiest way to get plain data to a browser is to encode it as JSON, and that’s what we did. We encode the grid data into a JSON data structure by applying a couple of tricks. Each row is a string that contains one character for each column. Additionally, using JSON allows us to easily associate complex data structures with every pixel (or group of pixels when using a 64x64 grid).
To associate data with each pixel, we just associate each feature in a grid tile with a number, starting with 0, and include a mapping of that number to the actual data. Using plain Unicode strings has the advantage that we can essentially encode that kind of binary data into JSON. With UTF-8 as the transfer encoding, we get some additional benefits:
- Requires only one byte for the first 93 features (see below for why this is exactly 93)
- Automatically uses 2 and 3 bytes for each code point if required.
- Every browser supports UTF-8 decoding
JSON strings need to escape the non-printable control character code points 0 through 31 as well as
\, so we’re just skipping those characters. Otherwise we’d end up with a bunch of literal ASCII
\u0000 (6 bytes instead of 1). Let’s look at how we encode some characters:
- 0: This number is guaranteed to occur in every grid tile. Using this algorithm, we add 32 which results in code point
0x20). Displayed in ASCII, this is a space character and the binary representation is
- 93: Adding 32 makes this number greater than 34 and 92, so we have to add an additional 2 to skip
\, resulting in the code point 127 (
0x7f). In binary, this is
- 94: This number is above UTF-8’s 1 byte threshold. Expanding to 128 (
0x80) requires encoding this as 2 bytes:
1100 0010 1000 0000.
When we do that with every pixel in the grid, we end up with something that looks like this:
Interestingly, our end result basically looks like ASCII art. This makes visual validation and sanity checks very easy.
We also observe that most of that data is very uniform, and gzipping the resulting JSON file typically results in grids that are typically just 1-3 KB. Decoding the grid in the browser is easy as well. We essentially just call
grid[row].charCodeAt(col) to retrieve the Unicode code point and reverse the transformation to obtain the key for the associated data.