Mapping extremely dense point data with vector tiles

Mapbox
maps for developers
3 min readMar 10, 2015

--

By Eric Fischer

One of the most challenging geographic data sets I have ever worked with is the
New York taxi trip origins and destinations
that Chris Whong obtained through the New York Freedom of Information Law.
It is huge (375 million points), tightly concentrated (nearly all within a few
miles of Manhattan) and varied (with some streets only visited once and
others more than a million times).

My previous map of this data was a set of static bitmap tiles rendered with
Datamaps, where I could handle
the wide range of densities by drawing all the dots, accumulating brightness
linearly, and then taking the cube root of the accumulation to condense
that wide range into the available range of pixel values.
That’s hard to do with vector tiles because
tiles that contain more than about 200,000 points are too big and too slow
to render on demand. So high-density data has meant giving up the
resolution-independence and scalability of vector tiles.

You no longer need to make that tradeoff because Tippecanoe
can now selectively drop points in the densest areas to moderate their size and visual impact
while retaining full detail in sparser areas. The vector tiles for
the map below
were made with

tippecanoe -z19 -d11 -g3 -o nyctaxi.mbtiles

for a base zoom level of 19, a tile resolution of 2048 (2¹¹) at that zoom, and a gamma
reduction of 3, which, at each zoom level, reduces the density in areas where points are
less than one pixel apart to the cube root of their original density.

How it works

Reducing density by counting points within bins doesn’t look very good, and
k-means clusters or voronoi diagrams are too slow to construct for such a large number of points.
What worked well was to compute the z-order
index of each point, the same index that is used to divide maps into tiles,
and manipulate the density along that linearization of the point locations
instead of trying to do it in space.
At zoom level n, the indices for points that are about a pixel apart on screen will
tend to be about 2^(64–2*(n+8)) apart, and the places where this is not true,
where indices jump around at tile boundaries, will average out over a large number of points.

Even along a line, you don’t have a continuous density function, you have a series of
discrete points, but instead of thinking about the density of the points, you can think
about the sparseness of the gaps between successive points.
The algorithm goes through the points in index order, and at each point looks at the
size of the gap between its index and the previous point’s index. If the difference is larger
than “a pixel,” as defined above, the gap is sparse enough, so it copies that point
into the tile and goes on to the next one.
If the gap is too small, it needs to create a gap whose size (relative to the pixel size)
is the cube of the gap that it found. So it skips over the current point and over as many more as it needs to
until it gets to one that is
far enough from the previous point that was added to the tile.
With a sufficiently large gap created, it copies that point into the tile and
goes on to check the size of the next gap.

The cube root gamma for the taxi map has nothing to do with the eye’s
cube root perception of lightness,
which monitor gamma compensates for. It just looked about right and reduced the
tile density enough that it would work, cutting the million points in the zoom level 19 tile
at Penn Station to 150,000.
With data whose densities are less extreme, something closer to a square root
seems more appropriate.
Automatically choosing the optimal gamma for a given data set is
a research project for the future.

--

--

mapping tools for developers + precise location data to change the way we explore the world