Making the most detailed tweet map ever

Mapbox
maps for developers
5 min readDec 3, 2014

--

By Eric Fischer

I’ve been tracking geotagged tweets from Twitter’s public API for the last three and a half years. There are about 10 million public geotagged tweets every day, which is about 120 per second, up from about 3 million a day when I first started watching. The accumulated history adds up to nearly
three terabytes of compressed JSON and is growing by four gigabytes a day.
And here is what those 6,341,973,478 tweets look like on a map, at any scale you want.

I’ve open sourced the tools I used to manipulate the data and did all the design work in Mapbox Studio Classic. Here’s how you can make one like it yourself.

You can follow Twitter’s stream of geotagged public tweets using the “statuses/filter” API to request tweets from a particular bounding box or the whole world. Before you can connect, you have to register a Twitter API key
and authenticate using it. I couldn’t find a simple library last year to generate the OAuth header for Twitter authentication, so I wrote this one. Once you have authenticated and connected to the filter API, you receive a steady stream of tweets in JSON format. They include a lot more metadata than you necessarily need to make a dot map, so I’ve been using this program to parse the JSON and pull out just each tweet’s username, date, time, location, client, and text.

Filtering the data

Even though there are six billion tweets to map, only 9% of them are ultimately visible as unique dots. The others are filtered out as duplicate or near-duplicate locations. For instance, every Foursquare check-in to a particular venue is tagged with the same location, and it doesn’t help the map to draw that same dot over and over. Showing the same person tweeting many times within a few hundred feet also makes the map
very splotchy, so I filter out those near-duplicates too.

In addition, if you plot all the tweet locations without any filtering, tweets from iPhones show severe banding from either the latitude or the longitude being snapped to a grid. The bands must be the result of fuzzed location data to avoid revealing people’s exact locations, but they are very visually obtrusive. I eliminate most of the banding by letting each unique latitude
and longitude only appear once on the map, and dropping any additional tweets that try to reuse one. Here is the code that I use to deband and deduplicate tweets.

Banding and splotching in unfiltered tweets

I thought there must be a bug in my debanding code, because if you
zoom in on London, there is a very visible blank stripe at the Prime Meridian where almost no tweets appear. But the same stripe also shows up in the unfiltered tweets, so it must be Twitter that is filtering them out.

Missing data at the Prime Meridian

Making vector tiles

The challenge of making dot maps is how to include all the detail when zoomed in deeply while unobtrusively dropping dots as you zoom out so that the low zoom levels are not overwhelmingly dense. I’ve been working on a new tool called Tippecanoe for making vector tiles from large data sets whose features don’t have any inherent scale ranking. You give it a file or stream of GeoJSON input, and it gives you back a vector mbtiles file
to show what your data looks like at any scale.

In the case of point features, it drops exponentially more dots at each lower zoom level, randomly chosen but consistent from one zoom to the next, so that by the time you get to zoom level 0, where the whole world is a single map tile, there are only 1586 dots remaining from the 590 million that are spread across the 4.5 million tiles at zoom level 14.

Styling in Mapbox Studio Classic

Once you have your vector mbtiles file, you can upload it to mapbox.com and open it in Mapbox Studio Classic to style it. I previously wrote about
styling techniques for dot maps and it is possible to get close to the same effects here. The two main aspects of the style are the sizes of the dots and the color ramp that is applied to them.

At zoom level 14 and below, the dot sizes are all the same, because tippecanoe has taken care of decreasing the density at each zoom level to 40% of the dots in the level above it. The style is responsible for making the dots larger at each level you zoom in beyond 14:

marker-allow-overlap: true;
marker-ignore-placement: true;
marker-width: 1;
[zoom >= 15] { marker-width: 1.58; }
[zoom >= 16] { marker-width: 2.50; }
[zoom >= 17] { marker-width: 3.95; }
[zoom >= 18] { marker-width: 6.25; }
[zoom >= 19] { marker-width: 9.88; }
[zoom >= 20] { marker-width: 15.63; }
[zoom >= 21] { marker-width: 24.71; }
[zoom >= 22] { marker-width: 39.06; }

The mysterious multiplier 1.58 by which the dot diameter increases with each level comes from the square root of 2.5, the inverse of the 40% of dots that survive at each lower zoom level, so the area of each dot grows by 2.5 with each zoom level.

I still don’t know why 2.5 is the appropriate rate, but many data sets, including population density, seem to fall off at about this same rate. You can use a different number if something else looks better for your data. The color of the dots is applied indirectly by letting their alpha channel accumulate as dots overlap, and then using the colorize-alpha image filter to apply colors to halves of the alpha range:

marker-opacity: .2;
image-filters: colorize-alpha(#00FF00, #00FF00, #FFFFFF);

Alpha blending only gives limited control over the brightness ramp, but an opacity of 0.2 reaches 50% brightness with 3 overlapping dots and 97% brightness with 16 dots, which works out pretty well for the density of tweets.

The image-filter assigns the bottom half of the alpha range to go from transparent to green and the top half from green to white, so that the densest areas get an extra glow. The green in the middle is only half-opaque, but RGB green is inherently bright enough that it still looks reasonably clear. It would be hard to see if it were blue instead. Finally, underneath the data layer for context are a desaturated satellite image from Mapbox Satellite and street names from Mapbox Streets. All the layers are rendered together from a single style sheet.

Upload the style to Mapbox and it’s ready to use. Here’s the link to the full map, or you can download Mapbox Studio Classic and start making your own.

--

--

mapping tools for developers + precise location data to change the way we explore the world