CNN maps the most diverse USA ever

John Keefe
No items found.

Sep 8, 2021

CNN maps the most diverse USA ever

John Keefe

Guest

No items found.

Guest

Sep 8, 2021

John is a senior data and visuals editor at CNN specializing in climate. He has a long history in data journalism, and also dedicates himself to teaching people new to the craft. We love the dot-density map he and his colleagues produced on race and ethnicity in the US, and asked John to share how they produced it within just a few hours of a major data release from the 2020 Census. 

The CNN Visual News team had been keeping a close watch on the census data release plans, so we knew what kinds of data we’d get and made some guesses about what we’d like to build and what other journalists at the network might need or want. 

I’ve been inspired by dot-density race and ethnicity maps since the 2010 census release. The New York Times put together one I admired and explored shortly after that release (it’s broken now because it used Flash, but they built a new one in 2015). The Cooper Center for Public Service made a beautiful version, as did the Washington Post

To prepare for the dot-density map, I spent about a week using test data and 2010 census data to determine whether I could build the national base layer quickly. The release was on a Thursday at 1 p.m., and our idea was to turn it around as quickly as possible —ideally before the weekend.

Could it be done?

The Census Bureau provided fake data —for some counties in Rhode Island— in the form in which they planned to release the real numbers, so we were able to build Python scripts ahead of time to ingest and process the data quickly in order to break out states, counties, places and census tracts for analysis.

I first used 2010 data to generate a dot map for Manhattan. It’s an area I know well and have mapped a bunch. So from that data, I tried all of New York City, and then New York State, and then California, too. Finally, I used the fake Rhode Island data. I was convinced we could quickly make a national map once the data dropped.

How we built national dot density maps fast

Making national, zoomable maps of data at the level of census tracts, or even ZIP codes, used to be quite a lift. I’d have to process the data, generate image tiles for the entire country down to a useful zoom level and then host all those tiles … somehow. This time around, there were three components that made it quick.

First is Mapshaper. I cannot stress how much of a gift this tool from Matthew Bloch at the New York Times is to the rest of the data-mapping world. It’s incredibly powerful. I recently noticed that Bloch had included a -dots command to “[f]ill polygons with random points, for making dot density maps.” Also it’s fast. By stringing its commands into a script (I do so in a Makefile), I could repeat my steps quickly. I downloaded each state’s census-tract shapefile and used Mapshaper to merge the shapes with the 2020 population data by race and ethnicity. Then I used the -dots command to plot the appropriate number and color of dots in each tract, generating a GeoJSON file full of points for each state.

I knew Mapbox could display vector tiles built from data, as opposed to the bulky image tiles of yore, and found this previous how-I-made-it post by Ryan McCullough. He did exactly what I wanted to do, but with census housing data. Tippecanoe turned my folder of state GeoJSON files into a single .mbtiles file I could upload for fast hosting on the CNN Mapbox account. Mapbox Tiling Service also does this without having to install Tippecanone and process the files on your own computer.

Finally, we had the CNN Wildfire Map. Led by my colleague Daniel Wolfe, the Visual News team already had a popular Mapbox-based map, which meant we didn’t have to build the front end from scratch. That was key to the quick turnaround.

Data Challenges

There were two technical challenges I also needed to solve. One is that census tracts often extend into water. So placing dots randomly into tracts can lead to people living in lakes and rivers. That’s another reason why New York City was a good test case: If you don’t do it right, you end up with people in the Hudson River.

The key was to subtract water areas from each state’s census tracts before placing the dots. For this, I downloaded the “areawater” shapefiles from the Census’s FTP site for every county in the country. (I found it easier to use an actual FTP client pointed at ftp2.census.gov/geo/tiger/TIGER2020/AREAWATER than to pull the files off the web.) Then I used Mapshaper’s -input command to read each county water file for a state and merged them into a single state water GeoJSON file using the “combine-files” flag. 

Here’s the makefile code for that:


# usage: make water STATE=06
water:
	# note that I've previously downloaded all county water files from /ftp2.census.gov/geo/tiger/TIGER2020/AREAWATER
	# to /Volumes/jkeefe-data/2020_Census/AREAWATER on my computer
	mkdir -p ./tmp/water
	rm ./tmp/water/*.*
	cd /Volumes/jkeefe-data/2020_Census/AREAWATER; unzip -o "tl_2020_$(STATE)*.zip" -d /Users/keefe/cnnvis-census2020-dot-maps/tmp/water
	npx mapshaper -i './tmp/water/*.shp' combine-files \
	-simplify 40% \
	-merge-layers \
	-proj EPSG:4326 \
	-o datawork/water/water_$(STATE).geojson

(I have since discovered this national “areawater” file, which could have helped here.)

Then, I used Mapshaper’s -erase command to subtract each state’s water areas from its tracts GeoJSON file.

The other major hurdle stems from the fact that on the US East Coast, the population density is so high that as you zoom out, the number of data points —even at 150 people per dot— is too high to fit into some of the vector tiles. Tippecanoe warns you when this happens, and provides settings to drop features from dense areas; but I didn’t want to lose any data in the visualization.

My initial fix was to limit the zoom level to 7, so you couldn’t zoom out farther than about the size of New Jersey. When my colleague Priya Krishnakumar said she really wanted to see the whole country, I had to agree — and had an idea.

Since I had scripted all of my Mapshaper commands, it was pretty easy to adjust the -dot parameters to create two new sets of GeoJSON files (and then tilesets) where the scales were 300 people per dot and 900 people per dot. I uploaded those tilesets to Mapbox and then used Mapbox Studio to limit the zoom levels at which each tileset is displayed —making sure the legend changed depending on the zoom level, too.

When the data release finally happened, we were thrown one fun curveball. The Census Bureau didn’t produce a “national” data file as we had expected; all of the population data was grouped by state. So to build CNN’s great national analysis and maps, we had to aggregate the state data. 

I turned my attention to the dot-density map at 4:13 p.m., according to our chat log, running the scripts I’d prepared, tweaking the user interface (including fantastic improvements by colleague Sergio Hernandez) and incorporating team feedback. We published the dot-density map six hours later.

Styling the Map

I used Mapbox Studio to design the overall look and feel of the dot-density basemap. I really love the control I have over the map features such as land, water and roads in Mapbox Studio. Not just the colors, but also the density of the labels, the halos and —importantly— which things are on top of other things. I like the labels on top, followed by the data, followed by the base layer. And I could tinker with all of this live, long before working on the front-end display, incorporating feedback from the rest of the team.

That said, Mapbox Studio often takes a fair amount of trial and error to get what we need. I’d like to see clearer documentation and more examples for how to use its features. For example, after publication, several people reached out to say that two of the colors we used for dots looked almost identical. So we set out to change them.

I had encoded the colors into the GeoJSON files as the variable “fill” for production reasons, and correctly assumed I could alter them “on the fly” in Mapbox studio. Figuring out how to do that, though, was challenging. I started building an expression based on the documentation, but could not get it to work. After more tinkering, I figured out how to use data conditions to get what we needed.

Keep building with Census data

After we published the map with the 2020 Census data, we were met with lots of very kind Twitter love. Our audience also spent a lot of time with the map, and it seemed to really resonate even days and weeks after the census release. Moving forward, there are more census stories in the works. And we’ll have a lot to dig into when detailed census data about households, heritage and more comes out later this year.

Thank you John for sharing the behind-the-scenes this project. We look forward to seeing what the CNN team builds next!

Are you building maps with Census data? We’d love to see them - share with us on Twitter @Mapbox.

No items found.
No items found.

Related articles