Sometimes you need more data processing power. We recommend housing your larger data sets in a PostgreSQL database with a PostGIS extension. This allows you to upload massive data sets and will provide much more flexibility for querying that data with SQL in Mapbox Studio Classic.
This manual will explain the process of using PostGIS and SQL in Mapbox Studio Classic to build vector tiles.
For some of the examples in this guide, you will need to add our vector tile helper utility, postgis-vt-util into your database. The utility provides a set of custom PostgreSQL functions that aid in managing your vector tile sources into Mapbox Studio Classic.
To add postgis-vt-util to your PostgreSQL database:
If your data isn’t already managed in a PostGIS database, you’ll need to import it. It’s a good idea to script this process so that you can repeat it later or collaborate on the project without sharing access to the same database.
ogr2ogr is the most versatile utility to get any kind of vector geodata into PostGIS. If you have a lot of data, you may want to look into shp2pgsql; it can import data much more quickly.
Basic ogr2ogr example
The following command will create a PostgreSQL table with all the same columns as your Shapefile, plus a wkb_geometry column for the geometries. The geometry column will be automatically indexed for efficient spatial queries.
The -t_srs EPSG:3857 part makes sure your data is projected to Web Mercator which is the projection that the final vector tiles will be.
Once these Shapefiles are loaded into a database, you can write SQL queries to bring them down to a desired zoom level and filter keep each zoom level at 500 MB. Refer to the SQL queries in your data.yml file for the specific code.
Here’s an overview of the Mapnik and postgis-vt-util functions you need to know.
This example includes all geometries from a table at all zoom levels.
Using spatial indexes effectively
The SQL queries in a Mapbox Studio Classic source project are run once for each tile you export, so making sure that these queries run quickly is they key to efficient exports. One of the best ways to speed up a query is to make sure it’s only looking at features that will actually be visible in the given tile using spatial indexes generated when you imported your data. By default, an attempt at this is made behind-the-scenes, but for more complex queries you’ll need to ensure this manually.
Assuming wkb_geometry is your geometry column, this query will only select features whose geometries have bounding boxes that intersect with the tile the query is being run for. Behind the scenes, !bbox! is replaced with a polygon representing the area covered by each tile.
Limiting data by zoom level
If you have a layer that’s not needed at every zoom level, you can craft your query to limit the selection based on the zoom level. This will require the z() function provided by the postgis-vt-util package. This also makes for smaller, more efficient tiles.
It’s good practice to include the avoid spatially-irrelevant clause mentioned above after your WHERE clause. To do this, add an AND statement to tack on that filter.
Controlling label density
Point data needs to be controlled because in some locations you may have hundreds of points geographically close to each other or even on top of each other. These dense areas mean that certain vector tiles will be very heavy, but Mapbox Studio Classic will render them just the same. Now imagine what happens when you zoom out to low zoom levels, again these dense areas become problematic when they’re combined with a CartoCSS style and used to generate a raster tile.
If you have ever uploaded Mapbox Studio Classic style to Mapbox.com and are met with an error like Drawtime avg. exceeded limit of 300ms or Drawtime max exceeded limit of 1000ms it’s likely that the style is using vector tiles that are too dense and are requiring Mapnik to do too much work to generate PNG tiles.
labelgrid lets you control the density of label points by dividing each tile into an imaginary grid and only allowing one label per grid cell. You can adjust the density by changing the dimensions of the grid.
There are a number of things going on in this query.
The labelgrid function only returns a unique string for each grid cell on the map - two geometries in the same cell will get the same result from this function. This string is used in the DISTINCT ON portion of the query to produce only one result per cell.
The ORDER BY sections ensure that the correct geometries are being prioritized - in this case places with the highest populations.
The id included in the ORDER BY is there to ensure consistent ordering every time the query is run. If two nearby places have the same population, PostgreSQL will not always return things in a consistent order unless you force it with a final column in the ORDER BY that has a unique value for every object.
Multiple tables in one layer
Sometimes your source data may be broken up more than necessary and you want to include multiple tables in one layer for easier styling. As long as you can make the columns for all the tables consistent, you can do this with UNION ALL.
We’re also using the z() function and !bbox! token explained in previous examples.