Letter-Spacing in a Multilingual World
On Wednesday at the FOSS4G North America conference here in Washington, DC I spoke about some of the technical and design details that went into the cartography of MapBox Streets. One of the subjects I touched on was typography and methods we are using to make it work well in many different languages at once.
The main typographic tool I discussed was letter-spacing because of the various challenges we ran into with it. This is a design element that is useful to indicate that a label covers a relatively large area, and something we also apply to street names to keep the letters from bunching up around curves.
In the past this would only work for labels positioned on points, not labels positioned along lines (as road names often are). For a long time we took advantage of Mapnik’s replace() feature to fake letter-spacing in these situations by replacing every character in a label with itself plus a space (or multiple spaces). This is no longer necessary with bleeding edge versions of Mapnik.
There is still a problem though. Our map of the world contains text in many different languages and writing systems, including scripts like Arabic and Bengali. The characters in these scripts are meant to flow together and remain attached, not spread apart at all. Applying letter-spacing effectively breaks such text.
To get around this we come back to the same text replacement hack we were using before, but with a more specific expression to match.
Here we are replacing most characters with themselves plus a space, however the regular expression does not match characters falling within the Unicode blocks for Arabic and Bengali characters. (Note that as of TileMill 0.9.0 the replace feature does not support Unicode text in the Windows or Mac versions, so this is a Linux-only example for now. The next update should address this issue.)
Doing these replacements at the character level is important because labels are not necessarily restricted to a single language.
Something important to note in the above code snippet is that the replacement space is a non-breaking space. This is necessary for labels placed on points where there will potentially be a line break. Using a normal space in that situation could cause the label to break in the middle of a word.
Another thing about spaces - there are lots of different kinds. Wikipedia lists about two dozen in Unicode. What’s cool about this is that you can use some of them to alleviate the lack of control you have with the replacement hack versus traditional letter-spacing methods. Only want to space out your text a little bit? Try a thin space (
U+2009) or a hair space (
U+200A). There’s also a narrow no-break space (
U+202F), useful for slight spacing in multi-line situations.
Ideally we can figure out ways to improve the flexibility of typography tools in Mapnik. Hermann Kraus has proposed a Google Summer of Code project to address issues relating to non-Latin text, which will be an important aspect. But for the short and medium term we’ll continue to look for hacks like this to make sure our maps look great in all languages.
I’ll blog about several other ideas from my Cartography with TileMill, PostGIS, and OpenStreetMap session, and in the meantime here are my slides.