Given a GPS trace, how do you find the matching streets in a given road network? For instance, think of the GPS trace of a taxi making its way through a city. How do you identify the actual streets the taxi used? Solving this problem unlocks important applications like improving street network data or computing traffic statistics.

The challenge is that GPS is prone to disturbances and inaccuracies. Any given position in a GPS trace may be off by a few feet or a lot more. Consider two one-way streets going in opposite directions among tall buildings. The buildings introduce a fair amount of noise.

The naive approach would be to choose the nearest street segment for each GPS point resulting in the red line. Clearly, this is not desirable, especially since the street could not even be traversed in that direction! Luckily there’s a solution to this problem introduced by Paul Newson and John Krumm in their 2009 paper on map matching.

A key component of Newson and Krumm’s approach are so called Hidden-Markov-Models which are based on Markov Chains. Markov Chains are popular as many random processes can be modeled with them. Given a set of states and a function that outputs the probability of transitioning from one state to another state it yields a mathematical model of the process. A typical question a Markov-Chain helps answer is: *Given state A, how probable is state B after 2 steps?*

Lets say the states of our Markov Chain are street segments. How would you measure the probability to transition from one street segment to any other street segment? It is fair to assume that a taxi can only transition to streets that are nearby. Basing probability on driving distance seems like a good idea. However the taxi might drive faster, which would change the distance between the sample points and ultimately the transition probabilities. Newson and Krumm found that a large difference between the driving distance of two street segments and the straight line distance between the corresponding GPS samples point to an unlikely transition. More precisely the difference of the two values follows an exponential distribution. That distribution can be used as model to estimate the transition probability.

This equips us with the tools to measure the probability of a given sequence of street segments. However we want to *find* the most probable sequence of street segments for given trace of GPS points.
This is where *Hidden-Markov-Models* come in. They where popularized and applied by Lawrence Rabiner
in his paper from 1989 on speech recognition. Hidden-Markov-Models enable you to find the most probable state sequence for a given sequence of observations.

For map matching, the observations are the GPS points. For each street segment the probability of a given GPS point matching the street is inverse-propositional to its distance. More precisely, Newson and Krumm model this using a Gaussian distribution:

Once we are able to compute the probabilities of observation and transitions we need an algorithm to identify the most probable sequence of states. This problem can be solved with the Viterbi Algorithm. The basic idea is to choose the most probable previous state as the predecessor of each state in a sequence. This way the algorithm extracts the entire state sequence by going back from the most probable last state.

This map matching approach is now available 100% open source in form of a plugin for the Open Source Routing Machine which we just merged back into the mainline branch. To learn more, join us tomorrow in our OSRM routing session at FOSS4GNA led up by my colleague Dennis.