Find your way with the power of PostGIS & pgRouting
Remember the last time we helped Santa analyse statistical data using basic features of PostGIS? If not, take a time to read it first, cause now it’s time to move on to some serious stuff 😈 We’ll take a look at how to speed up queries execution and try to find the shortest path to our desired destination.
In the first tutorial we ended up writing simple queries with geospatial data using PostgreSQL and PostGIS. This powerful extension allowed us to add geospatial context to our data and analyse it from the totally different perspective, like calculating the distance between elements, joining tables based on objects’ position and much more. The problem we encountered was the time of queries execution even on the small dataset e.g. the Problem 2 with geospatial join of two tables (3500 and 250 rows) from the last post took about 3.5s to finish. A quick look at the EXPLAIN ANALYZE
function result shows the problem.
Sort (cost=229281.12..229281.74 rows=246 width=23) (actual time=3233.233..3233.254 rows=246 loops=1)
Sort Key: (count(children.id)) DESC
Sort Method: quicksort Memory: 44kB
-> GroupAggregate (cost=229266.01..229271.35 rows=246 width=23) (actual time=3232.137..3233.140 rows=246 loops=1)
Group Key: countries.id, countries.name
-> Sort (cost=229266.01..229266.73 rows=288 width=19) (actual time=3232.125..3232.444 rows=3619 loops=1)
Sort Key: countries.id, countries.name
Sort Method: quicksort Memory: 309kB
-> Nested Loop Left Join (cost=0.00..229254.25 rows=288 width=19) (actual time=25.742..3230.079 rows=3619 loops=1)
Join Filter: ((countries.geom ~ children.geom) AND _st_contains(countries.geom, children.geom))
Rows Removed by Join Filter: 861175
-> Seq Scan on countries (cost=0.00..32.46 rows=246 width=1996) (actual time=0.020..0.445 rows=246 loops=1)
-> Materialize (cost=0.00..87.73 rows=3515 width=36) (actual time=0.000..0.288 rows=3515 loops=246)
-> Seq Scan on children (cost=0.00..70.15 rows=3515 width=36) (actual time=0.023..1.331 rows=3515 loops=1)
Planning time: 1.274 ms
Execution time: 3433.486 ms
For each country, the database is doing a sequential scan of all children objects performing complex operation of checking whetherchildrens.geom
is within countries.geom
or not, which gives 864'690 checks. Can we make it faster? Yes we can, let’s add indexes!
Spatial indexes
But how..? The normal way of adding index fails because we are trying to create it on a binary column which exceeds the max size.
=# CREATE INDEX children_gix ON chidren (geom);
ERROR: index row requires 11340 bytes, maximum size is 8191
The answer is to use special GIST index which instead of treating geom as binary data, uses its bounding box. The detailed explanation on how it works can be found here. To use this feature simply add USING GIST
clause to your command.
=# CREATE INDEX children_gix ON chidren USING GIST (geom);
Now it’s time for some benchmarks. The Problem 2 shown before now takes only 400ms to finish so it’s almost 10x faster!
Pathfinding
Now as everything works fast, we can move to more advanced problems like finding the shortest and most optimal path between set of points. However, PostGIS doesn’t have built-in functions for performing such calculations, there is another extension named pgRouting which integrates with PostGIS and adds geospatial routing functionality to your database. That’s exactly what we were looking for!
Setup
pgRouting setup is easy on both OSX and Ubuntu operating systems. For detailed instructions for any OS visit official manual.
OSX
Use Homebrew package manager and run:
brew install pgrouting
Ubuntu
# Create /etc/apt/sources.list.d/pgdg.list.
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
# Import the repository key, update the package lists
sudo apt-get install wget ca-certificateswget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -sudo apt-get update
# Install pgrouting based on your postgres Installation: for this example is 9.3
sudo apt-get install postgresql-9.3-pgrouting
Features
Routing functions
pgRouting has support for wide range of routing algorithms eg.:
pgr_astar
Shortest path A*pgr_dijkstra
Dijkstra’s shortest path algorithm
Each of these functions takes as an input an edge_sql
(SQL query, which should return a set of rows with the following columns: id, source, target, cost), start_vid
(id of the starting vertex), end_vid
(id of the ending vertex)
Database
Like PostGIS, pgRouting is also an optional extension, so before using its features you have to enable it in your database.
$ createdb pgr_tutorial
$ psql pgr_tutorial
pgr_tutorial=# CREATE EXTENSION postgis;
CREATE EXTENSION
pgr_tutorial=# CREATE EXTENSION pgrouting;
CREATE EXTENSION
Now, when everything is ready we can load some data. To explain the basics of pgRouting I’ll be using the public roads network of Masovian Voivodeship in Poland, downloaded from Geofabrik and converted using os2po tool. You can download it here [12MB].
gunzip -c roads.sql.gz | psql pgr_tutorial
Roads network
The only table in the dataset is roads
which represents road network as LineStrings.
CREATE TABLE roads (
id integer NOT NULL PRIMARY KEY,
geom geometry(LineString, 4326)
);
Network topology
As we have seen before, routing functions take as an argument network of connections in the form of a graph (start node id, end node id), so the last thing to do is to convert our actual road network to a graph. Fortunately, pgRouting helps us a lot with this task.
Start with adding two additional columns to our roads
table.
ALTER TABLE roads ADD COLUMN source integer;
ALTER TABLE roads ADD COLUMN target integer;
CREATE INDEX roads_source_idx ON roads (source);
CREATE INDEX roads_target_idx ON roads (target);
And then use pgr_createTopology
function which builds a network topology based on the geometry information (it analyses roads geometry and automatically assigns node ids to the source
and target
columns).
SELECT pgr_createTopology('roads', 0.0001, 'geom', 'id');
After about 5 minutes of calculations the dataset will be ready.
Cost
Since most of the algorithms need cost parameter assigned to each edge (road), we will use dynamically calculated length of each road using PostGIS st_length
function (in the real application we should consider a lot of additional parameters such as type of roads, traffic, velocity limits etc.).
Start navigating! 🚗💨
After this short introduction let’s move to some real-world examples and find the shortest path between nodes #1 and #5000 using Dijkstra algorithm.
SELECT seq, node, edge, cost as cost, agg_cost, geom
FROM pgr_dijkstra(
'SELECT id, source, target, st_length(geom, true) as cost FROM roads',
1,
5000
) as pt
JOIN roads rd ON pt.edge = rd.id;
After about 1.5s the results appear (The total path length is about 57km):
You may ask what are the nodes #1 and #5000? The answer is simple: just random numbers to verify if everything is working well. 😄 Of course the most common usage of routing applications is to find path between points on the map (lat, lon), not random nodes. To achieve that we have to find the nearest node to our desired locations, and then use the same query as before. So let’s find the way from our office (52.231909, 20.983455) to the Warsaw Chopin Airport (52.169647, 20.973400).
Query to find nearest node (PostGIS uses reversed order for lat lon):
SELECT source
FROM roads
ORDER BY ST_Distance(
ST_StartPoint(geom),
ST_SetSRID(ST_MakePoint(20.983455, 52.231909), 4326),
true
) ASC
LIMIT 1;
Final query:
SELECT seq, node, edge, cost as cost, agg_cost, geom
FROM pgr_dijkstra(
'SELECT id, source, target, st_length(geom, true) as cost FROM roads',
(SELECT source FROM roads
ORDER BY ST_Distance(
ST_StartPoint(geom),
ST_SetSRID(ST_MakePoint(20.983455, 52.231909), 4326),
true
) ASC
LIMIT 1),
(SELECT source FROM roads
ORDER BY ST_Distance(
ST_StartPoint(geom),
ST_SetSRID(ST_MakePoint(20.973400, 52.169647), 4326),
true
) ASC
LIMIT 1)
) as pt
JOIN roads rd ON pt.edge = rd.id;
The result obtained from Google Maps is almost the same as ours!
As we can see in this short tutorial PostGIS & pgRouting is simple, yet powerful set of tools only waiting for you to use it in your new application. It can work with roads, public transport, flights network and much more. Why not try building Google Maps competitor in a week or less? Or integrate navigation in your new “Tinder for …” app? Possibilities are almost infinite!
If you enjoyed this post, please don’t forget to tap ❤! You can also follow us on Facebook, LinkedIn and Twitter.