- Spring 2020
As someone who occasionally works as a cycle courier in London, I have something of a fascination for London geography. I had an idea quite a while ago to build a map-based game involving some sort of competitive street knowledge, and briefly spent some time trying to obtain an authoritative list of central London streets and their associated postcodes, but after scraping a couple of dodgy “directory”-type websites and realising the incredibly poor quality of the dataset I obtained, I gave up.
For some reason I was also convinced that the UI could be easily built on top of Google Maps, but this turned out to be a bit optimistic, as the API doesn’t really allow any access to the underlying data or support the kinds of interactions I wanted.
Fast-forward a couple of years and a chance encounter with OpenStreetMap reminded me of my idea and rekindled my interest. Surely OSM would offer the kind of data I needed. I refined the idea – an interactive central London map view with all the labels turned off, with the game mechanic requiring players to guess the location of a given street, with points being awarded for accuracy. It was mostly calibrated to be something I would personally enjoy fiddling with – with an eye on maybe being of interest to couriers and other itinerant London folks.
I realised I was going to need to gather a bunch of data before I could start building the app itself. I started researching OSM’s Overpass API (via the amazing Overpass Turbo interactive client) and started to understand the underlying OSM data. I would need a dataset of what OSM calls
ways (streets, but also paths, railways, trails, and many other types) and then some way of filtering by postcode area, as I had decided that “central London” should mean the central postcodes of W1, WC1-2, EC1-4, SW1 and SE1.
I ran into two issues early on. The first was that what we might consider a complete street can be made up of an arbitrary number of segments in OSM, sharing the same name but not necessarily being contiguous. There’s a relatively recent addition to the API in the form of the
complete statement, which is supposed to recurse into a result set in order to produce complete ways, but I couldn’t get it to work. So I decided instead to post-process the data in order to group all those disparate
ways into logical streets myself. More on that shortly.
The second issue was how to obtain data that would allow me to identify the geographical area of postcodes – or more specifically, postal districts, e.g. “W1”. Postcode data in the UK is the property of Royal Mail and (legitimate) full access to it is only possible via payment of a license fee. Fortunately I found Open Door Logistics, who write vehicle routing software and have done a lot of this work already. They used their access to the postcode data to produce a derived data set of postcode boundaries across the UK, and released it in the TopoJSON format.
Working in a Jupyter notebook, I converted this TopoJSON source into GeoJSON before filtering it to include only the central postcodes listed above, and then combining the various sub-districts (e.g. “W1A”, “W1U”) into single geometries. A couple more cleanup steps and these geometries were ready for use.
From there I moved back to streets. My work on the postcode geometries had produced an overall polygon which I could simplify and use as an input parameter to the Overpass API. Once I’d figured out the
way types I was interested in, I could run a query to retrieve everything inside that polygon. This produced about 10MB of data which I would then process further.
The final processing involved two main tasks. The first was to obtain a complete list of “logical streets” by grouping together
way geometries by name, but taking care to correctly handle duplicate names (e.g. the three “King Street” instances in central London). I made a pass over the raw OSM
way data and created a list of “approximate streets” based on names. A second pass then reviewed the geometries of each approximate street and measured their distance from one another in order to create distinct geographical groupings sharing the same name.
Having obtained a list of logical streets, the second task was to determine which postcode districts covered each one. This involved another iteration of the streets list, creating Shapely objects from each and comparing these to the polygons of each postal district to check for intersection. Matching districts were then appended to each street object’s properties. Transforming the final data back into GeoJSON yielded a slimmer dataset of around 1.2MB.
At last! I was ready to work on the actual app. I wanted to get some React practice, so I built it out using a collection of components inside a React 16 app started from
create-react-app – I began in September last year, though, so Hooks weren’t yet a thing, and by the time I’d finished I had to restrain myself from immediately rewriting it using Hooks. I used react-leaflet to provide a bridge between the Leaflet maps library and React, and Mapbox as the tile provider to serve up a nice custom tileset with all the labels removed. I use react-router’s HashRouter to provide in-browser routing.
Despite setting out to get some useful React practice out of this project, implementing the game mechanics and figuring out the sharp edges of react-leaflet ended up becoming a bit of a distraction, and the React implementation I ended up with is not quite what I would aim for if I did the front-end development again today. As mentioned, I would want to integrate Hooks, and possibly use a lightweight Redux integration to manage some of the state. But it’s taken me a while to find the time to get this finished, so those improvements will have to wait. It's certainly been a lot of fun getting this over the line, and I have indeed learned a lot (about React and much more). Happy clicking!