Bulk loading data into OpenStreetMap

Some of you who were at WhereCampUK, Nottingham in November 2010, may recognise the discussion in this blog post as it is based on my talk there.

Since the early days of OpenStreetMap, people have been bulk importing data into the database. There have been various issues with many of the previous import methods, which I’ll go through, before explaining the latest methodologies to be employed.

First up was the public domain TIGER data in the US, which is produced by the US Census Bureau. It varies hugely in quality from county to county and was seen as the only way to start the mapping in the US. The prospect of the data being imported delayed many people from starting to create the map in the US. However with the data not being great, it is difficult to improve the data as a newcomer, compared to using clean osm data.

There are on going projects to try and fixup the Tiger data. These were kick started by the CloudMade London dev team when it was realised that you could not route from one side of the US to the other, like you could in Europe. Tools were built to help highlight connectivity problems and make the road network routable.

After a Freedom of Information request the Royal Mail released the rough location of all of their post boxes (they didn’t have the exact locations). Matthew Somerville built an app for making it easier to locate all the post boxes and tick them off the list. The post box data from OpenStreetMap is regularly imported and checked against the list of post boxes supplied by the Royal Mail based on their reference number with the lists updated to show which post boxes we still don’t know the exact locations of. Some mappers have gone and made sure that whole postcode districts have all their post boxes in OpenStreetMap. Over time the novelty has worn off when playing the game of hunt the post box, or it has become an distance issue to try and find some more.

Naptan data is the public transport access point database in the UK. It has been generated by local councils using a GPS on the ground. This means that the data doesn’t have any issues with copyright from the ordnance survey, which they would have if they were positioned using ordnance survey maps. The Naptan data has been imported region by region into OpenStreetMap. There have been a few communities who have taken the checking of the Naptan data seriously to ensure that OSM has every bus stop in the correct position. They have often been in contact with local councils or the appropriate Traveline department to improve the original Naptan data where there are errors.

Novam is a tool that was built to make it easier to see which imported bus stops had been edited with various data added. A few different styles were generated as mappers came up with a consistent way that the data should be. It proved a very useful tool when working with local authorities, as they could see why OSM was valuable, and it highlighted the appropriate data. The tool is primarily used by mappers to see where they need to improve the OSM data.

When the Ordnance Survey Open Data was released on April 1st 2010, there was some discussion about how the data should be used. The consensus is that the data should not be blindly imported, rather local knowledge should be used to update the OpenStreetMap data. There are cases where the Ordnance Survey data that has been released is already out of date due to changes on the ground since the data was last collected or released. There have been cases where people have changed the correct OpenStreetMap data for the out of date or incorrect Ordnance Survey data. This is one of the reasons why you need to have local knowledge to be able to get good OSM data. Some tools have been implemented so that OSMers can improve the completeness of their dataset based on OSM. These tools are being used actively to increase the OSM coverage and accuracy.

The Bike Shop Locator was built by Andy Allan and myself as a tool to show a different method of importing data into OpenStreetMap for cases where there is significant portion of the data already in OSM, some of the data being imported may be inaccurate as the shop on the ground either doesn’t exist or has since changed name, and all of the data only having an approximate location to within a post code. The automation of the ticking off of this list is somewhat hard due to the fuzzy matching required for the names and other attributes. Also when visiting an area again to check to see exactly where the bike shop is you’ll spot other things in the OSM data that needs to be edited in the area.

WhereCampUK, November 2010 location:

For the presentation, I flipped between the following web pages in different tabs in my web browser:

Leave a Reply