OpenStreetMap: the data behind the maps
In my last article on OpenStreetMap I looked at the recent mass imports of public data — everything from British oil wells to the entire road network for the United States. But for those interested in more than an alternative to Google Maps, the ability to extract or add data to the project is what really makes OpenStreetMap shine. Whether you want to get an SVG of a campus map or import a local government's database of every building in the city, Linux users will find plenty of tools that cater to their needs.
The export tab on the web site provides the most simple way to access data. Users can draw an area on the main map view and then grab an image (in PNG, JPEG, PDF or PS formats); some HTML to embed the map into your web site; or the raw XML data. To further modify the data, either in the OpenStreetMap database or a local copy (stored as an XML .osm file on your disk) download the data using an editor like JOSM (the 'Java OpenStreetMap editor'). To make life easier when selecting the area to download, open up the preferences dialog and install the namefinder and slippy_map_chooser plugins.
Grabbing larger amounts of data would be difficult, slow and clumsy with these methods. More advanced users can get data directly through the API. Check the latitude and longitude coordinates for the area you want — an easy method for this is to use the export tab to draw an area, then note down the coordinates it records — then fire up wget or curl and download the data:
wget http://api.openstreetmap.org/api/0.5/trackpoints?bbox=left,bottom,right,top
The main api only lets you grab 5,000 points per request; you have to page the request to get the additional data. To pull out a really large chunk of data, or to filter it (for example to just download all the pubs in the city) use the extended OSM API (XAPI, or 'zappy'). Access to really enormous amounts of data, such as the entire planet or a country, can be found in the frequently updated dumps listed on the Planet.osm wiki page.
Once you have the data there are all manner of uses - your GPS navigation device, rendering your own maps for the web or print, or converting the data into another standard GIS format with tools like the Ruby osmlib. The documentation for each tool various enormously, but the toolchains tend to be relatively straight forward.
Of course, extracting data is only half the story. Not only should all good open source citizens be contributing back, but you will get the most value from the data if you collaborate with others in developing a rich data set that will lead to tools and use cases you can later replicate.
OpenStreetMap abounds with methods and tools for entering data. You might like the "old school" method of tracing a breadcrumb GPS trail — much more fun in the early days when I mapped much of Reading with some friends from a completely blank slate. Many mappers have traced basic road layouts and buildings from aerial imagery donated from Yahoo! so that others can go in and identify street names and points of interest. The main editing tools are Potlatch, a flash interface on the main web site (just click on the 'Edit' tab once you're zoomed into your local area), and the previously-mentioned JOSM. The wiki has plenty of guidance.
When importing large sets of existing data, things get a little more complicated. The first step is to step back and have a good think. Imports can cause two kinds of headaches for other contributors if done wrong: you might put a load of new data over the top of somebody else's efforts and make a complete mess in the process; or worse, you might import data without proper permission, causing legal difficulties for the project and technical difficulties in taking the data back out again.
It's always best to begin by asking a few questions on the relevant mailing list; there are localized lists for many areas, a general (high traffic) "talk" list, and a "legal-talk" list for legal issues such as licensing for imports. It's especially important to avoid convenient interpretations of web site notices regarding copyright and database rights when deciding if you can import the data. You need to get written confirmation so that the OpenStreetMap project is immune from legal attacks. There are some nice general guidelines on the wiki, which are worth a read.
If you have data with written permission to use it, you can begin the import process. The first, and most laborious, step is to map out the data against standard OSM tags, as in this UK public transport example or this really comprehensive exercise for CanVec data. You'll notice that oftentimes source-specific data (like unique IDs for features and really niche data) is retained in a namespace like "CanVec:FID" and "naptan:StopAreaCode". This can also be useful where you don't want the data to appear until volunteers have gone through checking it against existing data in the database, for example to merge two bus stops (one crowdsourced, the other from the import).
For large chunks of data, importers have tended to write custom scripts to then bring the data in. If the data is in the OpenStreetMap format, and it is in a state suitable to go straight into the database, this bulk import script makes the process quick and painless. The Canvec2osm code shows how to pull in more complicated data; this converts 11 different shape files into themed osm files with correct tagging, which can then be worked into a suitable state for importing.
A more cautious approach can be appropriate in areas with a lot of existing data. One quite technically challenging route is to set-up your own Web Map Service (WMS) using a tool like mapserver, and then set-up the JOSM WMS plugin to pull those maps in as a layer underneath your map data so it can be traced. This Map Warper tool is in beta and tries to make this process easier. If the data is quite simple you could just put the source and editor side-by-side on your screen and use your judgement to copy over points of interest.
However you want to proceed, you're probably best off getting in touch with some local or more experienced community members. Interested people could even just lobby local government officers and public institutions to get the data, then pass it along to somebody with more of an appetite for the technical stage. Given 6 months to study, process, and import the data, you should find richly detailed maps and underlying data available under a Creative Commons BY-SA license; the license, incidentally, may soon change to one more suitable for databases. Whatever you do, just remember to have fun.
| Index entries for this article | |
|---|---|
| GuestArticles | Chance, Tom |