Using OpenStreetMap tiles for Machine LearningExtract features automatically using convolutional networksRobert KyleBlockedUnblockFollowFollowingJan 31Performance of the network when predicting the population of a given tileOpenStreetMap is an incredible data source.
The collective effort of 1000s of volunteers has created a rich set of information that covers almost every location on the planet.
There are a large number of problems where information from the map could be helpful:city planning, characterizing the features of a neighborhoodresearching land usage, public transit infrastructureidentifying suitable locations for marketing campaignsidentifying crime and traffic hotspotsHowever for each individual problem, there is a significant amount of thought that needs to go into deciding how to transform the data used to make the map, into features which are useful for the task at hand.
For each task, one needs understand the features available, and write code to extract those features from the OpenStreetMap database.
An alternative to this manual feature engineering approach would be to use convolutional networks on the rendered map tiles.
How could convolutional networks be used?If there is a strong enough relationship between the map tile images and the response variable, a convolutional network may be able to learn the visual components of the map tiles that are helpful for each problem.
The designers of the OpenStreetMap have done a great job of making sure the map rendering exposes as much information as our visual system can comprehend.
And convolutional networks have proven very capable of mimicking the performance of the visual system — so it’s feasible a convolutional network could learn which features to extract from the images — something that would be time consuming to program for each specific problem domain.
Testing the hypothesisTo test whether convolutional networks can learn useful features from map tiles, I’ve chosen simple test problem: Estimate the population for a given map tile.
The USA census provides data on population numbers at the census tract level, and we can use the populations of the tracts to approximate the populations of map tiles.
The steps involved:Download population data at the census tract level from the Census Bureau.
For a given zoom level, identify the OpenStreetMap tiles which intersect with 1 or more census tracts.
Download the tiles from a local instance of OpenMapTiles from MapTiler.
Sum the population of the tracts inside each tile, and add the fractional populations for tracts that intersect with the tileVisualizing the census tracts which overlap with the 3 example tilesThis gives us:Input X: an RGB bitmap representation of the OpenStreetMap tileTarget Y: an estimated population of the tileTo re-iterate, the only information used by the network to predict the population are the RGB values of the OpenStreetMap tiles.
For this experiment I generated a dataset for California tiles and tracts, but the same process can be done for every US state.
Model training and performanceBy using a simplified Densenet architecture, and minimizing the mean-squared error on the log scale, the network achieves the following cross-validation performance after a few epochs:The squared error of 0.
45 is an improvement on the 0.
85 which you would get if you just guess the mean population each time.
This equates to a mean-absolute error of 0.
51 on the log-scale for each tile.
So the prediction tends to be of the right order of magnitude, but off by a factor of 3X (we haven’t done anything to optimize performance, so this isn’t a bad start).
SummaryIn the example of estimating population there is enough information in OpenStreetMap tiles to significantly outperform a naive estimator of population.
For problems with a strong enough signal, OpenStreetMap tiles can be used as a data source without the need for manual feature engineeringCredits:Many thanks to all the volunteers behind OpenStreetMapThe US government which makes the census data freely availableOpenMapTiles for providing a map rendering service for research purposesOriginally published at shuggiefisher.