Mapping messy addresses part 1: getting latitude and longitude

pip install -U googlemaps2.

Associate your API key with the Geocoding API client and invoke the Geocoding API for a sample address from the cleaned up dataset: “lake shore blvd.

and superior st.

” Note the address passed to the Geocoding API includes both the location from the dataset and the city (which is “Toronto” for all locations in the dataset).


check the latitude and longitude returned to confirm that it matches the input address:Latitude and longitude returned by the Geocoding API match the input locationNow that we have validated the roundtrip from location value to latitude / longitude and back to address, there are a couple of snags to be overcome before we can convert the whole batch of location values.

Snag #1: Geocoding API chokes on innocent looking location valuesThe Geocoding API choked on some location values, but not the ones I expected.

I naively tried testing the API by sending a junk address “asdfasdfjjjj” and got non-empty JSON back:However, when I tried to convert a batch of locations, it failed on a location value that looked fine: “roncesvalles to neville park”To get a batch of locations converted reliably, I had to wrap the Geocoding API call in a function that checks if the returned list is empty, and if so, returns placeholder values:Snag #2: the default daily API limit is too small for the batch of locations I needed to convertWith the get_geocode_result function defined to call the Geocoding API reliably, I was ready to do a batch run to convert the location values.

To minimize the calls to the API I defined a new dataframe df_unique containing only the unique location values:However, when I called the get_geocode_result function to add latitude and longitude values to the df_unique dataframe:I got the following error message:Checking the quota page for my project in the Google Cloud console, I could see that my daily limit for invocations of the Geocoding API was only 1,400.

That’s why I was getting the OVER_QUERY_LIMIT error when I tried to invoke the API for the df_unique dataframe with over 10k values.

To increase my daily limit for geocoding API calls for this project I had to open a ticket with Google Cloud support asking for my daily limit for the geocoding API to be raised:Doing a batch run to convert locations to longitude and latitudeWith my daily Geocoding API limit raised I was able to invoke the API on the df_unique dataframe without errors.

After 1.

5 hours (indicating about 110 API calls /minute) I had a dataframe including the latitude and longitude values for all the distinct locations:Next I created distinct longitude and latitude columns in df_unique and then joined the original dataframe with df_unique:Finally, I have a dataframe containing all the original data plus latitude and longitude values corresponding to the location values:SummaryHere’s a summary of the steps required to get latitude and longitude values to correspond with all the messy locations in the original dataset:Clean up the original dataset to remove redundant locations and reduce the number of unique locationsSet up Python access to the Google Geocoding API by creating a project in Google Cloud, getting an API key, and setting up the Python Client for Google Maps ServicesCall the Geocoding API with the address (location and city) and parse the returned JSON to get the latitude and longitude.

Check for the API returning an empty list, and open a ticket with Google Cloud support to get an increased daily API limit if the number of distinct locations that you want to convert is bigger than the default daily limit.

In the next article in this series I’ll describe how I used these latitude and longitude values to generate maps to visualize the delay patterns from the original dataset.

If you want to try out the code described in this article yourself:The primary notebook for converting locations to latitude and longitude is here.

You will need to get your own API key to run it.

An example input dataframe that you can use for this notebook is here.

Note that the location values in this dataframe have already been cleaned up (lowercased, street names in consistent order) as described in The solution part 1: clean up the location values to reduce redundancy section above.


. More details

Leave a Reply