Reverse Geocoding in R

The only caveat is that detailed locations like address names are not always available.

Below is a sample of how to utilize the revgeo package:library(revgeo)revgeo(longitude=-77.

0229529, latitude=38.

89283435, provider = 'photon', output=’frame’)So where is the problem? Well, as stated on Photons webpage:You can use the API for your project, but please be fair — extensive usage will be throttled.

We do not guarantee for the availability and usage might be subject of change in the future.

I am not certain how many queries it takes before the Photon API slows down but it is important to be mindful in how many requests we send to their server.

I decided to start with 500,000 coordinates to reverse geocode but this didn’t work well.

I ran the code and walked away for some time and when I came back I saw the throttling had begun so I needed to tweak the code.

In addition, R was throwing an error cannot allocate vector of size x.

x Gb, which means that my available RAM has been exhausted.

At this point I had two issues: 1) Throttling and 2) Memory Allocation.

For issue 1, I needed to incorporate sleep times in the code and work with smaller subsets of my already subsetted dataframe.

For issue 2, I found a thread on stackoverflow that had useful advice:Memory Allocation "Error: cannot allocate vector of size 75.

1 Mb"In the course of vectorizing some simulation code, I've run into a memory issue.

I'm using 32 bit R version 2.


0 (via…stackoverflow.

comA solution that helped me was running memory.

limit(size = _ _ _ _ _ _).

In addition I used the rm() command to remove any dataframes I no longer needed within my code and the gc() command for garbage collection.

Shown below, I loaded in the dataframe with ~1 million coordinates called main.

I subsetted the data to only 100,000 rows.

As you will see later, I subset the data even further in a while loop to avoid memory allocation issues.

library(revgeo)# the dataframe called 'main' is where the 1 million coordinate points reside.

main <- readRDS("main.

rds"))main_sub <- main[0:100000,] # Working with a smaller initial subsetrm(main)gc()Below is the full code.

The script incorporates other actions not relating to this posts subject matter but I wanted to publish it here so that you may see the whole picture and hopefully take away some helpful tips in reverse geocoding.

# Step 1: Create a blank dataframe to store results.

data_all = data.

frame()start <- Sys.

time()# Step 2: Create a while loop to have the function running until the # dataframe with 100,000 rows is empty.

while (nrow(main_sub)>0) {# Step 3: Subset the data even further so that you are sending only # a small portion of requests to the Photon server.

main_sub_t <- main_sub[1:200,]# Step 4: Extracting the lat/longs from the subsetted data from# the previous step (Step 3).

latlong <- main_sub_t %>% select(latitude, longitude) %>% unique() %>% mutate(index=row_number()) # Step 5: Incorporate the revgeo package here.

I left_joined the # output with the latlong dataframe from the previous step to add # the latitude/longitude information with the reverse geocoded data.

cities <- revgeo(latlong$longitude, latlong$latitude, provider = 'photon', output = 'frame')) %>% mutate(index = row_number(),country = as.

character(country)) %>% filter(country == 'United States of America') %>% mutate(location = paste(city, state, sep = ", ")) %>% select(index, location) %>% left_join(latlong, by="index") %>% select(-index) # Removing the latlong dataframe because I no longer need it.

This # helps with reducing memory in my global environment.

rm(latlong) # Step 6: Adding the information from the cities dataframe to # main_sub_t dataframe (from Step 3).

data_new <- main_sub_t %>% left_join(cities, by=c("latitude","longitude")) %>% select(X, text, location, latitude, longitude) # Step 7: Adding data_new into the empty data_all dataframe where # all subsetted reverse geocoded data will be combined.

data_all <- rbind(data_all,data_new) %>% na.

omit() # Step 8: Remove the rows that were used in the first loop from the # main_sub frame so the next 200 rows can be read into the while # loop.

main_sub <- anti_join(main_sub, main_sub_t, by=c("X")) print(nrow(main_sub)) # Remove dataframes that are not needed before the while loop closes # to free up space.

rm(data_sub_t) rm(data_new) rm(latlong_1) rm(cities) print('Sleeping for 10 seconds') Sys.

sleep(10) }end <- Sys.

time()After implementing this code, it took about 4 hours to reverse geocode 100,000 coordinates.

In my opinion, that’s not a viable option if I have 1 million coordinates to convert.

I may have to find another method to achieve my goal but I figured this would be helpful to some of you who have smaller datasets.

Thanks for reading and happy coding!.

. More details

Leave a Reply