We know we want just data related to reported pot holes.
According to the README doc posted by 311 and looking at the open data catalogue, we also know the following information:API record limits: 1000 records at a timepot hole fixing requests have a service code : CSROWR-12jurisdiction_id will always be ‘toronto.
ca’agency_responsible: always set to 311 Torontoservice_notice: not returnedzipcode: not returnedWhat do we need?we need to decide on a start and end date ranges to construct our query.
Picking a Date Range and Working with API LimitWhen deciding a date range, it is hard to pick a meaningful time that shows us valuable results.
Since our analysis is on pot holes some prior knowledge about the cause of pot holes can be helpful.
During the time of deep freezing and thawing results in cracks on the road which is responsible for causing pot holes.
As I am writing the article, Toronto is going through periods of freezing and thawing resulting in the city experiencing higher than usual reports.
Based on this knowledge, it will be interesting to view the data for the last 4 months.
Given that we want 4 months of data, it is important to keep in mind the maximum record limit that is set in place for API response.
The 311 Toronto API has a limit of 1000 records in its response object.
Having a size limit or a rate limit when working with APIs is fairly common.
This is a way to ensure that 311 Toronto’s servers are not overloaded trying to full-fill a bunch of requests and can provide a good quality of service.
In order to comply to 1000 record limit, I first spot checked total number of records from various months.
I found that on average data from a month period is below the 1000 record limit.
Given that recently the city has had higher than usual reports, our data may get cut off by the limit but we should still still have enough data points for our visualization.
We are going to take our date range (4 months) and partition it into 29 day periods and make synchronous requests for each of the 29 day chunks.
Now that we have a function that chunks the days, we will go ahead and pick our start and end range dates.
We know that pot holes are problematic during the season of freezing and thawing around late winter and spring.
Knowing this, it would be interesting to look at data from this winter because we have had some alternating cold days and warm days.
We have defined our date parameters below but feel free to grab this notebook from my Github and change the dates around for more insight.
If you run the notebook in my repository or create your own version based on this tutorial, feel free to play around with the date ranges.
Using the function written above, we have a list of dates reflecting the chunks of date ranges.
From this list, I am going to take every even index item as my start date and every odd item index as my end date.
datetime(2018, 11, 1, 0, 0), datetime.
datetime(2018, 11, 30, 0, 0), datetime.
datetime(2018, 12, 1, 0, 0), datetime.
datetime(2018, 12, 30, 0, 0), datetime.
datetime(2018, 12, 31, 0, 0), datetime.
datetime(2019, 1, 29, 0, 0), datetime.
datetime(2019, 1, 30, 0, 0), datetime.
datetime(2019, 2, 28, 0, 0), datetime.
datetime(2019, 3, 1, 0, 0), datetime.
datetime(2019, 3, 4, 0, 0)]We see in the above output, the first range is 2018–11–01 to 2018–11–30.
The second range is from 2018–12–01 to 2018–12–30 and so on.
We see that every even position (0, 2, 4, etc.
) on this list are the start dates and odd positioned items are end dates.
Make an API RequestBase URL: https://secure.
json?Using the parameters we already know from the 311 Readme doc, we can add parameters like service_code, jursidiction_id, start_date, and end_date.
Our API request will take each of the start and end date range partitions from above.
We will make 5 in total.
Now that we have a giant list (data_clob) containing nested JSON of the returned results, we see that every item starts with the key ‘service_requests’.
We are interested in the value of each ‘service_requests’.
View data in Pandas DataFramePandas can read your data from a bunch of formats like csv, dictionary, lists, SQL queries and put it into a data frame for you.
If you explore this dataframe, you can see that we have some important columns like long (longitude) and lat(for latitude).
More CleaningBased on the information from the README doc, we can drop the following columns: ‘agency_responsible’, ‘service_notice’, and ‘zipcode’.
Calculating Investigation days and repair days neededWe can calculate the estimated number of days for investigating a report based on the time different between ‘requested_datetime’ and ‘updated_datetime’.
The ‘expected_datetime’ seems to indicate the expected date to fix the pot hole but it commonly gets populated with a constant date value.
I am unsure as to what the reasoning behind this auto-population is for some reports.
In the gist below, I take string date values for requested_datetime, updated_datetime, and expected_datetime and convert to datetime object with pandas’ to_datetime method (line 4, 5, and 6).
Once we have our dataframe, we calculate the mean days required for investigations and repairs.
Using these mean values, we set up a threshold that decides if a service request was slow or quick.
Using the breakdown above, we took the corresponding longitude and latitude to map the location of the pot holes using mplleaflet.
Slower than average response/ investigations (blue dot on map)Quicker than average response/ investigations (black dots on map)Slower than average repair (red square on map)Quicker than average repair (green square on map)The file above saves as an HTML which contains our final visualization.
You can view the interactive result here: http://www.
htmlConclusionSo far we have learned about working with structured data like JSON objects, making API calls through GET requests, cleaning data with pandas, and visualizing the cleaned data through matplotlib.
Now that you have a clean dataframe, feel free to investigate problems or questions that you maybe wondering about your city.
If your city has an available open data catalogue like 311 Toronto, try mimicking this tutorial and maybe you will find out some interesting insights!.I hope this tutorial was helpful and I am willing to answer any questions to the best of my knowledge so feel free to comment.
GitHub Repositorymmonzoor/introductory_pot_hole_vizContribute to mmonzoor/introductory_pot_hole_viz development by creating an account on GitHub.