# Avoiding Parking Tickets in San Francisco Using Data Analytics

Avoiding Parking Tickets in San Francisco Using Data AnalyticsThis is a short(er) write up that summarizes and expands upon findings from a recent project..The hypothesis is that streets with higher volume are less susceptible to residential overtime tickets, a product of traffic enforcement officers fearing the very traffic they patrol.The data collection included about 2 1/2 years of parking ticket data, address locations, planning neighborhood zones, and traffic volumes..I made heat maps, volume maps, ticket maps, and even videos that show the street sweepers routes over time, or tickets given out over a day..The lowest volume (population 1) would be the streets I would avoid, and the highest (population 10) would be the streets I tried to park at..The resulting graph is shown below, which showed a fair correlation that this would work.On average , I could reduce the amount of tickets by 35% by choosing the highest volume streets vs..So I created an Ordinary Least Squares regression model that included all volume features, and I found that I could correctly (to a certain degree) identify which streets were likely to have more tickets..I also had a new feature to play with in parking supply, so I converted this into parking density..We could also reduce the amount of tickets by over 50% if we chose the best population compared to the worst..The values for best and worst are ratios in comparison to the average.Note: You could easily make a case that this experiment is also bias, considered the metric itself is a product on the parking supply feature(and in turn density)..However, that just boils down to questioning the data, and in my opinion the added feature was normally distributed enough and mostly valid.The worst fitted population resulted in twice as many tickets, and distinguishing features included lower parking supply, and lower parking density..The best population had higher volumes, and higher densities..This time around I also decided that I should log fit the tickets, as looking at the probability plot showed it would normalize my predicted variable.Upon searching for the best parameters using cross validation, I ended up with the tree shown below.As you can see, parking per mile was the most important feature again..Still not stellar, but double all others.In conclusion, if we take a couple million records of ticket data, and pair it with some street volumes and features, we can identify features of streets that are less likely to get you a residential overtime ticket.. More details