moment: I could create subsets of the data and relationships that I wanted to visualize using SQL queries, and then converting the SQL query results into CSV files that most visualization libraries could process.
The DataNow that I’ve shared some of my pain with you, I should also share some of my results.
As previously mentioned, from January 2018 to February 2019 there were 19,459,370 trips registered.
After some cleaning, slicing, and wrangling, my final working data set was reduced to 17,437,855 trips.
These are trips by subscribers only, as I decided to drop casual riders and one-day customers.
Total annual membership of Citi Bike now stands at 150,929 subscribers — as per to Citi Bike’s monthly report.
Let’s take a look at who they are.
Who is riding Citi Bike?There isn’t much information available on each individual subscriber, but from the data, we were able to get age and gender based on aggregate ridership.
These aggregations didn’t give the exact number of subscribers rather, the underlying distribution of the sample.
This was a great learning opportunity to make some interactive plots using Plotly.
I found that it can be a little cumbersome at first, understanding the figure hierarchy.
js library — so after all, I did get to use D3.
js a little.
The top birth year categories for subscribers are from 1987 to 1991.
Let me make the disclaimer again, Citi Bike has 150, 929 subscribers today, in order to get the distribution of those subscribers I used aggregate functions on the ridership data, as you’ll see in the code snippets below.
Pandas DataFrame Capturing Birth Year SQL QuerySubscribers by Year of BirthCode for Interactive Plotly Bar ChartGenderwise, the majority of the riders were male.
Pandas DataFrame from SQL query to identify Gender distributionAn interactive bar chart showing subscribers by gender.
Male (1), Female (2)How long are they riding?The average trip duration is 13 minutes, meaning that Subscribers are not riding long distances — remember, I dropped casual riders and one-time customers.
We can also take a look at the average trip number of trips per day of the week, and as expected, the numbers were slightly higher during weekdays than during the weekend.
Showing the difference between weekend rides and weekday commutes.
An interactive plot where circle size represents the average trip duration.
Finally, I was interested in the relationship between the length of the ride and the day of the year.
An interactive plot showing the number of trips per day of the year.
This plot gives a complete self-explanatory picture of the fluctuation in the demand for Citi Bikes throughout the year, the demand being higher in the April-October interval when the weather is warmer.
This got me curious about the relationship between the weather and the number of trips in a day, so I created a new Pandas DataFrame with the weather summary for each of the days in my original DataFrame using data from the National Oceanic and Atmospheric Administration.
I then ran a Multiple Regression algorithm using the Scikit Learn library.
Multiple Regression using Scikit LearnAs it turns out, 62% of the variance in the number of trips per day can be explained by the weather.
Where are they going?Some Citi Bike docking stations were certainly more popular than others.
To map the most popular ones, both by start and end of the bike ride, I used the Folium library.
But before I could actually make an interactive map, I ran a SQL to get the top Citi Bike dock stations by rides volume.
I tried mapping all of them the first time but, eventually, the program crashed.
So I decided to settle for 100.
A video that captures the functionality of a Folium interactive mapCode to create an interactive map using the Folium libraryThe top 5 stations by volume of riders are:Pershing Square North with 1,576,381 tripsW 21 St & 6 Ave with 1,148,192 tripsE 17 St & Broadway with 1,121,953 tripsBroadway & E 22 St with 109,7314 tripsBroadway & E 14 St 96,901 tripsTo put them in context, these are dock stations near Grand Central Station, Madison Square Park, Union Square Park, and Flatiron Building.
What about Bike 30657?Finally, I would like to honor the bike with the most trips this last year, Bike 30657 — you’re now wondering if you’ve ever ridden it, aren’t you?.
With 2776 rides, and a total of 36,448 minutes of traveling this bike probably knows New York City better than I do.
And so, as a parting gift, I’ll leave you with Bike 30657 so you can see it in action.
SourcesThe Angel Who Keeps Citi Bike Working for New YorkCiti Bike BlogCreative Interactive Crime Maps Using FoliumCiti Bike Monthly Report — March 2019Large Data Files with Pandas and SQLiteThe Hitchhiker’s Guide to D3.
gl ArcsThe Book of Shaders.