It is summer here in Chicago which means tourists abound and Divvy bikes are everywhere. Awhile ago, and a whole site ago, I posted a little how-to on making calendar heatmaps using the publicly available Divvy data. While that site is gone there are still some links to it out on the internet, one being the awesome Revolution Analytics blog, so instead of leaving people with a 404 I decided to revisit it.
This post had not been out for long before I was made aware of the awesome
bikedata package available via the ROpenSci github. After some initial loading we can have all the Divvy data right at our finger tips.
#devtools::install_github('ropensci/bikedata') library(bikedata) store_bikedata(city = 'chicago', bikedb = 'bikedb') #this can take a little bit
## reading file 1/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2013.csv ## reading file 2/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2014_Q1Q2.csv ## reading file 3/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2014-Q3-07.csv ## reading file 4/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2014-Q3-0809.csv ## reading file 5/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2014-Q4.csv ## reading file 6/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2015_Q4.csv ## reading file 7/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2015_09.csv ## reading file 8/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2015_08.csv ## reading file 9/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2015_07.csv ## reading file 10/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2015-Q1.csv ## reading file 11/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2015-Q2.csv ## reading file 12/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2016_04.csv ## reading file 13/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2016_05.csv ## reading file 14/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2016_06.csv ## reading file 15/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2016_Q1.csv ## reading file 16/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2016_Q3.csv ## reading file 17/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2016_Q4.csv ## reading file 18/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2017_Q1.csv ## reading file 19/19: /var/folders/6t/k2153h8n1z7b6_wxg1plct8r0000gn/T//RtmpeBWVfI/Divvy_Trips_2017_Q2.csv
##  11544749
Once we have all the rides loaded we can start making the variables we are interested in, and to get them we ust the
library(tidyverse) library(hrbrthemes) library(formattable) divvy.rides<-bike_daily_trips(bikedb = 'bikedb') divvy.rides<-divvy.rides%>% mutate(weekday=factor(weekdays(date,T),levels = rev(c("Mon", "Tue", "Wed", "Thu","Fri", "Sat", "Sun"))))%>% mutate(year=format(date,'%Y'))%>% mutate(week=as.numeric(format(date,"%W"))) daily.rides<-subset(divvy.rides,year != '2013')
After we compute the variables we will need for the graph, then use the handy
dplyr::summarise to get our aggregated data for each day. Then we can plot using an updated version of the old example.
#heatmap ggplot(daily.rides, aes(x = week, y = weekday, fill = numtrips)) + viridis::scale_fill_viridis(name="Divvy Rides", option = 'C', direction = 1, na.value = "grey93") + geom_tile(color = 'white', size = 0.1) + facet_wrap('year', ncol = 1) + scale_x_continuous( expand = c(0, 0), breaks = seq(1, 52, length = 12), labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) + theme_ipsum_rc(plot_title_family = 'Slabo 27px')
How has Divvy Grown?
As I was working through the data one question kept popping up, how has Divvy grown over the years? Though we do not have the 2013 data included in this, we do have all the data from 2014-2017 so let’s compare those. Ideally we would want to see steady growth year over year, with more and more users coming to the service every year.
weekly.rides<-divvy.rides %>% group_by(year,week)%>% summarise(n=sum(numtrips)) ggplot(weekly.rides,aes(x=week,y=n,color=year))+ geom_line(alpha=.25)+ geom_smooth(se=F,method='loess',alpha=.35)+ theme_ipsum_rc(plot_title_family = 'Slabo 27px')+ labs( title='Growth of Divvy', subtitle='Count of Divvy rides by week from 2013-2017', caption='Publicly available data from Divvy\nhttps://www.divvybikes.com/system-data')+ scale_y_continuous(labels=scales::comma)
It seems that there was substantial growth after the first full year of 2014, and then the user base has somewhat stayed stagnant. The actual ride counts are interesting, because though 2014 counts are less than the other there are much higher peaks around the summer months. Through Quarter 2 the totals for 2017 have it sitting more like two years ago more than 2016. How does Divvy reverse that stagnation? Well it seems like other large bike-share services, like Hubway in Boston or the Capital Bike Share in Washington D.C., are affected by the same slump. According to data from the National Association of City Transportation Officials (cool organization) the only one that accounts for more of the growth in the rideshare community is New York’s Citi Bike program. There are questions that start to arise from looking at this data. Why is the New York Citi Bike still growing? Is it purely driven by tourism? It seems likely. New York City saw tourism growth the last couple years, with reported visitor numbers surpassing 60 million in 2016. So it stands to reason that, along with the highest year for Divvy rides, Chicago rounded out 2016 by setting their own tourism record. I guess instead of asking if rides will keep growing, we should be asking what is being done to boost tourism?