Bike Sharing Basics

Stations <- mosaic::read.file("http://tiny.cc/dcf/DC-Stations.csv")
data_site <- "http://tiny.cc/dcf/2014-Q4-Trips-History-Data-Small.rds"
Trips <- readRDS(gzcon(url(data_site)))

1. Which station has the most outgoing rentals?

Trips %>%
  group_by(sstation) %>%
  summarise(rental_volume = n()) %>%
  arrange(desc(rental_volume)) %>%
  head(1)

## Source: local data frame [1 x 2]
## 
##                          sstation rental_volume
## 1 Columbus Circle / Union Station           241

2. Which station has the most outgoing rentals on Saturday?

Trips %>% 
  mutate(day_of_week = lubridate::wday(sdate)) %>%
  filter(day_of_week == 7) %>%
  group_by(sstation) %>%
  summarise(rental_volume = n()) %>%
  arrange(desc(rental_volume)) %>%
  head(1)

## Source: local data frame [1 x 2]
## 
##           sstation rental_volume
## 1 Lincoln Memorial            30

3. Which origin/destination pair of stations has the most traffic?

Trips %>%
  group_by(sstation, estation) %>%
  summarise(rental_volume = n()) %>%
  arrange(desc(rental_volume)) %>%
  head(1)

## Source: local data frame [1 x 3]
## Groups: sstation
## 
##         sstation                      estation rental_volume
## 1 10th & E St NW 10th St & Constitution Ave NW             3

4. Is mean daily outbound rental volume different on weekdays and weekends taking all stations together in aggregate.

Trips %>% 
  mutate(day_of_week = lubridate::wday(sdate),
         weekend = day_of_week %in% c(1,7)) %>%
  group_by(weekend, day_of_week) %>%
  summarise(rental_volume = n()) %>%
  summarise(daily_mean = mean(rental_volume, na.rm=TRUE))

## Source: local data frame [2 x 2]
## 
##   weekend daily_mean
## 1   FALSE     1477.2
## 2    TRUE     1307.0

5. (Optional) Same as (4), but give a separate answer for each station.

Trips %>% 
  mutate(day_of_week = lubridate::wday(sdate),
         weekend = day_of_week %in% c(1,7)) %>%
  group_by(sstation, weekend, day_of_week) %>%
  summarise(rental_volume = n()) %>%
  summarise(daily_volume = mean(rental_volume, na.rm=TRUE)) %>%
  spread(key = weekend, value = daily_volume) %>%
  head() # just to avoid printing all the results here.

## Source: local data frame [6 x 3]
## 
##                        sstation    FALSE TRUE
## 1                10th & E St NW 7.400000 12.0
## 2         10th & Florida Ave NW 3.800000  7.5
## 3           10th & Monroe St NE 1.666667  2.0
## 4                10th & U St NW 7.000000 12.5
## 5 10th St & Constitution Ave NW 6.400000  4.5
## 6                11th & F St NW 6.200000  4.0

A subtlety: Note that each summarise() statement removes one of the grouping variables. The others survive. You can see what groups are present at any point with the groups() statement.

Bike Sharing Basics

Data Computing

USCOTS 2015

1. Which station has the most outgoing rentals?

2. Which station has the most outgoing rentals on Saturday?

3. Which origin/destination pair of stations has the most traffic?

4. Is mean daily outbound rental volume different on weekdays and weekends taking all stations together in aggregate.

5. (Optional) Same as (4), but give a separate answer for each station.