Add up the count
over the names and years.
BabyNames %>%
summarise(total = sum(count))
## total
## 1 333417770
Note that summarise()
clobbers all the variables in the input data table other than those used for grouping. (No variables were used for grouping here.)
BabyNames %>%
group_by(year) %>%
summarise(total = sum(count))
## Source: local data frame [134 x 2]
##
## year total
## 1 1880 201484
## 2 1881 192700
## 3 1882 221537
## 4 1883 216952
## 5 1884 243468
## 6 1885 240856
## 7 1886 255320
## 8 1887 247396
## 9 1888 299481
## 10 1889 288952
## .. ... ...
With year
made a grouping variable, a separate calculation is done for each year, and year
appears in the output.
BabyNames %>%
group_by(year) %>%
summarise(name_count = n_distinct(name))
## Source: local data frame [134 x 2]
##
## year name_count
## 1 1880 1889
## 2 1881 1830
## 3 1882 2012
## 4 1883 1962
## 5 1884 2158
## 6 1885 2139
## 7 1886 2225
## 8 1887 2215
## 9 1888 2454
## 10 1889 2390
## .. ... ...
BabyNames %>%
group_by(year, sex) %>%
summarise(name_count = n_distinct(name))
## Source: local data frame [268 x 3]
## Groups: year
##
## year sex name_count
## 1 1880 F 942
## 2 1880 M 1058
## 3 1881 F 938
## 4 1881 M 997
## 5 1882 F 1028
## 6 1882 M 1099
## 7 1883 F 1054
## 8 1883 M 1030
## 9 1884 F 1172
## 10 1884 M 1125
## .. ... ... ...
Result <-
BabyNames %>%
filter(name %in% c("Jane", "Mary")) %>%
group_by(name, year) %>% # for each year
summarise(count = sum(count))
Put year
on the x-axis and the count of each name on the y-axis.
ggplot(data=Result, aes(x = year, y = count)) +
geom_point()
aes()
function.geom_line()
.+ ylab("Yearly Births")
size=2
. Remember that “setting” refers to adjusting the value of an aesthetic to a constant. Thus, it’s outside the aes()
function.ggplot(data=Result, aes(x = year, y = count)) +
geom_line(aes(color = name), size=2) +
ylab("Yearly Births")
Result2 <-
BabyNames %>%
group_by(year) %>%
mutate(total = sum(count)) %>%
filter(name %in% c("Mary", "Jane")) %>%
mutate(proportion = count / total)
Why is sex
a variable in Result2
? Eliminate it, keeping just the girls.
Result2 <-
BabyNames %>%
filter(sex == "F") %>%
group_by(year) %>%
mutate(total = sum(count)) %>%
filter(name %in% c("Mary", "Jane")) %>%
mutate(proportion = count / total)
What happens if the filter()
step is put before the mutate()
step?
The total
is just for Mary and Jane, ignoring all the other babies.
ggplot(data=Result2, aes(x = year, y = proportion)) +
geom_line(aes(color = name), size=2) +
ylab("Yearly Births")
geom_vline()
.ggplot(data=Result2, aes(x = year, y = proportion)) +
geom_line(aes(color = name), size=2) +
ylab("Yearly Births") +
geom_vline(x=1962)