A set of 100,000 ratings of movies by individuals was collected in the late 1990s by the grouplens research team at the University of Minnesota. The grouplens team provides the data directly at http://grouplens.org/datasets/movielens/100k/. These data were reformatted by for the Data Computing book. Downloaded them to your own computer with this statement:
download.file("http://tiny.cc/dcf/MovieLens.rda",
destfile = "MovieLens.rda")
You only need to download the data once. But each time you start a new R session1 Every time you knit a document, you are starting a new session just for the purpose of compiling that document. you will need to load()
the data to your R session.
MovieLens.rda
contains three data tables:
Ratings
has the individual movie ratings and the time at which they were entered. It also includes an ID variable for both the user and the movie.Movies
provides the name of the movie and information about genres.Users
gives basic information about the person who made the rating.Your task: Construct each of these graphics.
Users %>%
ggplot(aes(x = age)) +
geom_density(aes(fill = occupation),
color = NA, alpha = .7, position = "fill") +
facet_wrap( ~ sex)
Users %>%
ggplot(aes(x = age)) +
geom_density(aes(fill = sex),
color = NA, alpha = .4, position = "fill")
Users %>%
group_by(occupation) %>%
tally() %>%
arrange(desc(n))
## # A tibble: 21 × 2
## occupation n
## <chr> <int>
## 1 student 196
## 2 other 105
## 3 educator 95
## 4 administrator 79
## 5 engineer 67
## 6 programmer 66
## 7 librarian 51
## 8 writer 45
## 9 executive 32
## 10 scientist 31
## # ... with 11 more rows
All %>%
filter( genre != "unknown") %>%
ggplot(aes(x = age, color = sex, y = rating)) +
geom_smooth() +
facet_wrap( ~ genre, scales = "free")
## `geom_smooth()` using method = 'gam'
All %>%
ggplot(aes(x = age, color = sex, y = rating)) +
geom_smooth()
## `geom_smooth()` using method = 'gam'