Identify each of these functions as either a Data Verb, a Transformation, a Summary Function, or a Quick Presentation or a Comparison Expression.
str()
group_by()
rank()
mean()
filter()
summary()
summarise()
anti_join()
merge()
glimpse()
These questions refer to the diamonds
data table. Take a look at the codebook (using help()
) so that you’ll understand the meaning of the tasks.1 Write, using paper and pen, an expression that will answer these questions.
Imagine a data table, Patients
, with categorical variables name
, diagnosis
, sex
, and quantitative variable age
.
You have a statement in the form
Patients %>%
group_by( some variables ) %>%
summarise( count=n(), meanAge=mean(age) )
Replacing some variables
with each of the following, say …
meanAge
will contain any new information.sex
diagnosis
sex
, diagosis
age
, diagnosis
age
Here are three data tables with the same information:
Version One | Version Two | Version Three | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
NA
s in Version One, but there are in Versions Two and Three. Why?Suppose you want to create the following table with the name of the most popular name of either sex each year
Source: local data frame [4 x 4]
Groups: year, sex
name sex year nbabies
1 Roderick M 1912 46
2 Terry F 1912 17
3 Harrison F 2012 15
4 Roderick M 2012 202
What should the chain of commands look like to make this from Table One?Suppose you want to calculate the ratio of male to female in each name in each year. Like this:
Source: local data frame [6 x 3]
name year ratio
1 Harrison 1912 NA
2 Harrison 2012 0.007075
3 Roderick 1912 NA
4 Roderick 2012 NA
5 Terry 1912 0.346939
6 Terry 2012 0.035491
Please use the comment system to make suggestions, point out errors, or to discuss the topic.
Written by Daniel Kaplan for the Data & Computing Fundamentals Course. Development was supported by grants from the National Science Foundation for Project Mosaic (NSF DUE-0920350) and from the Howard Hughes Medical Institute.
Motivated by this problem set based on drills by Garrett Grolemund.↩