Week 2 Drill

Problem 1

The graphic below presents forecasts for the US Senate elections in Nov. 2014. The numbers or words give the forecast probability of one party’s candidate — Democrat or Republican — winning. The forecasts are made based on polls up through the end of August 2014. Individual results from several different polling organization are shown. The graphic is an excerpt from the full graphic, which shows predictions for all 36 senate seats up for election in 2014. Source: New York Times

Forecast probabilities of the outcomes of US Senate elections for 2014.

1.

What variables define the frame in this graphic?

Probability and State. State and Polling Organization. Democrats and Republicans. Just State Just Probability

2.

What is the glyph and its graphical attributes?

Glyph: names of the states. Graphical attribute: font. Glyph: names of the polling organization. Graphical attribute: the organization’s logo. Glyph: Rectangle. Graphical attribute: color. Glyph: Rectangle. Graphical attribute: color and text.

3.

Which of these is a guide for the indicated graphical attribute? (Select all that apply.)

  • Vertical scale: Name of state.
  • Vertical scale: Name of candidate.
  • Vertical scale: Name of polling organization.
  • Vertical scale: color band.
  • Color: color band.

4.

What sets the order of the categorical variable in the scale for the vertical variable?

Problem 2

For each of these computations, say what R function is the most appropriate:

  1. Count the number of cases in a data table.
  2. List the names of the variables in a data table.
  3. For data tables in an R package, display the documentation (“codebook”) for the data table.
  4. Load a package into your R session.
  5. Mark a data table as grouped by one or more categorical variables.
  6. Add up, group-by-group, a quantitative variable in a data table.

###. Problem 3

The data verb functions all take a data table as their first argument and return a data table as their output (The “output” is often called the “value” of the function.) HOWEVER, the chaining syntax let’s the output of one function become the input to the following function, so you don’t have to repeat the name of the data frame. An alternative syntax is to assign the output of one function to a named object, then use the object as the first argument to the next function in the computation.

Each of these statements, but one, will accomplish the same calculation. Identify the statement that does not match the others.

a.
BabyNames %>% 
  group_by( year, sex ) %>% 
  summarise( totalBirths=sum(count))
b.
group_by( BabyNames, year, sex) %>% 
  summarise( totalBirths=sum(count) )
c.
group_by( BabyNames, year, sex ) %>%
  summarise( totalBirths=mean(count) )
Source: local data frame [268 x 3]
Groups: year

   year sex totalBirths
1  1880   F       96.60
2  1880   M      104.43
3  1881   F       98.03
4  1881   M      101.05
5  1882   F      104.91
6  1882   M      103.45
7  1883   F      106.57
8  1883   M      101.58
9  1884   F      110.09
10 1884   M      101.73
..  ... ...         ...
d.
Tmp <- group_by(BabyNames, year, sex) 
summarise( Tmp, totalBirths=sum(count) )

Problem 4

Spot the error(s) in these expressions:

a.
BabyNames %>% 
  group_by(BabyNames, year, sex) %>%
  summarize(BabyNames, total=sum(count) )
b.
ZipGeography <- 
  group_by( State ) %>% 
  summarize( pop=sum(Population) )
c.
Minneapolis2013 %>%
  group_by( First ) ->
  summarize( voteReceived=n() )
d.
summarize( votesReceived=n() ) %<% 
  group_by( First ) <- 
  Minneapolis2013
e.
BabyNames %>% 
  group_by( "First" ) %>%
  summarise( votesReceived=n() )
f.
Tmp <- group_by(BabyNames, year, sex ) %>% 
  summarise( Tmp, totalBirths=sum(count))
g.
Tmp <- group_by(BabyNames, year, sex) 
summarise( BabyNames, totalBirths=sum(count) )

Please use the comment system to make suggestions, point out errors, or to discuss the topic.

comments powered by Disqus