• Table of Contents
  • 1 Tidy Data
  • 2 Computing with R
  • 3 Basic R Commands
  • 4 Files and documents
  • 5 Introduction to Data Graphics
  • 6 Frames, glyphs, and other components of graphics
  • 7 Data Wrangling
  • 8 Graphics and Their Grammar
  • 9 More Data Verbs
  • 10 Joining two data frames
  • 11 Wide versus Narrow Data
  • 12 Ranks
  • 13 Graphing Networks
  • 14 Statistics
  • 15 Data Scraping and Cleaning
  • 16 Using Regular Expressions
  • 17 Working with Many Variables
  • Exercises
  • Project: Popular names
  • Project: Joining the urban population
  • Project: Bird species
  • Appendix: Setting up R
  • 18 Appendix: R Programming Style Guide
  • 19 Notes for typesetting book
  • 20 Appendix: GitHub-RStudio Configuration

Project: Popular names

Over the years and decades, names come and go in popularity. BabyNames provides a lot of information about this, but it is not in glyph-ready form.

Goal:

Wrangle the Babynames data and then create a graph (like Figure 17.1) showing the ups and downs in the popularity of names of interest to you. The raw material you have is the BabyNames data frame in the DataComputing package.

Figure 17.1: A sketch of the popularity over time of a few names.

A sketch of the popularity over time of a few names.

Step 1:

Completing a project is more than just figuring out some computer commands. You have to plan ahead.

Examine the data you have at hand — for this project, the data are the one table BabyNames — to find out what variables are available and what is the meaning of a case.

Step 2:

Imagine what your end report will look like and sketch out your idea. Here, Figure will serve as the sketch of the goal.

Step 3:

Analyze the graphic to figure out what a glyph-ready data table should look like. Mostly, this involves figuring out what variables are represented in the graph. Write down a small example of a glyph-ready data frame that you think could be used to make something in the form of the graphic.

  • What variable(s) from the raw data table do not appear at all in the graph?
  • What variable(s) in the graph are similar to corresponding variables in the raw data table, but might have been transformed in some way?

Step 4:

Consider how the cases differ between the raw input and the glyph-ready table.

  • Have cases been filtered out?
  • Have cases been grouped and summarized within groups in any way?
  • Have any new variables been introduced? If so, what’s the relationship between the new variables and existing variables?

Step 5:

Using English, write down a sequence of steps that will accomplish the wrangling from the raw data table to your hypothesized glyph-ready data table.

Step 6:

Using paper and pen, translate your design, step by step, into R.

Step 7:

Implement, test, and revise. Place your commands from Step (6) into a suitable document. Run them in R.

It’s usual, particularly for beginners, but even for experts, that the wrangling sequence as first written won’t work exactly as you anticipated. Among other reasons, this might be due to syntax errors in your R commands, or mis-matches when referring to variables, or because you unintentionally left out some or another data wrangling action.

Expect to follow a cycle of testing, diagnosing problems, revising, testing again, diagnosing, revising again, …. One effective strategy is divide and conquer; examine the output from early wrangling actions before adding in the actions to follow.

Once you have your glyph-ready data, make your graphic. You may find this template ggplot() command useful:

GlyphReadyForm %>% 
  ggplot(aes(x = year, y = total, group = name)) +
  geom_line( size = 1, alpha = 0.5, aes(color = name)) +
  ylab("Popularity") + xlab("Year")