Concept 1. Data can be usefully organized into tables with “cases” and “variables.” In “tidy data,” every case is the same sort of thing, e.g. a person, a car, a year, a country in a year.
We talked about data tables, cases, variables, etc. in Week 1.
Concept 2. Data graphics can be constructed easily when each case corresponds to a “glyph” (mark) on the graph, and each variable to a graphical attribute of that glyph such as x- or y-position, color, size, length, shape, etc. Such data is called “glyph-ready.” (The same is true for more technical presentations of data, e.g., models, predictions, etc. — once the data are set up with appropriate cases and variables, the presentation is straightforward.)
Concept 3. When data are not yet in glyph-ready form, you can transfigure them into glyph-ready form. Such transfigurations are accomplished by performing one or more of a small set of basic operations on data tables: the so-called data “verbs.”
Introduce some software and commands that …
data()
, help()
, names()
, nrow()
, str()
, summary()
, head()
mScatter()
, mBar()
, makeWorldMap()
, mUSMap()
group_by()
, summarise()
See what makes data tables glyph-ready or not, and how the data verbs can be used to transfigure data tables into glyph-ready data.
InClass-2-XXX.Rmd
. Work through the topics under “Software and Commands,” putting your answers into the Rmd file.Three (unrelated) examples:
NHANES
Minneapolis2013
CountryData
You’re going to make some simple graphics.
To speed things up, make a subset of just 2000 cases from NHANES
:
Small <- sample_n( NHANES, size=2000 )
mScatter( Small )
Notice that the argument to mScatter()
is a data table.
Make a graph of height against age, height against weight, etc. Use one or more other graphical attributes such as color, size, etc. Find an relationship that interests you.
The mWorldMap()
function makes it easy to construct country-by-country maps. It takes three arguments:
For example, here’s a map of the number of deaths in each country (per 1000 inhabitants per year):
mWorldMap( CountryData, key="country", fill="death" )
Make that map and comment on the pattern it shows.
Make a map of some other variable of interest to you and comment on what it shows.
A bar chart is a simple and limited form of graph It represents a number as the length of a bar.
Consider the Minneapolis 2013 election data. Here’s a bar chart that might be used to show the election results:
This graph reflects the following data table (only part of which is shown):First | votes |
---|---|
BETSY HODGES | 28935 |
MARK ANDREW | 19584 |
DON SAMUELS | 8335 |
CAM WINTON | 7511 |
JACKIE CHERRYHOMES | 3524 |
BOB FINE | 2094 |
Compare the Minneapolis2013
data table and the data table printed above.
summarise()
: Find an expression involving summarize()
and NHANES
that will produce the following.
NHANES
NHANES
(silly)NHANES
group_by()
: repeat the above, but calculating the results group-by-group for:
Make a scatter plot of the ZipGeography
data. Use latitude, longitude and time zone to set position and color. By choosing the right combination, you should be able to construct a plot whose meaning is immediately obvious to anyone familiar with US geography.