Danny Kaplan & Libby Shoop
June 12, 2013
Provide meaningful data processing display and analysis skills
Fits in early, with minimal alterations to a student's schedule *Time budget: 10 class hours, 20 out-of-class hours
No pre-requisites
Fun, not a slog
Graphical Displays
Workflow
Relational Operations
Transformations
Modeling
Class organized around case studies data settings.
By a data setting, we mean:
Your task: Think about a data setting that you would like to have in your course(s). Meanwhile, we'll show some examples …
We chose some data sets for the prototype course to illustrate various aspects of data organization and display.
Criteria:
Hans Rosling's well-known project
514 country-by-country, year-by-year “indicators”. Examples:
Fertility, Alcohol consumption, Liver cancer incidence, paved roads, …
Data Organization/Processing
Visualization
Modeling
Birds captured and released at the Katharine Ordway Natural History Study Area, a 278-acre preserve in Inver Grove Heights, Minnesota, owned and managed by Macalester College. Source: Jerald Dosch, the manager of the Study Area. Currently 15829 cases in 23 variables.
Error: object 'OrdwayBirdsOrig' not found
Age, sex, height, wgt, etc. along with
31126 people.
500,000 registered voters from Wake County North Carolina.
We chose what we think are the major categories of accessible visualization modalities:
Your task: Think about what modalities are important to you.
Error: object 'nhanes' not found
Error: object 'nhanes' not found
Error: could not find function "ggplot"
Age, BMI, and Diabetes from the NHANES data
What's different about younger voters?
Which regions have the highest alcohol consumption?
Alcohol consumption indicated by color.
Vehicle-related deaths (per 100,000 per year) indicated by size.
Is alcohol an evident factor in the differing rate of vehicle-related deaths across countries?
Does this mean that alcohol isn't a factor in vehicle-related deaths?
A chloropleth map assigns a color to a region. What the region refers to is up to you.
In addition to “world map”, we could produce “seed map” or “campus map” or “brain map”.
Grouping cars based on similarity in performance/design measures.
Height v Age.
We'll talk about this under modeling.
Case: Individual bird captures
Variables: Month, species, etc.
Case: ???
Variables: ???
We all have preferred ways of working:
Sophisticated use of the computer is difficult.