Introductory statistics courses conventionally feature many types of graphics, e.g. histograms, stem-and-leaf plots, theoretical probability densities (often with tail probabilities annotated), bar charts, scatter plots, cross-tabulations, …
The graphics in the following table come from a nice, open-source textbook: OpenIntro Stats (4/e). This book is used successfully in many colleges, ranging from two-year colleges to elite, private, four-year schools. I paged through the book, capturing each new mode as I encountered it.
As you look through the collection, note how often …
Image modalities from | OpenIntro Stats |
---|---|
Not from Open Intro | Not from Open Intro |
Let’s reduce the number of graphical modes arranging things so …
gf_blank(Height ~ Age, data = NHANES)
gf_blank(Poverty ~ HomeOwn, data = NHANES)
gf_blank(Depressed ~ Work, data = NHANES)
gf_point(Height ~ Age, data = NHANES,
alpha = 0.3)
gf_jitter(HomeOwn ~ Poverty, data = NHANES,
alpha = 0.3, height = 0.2)
gf_jitter(Depressed ~ Work, data = NHANES,
alpha = 0.1, width = 0.2, height = 0.2)
Adding in color and facetting, up to four variables can be shown, but relationships become progressively more difficult to read.
Glyph is a more-or-less horizontal line or curve showing the model output at each value of the explanatory variables.
mod1 <- lm(height ~ ns(mother,2) * sex,
data = Galton)
mod_shape <- mod_eval(mod1,
mother = seq(55,72,length = 100),
interval = "prediction")
mod_shape2 <- mod_eval(mod1,
mother = seq(55,72,length = 100),
interval = "confidence")
gf_point(height ~ mother | sex, data = Galton,
alpha = 0.2) %>%
gf_line(model_output ~ mother | sex, data = mod_shape, size = 2)
Glyph is either a ribbon or an I-bar.
gf_point(height ~ mother | sex, data = Galton,
alpha = 0.2) %>%
gf_ribbon(lower + upper ~ mother | sex,
data = mod_shape, alpha = 0.2,
inherit = FALSE)
Note: In the context provided by the data, it’s always clear whether a band is a prediction interval or a confidence interval.
gf_point(height ~ mother | sex, data = Galton,
alpha = 0.1) %>%
gf_ribbon(lower + upper ~ mother | sex,
data = mod_shape2, alpha = 0.4,
inherit = FALSE)
mod <- lm(Height ~ Depressed, data = NHANES)
mod_shape <- mod_eval(mod, interval = "prediction")
Two distinct questions:
The data points themselves indicate the joint probability density.
Use another glyph to show conditional density: a violin:
gf_jitter(Poverty ~ Depressed, data = NHANES,
alpha = 0.03, width = 0.2) %>%
gf_violin(alpha = 0.1, fill = "black")
Each of the settings on the handout shows n = 100 data points. The variables are shown on the axes and, sometimes, as the name of a color legend. For each of the settings, your task is to:
Again, I’m using a proposed replacement for the traditional “significant.”↩