What do we want to get from teaching hypothesis testing?
Summary
Putting hypothesis testing into a broader, more useful framework.
Objectives for core statistics
Develop critical thinking and understanding of decision-making.
Empower students
technical mastery
confidence that they can think about data and evidence
Critical thinking: Ability to evaluate an argument (among other things)
Argument: Reasoning to demonstrate the truth/falsity/credibility of a hypothesis.
Hypothesis: A statement that might or might not be true. Examples:
We should treat you for … CHD or cancer or …
The object I’m tracking is a threat, so we should fire.
Conclusion from an argument: A statement about the truth/falsity or credibility of a hypothesis.
Two forms of reasoning
I. Deductive reasoning: Reasoning from premise to conclusion (hypothesis truth/falsity/credibility) starting from premises/assumptions and using accepted mechanics of creating new statements from existing statements.
Deductive reasoning and critical thinking: Know how to “stress test” an argument. Two approaches: i. Check the mechanics for correctness. ii. Challenge the premises
II. Inductive reasoning: Reasoning from observations/evidence to conclusion. This is often called statistical inference.
Classical definition of statistical inference: “The process of using a sample to infer the properties of a population.” But this limited context has little or nothing to do with decision-making.
Methods of calculation:
Data from an observational study or experiment
Two settings
Confidence intervals: bootstrapping or probability algebra.
Hypothesis testing: Enforce Null hypotheses by permutation or probability algebra.
Classically strained definition of confidence intervals: What does the interval have to do with the population? It’s a statement about future, potential samples from the population: “In 95% of these future trials, the population parameter will be within the confidence interval.”
Stat 101 Stress tests: check mechanics of calculation, check random sampling/random assignment. Awareness of sources of bias: survival, non-response, placebo, Hawthorne, …
Broader framework for inductive reasoning
Based on an observation, compare and compete different hypotheses and the resulting decisions.
Simplification for Math 300: based on an observation compare two hypotheses and the resulting decision.
Initial context: Screening tests.
Examples of hypothesis pairs:
threat vs not a threat
disease vs no disease
We observe a test result and based on that prefer one of the hypotheses and the corresponding action.
Background: We collected lots of data (“training data”) where we know which hypothesis was correct and the output from the test.
Code
cat("Training data")
Training data
Code
head(Framingham)
# A tibble: 6 × 16
age education curre…¹ cigsP…² BPMeds preva…³ preva…⁴ diabe…⁵ totChol sysBP
<dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 39 college_gr… nonsmo… 0 no none none not 195 106
2 46 HS grad nonsmo… 0 no none none not 250 121
3 48 some HS smoker 20 no none none not 245 128.
4 61 some colle… smoker 30 no none high BP not 225 150
5 46 some colle… smoker 23 no none none not 285 130
6 43 HS grad nonsmo… 0 no none high BP not 228 180
# … with 6 more variables: diaBP <dbl>, BMI <dbl>, heartRate <dbl>,
# glucose <dbl>, TenYearCHD <dbl>, sex <chr>, and abbreviated variable names
# ¹currentSmoker, ²cigsPerDay, ³prevalentStroke, ⁴prevalentHyp, ⁵diabetes
Code
cat("Model")
Model
Code
mod <-glm(TenYearCHD ~ age + diabetes + totChol,data=Framingham, family=binomial)Scores <-model_eval(mod, interval="none") |>mutate(disease=ifelse(.response ==1, "D", "H"))
Using training data as input to model_eval().
Code
cat("Scores from training data.\n")
Scores from training data.
Code
head(Scores)
.response age diabetes totChol .output .resid disease
1 0 39 not 195 0.06008126 -0.06008126 H
2 0 46 not 250 0.10538177 -0.10538177 H
3 0 48 not 245 0.11906918 -0.11906918 H
4 1 61 not 225 0.25257802 0.74742198 D
5 0 46 not 285 0.11144294 -0.11144294 H
6 0 43 not 228 0.08332748 -0.08332748 H
# A tibble: 4 × 3
# Groups: disease [2]
disease test n
<chr> <chr> <int>
1 D Neg 226
2 D Pos 409
3 H Neg 2247
4 H Pos 1306
Code
# in "wide" data frameCounts |> tidyr::pivot_wider(names_from=test, values_from=n)
# A tibble: 2 × 3
# Groups: disease [2]
disease Neg Pos
<chr> <int> <int>
1 D 226 409
2 H 2247 1306
Code
# as the canonical tableIconic_graph <- Scores |>ggplot() +geom_mosaic(aes(x=product(disease), fill=test))Iconic_graph
Warning: `unite_()` was deprecated in tidyr 1.2.0.
ℹ Please use `unite()` instead.
ℹ The deprecated feature was likely used in the ggmosaic package.
Please report the issue at <]8;;https://github.com/haleyjeppson/ggmosaichttps://github.com/haleyjeppson/ggmosaic]8;;>.