What do we want to get from teaching hypothesis testing?

Summary

Putting hypothesis testing into a broader, more useful framework.

Objectives for core statistics

Develop critical thinking and understanding of decision-making.
Empower students
1. technical mastery
2. confidence that they can think about data and evidence

Critical thinking: Ability to evaluate an argument (among other things)

Argument: Reasoning to demonstrate the truth/falsity/credibility of a hypothesis.

Hypothesis: A statement that might or might not be true. Examples:

We should treat you for … CHD or cancer or …
The object I’m tracking is a threat, so we should fire.

Conclusion from an argument: A statement about the truth/falsity or credibility of a hypothesis.

Two forms of reasoning

I. Deductive reasoning: Reasoning from premise to conclusion (hypothesis truth/falsity/credibility) starting from premises/assumptions and using accepted mechanics of creating new statements from existing statements.

Deductive reasoning and critical thinking: Know how to “stress test” an argument. Two approaches: i. Check the mechanics for correctness. ii. Challenge the premises

II. Inductive reasoning: Reasoning from observations/evidence to conclusion. This is often called statistical inference.

Classical definition of statistical inference: “The process of using a sample to infer the properties of a population.” But this limited context has little or nothing to do with decision-making.

Methods of calculation:

Data from an observational study or experiment
Two settings
1. Confidence intervals: bootstrapping or probability algebra.
2. Hypothesis testing: Enforce Null hypotheses by permutation or probability algebra.

Classically strained definition of confidence intervals: What does the interval have to do with the population? It’s a statement about future, potential samples from the population: “In 95% of these future trials, the population parameter will be within the confidence interval.”

Stat 101 Stress tests: check mechanics of calculation, check random sampling/random assignment. Awareness of sources of bias: survival, non-response, placebo, Hawthorne, …

Broader framework for inductive reasoning

Based on an observation, compare and compete different hypotheses and the resulting decisions.

Simplification for Math 300: based on an observation compare two hypotheses and the resulting decision.

Initial context: Screening tests.

Examples of hypothesis pairs:

threat vs not a threat
disease vs no disease

We observe a test result and based on that prefer one of the hypotheses and the corresponding action.

Background: We collected lots of data (“training data”) where we know which hypothesis was correct and the output from the test.

Code

cat("Training data")

Training data

Code

head(Framingham)

# A tibble: 6 × 16
    age education   curre…¹ cigsP…² BPMeds preva…³ preva…⁴ diabe…⁵ totChol sysBP
  <dbl> <chr>       <chr>     <dbl> <chr>  <chr>   <chr>   <chr>     <dbl> <dbl>
1    39 college_gr… nonsmo…       0 no     none    none    not         195  106 
2    46 HS grad     nonsmo…       0 no     none    none    not         250  121 
3    48 some HS     smoker       20 no     none    none    not         245  128.
4    61 some colle… smoker       30 no     none    high BP not         225  150 
5    46 some colle… smoker       23 no     none    none    not         285  130 
6    43 HS grad     nonsmo…       0 no     none    high BP not         228  180 
# … with 6 more variables: diaBP <dbl>, BMI <dbl>, heartRate <dbl>,
#   glucose <dbl>, TenYearCHD <dbl>, sex <chr>, and abbreviated variable names
#   ¹currentSmoker, ²cigsPerDay, ³prevalentStroke, ⁴prevalentHyp, ⁵diabetes

Code

cat("Model")

Model

Code

mod <- glm(TenYearCHD ~ age + diabetes + totChol,
           data=Framingham, family=binomial)
Scores <- model_eval(mod, interval="none") |>
  mutate(disease=ifelse(.response == 1, "D", "H"))

Using training data as input to model_eval().

Code

cat("Scores from training data.\n")

Scores from training data.

Code

head(Scores)

  .response age diabetes totChol    .output      .resid disease
1         0  39      not     195 0.06008126 -0.06008126       H
2         0  46      not     250 0.10538177 -0.10538177       H
3         0  48      not     245 0.11906918 -0.11906918       H
4         1  61      not     225 0.25257802  0.74742198       D
5         0  46      not     285 0.11144294 -0.11144294       H
6         0  43      not     228 0.08332748 -0.08332748       H

Code

cat("Picking a threshold and counting\n")

Picking a threshold and counting

Code

my_threshold <- 0.15
Scores <- Scores |>
  mutate(test = ifelse(.output > my_threshold, "Pos", "Neg"))
Counts <- Scores |>
  group_by(disease, test) |>
  tally()

ggplot(Scores, aes(x=disease, y=.output)) +
  geom_jitter(alpha=.02, width=.1, height=0) +
  geom_violin(alpha=.2, color=NA, fill="blue")

Code

cat("Apply threshold and count")

Apply threshold and count

Code

my_threshold <- 0.15
Counts

# A tibble: 4 × 3
# Groups:   disease [2]
  disease test      n
  <chr>   <chr> <int>
1 D       Neg     226
2 D       Pos     409
3 H       Neg    2247
4 H       Pos    1306

Code

# in "wide" data frame
Counts |>
  tidyr::pivot_wider(names_from=test, values_from=n)

# A tibble: 2 × 3
# Groups:   disease [2]
  disease   Neg   Pos
  <chr>   <int> <int>
1 D         226   409
2 H        2247  1306

Code

# as the canonical table
Iconic_graph <- Scores |>
  ggplot() +
  geom_mosaic(aes(x=product(disease), fill=test))
Iconic_graph

Warning: `unite_()` was deprecated in tidyr 1.2.0.
ℹ Please use `unite()` instead.
ℹ The deprecated feature was likely used in the ggmosaic package.
  Please report the issue at <]8;;https://github.com/haleyjeppson/ggmosaichttps://github.com/haleyjeppson/ggmosaic]8;;>.

Apply loss function to evaluate threshold

Code

Counts$cost = c(10,0,0,1)
Counts |> ungroup() |> summarize(total_cost = sum(n*cost))

# A tibble: 1 × 1
  total_cost
       <dbl>
1       3566

False-positives, false-negatives, etc.

Sensitivity, specificity, and prevalence

Code

Counts |> group_by(disease) |>
  mutate(prob = n/sum(n))

# A tibble: 4 × 5
# Groups:   disease [2]
  disease test      n  cost  prob
  <chr>   <chr> <int> <dbl> <dbl>
1 D       Neg     226    10 0.356
2 D       Pos     409     0 0.644
3 H       Neg    2247     0 0.632
4 H       Pos    1306     1 0.368

What do we want to get from teaching hypothesis testing?

Summary

Objectives for core statistics

Two forms of reasoning

Broader framework for inductive reasoning

Apply loss function to evaluate threshold

False-positives, false-negatives, etc.

Calculations using iconic graph