Confidence in Confidence Intervals

… or, Why Confidence Intervals should come at the start of the course.

Why do make confidence intervals share space with hypothesis testing?

Confidence intervals were invented by Jerzy Neyman in the 1930s. Ronald Fisher distained them, considering them too close to Bayesian reasoning.

From the beginning, seeking to distinguish the logic of confidence intervals from “inverse probability,” Neyman framed interpretation of CIs using the same frequentist formulation that is still taught today and which remains incomprehensible to the large majority of students, except perhaps in the form of a statistics catechism that is soon forgotten.

This formulation adds nothing to the understanding of CIs except to make it compatible with the frequentist dogma that a population parameter is not subject to a probabilistic interpretation.

True or False?

We should teach students to interpret sample statistics in the context of a confidence interval?

A conventional Stats 101 covers early on certain sample statistics: mean, median, IQR, standard deviation, quantiles, … It’s unconventional (but not incorrect) to put confidence intervals on the median, sd, IQR, quantiles, p-values, and so on.

Insofar as confidence intervals are defined in terms of repeated trials and comparison to a population parameter, there’s some sense in leaving them to emerge in tandem with hypothesis testing, particularly since (in many settings) the formulas and concepts involved—e.g. t*, standard errors—are cognates.

What if …?

Non-statisticians generally mis-interpret confidence intervals, failing to hew to the frequentist dogma about population parameters and being concerned with a Bayes-like question: How much do I know about this estimate?

Let’s treat this mis-interpretation with respect: evidently it helps science to get done and it’s hard to imagine what the frequentist dogma would add to their work.

WHAT IF we introduced confidence intervals simply as a measure of precision without reference to a population parameter? What steps would make sense for this introduction.

Classrooms are an excellent setting for demonstrating sampling variation. For instance, each student can flip a coin 10 times and the results collected across the class.

What’s missing in the M&M and coin-flipping simulations are the recording of data and the non-trivial analysis of that data. (Count the heads!) Some may find this a good feature. I prefer to use DAGs.

  1. Demonstrate sampling variation. It’s well regarded even in Stats 101 to use simulation to introduce sampling variation. M&Ms are more tasty than DAGs, but they are essentially the same kind of simulation.

  2. Quantify the amount of sampling variation. The variance of the students’ trials is all you need.

The practical advantage of constructing a sampling distribution rather than calculating the variance, is that no arithmetic is required: just make a dot plot of the student’s results.

But, given the ubiquity of cell phones, it’s entirely feasible to have students enter their individual results into an online spreadsheet. The instructor can then read the spreadsheet into her computer and perform the calculation.

What fraction of instructors know how to set this up? Shouldn’t we provide sufficient training for instructors so that they can?

  1. Repeat (2) for several different sample sizes and establish a pattern of the variance of sample variation as a function of \(n\). Using the variance simplifies this, since it is proportional to \(1/n\) with no square roots.

We want to use a wide range of \(n\). DAG-like simulations make this easier, avoiding the tedium of flipping a coin 100 times.

  1. Make it trivial to construct confidence intervals without the distraction of t, standard errors, and multiplication. In Lessons*, this is done by providing a model as input to conf_interval(). Demonstrate that the confidence intervals get smaller with increasing \(n\) in the same manner as in the simulations.

You don’t need the DAGs to do this. Just use a large dataset, like mosaicData::TenMileRace and sub-sample.

Need some theory

First, it’s more important to establish the habit of referring to confidence intervals than to establish the constant of proportionality for the one-over-n relationship.

If that constant is important to you, demonstrate it in the context of the mean of a sample of moderate size (to separate out the t* issue). The steps:

If you absolutely must do the small sample theory, do it BACKWARDS. Show that the confidence interval—for very small n—is bigger than the simple one-over-n theory implies. Then demonstrate that the sample variance itself has sampling variation and attribute the “anomoly” to that.

  1. Consider the mean of a sample of size \(n=1\). How much sampling variation is there? [Answer: exactly the same as the variance of the variable across many such samples. That is, the variance of the variable.]

  2. Use the one-over-n theory to calculate the variance \(s^2\) of a sample of size \(n=2\).

  3. Work that up to general \(n\).

  4. Do a demonstration that the calculated confidence interval is consistent with the \(s^2/n\) formula. But use \(n\) large enough that t* is indistinguishable from 2.