Intro Stats Rethought

A USCOTS 2023 Workshop with Danny Kaplan

In 2015, USCOTS pioneer George Cobb wrote that “never in my professional lifetime has there been such a need to rethink our curriculum from the ground up, starting necessarily with alternatives to the former consensus introductory course.” This workshop presents my own rethought introductory course and all the classroom-tested materials needed to teach it. The course emphasizes statistical methods that can guide action in the world; it engages causality, prediction, and the genuine comparison of hypotheses. The course includes an introduction to data wrangling and visualization. Regression modeling is the primary descriptive tool, streamlining and unifying settings for inference while highlighting the “multivariable thinking” emphasized in the GAISE report. The controversy over p-values is dealt with constructively, avoiding student misconceptions by putting p-values in the role of a screening test.

Quotes from George Cobb’s renovation paper

In short, few statisticians now think of the computer as merely bringing us a faster way to do the same old things. I suggest that something similar to the invention of the computer has happened only once before in the last thousand years of our history: the invention of the printing press. Initially, it would have been easy to think of the printing press as “merely bringing us a faster way to do the same old things,” in this instance a faster way to make copies of manuscripts. In hindsight, of course, we recognize that Gutenberg’s way of “doing things faster” not only led to wider distribution of the Latin Bible, but also inspired multiple translations into the vernacular, which led in turn to diminishing the role of priests as guardians of orthodoxy, and eventually to the emergence of Protestant sects. I see the same sort of thing as once happened with the printing press now happening with computing, not just in statistics, but in communication generally via the web.8 Much as learning Latin was once a challenging prerequisite to reading the Bible, in statistics facility with mathematics has been a prerequisite to understanding and using methods of data analysis. The select few who knew enough mathematics were a kind of priesthood. Just as movable type inspired translations that bypassed the barrier of Latin, computer software and computer-intensive methods have made statistical methods broadly available to those who are not mathematically facile, and unfamiliar with probability. Big data, bioinformatics, and analytics – varieties of computer-aided thinking - - are our heresies. They rely on computers to circumvent the need for mathematics.

Making decisions

Do these texts—and do we as teachers—keep the goals of learning about the world and of making sound decisions based on data central to their discussions throughout the course? There is an inexorable tendency to focus on the definitions and methods and give shorter shrift to the larger goal. This can be especially difficult and dangerous as we discuss inference. The methods are technical, the reasoning follows many steps – too many for most of our students to remember at first—and the conclusions about the world are almost scripted. — Paul Velleman, 2008

More from the above article:

Our best hope for changing the view of Statistics as damn lies is education. If we teach Statistics as a mechanistic muddle of magical methods, our students will conclude for themselves that it is a pack of damn lies. But we can do better:

  1. We must tell our students that to use statistics they must make judgments, and that there may be no method guaranteed to arrive at the truth. This will distress those who were hoping to just plug new numbers from the exercises and exam questions into the algorithms and formulas found in the little boxes of the textbook.

  2. We should advise students to know the motivating reason for the analysis because this will guide them in making these judgments. They should know who (or what) the cases are in the data, what has been measured or recorded about them (and in what units), and when that was done. Even definitions that sound reasonable should be questioned.

  3. We should teach that the guiding principle in making statistics judgments is a search for truth about the world. Faced with judgment calls, we make the choice that best supports our efforts to model or understand the world as it is. Where that choice isn’t clear, we make an honest attempt to make the best choice. It is fine to entertain alternative or contradictory models for as long as there are no data that allows us to choose among them.

  4. We should teach students to resist jumping to conclusions, extrapolating, and proposing explanations for associations that assume causation. And we should teach them to be skeptical of reports of Statistics that don’t meet these high standards.

  5. Most important, we must present the entire subject as a search for understanding about the world when we have data so that the other principles have a foundation to stand on.

Definitions of statistics

Statistics is a collection of procedures and principles for gathering data and analyzing information in order to help people make decisions when faced with uncertainty. (Utts and Heckard)