Intro Stats … Rethought

May 30-31, 2023 at USCOTS

Remembering George Cobb (1947-2020)

Photo from the AMSTAT News rememberence.

I met George at the very first USCOTS meeting, in 2005.

As part of my learning statistics in the years following my assignment to teach statistics, I had perused his Design and Analysis of Experiments which started me to think about whether students could learn statistics more efficiency if they learned about orthogonality and subspaces, which can be taught at the high-school level. This became an important theme of my work through 2010.

He had been an anonymous reviewer both for my promotion to professor at Macalester and for a “data fluency” grant proposal I wrote to the Keck Foundation. So he knew me, but I didn’t know anybody.

George gave the keynote at the 2005 banquet, “The Introductory Statistics Course: A Ptolemaic Curriculum?” This talk legitimized for statistics educators the resampling/bootstrapping/permutation approach to statistics, whence flowed books like Lock-5 and the Tintle group’s. The after-dinner talk was transformed into a paper: the first article in the first volume of Rob Gould’s new journal Technology Innovations in Statistics Education. He starts that paper this way:

“The founding of this journal recognizes the likelihood that our profession stands at the threshold of a fundamental reshaping of how we do what we do, how we think ab what we do, and how we present what we do to students who want to learn about the science of data. … Our generation is the first ever to have the computing power to rely on the most direct approach [brute force], leaving the hard work of implementation to obliging little chunks of silicon.”

[Consequently,] “we can and should rethink the way we present the core ideas of inference to beginning students. … I argue that what we teach has always been shaped by what we can compute. … Computers have freed us to put our emphasis on things that matter more …. the tyranny of the computable has shaped the development of the logic of inference, and forced us to teach a curriculum in which the most important ideas are often made to seem secondary. … [C]omputers have freed us to simplify our curriculum.”

In 2015, George wrote a paper that I quoted in the abstract for this workshop, “Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up.” I like the title especially.

In my 2005 address at USCOTS, I argued that the standard introductory course, which puts the normal distribution at its center, had oulived the usefulness of its centrality. … [T]he argument I presented at USCOTS was considered by some to be outside the mainstream, even radical. In the decade since then, I have come to regard my position in 2005 not as radical, but as far too conservative. Modern statistical practice is much broader than is recognized by our traditional curricular emphasis on probability-based inference.

Fortunately, Nicholas Horton and his colleagues have given us a well-researched and comprehensive kick-start in the form of a new set of Guidelines (ASA 2014). These curricular guidelines (hereafter Horton report) recognize the seismic shift taking place beneath our feet: ‘The additional need to think with data—in the context of answering a statistical question—represents the most salient change since the prior guidelines were approved in 2000. Adding these data science topics to the curriculum necessitates developing … capacities that complement more traditional mathematically oriented skills.’ These guidelines were appropriate constrained by a sense of what might realistically be expected in the near future. Realistic thinking has its virtues, but my premise is that long term there is also value to be found in more ambitious speculation.

[O]ur thinking about the undergraduate curriculum has become a tear-down, an aging structure that fails to take good advantage of the valuable territory on which it sits, and so imposes a steep opportunity cost on our profession and on our students.

Goals for the workshop

Setting: Per George’s metaphor … The Stat 101 property consisting of an “aging structure” and the grounds on which that house stands.

  • We’re all familiar with the rooms in the house.
  • We don’t usually think about the grounds. These grounds are the established consensus that college students in diverse fields should be required to take a statistics course.

Goal 1: Introduce you to a model house with modern features and a more open plan.

Goal 2: Establish that the house can be built within the bounds of the property lines (1-semester course), within the budget (what students can reasonably be expected to assimilate), and within the capabilities of the general contractor (you!).

Your turn. What features would you like to see?

  1. Individually, write down 5-10 words or phrases that you think should be central to a statistics course. If you think there are such words that others are highly likely to include in their own lists, write them down separately, so that we can get a good diversity of words.

We’ll collect these on the blackboard and discuss collectively.

  1. Divide into groups of three or four people. Ideally, you shouldn’t know the others in your group.

    1. Introduce yourselves. Prepare yourself to introduce to the entire assembly one of the other people in your group. After we’ve made these introductions for everybody, proceed to …

    2. In collaboration with your group colleagues, assemble two written lists:

      1. 10-15 topic names that you would be proud to feature in an intro stats course.
      2. 5-10 topics that are commonly taught but which you find somewhat embarrasing.

Foundations for my house

  1. Variation is the central topic for statistics.
  2. Data, its proper organization, basic wrangling (“manipulation”).
  3. Data graphics, with the point/jitter plot (“scatterplot”) as the central framework.
  4. Precision vs accuracy, with both being important and demonstrable.
  5. Adjustment, confounding and causality.
  6. Placing hypotheses in competition.
  7. Contemporary relevant settings, e.g. risk factors, medical screening, decision-making.

How do the collected topics from the groups map to these foundations?

Design principles

How to keep the design light and affordable:

  1. Small software footprint: small enough to memorize a list of operations. Consistent software notation.
  2. Small graphics footprint: a single graphics frame, point plot, functions, violins, pred./conf. bands.
  3. Small analysis footprint and consistent methodological framework (regression).
    • y ~ a + b, where a and b can be either categorical or quantitative.
    • z ~ a + b, where z is a zero-one variable
    • graphs of model functions, coefficients, and confidence intervals/bands.
  4. Avoid math-stats theory. Justify methods by demonstration/simulation, remove fictitious entities (e.g. “population”). Provide a simulation engine.