A First-day Challenge

Randall Pruim at Calvin College (Michigan) has a compelling first-day challenge for students: to uncover and understand a surprising pattern in day-by-day births. In this post, we’ll re-use Randy’s challenge to organize a first-day class on the question “What is statistics?”

Daniel Kaplan https://dtkaplan.github.io
01-18-2020

Welcome to Stats 101

The main questions in stats are:

Let’s start with some data on the number of births each day in the US.1

Here’s what the data look like, shown in standard format for storing data, much like a spreadsheet.

  1. Scroll through the table and try to make some meaning of the numbers. (Hint: It’s hard.)

In statistics you’ll learn about different ways of displaying and summarizing data that makes it easier to see information. Here’s one simple display of just the second column from the data, the number of births each day.

It’s easy to see some interesting things from the graphic than from the table, but hard to see others. For instance, the table makes it clear that our data cover only 100 days. But you would be hard pressed to figure this out from the graph above.

  1. There are some clear patterns shown by the graphic about day-to-day variation in the number of births. Give two examples of such patterns.

As I said before, in this statistics class you’re going to learn some techniques for making sense of data. Using a graph like the above is one such technique. But we can improve on it by using other techniques.

One is called jittering, which randomly moves the dots a small amount horizontally so that you can more easily see how many there are. Another technique is to make the dots somewhat transparent.

  1. What does this new graph show that you couldn’t see clearly in the previous graph?

Making sense of data requires that we connect it in some what to what we are interested in finding out.

Perhaps an obvious question raised by these data is: Why are there lots of days with about 12,000 births, and quite a lot with about 8,000 births, but none around 10,000 births?

  1. Think of a real-world explanation for the bi-modality of the data.

[I’ll skip the discussion intended to lead students to suspect that there’s another factor involved. Perhaps there is a reduction of births in Fall to prepare for winter.]

It’s often the case that we seek to explain the variation in one variable by using another variable. A first step is to put that other variable, the explanatory variable, on the graph. We’ll put it on the X axis, leaving the Y axis to represent the quantity we’re trying to explain, which we call the response variable.

Time of year is an obvious explanatory variable, so let’s introduce that into our graph.

A tool for exploring data

Making sense of data often involves a cycle:

  1. Look at the data in some way that you think might be informative.
  2. Consider the pattern shown by the data and figure out what makes sense and what remains mysterious to you.
  3. Think about what might explain or illuminate the mystery.
  4. Figure out how you might display the data in order to produce a better explanation. Then, back to (a).

The description of the cycle uses only everyday words. But there will be a lot to learn. For instance:

You’re going to be working with a tool for displaying data and pattern s that will let you explore data as you think best. The tool is oriented around learning statistics so it includes features to explore some basic questions:

The tool is called a “Little App.” You run it in an ordinary web browser, just by going to this site.

I’m going to show you how it works and then you’ll be all set up to use it.

  1. Starts with a splash page of instructions and explanations. You should read these. I’m going to be summarizing it now, so we’ll just dismiss the splash page by clicking outside it.
  2. We start on a tab called “Data.” This is where you choose a source of data and the variables you want to look at. For fun, the data set and the variables are selected at random at first, but you choose the ones you want.
  3. Since the amount of data is an important consideration in statistics, there’s a control to let you look at just a little data or a lot of data.
  4. The graphic always shows both the data and a pattern. You have some control over the type of pattern that’s being displayed. We’ll learn more about this later.

  1. The Centers for Disease Control publishes data on each of the roughly 4,000,000 births that happen in the US each year. We’ll look at that data later in the course. It’s very detailed, and for privacy reasons they only tell you the month of birth, not the day of the month. But before they implemented that privacy policy, you could find the actual day for each birth. The data we have are from 2015.↩︎