Resources for the Workshop and Teaching
Background: The core stats course at USAFA is Math 300. This had been in a bad way for many years, as department priorities focussed on development of a data science program.
- In April 2022, they invited me to redesign the course from scratch, subject to the constraint that all the course materials be available by the start of August.
Textbooks
As a provisional step, we used the textbook by Chester Ismay & Albert Kim, Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. This is available free, online.
Fall 2022 course (starting in August) course covered all three “Blocks” of ModernDive.
I. Data Science with tidyverse (ModernDive 11/40 sessions) 1. Getting started with data in R 2. Data Visualization 3. Data Wrangling 4. Data Importing and “Tidy” Data
- Data Modeling with moderndive (8/40 sessions)
- Basic Regression
- Multiple Regression
- Statistical Inference with
{infer}
(ModernDive—21/40 sessions) 7. Sampling 8. Bootstrapping and Confidence Intervals 9. Hypothesis Testing 10. Inference for Regression
- Data Modeling with moderndive (8/40 sessions)
In Spring 2023, we advanced one step further, replacing Block III entirely with my Lessons in Statistical Thinking.
- The online site <dtkaplan.github.io/Lessons-in-statistical-thinking/> contains the Spring 2023 book.
- Day-by-day instructor notes (for Block III) are available at <dtkaplan.github.io/Math-300Z>.
- Day-by-day summaries are at <dtkaplan.github.io/Math300-blog>
For Fall 2023, we take one more step, replacing Block II with an expanded Lessons. This will be ready for July 2023.
- We will refer to some materials from the draft expanded lessons.
For Spring 2024, Block I of ModernDive will also be replaced. Tentative ready date: Start of November 2024. But this depends on interest.
Supporting materials
Software available through the
{math300}
package. This will likely become the replacement/update to the{mosaicModel}
available on CRAN.Day-by-day detailed instructor notes for the Spring 2023 materials.
Exercises and daily “worksheets.”
Day-by-day “take-aways” on the Math300 blog site:
dtkaplan.github.io/Math300blog
Computing
We won’t be using the computer much in an organized way, but you might enjoy trying out the commands used in the course.
The easiest way, particularly for those new to R, is to follow this link to open an R session in your browser: https://posit.cloud/content/6045079
If you already have R/RStudio installed on your laptop, you need to install the
{math300}
package. These two commands will do it:
install.packages("remotes")
::install_github("dtkaplan/math300") remotes
Contributed by participants
Jeff Witmer (2023) “What should we do differently in Stat 101?” Journal of Statistics and Data Science Education, link
- Classroom session videos (these are very rough, especially in the early classes when I was figuring out how to record the class).
- Variation and variance (See the Notes.)
- DAGs and simulation (See the Notes)
- Signal and noise
- Sampling variation
- Confidence intervals
- Effect size
- Prediction mechanics
- Prediction intervals
- Covariates
- Adjustment for covariates
- Confounding
- Simple causal paths
- Spurious correlation
- Experiment and random assignment
- Measuring & accumulating risk (NA)
- Constructing a classifier (NA)
- Accounting for prevalence
- Hypothesis testing (NA) .
- Calculating a p-value
- False discovery (See the Notes)
- Classroom session videos (these are very rough, especially in the early classes when I was figuring out how to record the class).