Preface (to the printed 2nd edition)

The purpose of this book is to provide an introduction to statistics that gives readers a sufficient mastery of statistical concepts, methods, and computations to apply them to authentic systems. By “authentic,” I mean the sort of multivariable systems often encountered when working in the natural or social sciences, commerce, government, law, or any of the many contexts in which data are collected with an eye to understanding how things work or to making predictions about what will happen.

The world is uncertain and complex. We deal with the complexity and uncertainty with a variety of strategies including the scientific method and the discipline of statistics.

Statistics deals with uncertainty, quantifying it so that you can assess how reliable – how likely to be repeatable – your findings are. The scientific method deals with complexity: reduce systems to simpler components, define and measure quantities carefully, do experiments in which some conditions are held constant but others are varied systematically.

Beyond helping to quantify uncertainty and reliability, statistics provides another great insight of which most people are unaware. When dealing with systems involving multiple influences, it is possible and best to deal with those influences simultaneously. By appropriate data collection and analysis, the confusing tangle of influences can sometimes be straightened out. In other words, statistics goes hand-in-hand with the scientific method when it comes to dealing with complexity and understanding how systems work.

The statistical methods that can accomplish this are often considered advanced: multiple regression, analysis of covariance, logistic regression, among others. With appropriate software, any method is accessible in the sense of being able to produce a summary report on the computer. But a method is useful only when the user has a way to understand whether the method is appropriate for the situation, what the method is telling about the data, and what the method is not capable of revealing. Computer scientist Richard Hamming (1915-1998) said: “The purpose of computing is insight, not numbers.” Without a solid understanding of the theory that underlies a method, the numbers generated by the computer may not give insight.

Advanced methods of statistics can give tremendous insight. For this reason, these methods need to be accessible both computationally and theoretically to the widest possible audience. Historically, access has been limited because few people have the algebraic skills needed to approach the methods in the way they are usually presented. But there are many paths to understanding and I have undertaken to find one – the “fresh approach” in the title – that takes the greatest advantage of the actual skills that most people already have in abundance.

In trying to meet that challenge, I have made many unconventional choices. Theory becomes simpler when there is a unified framework for treating many aspects of statistics, so I have chosen to present just about everything in the context of models: descriptive statistics as well as inference.

Consequently, algebraic notation and formulas are strongly de-emphasized in this book. The traditional role that formulas have played in providing instructions for how to carry out a calculation is no longer essential for effective use of statistical methods. Software now implements the calculations. What’s needed is not a formula-based description that allows people to reproduce what computers do, but a way to understand the methods at a high level so that the rapidity and reliability of computers in performing calculations can be used to provide insight into real-world problems.

I have been fortunate to have the assistance and support of many people. Some of the colleagues who have played important roles are David Bressoud, George Cobb, Dan Flath, Tom Halverson, Gary Krueger, Weiwen Miao, Phil Poronnik, Victor Addona, Alicia Johnson, Karen Saxe, Michael Schneider, and Libby Shoop. Critical institutional support was given by Brian Rosenberg, Jan Serie, Dan Hornbach, Helen Warren, and Diane Michelfelder at Macalester and Mercedes Talley at the Keck Foundation.

I received encouragement from many in the statistics education community, including George Cobb, Joan Garfield, Dick De Veaux, Bob delMas, Julie Legler, Milo Schield, Paul Alper, Dennis Pearl, Jean Scott, Ben Hansen, Tom Short, Andy Zieffler, Sharon Lane-Getaz, Katie Makar, Michael Bulmer, Frank Shaw, and the participants in our monthly “Stat Chat” sessions. Helpful suggestions came from from Simon Blomberg, Dominic Hyde, Michael Lavine, Erik Larson, Julie Dolan, and Kendrick Brown. Michael Edwards helped with proofreading. Nick Trefethen and Dave Saville provided important insights about the geometry of fitting linear models.

It’s important to recognize the role played by the developers of the R software – the “core” R team as well as the group of volunteers who have provided numerous packages that extend R’s capabilities. Hadley Wickham, in particular, developed the ggplot2 package used to create many of the graphics in this Second Edition, as well as a remarkable array of other utilities for treating data in a unified way. The design of R (and its progenitor S) are not just a matter of good software design, but of a brilliant understanding and systematization of statistics that makes the underlying logic of statistics accessible to students as well as experts. Further extending the reach of R, J.J. Allaire, Joe Chang, and Joshua Paulson have created the RStudio interface to R, which makes it much easier to teach and learn with R.

Special thanks are due to Randall Pruim and Nicholas Horton who, as mosaic activists, have improved the extensions to R used in this book and provided a wide range of suggestions that have found their way into the Second Edition.

Thanks also go to the hundred or so students at Macalester College who enrolled in the early, experimental sessions of Math 155 where many of the ideas in this book were first tested. Among those students, I want to acknowledge particular help from Alan Eisinger, Caroline Ettinger, Bernd Verst, Wes Hart, Sami Saqer, and Michael Snavely. Approximately 500 Macalester students have used the First Edition of this book, many of whom have helped identify errors and suggested clarifications and other improvements.

Crucial early support for this project was provided by a grant from the Howard Hughes Medical Institute. An important Keck Foundation grant was crucial to the continuing refinement of the approach and the writing of this book. Google provided summer-of-code funding for my student Andrew Rich to develop interactive applets that can be used along with this book.

Finally, my thanks and love to my wife, Maya, and daughters, Tamar, Liat, and Netta, who endured the many, many hours during which I was preoccupied by some or another statistics-related enthusiasm, challenge, or difficulty.