The need

You’ve heard the news. Data are now a “torrent.” We are being “inundated” and “swamped” by data. “Big data” have grown so big that the term is defined by the inability of standard professional tools such as databases to handle them. But those tools have become so powerful and capable that a new term is needed – “massive data?” – to describe data that can now be handled by a standard personal laptop but which ten years ago required the resources of a large company or a government research lab.

Data are flooding every aspect of commercial, scientific, and government work. The potential value of effective use of the flood is reckoned as trillions of dollars per year.

Part of the reason why data metaphors are “floods,” “inundations,” and “torrents” is that new data are being generated constantly. Thomas Friedman, the well-known columnist and author on globalization, uses the word “supernova” in his recent book subtitled thriving in the era of accelerations. (A supernova is the explosion of a star: the largest kind of power generation in the universe.) But why not peaceful and benevolent metaphors such as “treasures” or “riches” or “boons?” Because the vast majority of workers in every field are ill-equipped to extract any meaning from the data. And, by and large, colleges and universities are not in a good position to train new specialists in data science or even generalists who can make use of the work that the specialists can undertake. It is only now that undergraduate and graduate programs are developing in data science. Their graduates are eagerly sought in the commercial world and so there is only a trickle of new faculty developing who can help students to develop data science skills.

The economy, research, etc. need people who understand data. The shortage is great. An often cited report estimates that:

the United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.

How to train such people when we don’t even have the infrastructure to train teachers in data science? StatPREP seeks to be part of the answer.

The strategy

There is a large, untapped reserve of quantitatively minded college and university faculty with the appropriate background to start to learn data science. Many of these faculty are mathematicians like you who have been drawn into teaching statistics to fill in the severe shortage of faculty trained specifically in statistics.

The statPREP strategy is to help you learn to do data science while you are teaching statistics. Over time, you can develop solid data-science skills and the insight needed to teach them to your students. Even in the short run there is a benefit. As you teach statistics, the data science you are learning can help you adopt the practices recommended for bringing statistics education in line with contemporary statistical practice.

One widely regarded set of recommendations are contained in the GAISE report1 from the American Statistical Association. Among the GAISE recommendations are these:

The wording of the last point is unfortunately vague. This is for a good reason. Too often, the “technology” used in the statistics classroom is a graphing calculator. This is the practice in so many colleges and universities that the GAISE authors did not see a practicable way to require the use of modern computing, even though the clear consensus among leading statistical educators is that the use of graphing calculators is inappropriate.

This is not a new conclusion. The Mathematical Association of America issued a report on meetings with the “partner” disciplines 2004 which pointed to

… the unimportance of graphing calculators; very few workshop participants reported their use in disciplinary courses. Therefore, if calculators are chosen as the technology for a mathematics course, it must be understood that this is done for pedagogical reasons, not to support uses in other disciplines. – p. 7

statPREP approach

Today’s faculty hardly have time for their teaching, let alone for picking up a new discipline such as data science. The statPREP approach is to

  • encourage even small steps
  • that can be used in existing courses,
  • providing ready access to modern computing for both instructors and their students, and
  • building classroom demos and lesson plans tailored to your teaching situation.

“Access to computing” means more than putting a screen and keyboard in front of students. It means coping with the reality of student’s lives and finances, providing a computing notation that is clear and comprehensible, and delivering this in a framework where students can get helpful, encouraging feedback from the computer. And, since the objective is to move toward teaching data science, the tools used have to fit in with actual tools used in the data-science community.

That’s a tall order, but we believe we have a realistic plan to accomplish it.

This statPREP workshop will consists of two major components.

  • In the first, you’ll get started with computer software well suited to data science. In learning about this, you’ll be using the same technology with which we’ll be implementing general computing access for you and your students.
  • In the second, you’ll take on a topic from your current statistics course to see how it can be taught (better!) by using genuine data of the sort encountered in data science.

Our goal for a workshop outcome is that you will feel enabled and comfortable to teach that topic in your upcoming statistics course. Of course, it’s hard to achieve mastery in a 1.5-day workshop, so there will be online and human resources for you to draw on after the workshop.

Can we do it?

We’ve thought a lot about the problem over many years, and we think we have developed a practical way forward. Inevitably, though, there will be flaws and shortcomings in our approach. With your patience and your feedback, we hope to work through these.

Accomplishing this means staying in touch with you so that we can find out what is working and what is not as you start to introduce data science into your statistics course. We need to hear from you so that we can provide the appropriate resources and services to make this undertaking successful and attractive.

A Programming Tip


  1. “Guidelines for Assessment and Instruction in Statistics Education: College Report 2016”