Chapter 25 Appendix: Data by design

DRAFT: NOTES FOR THE APPENDIX

25.1 Approaches to collecting data

https://en.wikipedia.org/wiki/Nonprobability_sampling

Focus on observational data.

  1. What population are the results to be applied to? The sampling frame is your representation of this.
    • census
    • patients, purchasers, college grads, …
      • can introduce spurious correlations (Give example in simulation chapter with a collider)
  2. Sampling
    • simple random sampling
    • convenience sampling (not good!) – contrast it with simple random sampling when the sampling frame is a census
    • cluster sampling
    • case-control
    • longitudinal versus cross-sectional
      • prospective cohort
      • retrospective cohort
      • survival bias
  3. Experiment and orthogonality: you are part of the system!

Then introduce experiment as a technique for dealing with covariates and pro

  1. Common collection practices and their consequences
    1. Census (example: employment discrimination and hypothesis testing)
    2. Simple random sample
    3. Cluster sampling
    4. Case/control study (always retrospective. Refer to example about the HBC high-blood pressure study in classification error)
    5. Cohort (prospective?)
    6. Retrospective cohort study
    7. Experiment (refer to fun examples from Kahneman’s Thinking Fast and Slow)
    8. Pathetic reaching out
      • telephone polling
      • convenience samples
  2. Measuring covariates
  3. Proxies for latent variables.

Additional precautions against confounding/sampling-bias

  1. Random selection within strata. Why? Provides a kind of certificate believable by others, that your sampling wasn’t biased.

  2. Assignment. Cut off any possible relationship X <- C. You can do this by instituting U -> X C -> Y and possibly X -> Y. But, you can fool yourself (“this patient isn’t healthy enough to be put on the experiment drug” or “this is a lost cause, so no risk in trying the new drug”). So random assignment is a good way to go.

Still, keep a record of covariates.

Intent-to-treat and instrumental variables approaches.

SEE MATERIALS IN 033-samples.Rmd

To avoid unnecessary and confusing abstraction, I’ll start with a specific setting: predicting whether a person will develop diabetes. Ideally, you should have a precise definition of “develop diabetes”: what constitutes diabetes, the time horizon of the prediction, and so on. There are many ways in which appropriate data can be collected. The different ways are called study designs.

One possible study design is a prospective cohort study. A cohort is a group of subjects with some features in common and others that differ. In a cohort study, a group of subjects are identified (“assembling” the cohort), say adults in the US with no previous signs of diabetes, and relevant observations made of existing conditions which will play the role of explanatory variables, e.g. age, sex, weight, diet, exercise, … whatever you think might be relevant. In a prospective study, the group is then followed forward over time and the eventual outcome – developing diabetes in our example – is recorded. To use the data for prediction for a new subject, compare the conditions for the new subject (age, sex, weight, etc.) to the set of observations in the cohort subjects. Pick out the members of the cohort whose original observations best resemble those of the new subject. Let’s call this the matching subset of the cohort. Then tabulate the eventual outcomes of the matching subset. This tabulation constitutes the prediction for the new subject.

It’s helpful to distinguish a prospective cohort study from a retrospective cohort study. In a retrospective study, the cohort subjects are not followed forward over time but backward. That is, the outcome is already known at the time the cohort is assembled. The values of the explanatory variables are extracted from historical records, say, the subjects’ medical records.

THIS IS FOR DETECTION, NOT PREDICTION. Still another design is a case/control study. If the outcome being studied is rare, a cohort might have to be very large in order to include people with the condition to be predicted. To illustrate, suppose we believe that drinking large amounts of sugary beverages (e.g. “Big Gulps”) is a factor that can be used to predict the onset of diabetes. For instance, according the the US Centers for Disease Control and Prevention (“National Diabetes Statistics Report, 2017” 2018), in a randomly selected group of 1000 US adults without signs of diabetes, approximately 7 will develop diabetes in the next year.

Possible introduction or enrichment for need to look at multiple variables. Hill-1937a-WA-I.pdf under “Definition of Statistics” and the following “Planning” section.

25.2 Generalization

A nice set of examples of selection bias is in Hill-1937a-WA-II.pdf.

25.3 Create model representation of your system.

A model is a representation for a purpose. You want something you can easily manipulate because you are going to be doing experiments on the model to understand/interpret the real-world system better.

25.4 Evaluate technical performance.

Feedback loop with (4)

25.5 Interpret and communicate.

Apply loss functions.

Express risk sensibly, attribute risk (causality) responsibly.

Express your uncertainty/ Standardize your results (adjustment) to help decision-makers see contrasts that are meaningful. (Example: Mexico and US death rates.)

Be attentive to false discovery.

Don’t be afraid to frame things in terms of causation, but only do so if you have handled the possibility of confounding in a responsible way.

Not p-values: - ASA editorial March 2019 - Nature article March 2019

25.6 On a second reading

After reading the methods, go back and re-read this.

Other points …

25.7 Sensitivity

About why we shouldn’t rely only on sampling variation to indicate what we know about the effect size. We should look at a variety of models and model architectures, proxies for the measured quantities (since a variable is not necessarily what we want it to be) to get a sense of how much variation there is among models of equal plausibility.

“The Statistical Confidence Game” – why do we focus on doing the same thing over and over?

Should get similar results when using different proxies for the effect, e.g. in the SAT data use expenditures, but also teachers’ salaries and class size, perhaps building and administrative expenses, ….

Motivated by Michael Lavine’s essay the The American Scientist. “Frequentist, Bayes, or Other?”" Michael Lavine https://doi.org/10.1080/00031305.2018.1459317

Also see Steven Ziliak’s article in the same issue, from which these quotes are taken:

G-7 Minimize “Real Error” with the 3 R’s: Represent, Replicate, Reproduce

A test of significance on a single set of data is nearly valueless. Fisher’s p, Student’s t, and other tests should only be used when there is actual repetition of the experiment. “One and done” is scientism, not scientific. Random error is not equal to real error, and is usually smaller and less important than the sum of nonrandom errors. Measurement error, confounding, specification error, and bias of the auspices, are frequently larger in all the testing sciences, agronomy to medicine. Guinnessometrics minimizes real error by repeating trials on stratified and balanced yet independent experimental units, controlling as much as possible for local fixed effects.

G-6 Economize With “Less Is More”: Small Samples of Independent Experiments

Small-sample analysis and distribution theory has an economic origin and foundation: changing inputs to the beer on the large scale (for Guinness, enormous global scale) is risky, with more than money at stake. But smaller samples, as Gosset showed in decades of barley and hops experimentation, does not mean “less than”, and Big Data is in any case not the solution for many problems.

al., Whelton PK et. 2018. “2017 Acc/Aha/Aapa/Abc /Acpm/Ags/Apha/Ash /Aspc/Nma/Pcna Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines.” J Am Coll Cardiol 71: e127–e248. https://www.acc.org/latest-in-cardiology/ten-points-to-remember/2017/11/09/11/41/2017-guideline-for-high-blood-pressure-in-adults.

AmericanStatisticalAssociation. 2016. “Guidelines for Assessment and Instruction in Statistics Education.” American Statistical Association. http://www.amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf.

Baudry, Julia, and Karen Assmann et al. 2018. “Association of Frequency of Organic Food Consumption with Cancer Risk: Findings from the Nutrinet-Santé, Prospective Cohort Study.” JAMA Internal Medicine online. https://doi.org/10.1001/jamainternmed.2018.4357.

Bradbury, K E, and A Balkwill et al. 2014. “Organic Food Consumption and the Incidence of Cancer in a Large Prospective Study of Women in the United Kingdom.” British Journal of Cancer 110: 2321–6.

Brennan, Tim, William Dieterich, and Beate Ehret. 2008. “Evaluating the Predictive Validity of the Compas Risk and Needs Assessment System.” Criminal Justice and Behavior 36 (1). https://jpo.wrlc.org/handle/11204/1123.

Bruder, C., J. L. Bulliard, S. Germann, I. Konzelmann, M. Bochud, and A Leyvraz M.and Chiolero. 2018. “Estimating Lifetime and 10-Year Risk of Lung Cancer.” Preventive Medicine Reports 11: 125–30. https://doi.org/doi:10.1016/j.pmedr.2018.06.010.

Charlesworth, Brian, and Anthony W.F. Edwards. 2018. “A Century of Variance.” Significance 15 (4): 21–25.

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54. https://www.jstor.org/stable/2288400.

Fisher, Ronald A. 1925. Statistical Methods for Research Workers. Oliver; Boyd. http://psychclassics.yorku.ca/Fisher/Methods/.

———. 1926. “The Arrangement of Field Experiments.” Journal of the Ministry of Agriculture of Great Britain 33: 503–13.

———. 1936. “The Coefficient of Racial Likeness.” Journal of the Royal Anthropological Institute of Great Britain and Ireland 66: 57–63.

Fisher, Ronald A, and Frank Yates. 1953. Statistical Tables for Biological, Agricultural and Medical Research. Oliver; Boyd.

F.R.S., Karl Pearson. 1900. “X. On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 50 (302). Taylor & Francis: 157–75. https://doi.org/10.1080/14786440009463897.

Fry, Hannah. 2018. Hello World: Being Human in the Age of Algorithms. W.W. Norton.

Galton, Francis. 1886. The Journal of the Anthropological Institute of Great Britain and Ireland 15: 246–63. http://www.jstor.org/stable/2841583.

Gelman, Andrew, and Eric Loken. 2014. “The Statistical Crisis in Science.” American Scientist 102 (6). https://doi.org/10.1511/2014.111.460.

Gould, Stephen Jay. 1996. The Mismeasure of Man. Revised and expanded. W.W. Norton.

Greene, David L. 1981. “Estimated Speed/Fuel Consumption Relationships for a Large Sample of Cars.” Energy 6 (5): 441–46. https://doi.org/https://doi.org/10.1016/0360-5442(81)90006-2.

J, Yerushalmy. 1971. “The Relationship of Parents’ Cigarette Smoking to Outcome of Pregnancy—Implications as to the Problem of Inferring Causation from Observed Associations.” American Journal of Epidemiology, 443–56. https://doi.org/10.1093/oxfordjournals.aje.a121278.

Kahn, Michael. 2005. “An Exhalent Problem for Teaching Statistics.” Journal of Statistics Education 13 (2). Taylor & Francis: null. https://doi.org/10.1080/10691898.2005.11910559.

Kahneman, Daniel. 2011. Thinking, Fast and Slow. Farrar, Straus,; Giroux.

Kaplan, Daniel T. 2011. Statistical Modeling: A Fresh Approach. 2nd ed. Project Mosaic Books. https://project-mosaic-books.com.

Kramer, Adam, JE Guillory, and JT Hancock. 2014. “Experimental Evidence of Massive-Scale Emotional Contagion Through Social Networks.” Proceedings of the National Academy of Sciences 111 (24): 8788–90. https://doi.org/10.1073/pnas.1320040111.

Larson, Jeff, Surya Mattu, Lauren Kirchner, and Julia Angwin. n.d. “How We Analyzed the Compas Recidivism Algorithm.” Pro Publica. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm.

Mathews, Fiona, Paul J. Johnson, and Andrew Neil. 2008. “You Are What Your Mother Eats: Evidence for Maternal Preconception Diet Influencing Foetal Sex in Humans.” Proceedings of the Royal Society B 275: 1661–8.

Michelson, Albert A, and Edward W Morley. 1887. “On the Relative Motion of the Earth and the Luminiferous Ether.” American Journal of Science 34: 333–45. http://spiff.rit.edu/classes/phys314/images/mm/mm_all.pdf.

“National Diabetes Statistics Report, 2017.” 2018. Centers for Disease Control; Prevention. https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf.

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.

Pearson, Karl. 1900. “On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling.” Philosophical Magazine Series 5 50 (302): 157–75.

Rabin, Roni Caryn. n.d. “Can Eating Organic Food Lower Your Cancer Risk?” New York Times. https://www.nytimes.com/2018/10/23/well/eat/can-eating-organic-food-lower-your-cancer-risk.html.

Schrek, Robert, Lyle A. Baker, George P. Ballard, and Sidney Dolgoff. 1950. “Tobacco Smoking as an Etiologic Factor in Disease. I. Cancer.” Cancer Research 10: 49–58.

Speed, T, and Nolan D. 2000. Stat Labs: Mathematical Statistics Through Applications. New York: Springer. https://www.stat.berkeley.edu/users/statlabs/labs.html#babies.

Stark, Philip B., and Andrea Saltelli. 2018. “Cargo-Cult Statistics and Scientific Crisis.” Significance 15 (4): 40–43. https://doi.org/https://doi.org/10.1111/j.1740-9713.2018.01174.x.

Student. 1908. “The Probable Error of a Mean.” Biometrika 6 (1): 1–25. https://doi.org/10.1093/biomet/6.1.1.

Tversky, Amos, and Daniel Kahneman. 1974. “Judgements Under Uncertainty: Heuristics and Biases.” Science 185 (4157): 1124–31.

Vanderpump, M P, and et al. 1995. “The Incidence of Thyroid Disorders in the Community: A Twenty-Year Follow-up of the Whickham Survey.” Clinical Endocrinology 43: 55–69.

Wasserstein, R. L., and N. A. Lazar. 2016. “The Asa’s Statement on P-Values: Context, Process, and Purpose.” The American Statistician 70. http://dx.doi.org/10.1080/00031305.2016.1154108.

Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software 59 (10). https://www.jstatsoft.org/article/view/v059i10/.

Woloshin et alia, Steven. 2008. “The Risk of Death by Age, Sex, and Smoking Status in the United States: Putting Health Risks in Context.” Journal of the National Cancer Institute 100 (12): 845–53. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3298961/.

Young, S. Stanley, Heejung Bang, and Kutluk Oktay. 2009. “Cereal-Induced Gender Selection? Most Likely a Multiple Testing False Positive.” Proceedings of the Royal Society B 276: 1211–2.

Zeise, Lauren, Richard Wilson, and Edmund A. C. Crouch. 1987. “Dose-Response Relationships for Carcinogens: A Review.” Environmental Health Perspectives 73 (Aug): 259–306.

References

“National Diabetes Statistics Report, 2017.” 2018. Centers for Disease Control; Prevention. https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf.