Chapter 23 Adjustment
JUST NOTES
- Comparing like things
- Assembling conditional probabilities into a whole, e.g. survey weighting
More things…
- Making meaningful comparisons: stratify and compare strata
- Add up the comparisons for each strata with an agreed-upon, standard weight for each strata.
23.1 Weighting in polls
https://www.nytimes.com/2018/09/06/upshot/live-poll-explainer.html see also https://www.nytimes.com/2018/09/06/upshot/midterms-2018-polls-live.html
23.2 Minimum wage example
See Figure 5.3.
23.3 Comparing to an index
Case mortality from Hill-1937a-WA-III.pdf
(perhaps for exercise).
23.4 Mismeasure of man
As described in Gould (1996) (pp. 87-99), in the 19th century, racist assertions of superiority were tied to measurements of brain size, with so-called superior races having, on average, the largest brains. There’s no justification for the association of superior intellect with brain size. People of short stature tend to have smaller brains than those of large stature, females tend to have smaller brains than males. In assembling the averages by race, no attempt was made to adjust for the stature of individuals. The claimed differences between the races was substantially the product of the samples used having different mixtures of small- and large-statured people included.
23.5 Age adjustment
A specific example in a context that’s easy to understand. Mortality rates among countries.
Mortality rates: adjusting for age
Cancer rates: adjusting for age
See https://ourworldindata.org/causes-of-death for many death-rate stats.
23.6 Adjustment for covariates
23.7 Seasonal adjustment
Separating out known sources of variation from unknown ones.
23.8 Example
Age-adjusted incidence and mortality rates by year of diagnosis, see https://seer.cancer.gov/archive/csr/1975_2014/results_merged/topic_annualrates.pdf tables 4.5 and 4.6
23.9 More details
From conditioning to correlation: p(xy) = p(y | x) p(x) = p(x | y) p(y)
Use tilde instead of |. So p(xy) = p(y ~ x)p(x)
##
## Call:
## glm(formula = Y ~ A, family = binomial, data = A_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9081 0.5944 0.5944 0.7535 0.7535
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.11395 0.09379 11.877 < 2e-16 ***
## AA 0.52982 0.16654 3.181 0.00147 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1038.4 on 999 degrees of freedom
## Residual deviance: 1027.9 on 998 degrees of freedom
## AIC: 1031.9
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = Y ~ A + C, family = binomial, data = A_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.88294 0.09448 0.17776 0.75942 1.25211
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1739 0.1221 -1.424 0.154
## AA 1.2698 0.1890 6.720 1.82e-11 ***
## CC 4.3138 0.4258 10.130 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1038.42 on 999 degrees of freedom
## Residual deviance: 713.18 on 997 degrees of freedom
## AIC: 719.18
##
## Number of Fisher Scoring iterations: 7
## A meanI_lower meanI_upper
## 1 o 0.9373592 1.307905
## 2 A 1.3942770 1.944251
## A C meanI_lower meanI_upper
## 1 o o -0.4100024 0.07603221
## 2 A o 0.8123784 1.39481499
## 3 o C 3.4278546 5.63746818
## 4 A C Inf Inf
References
Gould, Stephen Jay. 1996. The Mismeasure of Man. Revised and expanded. W.W. Norton.