Important word pairs
Many of the vocabulary terms used in statistical thinking come in pairs. We list several such pairs below, in roughly the order they first appear in the Lessons. The pairs can be a reference while reading, but it is also helpful to return to this list to sharpen your understanding of the distinctions.
Explanatory vs response variables. Models (in these Lessons) always involve a single response variable*. In contrast, models can have zero or more explanatory variables.
Variable vs covariate. “Covariate” is another word for an explanatory variable. The word “covariate” signals that the variable is not itself of direct interest to the modeler but puts another explanatory variable in a correct context.
Categorical vs quantitative variables. Always be aware of whether a model’s response variable is categorical or quantitative. When categorical, expect to use zero_one()
to convert it to quantitative before modeling. In contrast, explanatory variables can be either categorical or quantitative.
Regression model vs classifier. A regression model always has a quantitative response variable. A classifier has a categorical response variable. In these Lessons, as in much professional use of data, our categorical response variables will have two levels (e.g., healthy or sick, up or down, yes or no). In this situation, regression techniques suffice to build classifiers.
Model vs model function. By “model,” we will almost always mean “regression model.” A regression model, typically constructed by the lm()
function, contains various information useful to summarize the model. The “model function” provides the mechanism for one important task, calculating from values from the explanatory variables the corresponding model output.
Model coefficient vs effect size. Model coefficients are numerical parameters. Training determines the appropriate values for the coefficients. In contrast, an effect size describes the relationship between the response variable and a selected explanatory variable.
Point estimate vs interval estimate. A point estimate is a single number. For instance, a model coefficient is a point estimate, as is the output from a model function. In contrast, interval estimates involve two numbers; one specifies the lower end of the interval and the other number specifies the upper end.
Prediction interval vs confidence interval. A prediction interval describes the anticipated range of the actual result for which we have made a prediction, e.g., “tomorrow’s wind will be between 5 and 10 mph.” A confidence interval is often used to express the uncertainty in a coefficient or effect size.