9 Hypothesis and likelihood

Models are central to this book’s account of quantitative reasoning. Recall that a model is a “representation for a purpose.” Since we build models for a purpose that often relates to the real world, the models we build tend to represent real-world objects or processes. One example from the author’s past work: modeling how a prospective college student (and their family) decides which college to attend. The purpose was to help design a new tuition discounting policy for the college where the author taught.¹ The college’s administration hoped to encourage more students to attend while maintaining or increasing revenue.

Spotted a problem?
Help us fix it!

So many things go into a prospective student’s choice that it might seem impossible to model. The modeler is not a mind reader. However, a successful model does not always need to mimic the real-world process. Instead of examining the psychological factors behind a student’s decision, the model merely hypothesized that many students and their families would prefer a lower-cost college, all else being equal. That hypothesis might seem obvious, but some college officials worked under another hypothesis: that each family has its own “affordability” threshold and that, so long as the discount reached that threshold, any additional discount would not influence decision-making.

Spotted a problem?
Help us fix it!

Notice the switch in wording between the above two paragraphs. In the first paragraph, the word “model” is used. In the second, the word “hypothesis” is introduced.

It is reasonable to think of a “model” as a “hypothesis.” To show the parallel, consider this informative response from an AI to the prompt, “What is a hypothesis?” We break the response into parts to better point out the similarities and differences between a hypothesis and a model.

A hypothesis is a testable explanation for a phenomenon that serves as a starting point for further investigation through experimentation or observation.`

The above sentence is in the spirit of the “☞ scientific method ☜.” Philosophers of science sometimes prefer the name ☞ hypothetico-deductive model ☜. The emphasis of the sentence is on continuing research: “further investigation through experimentation or observation.” In contrast, for a model, the emphasis is on the human modeler’s purpose: helping to inform a decision or provide insight.

*It is an educated guess that proposes a relationship between variables and guides research by making a specific prediction about the outcome of a study.

Models often have to represent the “relationship between variables.” However, for the quoted sentence to correspond better to what modeling is, change “guides research” into “guides action.” Moreover, rather than being “about the outcome of a study,” models are more often about the real-world setting.

Spotted a problem?
Help us fix it!

A good hypothesis must be clear, testable, and falsifiable, meaning it can be proven wrong by evidence.

The “proven wrong by evidence” comes as a surprise to some people, who understandably think that the goal is to prove a hypothesis right. Nonetheless, it is a mainstream conception of the scientific method that hypotheses are never “proven right”; they are always in a state of “not yet proven wrong” until, at some point in the future that may never come, they are proven wrong.

For models, the test is not about being wrong but whether the model successfully meets the purpose for which we built it. An eminent statistician, George Box, framed this more eloquently:

“All models are wrong, but some are useful.”

Now focus attention on the word “proven.” A common interpretation of “proven” is “the matter is settled once and for always.” Mathematical proofs honor this interpretation. However, models are not about definitively settling matters; models are for using what we know about a matter to serve some purpose (such as prediction). Rather than “proof” or “wrong” or “right”, models call for another framework for resolving disputes between hypotheses. The following section introduces a framework for evaluating hypotheses about the origins of quantities in the face of uncertainty.

Spotted a problem?
Help us fix it!

9.1 The likelihood framework

Chapter 7 discussed how to use probability distributions to quantify uncertainty. Section 7.2 introduced a Bayesian notion of where probability distributions come from: they are hypotheses concocted by humans.

In the ☞ likelihood framework ☜, we replace ideas about individual hypotheses being correct or incorrect, right or wrong, true or false. Instead, we consider collections of hypotheses, putting them into competition with one another, just as sports playoffs pit different athletes or teams against one another. In sports, a higher score gives a team a claim to being better than the opponent.

Spotted a problem?
Help us fix it!

In the likelihood framework, the competing entities are hypotheses represented by probability distributions about the relative probability of each possible observed outcome. The “score” for each competing hypothesis is the “☞ likelihood ☜.” And just as sports teams compete in a particular setting—a game or match—hypotheses compete in a setting: data.

The reader who has never heard of “likelihood” in this sense may wonder if the framework is the right or best one for placing hypotheses into competition. We turn to the field of statistics to answer. As described in Section 7.2, Bayesian and Frequentist statisticians fundamentally disagree. Despite their conflicting stances, both camps put likelihood at the center of their methodologies.

To illustrate how likelihood works, consider how to model uncertainty in the time interval between major events, for example, the time between economic depressions or the time between magnitude 6.5+ earthquakes in a region. We base the model on the hypothesis that the recurrence is random does not depend on how long has passed since the last event. A competing hypothesis, consistent with the intuition of many people, is that the probability increases as we approach the “anniversary” of the last event. Suppose we have data on the last four events: an event occurred in 1889, 1906, 1929, and 1960. The times between successive events are 17, 23, and 31 years, respectively. Thus, the event occurs roughly every 25 years on average.

Translating the two competing hypotheses into probability distributions:

Hypothesis 1: The data come from an exponential distribution with a rate parameter of 0.04 per year (corresponding to an average of 25 years between events).
Hypothesis 2: The data come from a normal distribution with a mean of 25 years and a standard deviation of 10 years.

(a) Exponential distribution with rate of 0.04 per year.

Spotted a problem?
Help us fix it!

Note that the two hypotheses tell different stories about the possibility of future events. For instance, the exponential-distribution hypothesis gives a fairly large probability that a future inter-event duration will be less than 10 years, much larger than given by the probability under the normal-distribution hypothesis. Likewise, the normal distribution indicates that inter-event intervals longer than 50 years are rare, whereas the exponential distribution gives them a substantial probability.

The data seem to favor the normal-distribution hypothesis. For example, the data contains no intervals longer than 31 years or shorter than 17 years. The likelihood calculation quantifies the vague “seem to favor.” Now to compute the likelihood for each hypothesis given the observed data. To find the likelihood of a hypothesis, compute the product of the relative probabilities that the hypothesis assigns to each individual observed event. For the exponential hypothesis, Fig 9. 1 (a) shows that those relative probabilities are approximately 0.020, 0.015, and 0.011. The likelihood is the product of these individual probabilities: \[0.020 \times 0.015 \times 0.011 = 0.0000033\ .\]

For the normal distribution, shown in Fig 9. 1 (b), the relative probabilities are 0.029, 0.039, and 0.033. Correspondingly, the likelihood for the normal distribution is \[0.029 \times 0.039 \times 0.033 = 0.00004\ .\]

The normal distribution wins! Its likelihood value is about 7 times higher than that of the exponential distribution. In the terminology of statistics, the ☞ likelihood ratio ☜ is 0.00004 / 0.0000033 = 12.1.

9.2 Putting numerical evidence into words

A likelihood ratio is a measure of the strength of evidence. In interpreting likelihood, verbal ideas like “weak” evidence, “moderate,” and “strong” turn out to correspond to the magnitude of the likelihood ratio. Of course, we use “magnitude” in the sense of Chapter 4, so 6.7 is between magnitude 0 and 1.

Spotted a problem?
Help us fix it!

The translation of a continuous quantity into a small set of verbal descriptions necessarily involves imposing some breakpoints. Table 1 provides a widely accepted set of breakpoints for translating the magnitude of the likelihood ratio into a verbal interpretation.

Table 9. 1: A scale for translating the likelihood ratio into verbal categories.

Magnitude	Likelihood ratio	Verbal interpretation
0	1	No evidence.
1	10	Limited evidence to support the hypothesis.
2	100	Moderate evidence to support the hypothesis.
3	1000	Moderately strong evidence to support the hypothesis.
4	10000	Strong evidence to support the hypothesis.

According to Table 1, the likelihood ratio 6.7 corresponds to “limited evidence” in favor of the normal-distribution hypothesis.

Spotted a problem?
Help us fix it!

9.3 A level playing field?

As we will describe in Chapter 9, most commonly the probability distributions compared via likelihood come from the same family of distributions. The previous example, however, compared likelihoods from two different families: the exponential and the normal.

The challenge when working with hypotheses from two families is that one or the other hypothesis has an advantage. For instance, the normal family involves two parameters (the mean and standard deviation), whereas the exponential family uses only a single parameter.

To illustrate the influence of varying numbers of parameters, consider a third hypothesis, one with three parameters. This hypothesis will be that the inter-event interval is equally likely to be 17, 23, or 31 years. The likelihood for the observed data is therefore \[\frac{1}{3} \times \frac{1}{3} \times \frac{1}{3} = 0.037\ .\] This is much bigger than the likelihoods of either the exponential or normal distributions.

The spectators cry “foul!” The new hypothesis is competing unfairly. It was constructed specifically to match the observed data. True enough, but fairness calls on us to recognize that the parameters in the original two hypotheses were also selected based on a glance at the data.

How are we to avoid such foul play when conducting a hypothesis competition?

An informal approach applies an aesthetic to weed out ugly hypotheses. The 17-23-31 hypothesis is ugly because it rules out the possibility of any other inter-event interval (such as 18.5 years). We do not have any reason to think that the numbers 17, 23, and 31 are special in any way other than being the observed values.

Spotted a problem?
Help us fix it!

The principle called ☞ Occam’s Razor ☜ embodies a similar aesthetic. William of Occam’s 14th-century Latin statement translates directly to “Entities must not be multiplied beyond necessity.” Occam’s statement is hard to understand unless one already knows what it means. A modern version, attributed to Albert Einstein, is “Everything should be made as simple as possible, but not simpler.”

The spirit of dealing with data quantitatively calls for operationalizing the above aesthetics into quantitative terms. Without going into the details of the ways of doing this, we merely name some approaches so that the reader can readily interpret the somewhat technical literature of “likelihood.” Two quantitative embodiments of Occam’s razor are implemented as formulas: the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Both take into account the number of model parameters. Another approach, ☞ cross-validation ☜, uses randomly selected subsets of the data to estimate parameters, then evaluates how well the estimates perform by looking at the likelihood of the remaining data.

Footnotes

College administrators talk about “discounted” tuition internally, but when facing the outside world, they prefer to frame it as “financial aid.” The word “aid” encourages the usually unfounded conception that the college is giving money to students.↩︎

9 Hypothesis and likelihood

9.1 The likelihood framework

9.2 Putting numerical evidence into words

9.3 A level playing field?

New terms {

Footnotes