At any point, you can submit your answers by collecting them and uploading them to the class site.

No answers yet collected

Link to upload site

If requested by your instructor, please identify here the people from whom you received assistance on this assignment.

If the answers that have been loaded automatically are not yours, press this button before starting your work:

9 Exercises: Likelihood

Exercise 9. 1 An important task for quantitative reasoning is evaluating evidence for a claim. Non-Bayesian statistical methodology can offer great insight so long as data can be collected in a repeatable and reliable way as in a laboratory experiment. The logic of non-Bayesian statistics is based on making assumptions that the data meet the required standards and have the properties needed for the techniques to give a correct answer. And, for the idea of a “correct” answer to apply at all, non-Bayesian statistics limits its purview to mathematically idealized questions that often do not directly target the questions of actual interest. An example: the “fail to reject” the Null Hypothesis that is at the heart of frequentist statistics.

In his autobiography, Benjamin Franklin (1706-1790), gave a good description of the personality trait of requiring mathematically correct answers:

Thomas Godfrey, a self-taught mathematician, great in his way, and afterward inventor of what is now called Hadley’s Quadrant. But he knew little out of his way, and was not a pleasing companion; as, like most great mathematicians I have met with, he expected universal precision in everything said, or was for ever denying or distinguishing upon trifles, to the disturbance of all conversation. He soon left us.*

Often, useful quantitative reasoning accepts uncertainty and imprecision as inevitable and recognizes that precise statements about a matter of not usually available. An excellent example of such quantitative reasoning is described in David Spiegelhalter’s The Art of Statistics and is based on a 2014 report published in Nature: “Identification of the remains of King Richard III.”

Richard III was killed on August 22, 1485 at the Battle of Bosworth, aged 32. He was buried in the town of Leicester in the medieval church of the Grey Friars. The church was torn down on the orders of Henry VIII in about 1540. The location of the church is documented but there are no grave markers and, today, the site is occupied by a car parking lot. Around 2012, an interdisciplinary group—archeologists, anthropologists, geneticists, historians, etc.—attempted to locate Richard’s skeleton. Digging at a likely location in the parking lot let almost immediately to a skeleton.

Finding a skeleton on the site of a church graveyard is hardly surprising, but was the skeleton Richard’s? The team sought to evaluate the evidence provided by the skeleton and, importantly, the recovered DNA from the skeleton.

The Nature report states:

To obtain a probability that Skeleton 1 is that of Richard III, we considered the non-genetic data (radiocarbon data, estimated age at death, sex, presence of scoliosis and presence of perimortem wounds) together with the genetic data (mtDNA and Y-chromosome). For each data type, we computed likelihoods for the observed data under hypothesis 1 (H1—that Skeleton 1 is Richard III) and under hypothesis 2 (H2—that Skeleton 1 is not Richard III).

Here are the non-genetic observations:

The skeleton was that of a male aged 30 to 34 years, with severe scoliosis rendering one shoulder higher than the other, with numerous perimortem battle injuries Modelled radiocarbon dating … [indicated death in the interval] 1456–1530 AD at 95.4% probability.

  1. Age and sex of skeleton. Given the historical facts, a “male aged 30 to 34 years” is absolutely consistent with the observations. So, under hypothesis 1, the likelihood would be high, perhaps 75%. On the other hand, under hypothesis 2, it’s fair to see the likelihood as the product of 1/2 (for sex) and perhaps 1/5 given the age distribution at death of the sorts of people likely to be buried at the church. The likelihood ratio is divides the likelihood under hypothesis 1 by the likelihood under hypothesis 2, something like 0.75 / 0.10 = 7.5. On the verbal scale, this is “weak support” for hypothesis 1.

  2. Radiocarbon dating to AD 1456-1530. Given the historical record, this is obviously consistent with hypothesis 1, so the likelihood is high (say, 50%) under hypothesis 1. But under hypothesis 2, given the location and age of the church, it’s hardly surprising that any skeleton would date to this period. Let’s call it a likelihood of 1/3 to 1/4. The likelihood ratio is thus about 1.5 to 2, also “weak support.” It would have been much more stronger evidence, against hypothesis 1, if the skeleton dated to 1400 or 1600.

  3. Scoliosis. Under hypothesis 1, the likelihood of this evidence is high. Under hypothesis 2, the likelihood can be estimated from the fraction of skeletons found with scoliosis, perhaps 1 in 300. The likelihood ratio can reasonably be assigned as 200, “moderately strong evidence.”

  4. Wounds inflicted after death. Under hypothesis 1, this likelihood is high, but hard to know how high. (The War of the Roses, for which Richard’s death marked the end, was bitterly fought.) Under hypothesis 2, one could look at how often post-mortem wounds are found in such burials. The researchers assigned a likelihood ratio of 40: “moderate support.”

A. Given the evidence presented in (1) - (4), what’s a reasonable estimate of the overall likelihood ratio?

pwh-A-3ks
1000       5,000       10,000       100,000       1,000,000       5,000,000       10,000,000      

B. What does (A) correspond to on the verbal scale of strength of evidence (in favor of hypothesis 1)?

pwh-B-kdw
Moderate support       Moderately strong support       Strong support       Very strong support       Extremely strong support      

The genetic evidence is mixed. Richard had no children, but there are two kinds of DNA that are well conserved over time. The first is DNA from the Y chromosome, which is passed patrillinearly from generation to generation. The second is mitochondrial DNA which is passed exclusively from the mother.

  1. To find a continuous line of male descendents, it was necessary to go back to Richard’s great-great-grandfather, Edward III (1312-1377). Edward’s descendents tracked through the male line, reach the present after 18-20 generations. No match was found between Edward living descendants and Richard’s Y-chromosome DNA. The likelihood of no match is high, practically 1, under hypothesis 2. But even under hypothesis 1, given the many opportunties for false paternity in the many generations between Edward and today’s male descendents, the likelihood of no match is something like 10%.

C. Combine the two likelihoods given in (5) to find the likelihood ratio of hypothesis 1 versus hypothesis 2 under the observation of no genetic match. Which number is closest?

pwh-C-k2l
0.01       0.1       1.0       10       100      

  1. Mitochondrial DNA. Richard had a sister, and the sister has a continuous line of female descendents (through 17 to 19 generations) to the present. These descendants provided a strong genetic match. The likelihood ratio was estimated by the researchers to be approximately 500: “moderately strong support.”

D. Combining the likelihood ratios for (1) through (6), what is the overall likelihood ratio in favor of hypothesis 1?

pwh-D-8wd
1000       5000       10,000       100,000       1,000,000       5,000,000       10,000,000      

E. What is the verbal description of the overall likelihood ratio found in (D)?

pwh-E-kdw
Moderate support       Moderately strong support       Strong support       Very strong support       Extremely strong support      

Each of the individual likelihoods (1) - (6) is subject to dispute, but only within a limited range, say, a factor of roughly 2. This renders the overall likelihood uncertain on a numeric scale, but much less uncertain on a logarithmic scale. Note that the likelihood ratios corresponding to the verbal scale are spaced logarithmally. In mathematics, being uncertain by a factor of 10 is considered unacceptable precession for calling an answer “correct.” But on the logarithmic likelihood scale, a factor of 10 is not necessarily large.

Exercise 9. 2 Should the calculations be done with probability or with magnitude. The product of probabilities corresponds to the sum of the magnitudes.

Exercise 9. 3 Do the calculations to produce the years-saved versus cost function from Figure 13.7. Or maybe just the average years saved, that is, how to compute an expectation value.

Exercise 9. 4 A little drill with three or four data points: Calculate likelihood by hand for a given distribution. Maybe have them do both normal and exponential for a few points to show that both distributions are compatible.

Contrast maximum likelihood parameters with likelihoods that are some distance away. Contour plot for likelihood for mean and sd. Add more data points, then see how the likelihood contracts.

It’s the magnitude of the likelihood that we usually work with. Often, we can calculate the magnitude but not so much the actual value which suffers from computer round-off.

Exercise 9. 5 Absolute and relative probability set with the criterion that the total probability across all possibilities, adds up to 1. We’ll talk about how this calculation is done in Chapter 13.

Exercise 9. 6 Consider the risk of a serious-injury-producing automobile accident. The mileage driven until the next such accident is unknown. But we can frame a hypothesis: the relative probability of the mileage until the next accident is an exponential function (Chapter 7, BUT GIVE A MORE SPECIFIC LINK when you have it) with a rate of 1 in 50,000 miles.

There is an infinite number of other hypotheses that might be applied to the automobile-accident setting. For example, an exponential distribution with a rate of 1 in 72,983.5 miles. Or, perhaps a uniform distribution between a minimum of 138 miles and 21,709 miles. This might be starting to sound silly, but Bayesian reasoning saves the day by adding an additional concept: that every hypothesis can be assigned a relative “goodness.” There are two components that go into finding the “goodness” of a hypothesis. One of these is called “prior belief” and will be introduced in Chapter 10. The other is called “likelihood.”

To illustrate, let’s work with the specific hypothesis that “the relative probability of the mileage until the next accident is exponentially distributed with a mean of 50,000 miles.” We are keeping track of a car. Suppose the car has an accident at 38,231 miles. To find the likelihood, simply evaluate the probability distribution at the observed value. Here’s the relevant computing command for the relative probability:

dexp(38231, rate = 1 / 50000)
[1] 9.310216e-06

We can calculate the likelihood for any and all of the other hypotheses we are consideration. For example, we earlier mentioned a different hypothesis: that the exponential distribution has a rate of 1 in 72,983.5 miles. Here’s the calculation of the likelihood:

dexp(38231, rate = 1 / 72983.5)
[1] 8.114813e-06

Now a third hypothesis for the accident mileage: a uniform distribution between a minimum of 138 miles and 21,709 miles.

dunif(38231, min = 138, max = 21709)
[1] 0

The observation of the accident at 38,231 miles produces different likelihoods for the three different hypotheses.

It’s useful to consider a likelihood function that tells us the likelihood induced by an observation at 38,231 miles for each of a large set of hypotheses. Here’s the likelihood function for an exponential distribution: it takes the form of likelihood versus the hypothesized rate.

slice_plot(
  dexp(38231, rate = 1 / miles) ~ miles,
  domain(miles = 3000:1000000),
  npts = 500
) |>
  #gf_refine(scale_x_log10()) |>
  gf_labs(x = "Rate: 1 per n miles",
          y = "Likelihood of 38,231 miles observation")

Exercise 9. 7 The exponential distribution has 1 parameter, the normal has two, and the 17-23-31 has three. Formulas to take into account the number of parameters when comparing likelihoods have been offered. Two well-known ones are:

  • The Akaike Information Criterion (AIC). The total score is \(2 k - 2 \ln(L)\) where \(k\) is the number of parameters and \(L\) is the calculated likelihood.

  • The Bayesian Information Criterion (BIC). The score incorporates not only the number of parameters (\(k\)), but also the number of data points (\(n\), which is 3 in our example). The formula is \(k \ln(n) - 2 \ln(L)\).

Hypothesis \(k\) \(L\) AIC BIC
Exponential 1 0.000006 26 30
Normal 2 0.00004 28.3 22.5
17-23-31 3 12.6 8.7

A lower score for AIC or BIC is better.

Let’s imagine two scenarios where we collect new data.

Scenario 1
The next two intervals turn out to be 5 and 47 years.
Scenario 2 (in the spirit of science fiction)
The next two measurements are 31 and 31 years.

Calculate the likelihood, AIC and BIC for each of the three hypotheses using just the 2 new data points. Which hypothesis is favored according to L (higher is better), AIC, and BIC.

Exercise 9. 8  

CautionNext status step: complete the draft

Assigned to DTK

MAKE THIS ABOUT PUTTING high-school calculations in the likelihood framework.

ORIENT THIS TOWARD the high-school calculations being about likelihoods. Coin flip and die toss are shorthand for hypotheses.

“Probability” is a standard high-school mathematics topic, and it’s likely that you have spent some time calculating the probability of all heads from three coin flips or the probability of a 7 from rolling a pair of dice. Why coins? Why dice? One reason is that there is not a lot to know about coins and dice and we can be confident in the idea that heads or tails have the same relative probability, and, similarly, that the outcomes one through six of an individual die have the same relative probability. Other reasons: we are familiar from an early age with games that involve multiple throws of dice; we feel justified in the belief that just about any coin or any pair of dice thrown by just about any person provide a fair randomization device.

Whatever the educational virtues of learning the high-school probability calculations for dice and coins, the same reasons they are in the curriculum are also reasons why they are poor examples for dealing with uncertainty in significant events such as floods, fires, earthquakes, heat waves, financial collapses, illnesses, automobile mishaps, or—to reach for an extreme—the risk of an accidental detonation of a nuclear bomb. It would be foolish to think that the probability of a flood is the same at any location at any time or that we are all the same when it comes to becoming ill or the victim of an automobile mishap. And, unlike dice and coins, there is a lot to know about significant events and how they depend on circumstances. Finally, and thankfully, significant events are rare. We are always working with limited data and the understanding that risk can change over time.