At any point, you can submit your answers by collecting them and uploading them to the class site.
No answers yet collected
If requested by your instructor, please identify here the people from whom you received assistance on this assignment.
Exercise 10. 1 You’re trying to predict the outcome of a football game, updating your predictions as the game proceeds. The competing hypotheses here are the possible end scores for your team: 2, 3, 4, 5, 6, 7, 8, 9, …, 100, 100+. I don’t know which is your team and what are your beliefs about it, so I can’t tell you much about what relative probability you assign to each of these 100 possible outcomes, but you have your own views, well informed or not.
Purely for the sake of simplicity, let’s consider just these 20 hypotheses: the scores 5 through 24, with equal relative probability assigned to each: 1.
A. You don’t need to translate this relative probability distribution into an (absolute) probability, but if you were to do so, what would it be?
Now it’s half-time. Your team has a score of 10 pts. To update your before-the-game opinion, you need to multiply that relative probability by a likelihood function: What is the probability of the half-time outcome given each of the hypotheses.
B. Some of the hypotheses can be assigned zero likelihood given the half-time observation. Which are these?
C. What’s a better name for relative probability distribution that encodes your “before-the-game opinion” on your team’s score?
D. What’s a proper name for your updated distribution of relative probabilities taking into account the half-time score?
Exercise 10. 2 We are going to pick up on drill problem 10.B.2 [give a link here.] As a reminder, that drill problem was about your trying to predict the outcome of a football game involving your favorite team, the particular form of the prediction being the end-of-game score for your team. For simplicity, assumed that your prior make score outcome of 5 to 24 equally probable and every other score impossible, that is, probability 0 in your prior. (This is unrealistic, since obviously 25, 26, and so on might the the outcome, but our interest here is in showing how a calculation might be structured.)
So, your prior is 0.05 for each of the 20 scores between 5 and 24. (Note that they add up to 1, so this prior is in the form of an absolute probability distribution, which is of course a valid relative probability distribution as well.)
The observation, made a half-time, was a score of 10 pts. Based on this observation, we want to know the posterior distribution for the end-of-game score. Do find this, we simply multiply the likelihood of the observation for each of the hypotheses in the prior. Likelihoods should not be simply pulled from the air, there should be some firm reasoning behind them.
Common sense suggests that a half-time score of 10 is reasonably connected to an end-of-game score of 20. That is, it seems rough justice to say that it’s reasonable to expect the second half of the game to go more or less like the first half. But it would be silly to demand that the end-of-game score must be double the half-time score. A stickler for details might point out that the rules of football imply that an end-of-game score of 11 is impossible from a half-time score of 10. A Moneyball enthusiast might turn to the teams’ history in past games to look at how many points were scored in the second half of the game. More narrowly, the statistician might look only at previous games where the half-time score was 10, or perhaps throw the opposing team’s half-time score into the mix, and so on. But such detail requires more data. More data requires going back further in history. Going back further in history makes more questionable the relevance of that past record to the current performance of your team.
Here is a very rough (but reasonable) model of scoring in a football game: at the end of the game, each point earned is equally likely to have come from the first-half of the game or from the second half. With this in mind, there is a road forward to calculate the likelihood of each end-of-game hypothesis given the half-time observation of 10 pts. Moving forward requires some technical knowledge about probability theory which there is no reason for you to have already gained. But in this situation as in many real-world situations, there is often an expert who can guide you with technical matters.
I’m going to play the role of the expert consultant and tell you the calculation to do. The binomial distribution is well suited to this situation. The intuitive setting for the binomial is a series of coin flips, the binomial distribution tells you the probability out of n flips total, of getting 0 heads, or 1 head, or 2, or so on all the way up to n heads.
The R chunk contains the command needed to calculate the probability of getting k heads out of n flips. To use it we provide three numbers: the number of heads, the number of flips (size=) and the probability of a head in any one flip (prob = 1/2). (If we thought points come harder in the second half of the game, we might set, for instance prob = 1/4, but we will stick to 1/2.)
The table below lists each of the hypotheses for the end-of-game score as well as the prior probability we have assigned to each. There is also a column for the likelihood of observing a half-time score of 10 given each hypothesis. I’ve filled in the computer command that will calculate the likelihood for the first few hypotheses; the others follow the same pattern.
Fill in the table by calculating each likelihood, then calculate the posterior relative probability by multiplying together the prior by the likelihood.
| hypothesis (pts) | prior prob | likelihood | posterior relative prob |
|---|---|---|---|
| 5 | 0.05 | dbinom(10, prob=1/2, size = 5) |
|
| 6 | 0.05 | dbinom(10, prob=1/2, size = 6) |
|
| 7 | 0.05 | dbinom(10, prob=1/2, size = 7) |
|
| 8 | 0.05 | ||
| 9 | 0.05 | ||
| 10 | 0.05 | ||
| 11 | 0.05 | ||
| 12 | 0.05 | ||
| 13 | 0.05 | ||
| 14 | 0.05 | ||
| 15 | 0.05 | ||
| 16 | 0.05 | ||
| 17 | 0.05 | ||
| 18 | 0.05 | ||
| 19 | 0.05 | ||
| 20 | 0.05 | ||
| 21 | 0.05 | ||
| 22 | 0.05 | ||
| 23 | 0.05 | ||
| 24 | 0.05 |
To convert the posterior relative probability to an absolute posterior probability, add up all the relative probabilities and then divide each relative probability by this sum.
This is a lot of calculations, and a professional would automate them. You’re not expected to know how to do the automation, but perhaps you can understand what’s involved.
- What is the end-of-game posterior for a total score of 10 pts? (That is, no points scored in the second half.)
- A friend, impressed with your Bayesian powers, asks you at half-time, “What’s the probability that the final score will be \(\leq 15\)?” (Pick the closest answer.)
- Another friend, more competitive than the first, offers you an even-money bet claiming that the end-of-game score will be \(\geq 20\). How enthusiastic should you be to accept this bet?
- A third friend thinks that the number 21 has magical properties. She’s not so silly as to make an even money bet that the end-of-game outcome will be 21, and asks you to give her 7-to-1 odds. (This means, if she wins you pay her $7. If she loses, she pays you $1.) Are you willing to take the bet?
Exercise 10. 3 It’s common in mathematics to look at extremes. When it comes to competing hypotheses, there are two extremes: there is an infinite number of hypotheses in competition; or, there are just two hypotheses in competition. Naturally, it’s possible and common to have three competing hypotheses, four, or any other number. But there is no such thing as having just one hypothesis in the competition.
In every case, whether it’s comparing two hypotheses, ten, or an infinite number the procedure for calculating the Bayesian posterior probabilities on the hypotheses are the same.
You assign a prior to each hypothesis. This can be in the form of a relative probability. For two hypotheses, this will be a pair of numbers, for ten hypotheses this will be a list of 10 numbers, and for an infinite number of hypotheses this will be a function that takes any of the hypothesis as input and produces as output a relative probability.
Having made an observation, you calculate the likelihood of that observation under each of the hypotheses. Again, for two hypotheses this will be a pair of numbers, for ten hypotheses it will be a list of 10 numbers, and for an infinite number of hypotheses this will be a function that takes any of the hypotheses as input and produces a number as output.
Multiply the prior in (1) by the likelihood in (2) to get a posterior, expressed as a relative probability. Once again, for two hypotheses, the posterior will be two numbers. For 10 hypotheses 10 numbers. For an infinite number of hypotheses it will be a function.
Cast the relative probability posterior into the form of an absolute probability. This involves adding up all the relative probabilities and dividing each by the sum. For instance, imagine two hypotheses:
| Hypothesis | prior | likelihood | relative posterior | absolute posterior |
|---|---|---|---|---|
| A | 7 | 0.4 | 0.28 | 0.28 / (0.28 + 0.27) = 0.51 |
| B | 3 | 0.9 | 0.27 | 0.27 / (0.28 + 0.27) = 0.49 |
- In the above example, what is the odds of hypothesis A?
- Each of the likelihoods is an absolute probability calculated in its own world, a world where the corresponding hypothesis is true. Is there any requirement for the likelihoods to add up to 1 across all the hypotheses?
Now let’s make up an example where there are four hypotheses, say four different kinds of watercraft where the observation is seeing the craft with binoculars at a distance of 5 miles:
| Hypothesis | Prior | Likelihood | Relative Posterior | Absolute Posterior |
|---|---|---|---|---|
| Aircraft carrier | 0.01 | 0.8 | ||
| Luxury yacht | 2 | 0.5 | ||
| Canoe | 5 | 0.1 | ||
| Rubber dingy | 8 | 0.01 |
Note that the priors reflect a genuine reality: there are many more rubber dingys and canoes than yachts or aircraft carriers.
- Give a sensible explanation of why the likelihood for the aircraft carrier is 80 times that for a rubber dingy.
- Which of the four hypotheses has the greatest posterior probability?
- You can compute the relative posterior for each hypothesis by a simple multiplication of the numbers on that hypothesis’s row in the table.
You can likewise find each absolute posterior probability by using just the entries in that row.
- What is the absolute probability, given the observation, of the craft being a luxury yacht? (Pick the closest answer.)
Exercise 10. 4 Consider a bad movie plot organized along these lines: The protagonist has a disturbing dream that the universe has changed from “Normal” to a new state: “Doom.” Doom was hitherto thought impossible. Doom and Normal constitute two hypotheses about the state of the universe.
To say that a hypothesis is impossible is equivalent to assigning it probability zero. That’s what everyone else believes, but our protagonist, on account of his dream, assigns Doom the rhetorical odds “one-in-a-million.”
- If an outcome has probability zero, what are the odds of that outcome?
Now, in typical movie fashion, the protagonist starts to observe events—call them “strange events”—that shouldn’t be happening. In the Doom universe, such events have a high likelihood. But in the normal universe they could also be observed due to optical illusions, distraction, or tricks played by the neighborhood children who think the protagonist is crazy. Let’s assign each observed strange event a likelihood ratio of 2 in favor of Doom.
- Explain what feature of Bayesian reasoning allows evidence to be in favor of a hypothesis that we believe is impossible.
- Translate a likelihood ratio of 3 according to the verbal scale for evidence:
The Bayesian updating calculation has a particularly simple mathematical form when the prior probability is stated in terms of odds.
\[\text{posterior odds} = \text{prior odds} \times \text{likelihood ratio} .\]
For the character who claims that the prior odds of Doom is zero, the posterior odds must always remain zero regardless of the likelihood ratio. In the movies, such a character always ends up dying in a dramatic way, just punishment for his skepticism.
- For our protagonist, who started with a prior odds of one-in-a-million, the posterior odds after one such “strange” event will be what?
Our protagonist starts to encounter other “strange” reports, hearing about them on the radio, seeing them in news reports from diverse locations, and so on. For the skeptic, these are all of no import. It is impossible to move the skeptic off his prior odds of zero. But our protagonist takes them seriously. Or, at least, he is willing to fold them in to a Bayesian calculation.
- After five strange events (in total), what should be our protagonist’s posterior odds? (Pick the closest answer.)
- Another five strange events are observed, making 10 in all. What should be our protagonists posterior odds?
Amazingly, we have all been exposed to series of strange events through web “doomscrolling.” We Bayesian thinkers might deal with the deluge by adding new hypotheses to consider. Not just Normal and Doom, but also Clickbait/AI.
Exercise 10. 5 Pick up on Exercise 9.1 to estimate the posterior probability that the skeleton was Richard III.
1 Activities
Exercise 10. 6 The Federalist Papers is a collection of 85 essays written weekly from the end of 1787 through 1788. Authored by James Madison, Alexander Hamilton, and John Jay, the essays presented a case in favor of the new Constitution proposed to replace the Articles of Confederation originally adopted during the revolutionary war.
The essays were all published under a pseudonym, Publius. In later years, however, Madison, Hamilton, and Jay claimed the essays they wrote. Sometimes the claims overlapped because a essay was authored jointly. Other times, the overlaps might the result of a mistake or due to editing by another author.
In the early 1900s, historians tried to identify the true author of the disputed papers. This became the subject of a famous project in the 1950s by renowned statistician Fred Mosteller and co-workers.
Computers having improved considerably since the 1950s, I undertook a few hour project to reproduce Mosteller’s work.
Since the different authors covered different domains of the Constitution, Mosteller and his team looked for non-contextural words used frequently by one author but not so often by the others. Such words include “upon,” “whilst,” and “while.” We might call these “discriminating words” since they discriminate (in part) between which author wrote each essay.
Looking at the non-disputed papers, I calculated separately for each author the probability that a randomly selected word would match “upon,” “whilst,” or “while.” In essence, that probability is a likelihood. For instance, under the hypothesis that Madison wrote an essay, the word “whilst” has a probability of 0.000294.
The table gives, for the disputed essays, those words that appeared and the likelihood of the word under each of the two hypotheses, Hamilton vs Madison.
| word | essay | count | favors | HAMILTON | MADISON | ratio |
|---|---|---|---|---|---|---|
| whilst | XLIX | 1 | madison | 0.00000875 | 0.000294 | |
| upon | L | 1 | hamilton | 0.00331 | 0.000171 | |
| whilst | LI | 2 | madison | 0.00000875 | 0.000294 | |
| whilst | LIII | 1 | madison | 0.00000875 | 0.000294 | |
| upon | LIV | 2 | hamilton | 0.00331 | 0.000171 | |
| whilst | LVI | 1 | madison | 0.00000875 | 0.000294 | |
| whilst | LVII | 3 | madison | 0.00000875 | 0.000294 | |
| whilst | LXIII | 1 | madison | 0.00000875 | 0.000294 |
- For each of the disputed essays calculate the likelihood ratio and translate it into the verbal scale for strength of evidence.
Although the table attributes two of the essays to Hamilton, according to the internet consensus among historians is that Madison wrote all of the disputed essays, although Hamilton might have had a hand in editing them.
- Calculate the likelihood ratio across all of the disputed papers by multiplying the individual ratios together. Give the number and translate this into the verbal equivalent for strength of evidence.
- Since Hamilton wrote roughly twice as many essays as Madison, a reasonable prior probability for each essay is 2 to 1 in favor of Hamilton. Use this to calculate the prior probability for Madison having written all of the essays. Then, convert that prior probability into odds form and multiply it by the likelihood ratio from (2) to get the posterior odds that Madison wrote all the disputed essays. Finally, convert the posterior odds into a posterior probability that Madison wrote all the disputed essays.
Exercise 10. 7 A NON-BAYES problem. Calculate the number of accidents per million miles in the Crash_data record. [[It comes to 5.1 per million miles]] One of the two (fictional) activists in the text claimed, before the experiment started, that the accident rate for self-driving cars would be 1 per million miles, the other held for 10 million miles. Are the actual data compatible with either or both of these two priors? Could the differences between either of the priors and the data be just due to chance. What kind of analysis would let you make an authoritative statement.
Exercise 10. 8 Calculate the posterior if the prior chooses to include all seven successes from the previous tests, but only two of the failures. [[Need to give them the software.]]
Exercise 10. 9 The figure is a version of focusing on the domain near the peak of the likelilhood function.

Zooming in near the peak of the likelihood function from Figure 10.3(b) which was based on data from 10 cars which drove 785,000 in total.
The peak magnitude for the likelihood function is about 1.1. As described in the text, the approximate confidence interval can be read off from the locations of the hypotheses where the magnitude likelihood is down by 2. Here, that will be where the likelihood function crosses the value 1.1 - 2 = -0.9.
Perhaps it’s intuitive that increasing the amount of data would decrease uncertainty in the result, that is, narrow the confidence interval. Researchers often collect some preliminary data and then estimate how much more data would be needed make the confidence interval acceptably narrow.
This estimate is based on imagining that the new data would look very much like the preliminary data. That’s not necessarily true, which is why we call it an “estimate.”
Suppose the researches double the amount of data. So instead of the 785,000 miles of experience used to construct the likelihood function shown in ?@fig-image-for-more-data, we will have about 1,500,000 miles. We’re imagining that the new 750,000 of data will have the same likelihood function as the first 785,000 miles of data. Since the likelihood is the product of the individual likelihoods, the likelihood for the full 1,500,000 mile data set will be the the square of the 785,000-mile likelihood.
You may remember from your earlier school work that the magnitude of the square of \(L\) (that is, the magnitude of \(L^2\)) is double the magnitude of \(L\) itself. The likelihood function for the doubled data set will look just like ?@fig-image-for-more-data, but with the numbers on the vertical axis doubled.
Draw a sketch of ?@fig-image-for-more-data, but double each of the numbers on the vertical axis. Then, find the confidence interval for the doubled data. (Look where the likelihood is 2 less than the peak.)
That’s still a pretty broad interval. Perhaps we want to double the amount of data again, to 3,000,000 miles (40 cars). What would the confidence interval be then?
Exercise 10. 10 Here is an image generated by the Gemini AI that is intended to represent Bayesian inference. It was created in response to a prompt constructed by one of the instructors for this course.

- Do the best you can to explain what about the image makes sense in terms of Bayesian reasoning. What does the tugboat represent? What about the barge? And the red-and-white buoys?
- Now imagine a detail not included in the prompt for the image. There should be a crewman at the rail of the tugboat with a boat-hook. The crewman will be pulling up each data buoy as the tug passes by. Does this detail fit in with the idea that the picture is about Bayesian inference. Explain why.
- Now a hard one. Does the lighthouse in the picture make any Bayesian sense?