# 14  Probability models

Simulations of the sort introduced in Lesson sec-simulations are often a combination of random number generators and ordinary, deterministic arithmetic operations. For example, the height simulation from sec-creating-dependents (reproduced in the margin, for convenience) involves three variables. The third variable, child is made up of two parts: (mom+dad)/2, a simple deterministic calculations involving the first two variables; and a source of random numbers rnorm(n,mean=0,sd=sqrt(5.2)).

Height_sim <- datasim_make(
mom <- rnorm(n, mean=64.1,
sd=sqrt(5.32)),
sd=sqrt(6.10)),
child <- (mom + dad)/2 +
rnorm(n, mean=0, sd=sqrt(5.2))
)

The height simulation introduced in sec-creating-dependents.

The random number generators (RNGs) used in simulations come from a large set of “parametric probability models” that has been invented and refined over more than a century. Each of the parametric probability models has two major characteristics:

1. A family name that specifies the “shape” of the distribution of random values generated by the corresponding RNG. For instance, the “normal distribution,” corresponding to the rnorm() RNG, has the distinctive single-humped, symmetrical shape seen in Figure fig-half-violins. Other family names, discussed in more detail later in this Lesson, include the “uniform distribution,” the “exponential distribution,” the “bernoulli distribution,” and the “poisson distribution.”

2. One or two “parameters” that specify the exact shape of the distribution. For instance, rnorm() has two parameters, mean= and sd=.

## Normal distribution

The “normal” distribution has a familiar “bell” shape: peaked and relatively flat in the middle, then falling steeply, then trailing off gently, as in Figure fig-normal-example3.

The parameter mean is the location (along the vertical axis) of the peak of the distribution. The parameter sd is shown by the distance between baseline and peak, which you can measure using the red rulers. The greater the sd, the more spread out the distribution and, consequently, the lower the density near the peak.

In Figure fig-normal-example3, three different normal distributions are shown:

1. mean=50, sd=5. This has the tightest and most highly peaked density.
2. mean=0, sd=10. Only half as peaky in density as (i).
3. mean=25, sd=20. Only one-quarter as peaky in density as (i). ::: {.callout-note} ## “Particularism” in politics

“Moral foundations theory” holds that people’s political views are shaped by a few philosophical principals, such as the spectrum of attitudes toward fairness or acceptance of authority. A 2023 news story reported on an empirical study by political scientists of one such attitude spectrum, “particularism” versus “universality.” As described by the story, “The particularism-universalism axis tracks how much people favour those close to them, such as family or neighbours, over those with whom their ties are weaker.”

“What drives people to vote the way they do?” (2023) The Economist Aug. 17. link

To assess the extent to which the two major US political parties are spread out on this spectrum, the scientists examined the text of speeches given in Congress, scoring each according to particularism/universality. The results are shown in Figure fig-particularism1. Figure 14.2: Scores of congressional speeches on the particularism/universality axis, stratified by political party, as reported by The Economist.

Each of the curves in Figure fig-particularism1 shows a distribution of the score assigned to each of many speeches. In such a presentation, it’s easy to forget the individual speeches, which are not shown. (For this reason, we prefer presentations such as in Figure fig-normal-example3 that show the literal density of data along with the half-violin curve summary.)

It’s easy to see from Figure fig-particularism1 that the distributions are approximately normal. Thinking of the distribution as a hill, the mean parameter corresponds to the location of the peak, the standard deviation parameter can be estimated as the half-width of the distribution at a height about 60% of the way up the hill. Even though neither of the axes has a numerical scale, you can estimate the standard deviation with a ruler. Place the ruler horizontally, sliding it upward until it’s about 60% up the hill. Then measure the width of the hill at this elevation and divide by two to get the standard deviation. The result will depend on the units marked on your ruler, perhaps inches or centimeters.

The standard deviations measured from the Democratic and Republican curves are very similar. Since the standard deviations are similar, so are the variances. Consequently, the amount of variation in the particularism of Democratic speeches is about the same as in Republican speeches.

Recall that variance is the square of the standard deviation.

Using the same ruler, measure the distance between the Republican and Democratic peaks, that is, the mean value of particularism for each party. The difference in means is roughly two-thirds of the standard deviation.

According to the news report, “The authors found that Republican lawmakers typically used far more particularist language than Democratic ones did.” The qualitative statement “far more” is a weasel word. In fact, the same difference in particularism between the parties could also reasonably be quantified as “just a little,” so long as the data are displayed and not just the density summary.

Much better to use a quantitative measure of “how much more.” By using the normal distribution as a model, we have easily quantified “how much” using the difference in means measured on a scale set by the amount of variation. The result is two-thirds. But should we regard this as big or small?

Figure fig-particularism2 shows data with the same ratio of difference in means to standard deviation. In that format, the difference in particularism between the parties does not seem so large, even though the curves are related in the same way. Figure 14.3: Simulated distributions of particularism presented in the style of Figure fig-normal-example3.

Using a probability distribution to summarize the data enables us to construct measures of “far more” or “just a little” that aren’t influenced by a graphical format creating a visual impression. For example, we know that the means differ by 0.66 (two-thirds) of a standard deviation. Let’s see how often two speeches by members of the same party will differ by this amount. A quick simulation gives the answer:

One_party_sim <- datasim_make(
speech1 <- rnorm(n, mean = 0, sd = 1),
speech2 <- rnorm(n, mean = 0, sd = 1)
)

Simdata <- datasim_run(One_party_sim, n=10000)
Simdata |>
mutate(big = abs(speech1 - speech2) > 2) |>
summarize(mean(big))
  mean(big)
----------
0.1512

Two speeches from the same party will differ by two standard deviation of particularism about 15% of the time.

Now compare speeches from the different parties, where the means are separated by two-thirds of a standard deviation. How often will the Republican speech have a higher particularism by two standard deviations?

Two_parties_sim <- datasim_make(
democrat <- rnorm(n, mean = 0, sd = 1),
republican <- rnorm(n, mean = 0.66, sd = 1)
)

Simdata <- datasim_run(Two_parties_sim, n=10000)
Simdata |>
mutate(big = (republican - democrat) > 2) |>
summarize(mean(big))
  mean(big)
----------
0.1705

Any two speeches from different parties will differ by two standard deviation only about 17% of the time.

:::

SAY THAT the normal is the go-to parametric probability distribution for pure random noise: mean zero, sd measures the amplitude of the noise.

## Exponential distribution

The exponential distribution is commonly used to model the time until a random-seeming event. As a concrete example, consider a used-book store and narrow things down to a specific book, say, The Hinge of Fate, a history published in 1950. Each week, customers with varied interests browse the history section of the store. Most customers have little or no interest in the obscurely titled Hinge. But there is some probability that, on any given week, a customer—perhaps recognizing the famous author’s name—will decide to buy the book.

The exponential distribution is based on an idealization that the weekly probability of the Hinge being bought remains the same from week to week. Given this assumption, the exponential distribution is the appropriate model of how long the book will sit on the shelf until it’s sold.

The “attractiveness” of a book is subjective. But imagine lumping books together according to genre, condition, and price. Hinge is a World War II history. Suppose the price is $5 and the book is in “good” condition. Now consider all books in the class “$5 World War II histories in good condition.” If there are, say, 100 books in this class in the store, we can estimate the median duration on the shelf by noting when the 50th book was sold.

Part of “knowing the used-book business” is having a good system for stratifying books based on genre, condition, price, etc. Such a system lets the store owner decide from the outset whether it’s likely to be profitable to stock a book.

Obviously, books differ in their attractiveness to buyers. Attractive books are likely to be sold sooner than unattractive ones. The exponential distribution accommodates such diversity by having a parameter: the mean time until sale. This parameter does not dictate when the book will be sold, but it allows us to incorporate the book’s attractiveness into the model.

For readers not in the used-book business, more compelling examples of using the exponential distribution are earthquakes and large storms. You may have heard, for instance, of a big storm being categorized as a “100-year storm.” Intuitively, and wrongly, many people expect such storms to occur once every 100 years, so that if a storm has just occurred the next one will happen in about 100 years. It’s more meaningful to define a 100-year storm as a storm of such large magnitude that the probability, in any one year, is 1/100. m-seeming events, for instance the time between earthquakes or the time interval between the rare, large storms that are described as “100-year storms.”

In my experience, asking students to draw the probability distribution for the time until the next 100-year storm is likely to elicit the question, “How long ago was the last 100-year storm?” Students can find it hard to accept my answer: “It doesn’t matter.”

The reality of the exponential distribution is utterly counter-intuitive. Figure fig-exp-example3 shows three examples of exponential distributions, one each for storms occuring on average every 25 years, 50 years, and 100 years respectively. (The red rulers have been positioned vertically at the average for each distribution.)

Note in draft: This figure should be redrawn using “ribbon” so that the peak at year zero is evident. Using violin leads to a gentle fall-off of density when approaching zero from above. Figure 14.4: Three exponential distributions with different average times between events.

What’s surprising about the exponential distribution is that the peak does not occur at the average value. Instead, the most likely intervals between storms are very short. And with very small probability, the intervals might be several times the mean. Indeed, the mathematical form of the probability distribution is exponential decay, thus the name “exponential” distribution.

Exponential parameters: rate, mean, standard deviation

CHANGE THIS TO A simulation: generate 100-year storm intervals, add them up to get the total time covered, and calculate the average time per year.

storm_sim <- datasim_make(
t <- rexp(n, rate = 1/100)
)
datasim_run(storm_sim, n=1000) |>
summarize(timespan = sum(t), nstorms=n(), mean(t), var(t)) |>
mutate(storms_per_year = nstorms/timespan)
 timespan   nstorms    mean(t)     var(t)   storms_per_year
---------  --------  ---------  ---------  ----------------
96218.94      1000   96.21894   8758.633          0.010393

Recall that the parameters for the normal probability distribution are the “mean” and “standard” deviation. In contrast, R uses “rate” as the parameter for the exponential distribution. “Rate,” as the name suggests, refers to the average rate (e.g. storms-per-year). For instance, the rate for a 100-year storm is 0.01 per year, the rate for a 10-year storm is 0.1 per year, and the rate for a 1-year storm is 1 per year.

The RNG for an exponential distribution is rexp(). For example:

Exp_sim <- datasim_make(
x <- rexp(n, rate=0.01),
y <- rexp(n, rate=0.1),
z <- rexp(n, rate=1)
)

RUN THIS AND SHOW THE MEAN INTERVAL BETWEEN STORMS and the standard deviation. COMPARE THEM TO THE “rate”. Explain why we’ve put them in separate columns.

## Poisson distribution

Another way to look at the timing (or spacing) between random events is to count them. For instance, sit by a quiet country road and count the number of cars that pass by in a 10-minute period. The count must always be an integer: 0, 1, 2, $$\ldots$$, cars. Repeat the observation on the next day, and the day after that, and so on. The counts will be similar from day to day but will vary somewhat. That variation is, to the observer, random.

Suppose you count 87 cars in 10-day’s observations: a hundred minutes in total. It’s fair to say that the rate at which the cars pass is 87 cars/100 minutes = 0.87 cars per minute or 8.7 cars per 10 minutes.

This rate does not need to be an integer; there will never be a 10-minute interval in which 8.7 cars pass by. Instead, you might see 6 or 7 or 10 or 13 cars in any one 10-minute interval. The “Poisson distribution” assigns a probability to each possible outcome. The Poisson distribution has a single parameter, the rate.

Named after Baron Siméon Denis Poisson (1781-1840)

?(caption)

 count   probability
------  ------------
0     0.0001666
1     0.0014490
2     0.0063040
3     0.0182800
4     0.0397700
5     0.0691900
6     0.1003000
7     0.1247000
8     0.1356000
9     0.1311000
10     0.1140000
11     0.0902000
12     0.0653900
13     0.0437600
14     0.0272000
15     0.0157700
16     0.0085770
17     0.0043890
18     0.0021220
19     0.0009714
20     0.0004226
21     0.0001751
22     0.0000692
23     0.0000262
24     0.0000095
25     0.0000033

Table tbl-poisson-87 shows the Poisson distribution for a rate of 8.7. The most likely outcomes are 8 or 9: about what you would expect when the rate is 8.7. But outcomes that are 5, 6, 7, 10, 11, 12 are not much less likely. Awareness of the Poisson distribution is helpful when interpreting data. For instance, suppose that on one day’s 10-minute observation you counted 13 cars. Does this mean that usage of the road has changed from previous days? Looking at the probability from the Poisson distribution with a rate of 8.7 you can see that 13 is a plausible, if uncommon, outcome.

Like all the probability distributions we will work with, the Poisson distribution has a mean and a variance. It happens that these are numerically equal to the rate parameter.

Example: Making use of the Poisson distribution

Providing medical supplies to remote clinics is a challenge. The clinic needs to have enough on hand to deal with emergencies, but the shelf life of the supplies—especially blood products—is limited.

Consider the situation of a clinic where the average rate of use of transfusion blood is 0.3 unit per day. Suppose roads are poor and deliveries are unreliable, coming perhaps once per week. The average demand is 2.1 units per week, so how much blood should be delivered each week? The Poisson distribution with a rate parameter of 2.1 units per week provides insight. As Table tbl-poisson-21 shows, ordering 3 units per week is risky: in any one week there is about a 15% chance of needing more than that. The order would have to be 6 units per week to keep the probability of running out below 1% each week.

On the other hand, ordering 6 units per week means that, on average, 3.9 units will go to waste each week.

?(caption)

 count   probability
------  ------------
0      1.22e-01
1      2.57e-01
2      2.70e-01
3      1.89e-01
4      9.92e-02
5      4.17e-02
6      1.46e-02
7      4.38e-03
8      1.15e-03
9      2.68e-04
10      5.63e-05 Figure 14.5: An airborne drone delivering (by parachute) two units of whole blood for emergency transfusion in a remote, rural clinic in Rwanda. (Link to video)

There are advantages to a centralized storage depot that can quickly deliver supplies as needed to each of many clinics. One such storage/rapid-delivery system is depicted in Figure fig-drone-delivery, where drones are used to parachute supplies down to where they are needed. To see the advantages, consider a system such as that in Figure fig-drone-delivery which serves 360 remote clinics.

For the whole set of 360 clinics the demand for blood in any one week will be 2.1 units per week times 360, or 756 units per week. The clinic needs to have enough blood on hand to meet this demand, including rare, extreme fluctuations due to chance. Figure fig-300-clinics shows the Poisson distribution for a rate of 108 per day.

Code
tibble(count=600:900, prob=dpois(count, 756)) |>
gf_point(prob ~ count) Figure 14.6: The probability of depand per week for the 360-clinics in the system.

If the system keeps 840 units of blood in stock at the start of each week, demand will exceed supply only one week in 5 years. The average waste will be 840-756 = 84 units per week. This is not negligible, but far lower than the total average waste over 360 clinics: $$3.9 \times 360 \approx 1400$$ units per week. So the centralized storage and delivery provides much greater reliability and much less waste!

## Bernoulli distribution

Can be rendered either as 0-1 or as a pair of labels.

1. Start by looking at the features of the normal distribution. Use a point-violin graph out child, mother, and father to point out how the distribution for children is shifted and (slightly) broader for the children.

2. Introduce the quantile() function to find the central 95% of each distribution, and show how this corresponds to plus-or-minus 2 standard deviations.

1. Exercise: Show that rnorm(n, mean=10, sd=sqrt(50)) gives the same distribution as 10 + rnorm(n, mean=0, sd=sqrt(50)). Similar stuff for scalar addition (changes mean but not sd) and scalar multiplication (changes both mean and sd, unless the mean is zero.
2. Text and exercise: show what happens to the variance when taking the sum of values from two normal distributions. Link this to the variance results seen in Height_sim in the previous Lesson.
3. Introduce the uniform, exponential, and poisson distributions and their parameters.

1. Text and exercise: Show that the mean and variance have exactly the same properties with respect to scalar addition, scalar multiplication, as well as vector addition (but not necessarily multiplication).
4. Dependence: the above rules don’t apply for vector addition (for the variance) when the variables being added are dependent.

5. Simulations are independent row-by-row but not variable-by-variable. Mention time series.

Independent trials

Each trial will be one row of the data frame. By running many trials, we can use statistical methods to study the resulting patterns.

Always have in mind that we are constructing n trials.

sample(1:6, size=10, replace=TRUE)
  4 6 1 4 4 5 1 5 1 3
1. Error propagation (optional)

2. More named distributions (optional)

3. Likelihood (optional)

## Exercises

Exercise 13.1 rabbit-storm-doll

Show 4 normal distributions, kinda like Figure fig-normal-example3, but with the plot oriented horizontally. Ask which has the largest and which the smallest sd. Which has the larges and which the smallest mean?

Similar for exponential (not vertically jittered). Measure the mean time between events.

Similar for bernoulli. As them to estimate p from the (jittered) density plot.

Similar for poisson (jittered).

And similar for uniform (not vertically jittered)

Exercise 13.2 fish-storm-doll

For a 100-year storm, what’s the probability of an inter-storm interval of five years or less? (Do with rexp().) If the distribution were normal, with the same mean and variance as the exponential distribution corresponding to 100-year storms, what would be the probability of two successive 100-year storms having an interval of five years (or less).

Exercise 13.3 fish-storm-doll

Repeat some of the problems but using dnorm (for relative probabilities) and pnorm (for absolute probabilities).

Exercise 13.4 fish-rain-doll

Set up the exponential race between the parking violators and the police issuing the ticket. Create a simulation.

Exercise 13.5 whale-storm-doll

Use a simulation to estimate the mean and variance of the outcomes from an Exponential and from a Poisson distribution.

Exercise 13.6 fish-storm-car

Consider a drone delivery service that sends shipments at an average rate of 10 per hour and where each mission takes an average of 2 hours (until the drone is ready for re-use). How many drones should the service keep in working inventory so that the chance of running out of drones is less than 0.01% in a single two-hour shift?

Exercise 13.7 fish-cloud-doll

DRAFT: Use the 108 units per day demand to find how often supply will exceed demand for a stock of 120 units, 130, and 140. What is the average waste per day. Then translate this to a waste per week. Compare this to the weekly waste for 360 clinics wasting an average of 3

Perhaps exercise fish-dive-plant as an example.

## OLD STUFF

Note in draft/transition

This Lesson has to be oriented more toward simulation and only peripherally toward *causation**.

Also, include error propagation.

Carl Wieman is a Nobel-prize-winning physicist and professor of education at Stanford University. Weiman writes, “For many years, I had two parallel research programs: one blasting atoms with lasers to see how they’d behave, and another studying how people learn.” Some of Wieman’s work on learning deals with the nature of “expertise.” He points out that experts have ways to monitor their own thinking and learning; they have a body of knowledge relevant to checking their own understanding.

This lesson presents you with two tools: simulation and repetition. Simulation enables you to generate variation with known properties, where you know the sources of variation and how variables are connected. Then you can experiment to find out how to use statistical models to reveal the underlying properties.

The second tool, repetition, helps you deal with randomness and quantify precision. By repeating the same simulation many times while introducing basic random fluctuations, you can figure out the extent to which randomness can be tamed.

Starting in Lesson sec-accounting-for-variation we have been working with regression models that relate a response variable to one or more explanatory variables. The regression model splits the variation in the response into two parts: the variation of the model values and the residual variation.

The residuals are generally treated as pure noise, with no relationship whatsoever to the explanatory variables. In this Lesson, we will look at some commonly used models for pure noise: probability models. Probability models don’t seek an explanation of specimen-by-specimen variation. Instead, they merely describe which noise values are more common and which less common.

A simple illustration using the Galton data frame will suffice to show what we mean by “more and less common.” As we’ve done before, we’ll build a model of (full-grown) childs height as a function of sex, mother’s height and father’s height, but instead of focusing attention on the model values, we’ll look directly at the residuals. (Figure fig-galton-resids)

Resids <- Galton |>
model_train(height ~ sex + mother + father) |>
model_eval() |>
select(sex, .resid) 
Using training data as input to model_eval().
Resids |> tilde_graph(.resid ~ sex, alpha=0.3, annot="violin")
Warning in tilde_graph(Resids, .resid ~ sex, alpha = 0.3, annot = "violin"): x-axis variable is numerical, so only one violin drawn for all rows.
Perhaps you want to use ntiles() or factor() on that variable? Figure 14.7: Residuals from the model height ~ sex + mother + father trained on the Galton data.

A residual of zero means that the child’s actual height is an exact match for the model value for that child. In contrast, a positive residual means the child is taller than his or her model value; a negative residual means the child is shorter than their model value. Residual values larger than 5 inches or smaller than -5 inches are very uncommon. On the other hand, residuals less than 3 inches or greater than -3 inches are very common.

A probability model for the height residuals assigns a relative probability for each and every possible outcome. Figure fig-galton-prob-model shows one reasonable probability model (in red) for the Galton residuals.

Instructors should note that we are using the term “relative probability” as a unifying term covering both “probability” and “probability density”.

Instructors may find more familiar the mathematical practice of graphing the relative probability as a function output on the vertical axis and the function input on the horizontal axis. The data graphics frame convention and the mathematical practice are, unfortunately, inconsistent. These Lessons prefer the data graphics frame layout.  Figure 14.8: A “normal” probability model for the Galton residuals. The word “normal” refers to the smooth bell-shaped curve drawn in red.

## Three basic probability models

Models for unbounded phenomena

1. Normal–residuals, parameter: deviation
2. Exponential–random intervals: one-over-mean (= rate)
3. Poisson–random counts: mean count

Why earthquake intervals aren’t normal

Operations:

1. RNG
2. relative probability

1. Uniform
2. Beta
3. Binomial

## Back to the simulation

Now that we know some of the facts of adding variances, we can return to the height simulation to try to make it accord better with the patterns seen in the Galton data.

Instructor note: DRAFT DRAFT Of historical importance: disproving Larmarkian inheritance. How could people have thought this given the experience with circumcision over 200 generations?

Add the gene to the simulation

Gene_sim <- datasim_make(
momgene <- rnorm(n, mean=64.1, sd=sqrt(2.7)),
mom <- momgene + rnorm(n, sd=sqrt(2.6)),
childgene <- (momgene + dadgene)/2 + rnorm(n, sd=sqrt(1.25)),
child <- childgene + rnorm(n, sd=sqrt(2.6))
)
Foo2 <- datasim_run(Gene_sim, n=100000)
Foo2 |> model_train(child ~ dad) |> conf_interval()
term                 .lwr        .coef         .upr
------------  -----------  -----------  -----------
(Intercept)    49.0879817   49.5044365   49.9208913
dad             0.2416679    0.2476836    0.2536992
Foo2 |> model_train(child ~ dad + mom) |> conf_interval()
term                 .lwr        .coef         .upr
------------  -----------  -----------  -----------
(Intercept)    32.7034752   33.2460896   33.7887039
mom             0.2484872    0.2541949    0.2599026
Foo2 |> model_train(dad ~ child) |> conf_interval()
term                .lwr        .coef         .upr
------------  ----------  -----------  -----------
(Intercept)    52.340699   52.7404878   53.1402770
child           0.240862    0.2468575    0.2528531
Foo2 |> model_train(dad ~ child + mom) |> conf_interval()
term                 .lwr        .coef         .upr
------------  -----------  -----------  -----------
(Intercept)    55.2887424   55.7789002   56.2690579
child           0.2577325    0.2639253    0.2701181
mom            -0.0712420   -0.0651363   -0.0590306
Foo2 |>
summarize(sd(child), sd(childgene),
sd(mom), sd(momgene))
 sd(child)   sd(childgene)    sd(dad)   sd(dadgene)    sd(mom)   sd(momgene)
----------  --------------  ---------  ------------  ---------  ------------
2.264516         1.59152   2.260737      1.583166   2.296827      1.636654

In Galton data

Foo3 <- Galton |>
mutate(child = ifelse(sex=="F", 1.08*height, height))
lm(child ~ father + mother, data=Foo3) |> conf_interval()
term                 .lwr        .coef         .upr
------------  -----------  -----------  -----------
(Intercept)    13.1445960   18.7015436   24.2584912
father          0.3636723    0.4228831    0.4820940
mother          0.2682681    0.3316683    0.3950685
lm(father ~ child + mother, data=Foo3) |> conf_interval()
term                 .lwr        .coef         .upr
------------  -----------  -----------  -----------
(Intercept)    39.7644128   44.6646935   49.5649742
child           0.3660707    0.4256721    0.4852734
mother         -0.1435670   -0.0765038   -0.0094407
lm(father ~ mother, data=Foo3) |> conf_interval()
term                .lwr        .coef         .upr
------------  ----------  -----------  -----------
(Intercept)    59.688333   64.1780869   68.6678402
mother          0.008862    0.0788767    0.1488913
Foo3 |>
summarize(sd(child), sd(father), sd(mother))
 sd(child)   sd(father)   sd(mother)
----------  -----------  -----------
2.595853     2.470256     2.307025

SIMPLER: parent -> child. and parent <- gene -> child

Parent_sim <- datasim_make(
parent <- rnorm(n, mean=69, sd=sqrt(5.1)),
child <- parent + rnorm(n, 0, sd=sqrt(5.1))
)
Parent_gene_sim <- datasim_make(
gene <- rnorm(n, mean=69, sd=sqrt(2.6)),
parent <- gene + rnorm(n, sd=sqrt(2.5)),
child <- gene + rnorm(n,  sd=sqrt(2.5))
)

Goo1 <- datasim_run(Parent_sim, n=100000)
Goo2 <- datasim_run(Parent_gene_sim, n=100000)
lm(child ~ parent, data = Goo1) |> conf_interval()
term                 .lwr       .coef        .upr
------------  -----------  ----------  ----------
(Intercept)    -0.6080272   -0.179597   0.2488333
parent          0.9964192    1.002625   1.0088304
lm(child ~ parent, data = Goo2) |> conf_interval()
term                 .lwr        .coef         .upr
------------  -----------  -----------  -----------
(Intercept)    33.5630446   33.9309645   34.2988844
parent          0.5029448    0.5082737    0.5136027
Goo1 |> summarize(sd(child), sd(parent))
 sd(child)   sd(parent)
----------  -----------
3.198536     2.257375
Goo2 |> summarize(sd(child), sd(parent), sd(gene))
 sd(child)   sd(parent)   sd(gene)
----------  -----------  ---------
2.256735     2.259507   1.612266

YOU WERE HERE. MOVE ON TO THE NEXT CHAPTER and i. introduce rnorm() with a description of the normal distribution. ii. give the parents some children and compare regression results to Galton iii. Introduce the gene component to the simulation.