= sample(dag02, size=25)
dag02sample
%>%
dag02sampleselect(y)
# A tibble: 25 × 1
y
<dbl>
1 2.29
2 8.90
3 6.56
4 1.29
5 6.97
6 8.76
7 8.23
8 0.759
9 5.18
10 10.2
# … with 15 more rows
Jane Doe
22.1 Describe the logical origin of sampling variation as the variation between multiple samples from the same source.
22.2 Recognize the several formats in which we describe sampling variation—sampling variance, standard error, margin of error, confidence interval—and show how they are related.
22.3 Using repeated sampling trials, observe how sampling variance scales with sample size \(n\).
Using dag02
, obtain a sample of size 25 and show the values of y
.
# A tibble: 25 × 1
y
<dbl>
1 2.29
2 8.90
3 6.56
4 1.29
5 6.97
6 8.76
7 8.23
8 0.759
9 5.18
10 10.2
# … with 15 more rows
Compute the mean those 25 values of y
in two different, but entirely equivalent ways. (1) Use data wrangling. (2) Construct a model y ~ 1
report the intercept coefficient. Show that these give the same answer.
Create a new chunk that repeats the generation of a sample from dag02
the the two methods for calculating the mean of the y
values. Run the new chunk and observe that the calculated value of the mean differs somewhat from that you found in Part 1. Repeat running the chunk over and over again; the mean value will differ each time.
Task 2.1. Each time you run the chunk, you are performing a new sampling trial. Run a dozen or so trials, observing the calculated value of the mean of y
in order to get a sense for how much it varies from trial to trial. Then summarizing your observations by giving a rough interval for the range of the mean of y
across the trials.
We are going to automate the process of performing sampling trials so that we can run hundreds of them.
Using the do
operator, calculate the sampling variance for a set of trials from dag02
. The following code chunk shows how to run 500 trials, in each of which the mean of y
is calculated using the y ~ 1
method and reporting the intercept coefficient. These will be collected into a data frame named dag02trials25
.
Task 2.2. Run the chunk above to create dag02trials25
. Then use data wrangling commands to compute three summaries of the trials: i. The mean of the coefficient across the trials. ii. The variance of the coefficient across the trials. iii. The standard deviation of the coefficient across the trials.
Task 2.3. Repeat (2) with four different sample sizes (try 50, 100, 200, and 400). Fill in the table below. What do you notice about the standard error as sample size increases?
Sample size | Sampling variance | Standard error |
---|---|---|
n=25 | 0.439 | 0.663 |
n=50 | 0.253 | 0.503 |
n=100 | 0.129 | 0.359 |
n=200 | 0.059 | 0.243 |
n=400 | 0.032 | 0.178 |