<- function(D) {
post Lno(85300, D) * Lno(65200, D) *
Laccident(13495, D) * Lno(131200, D) *
#similarly for each of the remaining cars ...
prior(D)
}
Block 6 Models
Safety of self-driving cars
One of the goals for self-driving cars is to reduce road accidents, especially fatal accidents. People are understandably skeptical that an automated system can cope with all the varying conditions of traffic, visibility, road damage, etc. without the benefit of human judgment or experience. For this reason, society will need to accumulate substantial evidence for enhanced safety in self-driving cars before accepting any claims to that effect.
This project is about how to accumulate such evidence.
Based on experience with tens of millions of regular cars driving hundreds of billions of total miles, suppose we decide the accident probability is approximately 10% per 20,000 miles. Note that this is not stating that an accident is certain to occur in the first 200,000 miles of driving. The probability that, for a representative car, an accident occurs at
Take note that we have parameterized the exponential with
Part A. Estimate the
Recognize that “at or before” corresponds to a cumulative probability function which in this case will be
Construct the cumulative probability density in symbolic form. It will be a function of both
and , let’s call it . (Hint: must be zero. will have a roughly sigmoid increase with . .)Set
, per our assumption for ordinary cars of a 10% risk in miles of driving. Solve this to find .
Show your work.
To start the calculations, we will need a prior relative density function for
Part B: Implement the function
- Graph out
on the domain 2,000 to 2,000,000, that is, roughly to . - Referring to your graph, write down your intuition about whether this prior seems to favor
or not. - Now remember that
shows relative density. So compare the total probability that to the probability of Does this indicate that the prior is biased toward the assumption that self-driving cars will be no safer than ordinary cars? Explain your reasoning.
Now imagine that you work for a safety organization collecting accident information. To figure out the safety of self-driving cars, you are monitoring a fleet of 100 self-driving cars. Each year you get a report giving the odometer reading of the car, or, if the car has been in an accident that year, the odometer reading at the time of the accident. The data might look like the following table. (The table is entirely fictitious and shouldn’t be misinterpreted as representing real-world self-driving cars.):
car |
Status | Mileage |
---|---|---|
1 | on road | 85,300 |
2 | on road | 65,200 |
3 | accident | 13,495 |
4 | on road | 131,200 |
5 | on road | 96,000 |
6 | accident | 54,682 |
7 | accident | 105,200 |
8 | on road | 53,900 |
9 | accident | 86,000 |
10 | on road | 94,300 |
100 | on road | 107,200 |
The first two cars in the fleet have accumulated 85,300 and 65,200 accident-free miles respectively. The third car was in an accident at 13,495 miles and is no longer on the road.
In response to the data, you issue a yearly report in the form of a posterior distribution on
To update the original prior into a posterior, you need to construct the likelihood functions. There are two functions, because there are two different kinds of observations on each car:
- If the car was in an accident, then you want the likelihood of
given the mileage at which the accident happened. Since the probability model is , the likelihood function for a car that had an accident at miles will be . - If the car has not been in an accident, then you want the likelihood of
given the number of miles traveled.
The likelihood function (b) is based on the probability model that the car has not had an accident in
Part C. Implement the two likelihood functions in R as Laccident(m, D)
and Lno(m D)
.
Plot out the two functions and explain why their shape makes sense. Show the code that implements the two functions.
Now you are in a position to update the prior to create the posterior probability density on
Part D. To keep things simple, we will use just the first 10 cars in the fleet report. The unnormalized posterior function will be:
Based on the data from the first 10 cars, do you think that the self-driving cars are safer than the ordinary cars? Recall that