QUIZ 7: (questions are in italics).

This quiz is about the concepts of the (empirical) sampling distribution and the confidence interval .

Part I: The (empirical) sampling distribution

As we have said, the notion of the (empirical) sampling distribution can pertain to anything one can compute from a sample, e.g., sample mean, sample median, sample minimum, sample maximum, and the sample percentile.  To address the last one, 

a) Write code to build the (empirical) sampling distribution of the 90th sample percentile (or 0.9 sample quantile) of the sample. Let the number of trials be 10,000, and the sample size be 100. For the population, use standard normal, i.e., use rnorm(), NOT sample(), to take samples.
Important: start your code with set.seed(1).

   set.seed(1)                     
   n.trial = 10000
   sample.size = 100
   sample.stat = numeric(n.trial)
   for (trial in 1:n.trial) {
   samp = rnorm(sample.size, 0, 1)                 # rnorm(), not sample()
   sample.stat[trial] = quantile(samp, prob=0.9)   # important part 
   }
   hist(sample.stat)                             # FYI, mean = 1.25, sd = 0.166

b) Write code to test the normality of the histogram in part a.

   qqnorm(sample.stat)    # Straight line implies normality.

Part II: Confidence Interval

In a hw you just turned in, you found/built the CI for the b parameter of the Unif(0,b) distribution. Let's confirm that the formula given in the soln (posted on the course website) has the correct coverage. To that end,

c) Write code to take 1000 samples of size 100 from Unif(0,2), and count/report the number of times the 95% CI for b covers b. Important, start your code with set.seed(1).
 
   set.seed(1)
   b = 2 
   n.trial = 1000
   sample.size = 100
   CI = matrix(nrow=n.trial, ncol=2)
     for (i in 1:n.trial) {
     x = runif(sample.size,0,b)
     lower = 2*mean(x)/(1+qnorm(.975)/sqrt(3*sample.size))
     upper = 2*mean(x)/(1-qnorm(.975)/sqrt(3*sample.size))
     CI[i,] = c(lower,upper)
     }
       cnt = 0
       for (i in 1:n.trial) {
       if (CI[i,1] <= b & CI[i,2] > b)   # unimportant where "=" is.
       cnt = cnt+1
       }
       cnt    # 948 . If all is good the count should be around 950.


Moral:
Part I: Many things have sampling distributions that are (at least, approximately) Normal; even something like the sample percentile turns out to be one of them.
Part II: One can always confirm that the formulas we derive for CIs have the correct coverage.