7.2: Problems 7: Variances and covariances of offspring counts

1. (a) A pink-flowered plant is crossed with a white-flowered plant. Consider just one offspring. Write X=1 if the offspring has pink flowers, X=0 if not: P(X=1)=P(X=0)=1/2. What are E(X) and V(X)?
(b) Now consider 20 offspring from the cross. Let Y be the number with pink flowers. What is the probability mass function (pmf) of Y? How could you use this to find E(Y) and V(Y), if you could do the sums?
(c) Instead, think of Y as the sum of the independent random variables Xi, where Xi=1 if offspring i has pink flowers, and 0 if not. Using this, find E(Y) and V(Y).

ALL the other problems here work this same way. Consider the counts of offspring of given types as the sum of the 20 independent 1/0 ("indicator") random variables, one for each offspring.

2. (a) Two plants with wrinkled seeds, but known to be genotype Ww, are crossed. Consider one offspring. Write X=1 if the offspring has wrinkled seeds (P(X=1)=3/4) and X=0 otherwise. What are E(X) and V(X)?
(b) For 20 offspring, let Y be the number with wrinkled seeds. Find E(Y) and V(Y).

3. (a) Two pink-flowered plants are crossed. Consider one offspring. Write X=1 if the offspring has red flowers, and X=0 otherwise (P(X=1)=1/4). Write Z=1 if the offspring has pink flowers, and Z=0 otherwise (P(Z=1)=1/2). Find cov(X,Z) and corr(X,Z).
(b) In a set of 20 offspring from this cross, let Y be the number with red flowers, and T the number with pink flowers. Find cov(Y,T) and corr(Y,T).
(c) There are Y with red flowers, and 20-Y with pink/white flowers. What is cov(Y,20-Y)? What is corr(Y,20-Y) ?

4. OK, let's tidy up some notation -- I've run out of letters. In the cross described in the second half of the previous section, consider first one offspring and write

X1 =1, if it has pink flowers, wrinkled seeds, P(X1=1)=0.4
X2 =1, if it has pink flowers, round seeds, P(X2=1)=0.1
X3 =1, if it has white flowers, wrinkled seeds, P(X3=1)=0.1
X4 =1, if it has white flowers, round seeds, P(X4=1)=0.4
In a set of 20 offspring, let
Y1 be the number with pink flowers and wrinkled seeds, and
Y2 be the number with pink flowers and round seeds, and
Y3 be the number with white flowers and wrinkled seeds, and
Y4 be the number with white flowers and round seeds.
In each case, you will find it easier to consider Yi (i=1,2,3,4) as the sum of the 20 indicators Xi for the individual seeds.
(a) Find E(Y3), V(Y3), and cov(Y3,Y4)? What is each of these, in words ?
(b) Find the covariance of the number of offspring with pink flowers, and the number with white flowers and round seeds.
(c) Find the covariance of the number of offspring with pink flowers, and the number with pink flowers and round seeds.
(d) Find the covariance of the number of offspring with white flowers, and the number with wrinkled seeds. Find also the correlation.
(e) Find the covariance of the number of offspring with white flowers, and the number with round seeds. Find also the correlation.
(f) Is there any count of types of offspring which has zero covariance with some other count? Why?/why not?

5. Returning to #3, consider again the 20 offspring plants, each with P(red)=P(white)=1/4, P(pink)=1/2. Let Y=number red, T=number pink, and W = number white (Y+T+W=20)
(a) What is the probability P(Y=y, T=t, W=w)? what is the name of this probability mass function?
(b) What is the pmf P(Y=y) ? What is E(Y)? What is V(Y) ?
(c) Given that a plant is not white, what is the probability it is red?
Given there are 6 white plants, how many red ones do you expect?
(d) Given there are 6 white plants, what is the variance of the number of red plants? Is it larger of smaller than the variance of Y found in part (b)? Is this what you would expect?
(e) Given there are 6 white plants, what is the covariance of the number of red ones and the number of pink ones. Compare this with your answers to #3(b) and #3(c).

6. Here are the general formulae for means and variances in a binomial distribution, and for covariances in a multinomial distributions.

(i) Suppose X is B(n,p), so P(X=k) = (n!/(k! (n-k)!) pk(1-p)n-k, k=0,1,2,3,....,n.
(a) For the case n=1, show E(X)=p, V(X)=p(1-p)
Then sum up the means and variances for the n independent trials, as in the other examples, to get E(X)=np, V(X)=np(1-p)
(b) By summing k*P(X=k), show E(X)=np.
This method is no better than the other, and is a lot messier.
(c) By summing k*(k-1)*P(X=k), show E(X(X-1)) = n(n-1)p2, and hence show V(X)=np(1-p).

(ii) Now suppose (X1,..., Xr) is multinomial Mn(n, (p1,...,pr)). This is the probability distribution we get when we have n independent objects, but they have r different possible types, and P(object is type j)=pj, j=1,...,r. We could use the pmf for this one also, but that gets really messy. Instead, we consider one object at a time, and sum.

(a) Consider just the first type. Then X1 is just B(n,p1). Why is this? So now you can write down E(X1) and V(X1). Also E(Xj) and V(Xj).
(b) Consider just one object (n=1). Show E(X1X2) =0, and cov(X1, X2) = - p1p2. What is cov(Xj, Xi) ?
(c) Now add up over n independent objects to show that cov(Xj, Xi) = -n pipj.

(iii) Now suppose I know Xr=k. How does this affect the prob dsn of (X1,..., Xr-1) ?
(a) Well, chucking away the k objects I know are type r, now I have n-k independent objects, and P(type j) = p*j = pj/(1-pr) (why?)
(b) So Xj is B(n-k, p*j) (why?). So what is E(Xj | Xr = k) ? And what is V(Xj | Xr = k) ?