Homework 2. Statistics 771, Spring 09 Posted online Thursday Feb 5/09 Due in class Wednesday Feb 11/09. 1. Consider a linear regression problem which involves, in addition to the n x p design matrix X and the n x 1 response vector Y, an n x n diagonal weight matrix W (i.e. w_ii > 0; w_ij = 0 if i != j). The weighted least squares estimator beta.hat is the solution of the linear system: (X^t W^(-1) X) beta.hat = X^t W^(-1) Y ** a. Explain how this set of equations could arise as the maximum likelihood estimating equations for a normal model in which the response variances possibly change across the experimental units. b. Explain how the QR decomposition could be used to solve **. 2. Given a response vector Y and a design matrix X as in the usual multiple linear regression problem, the ridge regression estimate beta.hat.ridge minimizes (in beta) (Y- X beta)^t (Y-X beta) + lambda beta^t beta for a fixed positive parameter lambda controlling the complexity of the solution. a. Show that beta.hat.ridge = ( X^t X + lambda I )^{-1} X^t Y where I is the identity matrix. b. Consider the singular value decomposition X = U D V^t. Show that the predicted values X beta.hat.ridge can be expressed as a certain linear combination of columns of U. 3. Consider the linear system A x = b for a p by p upper triangular matrix A, where b is a given p vector and x is to be solved for. Suppose also that a_{ii} > 0 for all i. Show how back-solving is used to identify x. Show that A^(-1) is also upper triangular. 4. The following problem arises in a study of mutation frequency among certain immune system cells. Random variables X_1, X_2, ..., X_m, X_{m+1}, ..., X_{m+n} are mutually independent and Poisson distributed, with the first m having Poisson mean theta and the next n having Poisson mean ( k theta ), for a known positive constant k and an unknown positive parameter theta. a. If we could observe all the X_i's as x_i's, what would be the maximum likelihood estimator of theta? b. The X_i's are actually unobservable counts. Instead, we can measure Bernoulli trials Y_i = 1[ X_i > 0 ] Derive an expression for the log-likelihood l(theta) having observed the y_i's. Plot the loglikelihood function for theta from data where in m=n=384, k=2, and s = sum_{i=1}^m y_i = 100, t = sum_{i=[m+1]}^[m+n] y_i = 30. Find the MLE theta.hat.