Homework #1 for Stat 610 Due Thursday February 1, in class. 1. Read chapter 6 in Wasserman 2. Models: a. Translate into words the following models for a random sample of real-valued random variables i. scriptF_1 = { F: F is a cdf } ii. scriptF_2 = { F in scriptF_1: F(x) = int_{-infty}^x f(y) dy } iii. scriptF_3 = { F in scriptF_2: f(x) = f(2 mu - x) for some mu, and for all real x <= mu } iv. scriptF_4 = { F in scriptF_2: h(x) = -log[ f(x) ] is convex } v. scriptF_5 = { F in scriptF_2: f(x;mu,sigma) = (1/sigma) phi[ (x - mu)/sigma ], where phi(x) = density of standard normal distribution } Also, show that scriptF_5 is contained in scriptF4, that scriptF_5 is contained in scriptF_3. What is the relationship between scriptF_3 and scriptF_4? b. A simple Markov chain model for a vector of binary random variables X_1, X_2, ..., X_n entails the marginal m(1) = P[X_1=1] and conditionals of the form P[X_{i+1} = x | X_1=x_1, ..., X_i=x_i ] = t(x_i, x) for transition probabilities t(,), where x and each x_i are in {0,1}. Argue that the Markov chain model includes the model in which X_i's are a random sample of Bernoulli trials. c. The univariate logistic regression model may be useful when data are ordered pairs (X_i, Y_i), mutually independent across i=1,2,...,n and when Y_i takes only two values. It specifies the conditional distribution of Y_i given X_i=x as Bernoulli[ h(x) ], where log[ h(x)/{ 1- h(x) } ] = alpha + beta x and where (alpha,beta) is a vector of real labels, and it leaves unspecified the marginal distribution of X_i. The univariate probit model is similar, Y_i given X_i = x is Bernoulli[ g(x) ] where g(x) = int_{-infty}^{ a + b x } phi(u) du, and where (a,b) are real labels and phi is the standard normal density function. Describe the intersection of the probit and logit models? 3. Identifiability: a. The Rasch model is sometimes useful when data are arranged as an m x n matrix of Bernoulli trials (X_ij). Specifically, the model asserts mutual independence of all the random variables, and further, with h_ij = P[ X_ij = 1 ], that log[ h_ij/{ 1- h_ij } ] = mu + alpha_i + beta_j where theta = (mu, alpha_1,...,alpha_m, beta_1,...,beta_n) labels each distribution in the model, and is a vector of real values. Show that without contstraints on elements of theta, this parameterization is not identifiable. Describe an identifiable parameterization and confirm this property. b. Consider the mixture of two normals model described in class for the real-valued random variable X. Densities in this model have the form f(x;theta) = p (1/sigma1) phi[ (x-mu1)/sigma1 ] + (1-p) (1/sigma2) phi[ (x-mu2)/sigma2 ] where theta=(p, mu1, sigma1, mu2, sigma2 ) is a vector labeling the density. To be a valid density, we must have p in (0,1) and sigma1,sigma2 > 0. Show that without contstraints on elements of theta, this parameterization is not identifiable. Describe an identifiable parameterization and confirm this property. 4. Limits. Consider a random sample X_1, X_2,...,X_n from a normal distribution with mean mu and variance sigma^2. The statistic T_n = (1/n) sum_{i=1}^n ( X_i - bar X_n )^2 estimates sigma^2. Show that T_n -->p sigma^2 as n--> infty