Here is how to use R to do a linear regression analysis on our in-class example of 6 data points (x_i,y_i) = ( (1,1), (2,2), (3,2), (3,3), (4,3), (5,4) ). Notes: 1. > is the R prompt 2. c stands for column (of data) 3. x <- c(4,1,2) puts the data 4,1,2 into a vector called x 4. R ignores everything after the # symbol, so comments for the reader can follow # 5. It is possible to specify null hypotheses for alpha and beta (in the example below), but by default R assumes the null hypothesis is beta = 0 and the alternative hypothesis is two-sided (beta not = 0), and similarly for alpha. 6. Also, R returns the P-value for each hypothesis which in this situation is of the form P(|T| > |t|) where t is the observed value of T; t close to 0 is consistent with the null hypothesis and the further t is from 0, the more evidence you have in favor of the alternative hypothesis (and the smaller the P-value will be). Of course, you can't calculate the P-value yourself because the textbook doesn't contain a full Table of probabilities for the t distribution. 7. In the output below, the F-value refers to testing the hypothesis that the observed value (Y) is unaffected by the value of the control variable (X), in other words, that the regression line is horizontal. The F-value is kind of redundant in this simple situation of just one control variable because a horizontal regression line means beta = 0 and we already have a t-test for that, but in multiple regression (several control variables) the F-test would be relevant. After you start R up, you get something like the following, then a prompt symbol. Type in each line as below and hit Return after each line R : Copyright 2002, The R Development Core Team Version 1.6.1 (2002-11-01) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type `license()' or `licence()' for distribution details. R is a collaborative project with many contributors. Type `contributors()' for more information. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R. > x <- c(1,2,3,3,4,5) # enter x-values > x # display x [1] 1 2 3 3 4 5 > y<-c(1,2,2,3,3,4) # enter corresponding y-values > inputdata <- data.frame(x, y) # form R-object called a "data_frame" > inputdata # display inputdata, the first column just numbers the data points x y 1 1 1 2 2 2 3 3 2 4 3 3 5 4 3 6 5 4 > samplemodel <- lm(y ~ x, data=inputdata) # Do a linear regression analysis on this data frame, results stored in "samplemodel", lm stands for linear model, y~x means x is the control variable, y is the observed variable > summary(samplemodel) # display results Call: lm(formula = y ~ x, data = dummy) Residuals: 1 2 3 4 5 6 -0.1 0.2 -0.5 0.5 -0.2 0.1 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.4000 0.4000 1.000 0.37390 x 0.7000 0.1225 5.715 0.00464 ** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.3873 on 4 degrees of freedom Multiple R-Squared: 0.8909, Adjusted R-squared: 0.8636 F-statistic: 32.67 on 1 and 4 DF, p-value: 0.004636 > # The p-value for F is the same as the p-value for the coefficient of x. The reason for this is explained in Note 7 above.