## Regression in S-PLUS

The textbook gives examples in Chapter 10
on fitting many statistical models in S-PLUS,
including regression and analysis of variance.
The basic techniques for fitting any kind of model in S-PLUS are very similar.
Generally, you ought to:
**Put your data into a data frame with each column
correctly specified as a quantitative variable,
factor, or ordered factor.**
Section 10.1 in the textbook gives more details on the syntax
for doing these tasks in S-PLUS.

**Fit a model using the appropriate function.**
Regression is an example of a linear model, and regression models
are fit with the S-PLUS function `lm`.
Here is an example closely related to an example in the text
in Section 10.5.
Suppose that you have a data frame named `auto` with quantitative
variables `mpg` (miles per gallon),
`weight`, `len` (length), and `disp` (displacement).
To predict miles per gallon on the basis of the other variables:

> fit <- lm(mpg ~ weight + len + disp, data=auto)

This creates an object named `fit`
which contains all necessary information about the fit.
The textbook gives more details on changing the formula statement
to fit alternative models.
**Use diagnostic tools to examine the quality of the fitted model.**
Other functions may then be used to extract information about the model.
For example,

> summary(fit)

produces a summary of the fit,
including:
(1) a five number summary of the residuals;
(2) a table of the estimated regression coefficients with standard errors
and t statistics and p-values from two sided tests
of the hypothesis that the regression coefficient equals zero;
(3) other numerical measures of the fit such as the R-squared statistic; and
(4) a table showing the correlation of the coefficients.
If you wish to extract only the coefficients,
> coef(fit)

does the job.
If you want to work with the fitted values or residuals,
> fv <- fitted.values(fit)
> r <- residuals(fit)

will do the trick.
The function `plot` applied to `fit` will produce six
separate diagnostic plots in rapid succession.
Setting `par(mfrow=c(3,2))` prior to the plot call allows you to
see all of these plots on one screen or page.
You may exercise more control over what you wish to view
by working with the residuals, fitted values, and data directly.
For example,
> plot(fitted.values(fit),residuals(fit),xlab="Fitted Values",
+ ylab="Residuals")
> abline(0,0)

produces a plot of the residuals versus the fitted values.
In applied statistics courses,
you learn how to examine residual plots for patterns
and how to examine the numerical summaries
in an effort to build effective models.
S-PLUS is well-suited to allowing you
to rapidly fit and update different models
with the aid of different diagnostic tools.
The book *Modern Applied Statistics with S-Plus*
by Venables and Ripley
and the book *Statistical Models in S* edited by Chambers and Hastie
(of which I have given you an excerpt)
include much more detailed information on practical model fitting
using S-PLUS.

Last modified: May 2, 1997

Bret Larget,
larget@mathcs.duq.edu