To introduce these ideas, we analyze the data from a student experiment to explore the nature of the relationship between a person's heart rate and the frequency at which that person stepped up and down on steps of various heights. The student experimenters tested the effects of two different step heights crossed with three different stepping speeds, for a total of six different treatment combinations. In addition, they used six different subject/experimenter combinations as blocks. They used a balanced incomplete block design, meaning that every pair of factor levels appears in an equal number of blocks. For this experiment, each of the six height/frequency combinations appears in five blocks, where a block is a student exerciser and a pair of experimenters.

- Order: the overall performance order of the trial
- Block: the subject and experimenters' block
- Height: low (5.75") or high (11.5")
- Frequency: the rate of stepping. slow (14 steps/min), moderate (21 steps/min), or high (28 steps/min)
- RestHR: the resting heart rate of the subject before a trial, in beats per minute
- HR: the final heart rate of the subject after a trial, in beats per minute
- Time: the performance order within a block

% cp ~/larget/496/step.dat ~/496/step.dat % Splus -e > step.frame <- read.table("step.dat",header=T)Any variable that takes on character values will be interpreted as a factor, and any variable with only numerical values will be interpreted as quantitative by default. For the data we have read in, Time and Frequency should also be ordered factors. This S-PLUS code changes the status of both variables.

> step.frame$Frequency <- ordered(step.frame$Frequency, + levels = c("slow","moderate","fast")) > step.frame$Time <- ordered(step.frame$Time)For Frequency, you need to explicitly tell S-PLUS the proper order. For Time, S-PLUS uses the numerical order by default.

You can check that all variables are correctly classified
by using the function `sapply`,
which is similar to `apply`,
and applies a specified function to all objects in a list,
or in the case of data frames,
to each column.
The function `is.factor` returns T if a column is a factor
or an ordered factor.
The function `is.ordered` returns T
if a column is an ordered factor.

> sapply(step.frame,is.factor) > sapply(step.frame,is.ordered)The response variable we are interested in is the difference between heart rate (HR) and resting heart rate (RestHR). Create a variable called DiffHR for this difference and add it to the data frame.

> attach(step.frame) > step.frame$DiffHR <- HR - RestHR > attach(step.frame)

> plot.factor(DiffHR ~ Block + Time + Height + Frequency)Each of these pictures gives an informal indication as to whether a given factor should be included in the model. Another useful summary plot is created with

> plot.design(step.frame)

**Put your data into a data frame with each column
correctly specified as a quantitative variable,
factor, or ordered factor.**
Section 10.1 in the textbook gives more details on the syntax
for doing these tasks in S-PLUS.

**Fit a model using the appropriate function.**
Analysis of variance models are fit with the S-PLUS function `aov`.
This example continues the data introduced above.
To fit a model that predicts the difference in heart rate
based on the group of students (Block),
number in sequence of a group of experiments (Time),
height of the step (Height),
and frequency with which the exerciser steps (Frequency),

> fit <- aov(DiffHR ~ Block + Time + Height + Frequency, data=step.frame)This creates an object named

**Use diagnostic tools to examine the quality of the fitted model.**
Other functions may then be used to extract information about the model.
For example,

> summary(fit)produces the ANOVA table. If you wish to extract the fitted coefficients,

> coef(fit)does the job. If you want to work with the fitted values or residuals,

> fv <- fitted.values(fit) > r <- residuals(fit)will do the trick. Notice that the functions used above are identical to those used to examine the object created by fitting a regression model and, in fact, will work for the several other models available in S-PLUS.

The function `plot` applied to `fit` produces
a few diagnostic plots.
Again, you may exercise more control over what you wish to view
by working with the residuals, fitted values, and data directly.
For example,

> plot(fitted.values(fit),residuals(fit),xlab="Fitted Values", + ylab="Residuals") > abline(0,0)produces a plot of the residuals versus the fitted values.

Just as with regression, finding the appropriate model to describe the data requires fitting several models and examining the fits interactively. S-PLUS gives you the tools to do this. This course attempts to show you how to use these tools computationally. Courses in applied statistics are necessary for you to learn more about why these tools can be effective.

The book *Modern Applied Statistics with S-Plus*
by Venables and Ripley
and the book *Statistical Models in S* edited by Chambers and Hastie
(of which I have given you an excerpt)
include much more detailed information on practical model fitting
using S-PLUS.

Last modified: May 2, 1997

Bret Larget, larget@mathcs.duq.edu