- Regression
- Analysis of Variance (ANOVA)
- Analysis of Covariance (ANCOVA)
- General Linear Models (GLM)
- Sums of Squares Types: I, II, III & IV
- Stepwise Regression

proc reg; /* simple linear regression */ model y = x; proc reg; /* weighted linear regression */ model y = x; weight w; proc reg; /* multiple regression */ model y = x1 x2 x3;The

model y = x / noint; /* regression with no intercept */ model y = x / ss1; /* print type I sums of squares */ model y = x / p; /* print predicted values and residuals */ model y = x / r; /* option p plus residual diagnostics */ model y = x / clm; /* option p plus 95% CI for estimated mean */ model y = x / cli; /* option p plus 95% CI for predicted value */ model y = x / r cli clm; /* options can be combined */

It is possible to let SAS do the predicting of new observations and/or
estimating of mean responses. The way to do this is to enter the
`x` values (or `x1,x2,x3` for multiple regression)
you are interested in during the data input step, but put a period (.)
for the unknown `y` value. That is,

data new; input x y; cards; 1 0 2 3 3 . 4 3 5 6 ; proc reg; model x = y / r cli clm;Try it, and check standard errors and confidence intervals by hand. Here are some other model options for more advanced stuff:

model y = x / covb; /* covariance matrix for estimates */ model y = x / collin; /* collinearity diagnostic */ model y = x / collinoint; /* collin without intercept */The

output out=b predicted=py; /* predicted values in "py" */ output out=b p=py; /* same as predicted */ output out=b residual=ry; /* residual values in "ry" */ output out=b r=ry; /* same as residual */ output out=b stdr=sr; /* standard error of residuals "sr" */ output out=b student=sy; /* studentized residuals "sy" */Only one

output out=b p=py r=ry stdr=sr student=sy;Those new variables created in set

proc anova; /* one-way analysis of variance */ class trt; model y = trt; proc anova; /* 1-way with multiple comparisons */ class trt; model y = trt; means trt / lsd snk; /* LSD and Student-Neumann-Kohl */ proc anova; /* two-way anova */ class fert var; model y = fert var; means fert var / lsd; /* means by fert and var with LSD */ proc anova; /* two-way anova with interaction */ class fert var; model y = fert var fert*var; /* interaction signified by asterisk */ means fert var / lsd; means fert*var; /* for each fert-var combination */The

means trt / t; /* Least Significant Difference */ means trt / lsd; /* Least Significant Difference */ means trt / bon; /* Bonferroni */ means trt / snk; /* Student-Newman-Keuls */ means trt / lsd alpha=.05; /* LSD at level 5% (default) */ means trt / lsd lines; /* force ordering of means */ means trt / lsd cldiff; /* force pairwise tests of means */The

If you want to save predicted values or residuals, or to evaluate
contrasts, you must use `proc glm` instead of `proc
anova`. See below.

proc glm; /* analysis of covariance */ class trt; /* trt = factor, x = covariate */ model y = x trt; proc glm; /* analysis of covariance */ class trt; /* with different slopes */ model y = x trt x*trt;More advanced use of ANCOVA can be found in the section on Multiple Responses.

proc glm; /* simple linear regression */ model y = x / solution; proc glm; /* weighted linear regression */ model y = x / solution; weight w; proc glm; /* multiple regression */ model y = x1 x2 x3 / solution; proc glm; /* one-way analysis of variance */ class trt; model y = trt; proc glm; /* additive two-factor anova */ class fert var; model y = fert var; proc glm; /* full two-factor anova */ class fert var; model y = fert | var; proc glm; /* analysis of covariance */ class trt; /* trt = factor, x = covariate */ model y = x trt; data testlin; set resps; x = level; proc glm; /* test for non-linearity */ class level; resp = x level;The

model y = trt x / solution; /* print parameter estimates and SEs */ model y = x / noint; /* no intercept (as in proc reg) */ model y = x / ss1; /* print only type I sums of squares */ model y = x / ss2; /* print only type II sums of squares */ model y = x / p; /* print predicted values and residuals */ model y = x / clm; /* option p plus 95% CI for estimated mean */ model y = x / cli; /* option p plus 95% CI for predicted value */ model y = x / cli alpha=.01; /* only .01, .05 and .10 available */The default way of estimating model parameters in SAS is to set the last group estimate to

The `means` phrase works much the same in `proc glm`
as in `proc anova`.
Contrasts can be set up if `means` aren't enough. Here is an
example from the glue data. The `contrast` phrase contains a
quoted title, variable name and the contrast coefficient values.
Note that the order of factor levels is lexicographic, which may not
be what you expect. This can be checked by examining the order under
the `solution` option to the `model` phrase.
Further, these can get very complicated for higher order designs.
Consult a book for further help.

contrast 'A vs. rest' glue 1 -.25 -.25 -.25 -.25; contrast 'BD vs. CE' glue 0 .5 -.5 .5 -.5;Predicted and residual (and other) values can be passed to other procedures and data steps using the

- Short Summary of Types of Sums of Squares
- Hypotheses for Unbalanced Data
- General Form of Estimable Functions

- balanced data (each cell has exactly r replicates)
- unbalanced data but each cell has at least one observation
- unbalanced data with one or more empty cells

Source Type I SS Type II SS Type III or IV SS A SS(A|u) SS(A|u,B) SS(A|B,AB) B SS(B|u,A) SS(B|u,A) SS(B|A,AB) A*B SS(A*B|u,A,B) SS(A*B|u,A,B) SS(AB|A,B)

- Type I (sequential)
- incremental improvement in the error SS as each effect is added to the model
- Type II (hierarchical)
- reduction in error SS due to adding the term to the model after all other terms except those that contain it
- Type III (orthogonal)
- reduction in error SS due to adding the term after all other terms have been added to the model
- Type IV (balanced)
- variation explained by balanced comparison of averages of cell means

Type II approach is appropriate for model building, and is the natural choice for regression.

Type III and Type IV tests differ only if the design has empty cells.
SAS automatically gives you Types I and III with `proc glm`.
You can explicitly choose types with options to the `model`
phrase:

proc glm; class a b; model y = a b a*b / ss1 ss2 ss3 ss4; /* select all 4 types */

- I/II
- Hypotheses are functions of the cell counts (they differ from
those tested if the data were balanced). This is usually undesirable.
Type I hypotheses depend on order of terms in
`model`. - III/IV
- Hypotheses are the same for balanced and unbalanced data, involving simple, marginal averages of (population) cell means.

- I/II
- Caution! Remember that hypotheses depend on cell counts.
- III
- Hypotheses do not depend on the order of effects or on the labels of levels. However, the orthogonal contrasts used are difficult to interpret unless you are willing to assume some interactions are zero.
- IV
- Hypotheses are balanced and easily interpretable. However, the SS may change if the labels of the factor levels are changed! Thus the exact tests performed depend on the order and labels of factor levels! Essentially, Type IV contrasts correspond to analysing subsets of factor levels chosen automatically.

- analyse with all data (use Type IV automated hypotheses)
- analyse with all data (using Type II) for additive model
- analyse combination of factors with missing cells as a single factor
- pick subset(s) with no empty cells and analyse them
- compare the subset analyses with the analysis in (1)
- if the results are consistent, write them up
- if they are not consistent--dig and find out why! (get help!)

proc glm; class a b; model y = a | b / e; /* general form of estimable functions */ proc glm; class a b; /* estimable function coefficients */ model y = a | b / e1 e2 e3; /* for Types I, II, III */

proc stepwise; model y = x1 x2 x3;Here are model options for the means of selection and elimination:

model y = x1 x2 x3 / forward; /* forward selection */ model y = x1 x2 x3 / backward; /* backward elimination */ model y = x1 x2 x3 / stepwise; /* forward in & backward out */ model y = x1 x2 x3 / maxr stop=4; /* like stepwise, but using R^2 */The cheapest methods are

model y = x1 x2 x3 / noint; /* no intercept */ model y = x1 x2 x3 / slentry=0.5; /* signif. level for selection */ model y = x1 x2 x3 / slstay=0.1; /* signif. level for elimination */ model y = x1 x2 x3 / include=2; /* force in first 2 variables */ model y = x1 x2 x3 / start=2; /* start with 2 variables */ model y = x1 x2 x3 / details; /* more details of R^2, F stats */The significance levels

Return to U WI Statistics Home Page

Last modified: Mon Jun 19 14:23:40 1995 by Brian Yandell
Wed Mar 22 11:26:59 1995 by Stat Www
*(statwww@stat.wisc.edu)*