GUIDE Classification and Regression Trees and Forests (version 17.10)
© Wei-Yin Loh 1997-2014
Sand sculpture by Haoyang Fan, Xu He, Dong Liu, and Wenwen Zhang, on
Sentosa Island, Singapore, March 22, 2014
GUIDE is a multi-purpose machine learning algorithm for
constructing classification and regression trees. It is designed and maintained by
Wei-Yin Loh at the University of Wisconsin, Madison. GUIDE stands for
Generalized, Unbiased, Interaction Detection and Estimation.
This material is based upon work supported by grants from the
U.S. Army Research Office, the National Science Foundation, and the
National Institutes of Health.
Properties and features:
- Choice of classification or regression trees
- Negligible bias in split variable selection
- Importance ranking and identification of unimportant variables
- Power to detect local interactions between pairs of
- Ability to use ordered (continuous) and unordered (categorical)
- Automatic handling of missing values, including splits on missingness
- Automatic prediction for new (unseen) samples
- Choice of weighted least squares (Gaussian), least median of
squares, Poisson, quantile (including median), proportional hazards,
or multi-response (e.g., longtudinal) regression tree models
- Choice of piecewise constant, best simple polynomial, multiple,
or stepwise linear regression models
- Choice of roles for predictor variables (splitting only, node modeling
only, both, or none)
- Choice of using categorical variables for splitting only or both
splitting and fitting through dummy 0-1 vectors (ANCOVA)
- Choice of stopping rules: no pruning, pruning by
cross-validation, or pruning with a test sample
- Choice of batch or interactive mode of operation
- On-the-fly generation of products and powers of predictor
variables as regressor variables
- Generation of LaTeX ( MikTeX for Windows) source code for
the tree diagrams in PostScript and PDF formats. The LaTeX code requires the
PSTricks package which is normally included in most LaTeX
PSTricks User Guide
India doc for some excellent documentation on PSTricks. The
PostScript files may be converted to pdf with the
ps2pdf program which is part of
- Generation of R source code for
prediction of future cases
- Free executables for Windows, Macintosh, and Linux computers
1 for a feature comparison between GUIDE and other
classification tree algorithms.
2 for a feature comparison between GUIDE and other
regression tree algorithms.
- Loh, W.-Y. (2014),
Fifty years of classification and regression trees (with discussion),
International Statistical Review,
vol. 34, 329-370. DOI
- Loh, W.-Y., He, X., and Man, M. (2014), A regression tree approach
to identifying subgroups with differential treatment effects,
submitted for publication. DOI
- Loh, W.-Y. and Zheng, W. (2013),
Regression trees for longitudinal and multiresponse data,
Annals of Applied Statistics,
vol. 7, 496-522. DOI
- Loh, W.-Y. (2012),
Variable selection for classification and regression in large p, small
Lecture Notes in
Statistics---Proceedings, A. Barbour, H.P. Chan and D.
Siegmund (Eds.), vol 205, Springer, pp 133--157.
- Loh, W.-Y. (2011),
Classification and regression trees,
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.1, 14-23.
- Loh, W.-Y. (2010),
Wiley Interdisciplinary Reviews: Computational Statistics, vol.2, 364-369.
- Loh, W.-Y. (2009),
Improving the precision of classification trees.
Annals of Applied Statistics, vol. 3, 1710-1737.
[The definitive reference for GUIDE classification.]
- Loh, W.-Y. (2008),
Classification and regression tree methods.
Encyclopedia of Statistics in Quality and Reliability, F. Ruggeri, R. Kenett, and F. W. Faltin (Eds.) Wiley, pp. 315-323.
- Loh, W.-Y. (2008),
Regression by parts: Fitting visually interpretable models with GUIDE,
Handbook of Computational Statistics, vol. III
, 447-469, Springer.
- Loh, W.-Y., Chen, C.-W., and Zheng, W.(2007),
Extrapolation errors in linear model trees.
ACM Transactions on Knowledge Discovery in
Data, vol. 1, issue 2, article 6.
- Kim, H., Loh, W.-Y., Shih, Y.-S., and Chaudhuri, P. (2007),
Visualizable and interpretable regression models with good prediction power
IIE Transactions, vol. 39, Issue
6, June 2007, pp. 565-579.
- Loh, W.-Y. (2006), Regression tree models
for designed experiments, Second Lehmann
Symposium, Institute of Mathematical Statistics Lecture
Notes-Monograph Series, vol. 49, 210-228.
- Loh, W.-Y. (2002),
Regression trees with unbiased variable selection and interaction
detection, Statistica Sinica,
vol. 12, 361-386. [The definitive reference for GUIDE regression.]
- Chaudhuri, P. and Loh, W.-Y. (2002),
Nonparametric estimation of conditional quantiles using quantile
regression trees, Bernoulli,
vol. 8, 561-576.
- Chaudhuri, P., Lo, W.-D., Loh, W.-Y., and Yang, C.-C. (1995),
Generalized regression trees, Statistica
Sinica, vol. 5, 641-666.
- Chaudhuri, P., Huang, M.-C., Loh, W.-Y., and Yao, R. (1994),
Piecewise-polynomial regression trees, Statistica Sinica, vol. 4,
- Loh, W.-Y., and Vanichsetakul, N. (1988),
Tree-structured classification via generalized discriminant analysis (with discussion), Journal of the American Statistical Association,
vol. 83, 715-728. [This the article that started it all.]
(Mostly) third-party applications of GUIDE, QUEST, CRUISE, and LOTUS: Look here
GUIDE compiled binaries: The following executable files may be freely
distributed but not sold for profit.
guide.gz for 64-bit Linux (compiled with Intel Fortran 12.1.3, Red Hat Enterprise Linux Server release 6.5 (Santiago), kernel 2.6.32-431.3.1.el6.x86_64)
guide.gz for 32-bit Linux (compiled with GFortran 4.6.3, Ubuntu 12.04 LTS (precise), kernel 3.2.0-60-generic)
guide.gz for Mac OS X Mavericks 10.9.5 (compiled with GFortran 4.9.1)
guide.gz for Mac OS X Yosemite 10.10.1 (compiled with GFortran 5.0.0)
- guide.zip for 32-bit Windows
- guide.zip for 64-bit Windows
GUIDE manual and data and description files: bbdat.txt,
GUIDE revision history: See
Earlier algorithms developed by Wei-Yin Loh and his students:
Binary classification tree
CRUISE: Classification tree that splits each node into two or
LOTUS: Logistic regression tree
Copyright (c) 1997-2014 Wei-Yin Loh. All rights reserved.
Redistribution and use in binary forms, with or without modification, are
permitted provided that the following condition is met:
Redistributions in binary form must reproduce the above copyright notice,
this condition and the following disclaimer in the documentation and/or other
materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY WEI-YIN LOH "AS IS" AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL WEI-YIN LOH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
OF THE POSSIBILITY OF SUCH DAMAGE.
The views and conclusions contained in the software and documentation are those
of the author and should not be interpreted as representing official policies,
either expressed or implied, of the University of Wisconsin.
Return to Wei-Yin Loh's
Last modified: December 21, 2014 by Wei-Yin Loh