GUIDE Regression Tree (version 7.9)

© Wei-Yin Loh 1997-2009

GUIDE is a multi-purpose machine learning algorithm for constructing classification and regression trees. It is designed and maintained by Wei-Yin Loh at the University of Wisconsin-Madison. GUIDE stands for Generalized, Unbiased, Interaction Detection and Estimation. This material is based upon work supported by grants from the U.S. Army Research Office and the National Science Foundation.

Properties and features:

  1. Choice of classification or regression trees
  2. Negligible bias in split variable selection
  3. Importance ranking and identification of unimportant variables
  4. Power to detect local interactions between pairs of predictor variables
  5. Ability to use ordered (continuous) and unordered (categorical) predictor variables
  6. Automatic handling of missing values
  7. Automatic prediction of new samples
  8. Choice of weighted least squares (Gaussian), least median of squares, Poisson, quantile (including median), or proportional hazards regression tree models
  9. Choice of piecewise constant, best simple polynomial, multiple linear, or stepwise regression models
  10. Choice of roles for predictor variables (splitting only, node modeling only, both, or none)
  11. Choice of using categorical variables for splitting only or both splitting and fitting through dummy 0-1 vectors
  12. Choice of stopping rules: no pruning, pruning by cross-validation, or pruning with a test sample
  13. Choice of batch or interactive mode of operation
  14. Automatic generation of products and powers of predictor variables as regressor variables
  15. Automatic generation of LaTeX ( MikTeX) or allCLEAR source code for the tree diagrams in PostScript and PDF formats. The LaTeX code requires the PSTricks package which is normally included in most LaTeX distributions. The PostScript files require Ghostscript and Ghostview for display and printing.
  16. Free executables for Windows, Macintosh, and Linux computers

Documentation:

  1. Loh, W.-Y. (2009), Improving the precision of classification trees, Annals of Applied Statistics, to appear. [This is the definitive reference for GUIDE classification.]
  2. Loh, W.-Y., Chen, C.-W., and Zheng, Z.(2007), Extrapolation errors in linear model trees, ACM Transactions on Knowledge Discovery in Data, vol. 1. DOI
  3. Loh, W.-Y. (2007), Regression by parts: Fitting visually interpretable models with GUIDE, Handbook of Computational Statistics, vol. III , Springer, in press.
  4. Kim, H., Loh, W.-Y., Shih, Y.-S., and Chaudhuri, P. (2007), Visualizable and interpretable regression models with good prediction power . [This is the author's version of the work. It is posted here by permission of Taylor & Francis for personal use, not for redistribution.] IIE Transactions, vol. 39, Issue 6, June 2007, pp. 565-579. DOI
  5. Loh, W.-Y. (2006), Regression tree models for designed experiments, Second Lehmann Symposium, Institute of Mathematical Statistics Lecture Notes-Monograph Series, vol. 49, 210-228.
  6. Loh, W.-Y. (2002), Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, vol. 12, 361-386. [This is the original reference for GUIDE regression.]
  7. Chaudhuri, P. and Loh, W.-Y. (2002), Nonparametric estimation of conditional quantiles using quantile regression trees, Bernoulli, vol. 8, 561-576. [This paper extends GUIDE to quantile regression.]
  8. Chaudhuri, P., Lo, W.-D., Loh, W.-Y., and Yang, C.-C. (1995), Generalized regression trees, Statistica Sinica, vol. 5, 641-666. [This is the first paper on Poisson and logistic regression trees.]
  9. Chaudhuri, P., Huang, M.-C., Loh, W.-Y., and Yao, R. (1994), Piecewise-polynomial regression trees, Statistica Sinica, vol. 4, 143-167. [This is the first paper on polynomial regression trees.]
  10. GUIDE manual in pdf format. The manual uses the example data and description files bbdat.txt, bbdsc.txt, irisdata.txt, irisdsc.txt, solderdat.txt, and solderdsc.txt for illustration.

Compiled binaries: The following files may be freely distributed but not sold for profit.

Revision history: See the file history.txt

Closely related algorithms developed by Wei-Yin Loh and his students:

  • QUEST: A binary classification tree
  • CRUISE: A classification tree that splits each node into two or more subnodes
  • LOTUS: A logistic regression tree
  • Application papers that use CRUISE, GUIDE, LOTUS, or QUEST: See file

    License:

    GUIDE is free software. You may use the Program without restriction. You may copy and distribute the Program in executable form provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; and give any other recipients of the Program a copy of this license along with the Program.

    Disclaimer of Warranty:

    The copyright holder provides the Program "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of the Program is with you. Should the Program prove defective, you assume the cost of all necessary servicing, repair or correction. In no event will the copyright holder be liable to you for damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the Program (including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the Program to operate with any other programs), even if such holder has been advised of the possibility of such damages.

    Return to Wei-Yin Loh's homepage.

    Last modified: November 14, 2009 by Wei-Yin Loh