QUEST Classification Tree (version 1.9.2)
© Yu-Shan Shih 1997-2005
QUEST is a binary-split decision tree algorithm for classification
and data mining developed by Wei-Yin Loh (University of
Wisconsin-Madison) and
Yu-Shan Shih (National Chung Cheng University, Taiwan). QUEST
stands for Quick, Unbiased and Efficient Statistical Tree.
The objective of QUEST is similar to that of the CART(TM)
algorithm described in the book, Classification and Regression Trees, by
Breiman, Friedman, Olshen and Stone (1984). [CART is a registered
trademark of California Statistical Software, Inc.] The major differences
are:
- QUEST uses an unbiased variable selection
technique by default
- QUEST uses imputation instead of surrogate
splits to deal with missing values
- QUEST can easily handle
categorical predictor variables with many categories
If there are no missing values in the data, QUEST can optionally use
the CART algorithm to produce a tree with
univariate splits.
For a comparison of features between QUEST and other algorithms, see table.
Documentation:
- Loh, W.-Y. and Shih, Y.-S. (1997),
Split selection methods for classification trees, Statistica Sinica, vol. 7, 815-840.
[This is the definitive reference for QUEST.]
- Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. (2000),
A comparison of prediction accuracy, complexity, and training time of
thirty-three old and new classification algorithms, Machine Learning Journal, vol. 40,
203-228. [This paper compares the performance of version 1.7 of
QUEST against other methods.] A separate appendix
contains more detailed results.
The datasets used in the study are in the
gzipped tar archive (5.8Mb)
- QUEST
User Manual in pdf format. The manual uses the example data and
description files
hepdat.txt and
hepdsc.txt for illustration.
- Loh, W.-Y. and Vanichsetakul, N. (1988), Tree-structured
classification via generalized discriminant analysis (with
discussion), Journal of the American
Statistical Association, vol. 83, 715-728. [This paper
documents an older algorithm called FACT.]
- Shih, Y.-S. (1999), Families of splitting criteria for
classification trees, Statistics and
Computing, vol. 9, 309-315. This paper documents the
enlarged class of splitting criteria in version 1.8 of QUEST. Downloadable from
Shih's page.
Compiled binaries: The following files may be freely
distributed but not sold for profit.
Intel and compatibles (Windows 9x/NT/2000) in pkzip format ---
download (download
pkunzip.exe)
Intel and compatibles (Linux 2.0) in gzip format ---
download
Sun SPARCstation/Ultra (Sun Solaris OS 5) in gzip format ---
download
Revision history: See the file
history.txt
Commercial implementations of earlier versions of the algorithm
for the Windows platform are available from SPSS (AnswerTree)
and StatSoft
(STATISTICA).
Tree diagrams:
The QUEST program can optionally produce
LaTeX ( MikTeX) or allCLEAR source
code for the tree diagrams.
The LaTeX code, which requires the
PSTricks package, can output pdf or postscript files (the latter
can be viewed and printed using
Ghostscript
and GSView).
Some application papers that use QUEST and FACT (its
precursor):
- Bertelli, D., Plessi, M., Sabatini, A.G., Lollo, M. and
Grillenzoni, F. (2007), Classification of Italian honeys by
mid-infrared diffuse reflectance spectroscopy (DRIFTS),
Food Chemistry, vol. 101, 1582-1587
- Hoque, M.O., Feng, Q.H., Toure, P., Dem, A., Critchlow, C.W.,
Hawes, S.E., Wood, T., Jeronimo, C., Rosenbaum, E., Stern, J., Yu,
M.J., Trink, B., Kiviat, N.B., and Sidransky, D. (2006), Detection of
aberrant methylation of four genes in plasma DNA for the detection of
breast cancer, Journal of Clinical Oncology
, vol. 24, 4262-4269
- Sullivan, M.S., Jones, M.J., Lee, D.C., Marsden, S.J., Fielding,
A.H., and Young, E.V. (2006),
A comparison of predictive methods in extinction risk studies:
contrasts and decision trees,
Biodiversity and Conservation,
vol. 15, 1977-1991
- Stahl, K. (2005),
Influence of hydroclimatology and socioeconomic conditions on water-related
international relations,
Water International,
vol. 30, 270-282
- Kannebley, S., Porto, G.S., and Pazello, E.T. (2005),
Characteristics of Brazilian innovative firms: An empirical analysis based on
PINTEC - industrial research on technological innovation,
Research Policy,
vol. 34, 872-893
- Balaras, C.A., Droutsa, K., Dascalaki, E., and Kontoyiannidis, S.
(2005), Deterioration of European apartment buildings,
Energy and Buildings,
vol. 37, 515-527
- Pal, M. and Mather, P.M. (2003), An assessment of the
effectiveness of decision tree methods for land cover classification,
Remote Sensing of Environment,
vol. 86, 554-565
- Kedia, S. and Williams, C. (2003), Predictors of substance abuse
treatment outcomes in Tennessee, Journal of
Drug Education, vol. 33, 25-47
- Okura, Y., Matsumura, Y., Harauchi, H., Sukenobu, Y., Kou, H., Kohyama,
S., Yasuda, N., Yamamoto, Y., and Inamura, K. (2002), An inductive method
for automatic generation of referring physician prefetch rules for
PACS, Journal of Digital
Imaging, vol. 14, 226-231
- Olden, J.D. and Jackson, D.A. (2002),
A comparison of statistical approaches for modelling fish species
distributions, Freshwater
Biology, vol. 47, 1976-1995
- Pryse-Phillips, W., Aube, M., Gawel, M., Nelson, R., Purdy, A., and
Wilson, K. (2002), A headache diagnosis project,
Headache: The Journal of Head and Face
Pain, vol. 42, 728-737
- Croisier, D., Chavanet, P., Lequeu, C., Ahanou, A., Nierlich, A.,
Neuwirth, C., Piroth, L., Duong, M., Buisson, M., and Portier,
H. (2002), Efficacy and pharmacodynamics of simulated human-like treatment
with levofloxacin on experimental pneumonia induced with
penicillin-resistant pneumococci with various susceptibilities
to fluoroquinolones, Journal of
Antimicrobial Chemotherapy, vol. 50, 349-360
- Carter, M., Elsner, J., and Bennett, S. (2000), A quantitative
precipitation forecast experiment for Puerto Rico, Journal of Hydrology, vol. 239,
162-178
- Connor, G. and Woodcock, F. (2000), The application of synoptic
stratification to precipitation forecasting in the trade wind regime,
Weather and Forecasting,
vol. 15, 276-297
- Elsner, J., Lehmiller, G., and Kimberlain, T. (1999), Objective
classification of Atlantic hurricanes,
Journal of Climate, vol. 9, 2880-2889
- Banerjee, M., Mitra, S., and Pal, S.K. (1998), Rough fuzzy MLP:
Knowledge encoding and classification, IEEE
Transactions on Neural Networks, vol. 9, 1203-1216
- Carter, M. and Elsner, J. (1997), A statistical method for
forecasting rainfall over Puerto Rico,
Weather and Forecasting, vol. 12, 515-525
- Wolberg, W.H., Tanner, M.A., and Loh, W.-Y. (1988), Diagnostic
schemes for fine needle aspirates of breast masses, Analytical and Quantitative Cytology and
Histology, vol. 10, 225-228
- Wolberg, W.H., Tanner, M.A., Loh, W.-Y., and Vanichsetakul,
N. (1987), Statistical approach to fine needle aspiration diagnosis of
breast masses, Acta Cytologica,
vol. 31, 737-741
- Wolberg, W.H., Tanner, M.A., and Loh, W.-Y. (1989), Fine needle
aspiration for breast mass diagnosis,
Archives of Surgery, vol. 124, 814-818
Related algorithms with unbiased splits:
CRUISE: Classification trees with more than two splits per node
GUIDE:
Piecewise-linear least-squares, quantile, and Poisson
regression trees
Application papers that use CRUISE, GUIDE, LOTUS, or QUEST: See file
License:
QUEST is free software. You may use the Program without
restriction. You may copy and distribute the Program in executable
form provided that you conspicuously and appropriately publish on each
copy an appropriate copyright notice and disclaimer of warranty; and
give any other recipients of the Program a copy of this license along
with the Program.
Disclaimer of Warranty:
The copyright holder provides the Program "as is" without warranty of
any kind, either expressed or implied, including, but not limited to,
the implied warranties of merchantability and fitness for a
particular purpose. The entire risk as to the quality and
performance of the Program is with you. Should the Program prove
defective, you assume the cost of all necessary servicing, repair or
correction. In no event will the copyright holder be liable to you
for damages, including any general, special, incidental or
consequential damages arising out of the use or inability to use the
Program (including but not limited to loss of data or data being
rendered inaccurate or losses sustained by you or third parties or a
failure of the Program to operate with any other programs), even if
such holder has been advised of the possibility of such damages.
Return to Wei-Yin Loh's
homepage.
Last modified: August 4, 2008 by Wei-Yin Loh