QUEST Classification Tree (version 1.9.2)

© Yu-Shan Shih 1997-2005

QUEST is a binary-split decision tree algorithm for classification and data mining developed by Wei-Yin Loh (University of Wisconsin-Madison) and Yu-Shan Shih (National Chung Cheng University, Taiwan). QUEST stands for Quick, Unbiased and Efficient Statistical Tree.

The objective of QUEST is similar to that of the CART(TM) algorithm described in the book, Classification and Regression Trees, by Breiman, Friedman, Olshen and Stone (1984). [CART is a registered trademark of California Statistical Software, Inc.] The major differences are:

If there are no missing values in the data, QUEST can optionally use the CART algorithm to produce a tree with univariate splits.

For a comparison of features between QUEST and other algorithms, see table.

Documentation:

  1. Loh, W.-Y. and Shih, Y.-S. (1997), Split selection methods for classification trees, Statistica Sinica, vol. 7, 815-840. [This is the definitive reference for QUEST.]
  2. Lim, T.-S., Loh, W.-Y., and Shih, Y.-S. (2000), A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, Machine Learning Journal, vol. 40, 203-228. [This paper compares the performance of version 1.7 of QUEST against other methods.] A separate appendix contains more detailed results. The datasets used in the study are in the gzipped tar archive (5.8Mb)
  3. QUEST User Manual in pdf format. The manual uses the example data and description files hepdat.txt and hepdsc.txt for illustration.
  4. Loh, W.-Y. and Vanichsetakul, N. (1988), Tree-structured classification via generalized discriminant analysis (with discussion), Journal of the American Statistical Association, vol. 83, 715-728. [This paper documents an older algorithm called FACT.]
  5. Shih, Y.-S. (1999), Families of splitting criteria for classification trees, Statistics and Computing, vol. 9, 309-315. This paper documents the enlarged class of splitting criteria in version 1.8 of QUEST. Downloadable from Shih's page.

Compiled binaries: The following files may be freely distributed but not sold for profit.

  • Intel and compatibles (Windows 9x/NT/2000) in pkzip format --- download (download pkunzip.exe)
  • Intel and compatibles (Linux 2.0) in gzip format --- download
  • Sun SPARCstation/Ultra (Sun Solaris OS 5) in gzip format --- download
  • Revision history: See the file history.txt

    Commercial implementations of earlier versions of the algorithm for the Windows platform are available from SPSS (AnswerTree) and StatSoft (STATISTICA).

    Tree diagrams: The QUEST program can optionally produce LaTeX ( MikTeX) or allCLEAR source code for the tree diagrams. The LaTeX code, which requires the PSTricks package, can output pdf or postscript files (the latter can be viewed and printed using Ghostscript and GSView).

    Some application papers that use QUEST and FACT (its precursor):

    1. Bertelli, D., Plessi, M., Sabatini, A.G., Lollo, M. and Grillenzoni, F. (2007), Classification of Italian honeys by mid-infrared diffuse reflectance spectroscopy (DRIFTS), Food Chemistry, vol. 101, 1582-1587
    2. Hoque, M.O., Feng, Q.H., Toure, P., Dem, A., Critchlow, C.W., Hawes, S.E., Wood, T., Jeronimo, C., Rosenbaum, E., Stern, J., Yu, M.J., Trink, B., Kiviat, N.B., and Sidransky, D. (2006), Detection of aberrant methylation of four genes in plasma DNA for the detection of breast cancer, Journal of Clinical Oncology , vol. 24, 4262-4269
    3. Sullivan, M.S., Jones, M.J., Lee, D.C., Marsden, S.J., Fielding, A.H., and Young, E.V. (2006), A comparison of predictive methods in extinction risk studies: contrasts and decision trees, Biodiversity and Conservation, vol. 15, 1977-1991
    4. Stahl, K. (2005), Influence of hydroclimatology and socioeconomic conditions on water-related international relations, Water International, vol. 30, 270-282
    5. Kannebley, S., Porto, G.S., and Pazello, E.T. (2005), Characteristics of Brazilian innovative firms: An empirical analysis based on PINTEC - industrial research on technological innovation, Research Policy, vol. 34, 872-893
    6. Balaras, C.A., Droutsa, K., Dascalaki, E., and Kontoyiannidis, S. (2005), Deterioration of European apartment buildings, Energy and Buildings, vol. 37, 515-527
    7. Pal, M. and Mather, P.M. (2003), An assessment of the effectiveness of decision tree methods for land cover classification, Remote Sensing of Environment, vol. 86, 554-565
    8. Kedia, S. and Williams, C. (2003), Predictors of substance abuse treatment outcomes in Tennessee, Journal of Drug Education, vol. 33, 25-47
    9. Okura, Y., Matsumura, Y., Harauchi, H., Sukenobu, Y., Kou, H., Kohyama, S., Yasuda, N., Yamamoto, Y., and Inamura, K. (2002), An inductive method for automatic generation of referring physician prefetch rules for PACS, Journal of Digital Imaging, vol. 14, 226-231
    10. Olden, J.D. and Jackson, D.A. (2002), A comparison of statistical approaches for modelling fish species distributions, Freshwater Biology, vol. 47, 1976-1995
    11. Pryse-Phillips, W., Aube, M., Gawel, M., Nelson, R., Purdy, A., and Wilson, K. (2002), A headache diagnosis project, Headache: The Journal of Head and Face Pain, vol. 42, 728-737
    12. Croisier, D., Chavanet, P., Lequeu, C., Ahanou, A., Nierlich, A., Neuwirth, C., Piroth, L., Duong, M., Buisson, M., and Portier, H. (2002), Efficacy and pharmacodynamics of simulated human-like treatment with levofloxacin on experimental pneumonia induced with penicillin-resistant pneumococci with various susceptibilities to fluoroquinolones, Journal of Antimicrobial Chemotherapy, vol. 50, 349-360
    13. Carter, M., Elsner, J., and Bennett, S. (2000), A quantitative precipitation forecast experiment for Puerto Rico, Journal of Hydrology, vol. 239, 162-178
    14. Connor, G. and Woodcock, F. (2000), The application of synoptic stratification to precipitation forecasting in the trade wind regime, Weather and Forecasting, vol. 15, 276-297
    15. Elsner, J., Lehmiller, G., and Kimberlain, T. (1999), Objective classification of Atlantic hurricanes, Journal of Climate, vol. 9, 2880-2889
    16. Banerjee, M., Mitra, S., and Pal, S.K. (1998), Rough fuzzy MLP: Knowledge encoding and classification, IEEE Transactions on Neural Networks, vol. 9, 1203-1216
    17. Carter, M. and Elsner, J. (1997), A statistical method for forecasting rainfall over Puerto Rico, Weather and Forecasting, vol. 12, 515-525
    18. Wolberg, W.H., Tanner, M.A., and Loh, W.-Y. (1988), Diagnostic schemes for fine needle aspirates of breast masses, Analytical and Quantitative Cytology and Histology, vol. 10, 225-228
    19. Wolberg, W.H., Tanner, M.A., Loh, W.-Y., and Vanichsetakul, N. (1987), Statistical approach to fine needle aspiration diagnosis of breast masses, Acta Cytologica, vol. 31, 737-741
    20. Wolberg, W.H., Tanner, M.A., and Loh, W.-Y. (1989), Fine needle aspiration for breast mass diagnosis, Archives of Surgery, vol. 124, 814-818

    Related algorithms with unbiased splits:

  • CRUISE: Classification trees with more than two splits per node
  • GUIDE: Piecewise-linear least-squares, quantile, and Poisson regression trees
  • Application papers that use CRUISE, GUIDE, LOTUS, or QUEST: See file

    License:

    QUEST is free software. You may use the Program without restriction. You may copy and distribute the Program in executable form provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; and give any other recipients of the Program a copy of this license along with the Program.

    Disclaimer of Warranty:

    The copyright holder provides the Program "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of the Program is with you. Should the Program prove defective, you assume the cost of all necessary servicing, repair or correction. In no event will the copyright holder be liable to you for damages, including any general, special, incidental or consequential damages arising out of the use or inability to use the Program (including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the Program to operate with any other programs), even if such holder has been advised of the possibility of such damages.

    Return to Wei-Yin Loh's homepage.

    Last modified: August 4, 2008 by Wei-Yin Loh