Skip to main content

Tree-Based Methods and Their Applications

  • Reference work entry
Springer Handbook of Engineering Statistics

Part of the book series: Springer Handbooks ((SHB))

Abstract

The first part of this chapter introduces the basic structure of tree-based methods using two examples. First, a classification tree is presented that uses e-mail text characteristics to identify spam. The second example uses a regression tree to estimate structural costs for seismic rehabilitation of various types of buildings. Our main focus in this section is the interpretive value of the resulting models.

This brief introduction is followed by a more detailed look at how these tree models are constructed. In the second section, we describe the algorithm employed by classification and regression tree (CART), a popular commercial software program for constructing trees for both classification and regression problems. In each case, we outline the processes of growing and pruning trees and discuss available options. The section concludes with a discussion of practical issues, including estimating a treeʼs predictive ability, handling missing data, assessing variable importance, and considering the effects of changes to the learning sample.

The third section presents several alternatives to the algorithms used by CART. We begin with a look at one class of algorithms – including QUEST, CRUISE, and GUIDE– which is designed to reduce potential bias toward variables with large numbers of available splitting values. Next, we explore C4.5, another program popular in the artificial-intelligence and machine-learning communities. C4.5 offers the added functionality of converting any tree to a series of decision rules, providing an alternative means of viewing and interpreting its results. Finally, we discuss chi-square automatic interaction detection (CHAID), an early classification-tree construction algorithm used with categorical predictors. The section concludes with a brief comparison of the characteristics of CART and each of these alternative algorithms.

In the fourth section, we discuss the use of ensemble methods for improving predictive ability. Ensemble methods generate collections of trees using different subsets of the training data. Final predictions are obtained by aggregating over the predictions of individual members of these collections. The first ensemble method we consider is boosting, a recursive method of generating small trees that each specialize in predicting cases for which its predecessors perform poorly. Next, we explore the use of random forests, which generate collections of trees based on bootstrap sampling procedures. We also comment on the tradeoff between the predictive power of ensemble methods and the interpretive value of their single-tree counterparts.

The chapter concludes with a discussion of tree-based methods in the broader context of supervised learning techniques. In particular, we compare classification and regression trees to multivariate adaptive regression splines, neural networks, and support vector machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 309.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

CART:

classification and regression tree

CRUISE:

classification rule with unbiased interaction selection and estimation

CVP:

critical value pruning

EBP:

error-based pruning

GUIDE:

generalized, unbiased interaction detection and estimation

LDA:

linear discriminant analysis

MARS:

multivariate adaptive regression splines

MART:

multiple additive regression tree

MEP:

minimum error pruning

MSE:

mean square errors

PEP:

pessimistic error pruning

QUEST:

quick, unbiased and efficient statistical tree

REP:

reduced error pruning

RF:

random forest

SVM:

support vector machine

iid:

independent identically distributed

References

  1. C. L. Blake, C. J. Merz: UCI repository of machine learning databases http://www.ics.uci.edu/mlearn/MLRepository.html (Department of Information and Computer Science (Univ. California), Irvine 1998)

    Google Scholar 

  2. K.-Y. Chan, W.-Y. Loh: LOTUS: An algorithm for building accurate, comprehensible logistic regression trees, J. Comput. Graph. Stat. 13(4), 826–852 (2004)

    Article  MathSciNet  Google Scholar 

  3. Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 156, Vol. 1–Summary, 2nd edn. (FEMA, Washington 1993)

    Google Scholar 

  4. Federal Emergency Management Agency: Typical Costs of Seismic Rehabilitation of Existing Buildings, FEMA 157, Vol. 2–Supporting Documentation, 2nd edn. (FEMA, Washington 1993)

    Google Scholar 

  5. L. Breiman, J. Friedman, R. Olshen, C. Stone: Classification and Regression Trees (Chapman Hall, New York 1984)

    MATH  Google Scholar 

  6. W.-Y. Loh, Y.-S. Shih: Split selection methods for classificaiton trees, Stat. Sin. 7, 815–840 (1997)

    MathSciNet  MATH  Google Scholar 

  7. H. Kim, W.-Y. Loh: Classification trees with unbiased multiway splits, J. Am. Stat. Assoc. 96, 589–604 (2001)

    Article  MathSciNet  Google Scholar 

  8. W.-Y. Loh: Regression trees with unbiased variable selection, interaction detection, Stat. Sin. 12, 361–386 (2002)

    MathSciNet  MATH  Google Scholar 

  9. J. R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann, San Mateo 1993)

    Google Scholar 

  10. G. V. Kass: An exploratory technique for investigating large quantities of categorical data, Appl. Stat. 29, 119–127 (1980)

    Article  Google Scholar 

  11. R. A. Fisher: The use of multiple measurements in taxonomic problems, Ann. Eugenic. 7, 179–188 (1936)

    Google Scholar 

  12. T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning: Data Mining, Inference, Prediction (Springer, Berlin Heidelberg New York 2001)

    Google Scholar 

  13. F. Esposito, D. Malerba, G. Semeraro: A comparative analysis of methods for pruning decision trees, IEEE Trans. Pattern Anal. 19, 476–491 (1997)

    Article  Google Scholar 

  14. P. Ein-Dor, J. Feldmesser: Attributes of the performance of central processing units: a relative performance prediction model, Commun. ACM 30, 308–317 (1987)

    Article  Google Scholar 

  15. R. J. Little, D. B. Rubin: Statistical Analysis with Missing Data, 2nd edn. (Wiley, Boboken 2002)

    MATH  Google Scholar 

  16. L. Breiman: Bagging predictors, Mach. Learn. 24, 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  17. H. Drucker, C. Cortes: Boosting decision trees. In: Adv. Neur. Inf. Proc. Syst., Proc. NIPSʼ95, Vol. 8, ed. by M. C. Mozer D.  S. Touretzky, E. Hasselmo (Ed.) M. (MIT Press, Cambridge 1996) pp. 479–485

    Google Scholar 

  18. W.-Y. Loh, N. Vanichsetakul: Tree-structured classification via generalized discriminant analysis (with discussion), J. Am. Stat. Assoc. 83, 715–728 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  19. J. A. Hartigan, M. A. Wong: Algorithm 136, A k-means clustering algorithm, Appl. Stat. 28, 100 (1979)

    Article  MATH  Google Scholar 

  20. J. R. Quinlan: Discovering rules by induction from large collections of examples. In: Expert Systems in the Micro-Electronic Age, ed. by D. Michie (Edinburgh Univ. Press, Edinburgh 1979) pp. 168–201

    Google Scholar 

  21. E. B. Hunt, J. Marin, P. J. Stone: Experiments in Induction (Academic, New York 1966)

    Google Scholar 

  22. J. Dougherty, R. Kohavi, M. Sahami: Supervised, unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning, ed. by A. Prieditis, S. J. Russel (Morgan Kaufmann, San Mateo 1995) pp. 194–202

    Google Scholar 

  23. J. R. Quinlan: Improved use of continuous attributes in C4.5, J. Artif. Intell. Res. 4, 77–90 (1996)

    MATH  Google Scholar 

  24. T.-S. Lim, W.-Y. Loh, Y.-S. Shih: A comparison of prediction accuracy, complexity, training time of thirty-three old and new classification algorithms, Mach. Learn. J. 40, 203–228 (2000)

    Article  MATH  Google Scholar 

  25. E. Bauer, R. Kohavi: An empirical comparison of voting classification algorithms: bagging, boosting, variants, Mach. Learn. 36, 105–139 (1999)

    Article  Google Scholar 

  26. L. Breiman: Statistical modeling: the two cultures, Stat. Sci. 16, 199–215 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  27. L. Breiman: Random forests, Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  28. T. G. Dietterich: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, randomization, Mach. Learn. 40, 139–157 (2000)

    Article  Google Scholar 

  29. Y. Freund, R. E. Schapire: Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, ed. by L. Saitta (Morgan Kaufmann, San Mateo 1996) pp. 148–156

    Google Scholar 

  30. R. Schapire: The strength of weak learnability, Mach. Learn. 5(2), 197–227 (1990)

    Google Scholar 

  31. Y. Freund: Boosting aweak learning algorithm by majority, Inform. Comput. 121(2), 256–285 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  32. Y. Freund, R. E. Schapire: A decision-theoretic generalization of on-line learning, an application to boosting, J. Comput. Syst. Sci. 55, 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  33. J. Friedman, T. Hastie, R. Tibshirani: Additive logistic regression: astatistical view of boosting (with discussion), Ann. Stat. 28, 337–374 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  34. T. K. Ho: The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. 20, 832–844 (1998)

    Article  Google Scholar 

  35. M. R. Segal: Machine learning benchmarks, random forest regression, Technical Report, Center for Bioinformatics and Molecular Biostatistics (Univ. California, San Francisco 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nan Lin , Douglas Noe or Xuming He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag

About this entry

Cite this entry

Lin, N., Noe, D., He, X. (2006). Tree-Based Methods and Their Applications. In: Pham, H. (eds) Springer Handbook of Engineering Statistics. Springer Handbooks. Springer, London. https://doi.org/10.1007/978-1-84628-288-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-288-1_30

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-806-0

  • Online ISBN: 978-1-84628-288-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics