Interpretable Survival Gradient Boosting Models with Bagged Trees Base Learners

Jarmulski, Wojciech; Wieczorkowska, Alicja

doi:10.1007/978-3-030-48861-1_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11948))

Included in the following conference series:

International Workshop on New Frontiers in Mining Complex Patterns

600 Accesses

Abstract

In this paper we present a novel survival analysis modeling approach based on gradient boosting using bagged trees as base learners. The resulting models consist of additive components of single variable models and their pairwise interactions, which makes them visually interpretable. We show that our method produces competitive results often having the predictive power higher than full-complexity models. This is achieved while maintaining full interpretability of the model, which makes our method useful in medical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc. Ser. B 34, 187–202 (1972). https://doi.org/10.1111/j.2517-6161.1972.tb00899.x
Article MathSciNet MATH Google Scholar
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008). https://doi.org/10.1214/08-AOAS169
Article MathSciNet MATH Google Scholar
Ridgeway, G.: The state of boosting. Comput. Sci. Stat. 31, 172–181 (1999)
Google Scholar
Katzman, J., Shaham, U., Bates, J., Cloninger, A., Jiang, T., Kluger, Y.: DeepSurv: personalized treatment recommender system using a cox proportional hazards deep neural network (2016). https://doi.org/10.1186/s12874-018-0482-1
Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York (1997). https://doi.org/10.1007/978-1-4757-2728-9
Book MATH Google Scholar
Rajkomar, A., Dean, J., Kohane, I.: Machine learning in medicine. N. Engl. J. Med. 380, 1347–1358 (2019). https://doi.org/10.1056/NEJMra1814259
Article Google Scholar
Vock, D.M., Wolfson, J., Bandyopadhyay, S., Adomavicius, G., Johnson, P.E., Vazquez-Benitez, G., et al.: Adapting machine learning techniques to censored time-to-event health record data: a general-purpose approach using inverse probability of censoring weighting. J. Biomed. Inform. 61, 119–131 (2016). https://doi.org/10.1016/j.jbi.2016.03.009
Article Google Scholar
Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A., Van Der Laan, M.J.: Survival ensembles. Biostatistics 7, 355–373 (2006). https://doi.org/10.1093/biostatistics/kxj011
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System (2016). https://doi.org/10.1145/2939672.2939785
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MathSciNet Google Scholar
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models, vol. 43. CRC Press, Boca Raton (1990)
MATH Google Scholar
Wood, S.: Generalized Additive Models: An Introduction with R. CRC Press, Boca Raton (2006)
Book Google Scholar
Lou, Y., Caruana, R., Gehrke, J., Hooker, G.: Accurate intelligible models with pairwise interactions. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Part F1288, pp. 623–631 (2013). https://doi.org/10.1145/2487575.2487579
Buehlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat. Sci. 22, 477–505 (2007)
Article Google Scholar
Lou, Y., Caruana, R., Gehrke, J.: Intelligible models for classification and regression. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 2012, p. 150. ACM Press, New York (2012). https://doi.org/10.1145/2339530.2339556
Chen, Y., Jia, Z., Mercola, D., Xie, X.: A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. 2013, 8 (2013)
MATH Google Scholar
Huster, W.J., Brookmeyer, R., Self, S.G.: Modelling paired survival data with covariates. Biometrics 45, 145–156 (1989)
Article MathSciNet Google Scholar
Blair, A.L., Hadden, D.R., Weaver, J.A., Archer, D.B., Johnston, P.B., Maguire, C.J.: The 5-year prognosis for vision in diabetes. Am. J. Ophthalmol. 81, 383–396 (1976)
Article Google Scholar
Curtis, C., Shah, S.P., Chin, S.-F., Turashvili, G., Rueda, O.M., Dunning, M.J., et al.: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012). https://doi.org/10.1038/nature10983
Article Google Scholar
Schumacher, M., Bastert, G., Bojar, H., Hübner, K., Olschewski, M., Sauerbrei, W., et al.: Randomized 2 × 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. J. Clin. Oncol. 12, 2086–2093 (1994). https://doi.org/10.1200/JCO.1994.12.10.2086
Article Google Scholar
Foekens, J.A., Peters, H.A., Look, M.P., Portengen, H., Schmitt, M., Kramer, M.D., et al.: The urokinase system of plasminogen activation and prognosis in 2780 breast cancer patients. Cancer Res. 60, 636–643 (2000)
Google Scholar
Hothorn, T., Buehlmann, P., Kneib, T., Schmid, M., Hofner, B.: {mboost}: Model-Based Boosting (2018)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Harrell Jr., F.E., Califf, R.M., Pryor, D.B., Lee, K.L., Rosati, R.A.: Evaluating the yield of medical tests. J. Am. Med. Assoc. 247, 2543–2546 (1982). https://doi.org/10.1001/jama.1982.03320430047030
Article Google Scholar
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Polish-Japanese Academy of Information Technology, Koszykowa 86, 02-008, Warsaw, Poland
Wojciech Jarmulski & Alicja Wieczorkowska

Authors

Wojciech Jarmulski
View author publications
You can also search for this author in PubMed Google Scholar
Alicja Wieczorkowska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wojciech Jarmulski .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
University of Bari Aldo Moro, Bari, Italy
Corrado Loglisci
CNR-ICAR, Rende, Italy
Giuseppe Manco
Federico II University, Naples, Italy
Elio Masciari
University of North Carolina, Charlotte, NC, USA
Zbigniew Ras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jarmulski, W., Wieczorkowska, A. (2020). Interpretable Survival Gradient Boosting Models with Bagged Trees Base Learners. In: Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z. (eds) New Frontiers in Mining Complex Patterns. NFMCP 2019. Lecture Notes in Computer Science(), vol 11948. Springer, Cham. https://doi.org/10.1007/978-3-030-48861-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-48861-1_3
Published: 14 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48860-4
Online ISBN: 978-3-030-48861-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)