Lightweight surrogate random forest support for model simplification and feature relevance

Kim, Sangwon; Jeong, Mira; Ko, Byoung Chul

doi:10.1007/s10489-021-02451-x

Lightweight surrogate random forest support for model simplification and feature relevance

Published: 03 May 2021

Volume 52, pages 471–481, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

694 Accesses
10 Citations
Explore all metrics

Abstract

In this study, we propose a lightweight surrogate random forest (L-SRF) algorithm that can be interpreted through a new rule distillation method. The common surrogate models replace the existing heavy and deep but high-performance black box model using a teacher–student learning framework. However, the student model obtained in this way must maintain the performance of the teacher model, and thus the degree of model simplification and transparency is extremely limited. Therefore, to increase model transparency while maintaining the performance of the surrogate model, we propose two methods. First, we propose a cross-entropy Shapley value to evaluate the contribution of each rule in the student surrogate model. Second, a random mini-grouping method is devised to effectively distilless important rules while minimizing the overfitting problem caused by a model simplification. The proposed L-SRF based on a rule contribution has the advantage of improving the degree of simplification and transparency of the model by realizing the large distillation ratio against the initial SRF model. In addition, because the proposed L-SRF removes unnecessary rules, it is possible to minimize the loss of the importance and relevance of each feature. To demonstrate the superior performance of the proposed L-SRF method, several comparative experiments were conducted on various data sets. We proved experimentally that the proposed method achieves a more effective performance than black box AI models in terms of model transparency and memory requirement, as well as the interpretation of the feature relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selective Cascade of Residual ExtraTrees

Article 24 October 2020

A review of random forest-based feature selection methods for data science education and applications

Article 03 February 2024

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

Notes

A shorter version of this paper was presented at the NeurIPS2020 Workshop.

References

Adadi A, Berrada M (2018) Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160
Article Google Scholar
Arrieta AB, et al. (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. ELSEVIER Inf Fusion 58:82–115
Article Google Scholar
Tan S et al (2018) Distill-and-compare: auditing black-box models using transparent model distillation. In: 2018 AAAI/ACM conference on AI, ethics and society. pp 303–310
Bastani O, Kim C, Bastani H. (2017) Interpretability via model extraction. arXiv:1706.09773
Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR, pp 1–11
Xu K et al (2018) Interpreting deep classifier by visual distillation of dark knowledge. arXiv:1803.04042
Kim S, Jeong M, Ko BC (2020) Interpretation and simplification of deep forest. TechRxiv, techrxiv. 11661246.v1
Kim S, Boukouvala F (2020) Machine learning-based surrogate modeling for data-driven optimization: a comparison of subset selection for regression techniques. Springer Optim Lett 14:989–1010
Article MathSciNet Google Scholar
Kim S, Jeong M, Ko BC (2020) Energy efficient pupil tracking based on rule distillation of cascade regression forest. MDPI Sensors 20:1–17
Google Scholar
Kim S, Jeong M, Ko BC (2020) Is the surrogate model interpretable?. In: NeurIPS workshops. pp 1–5
Kim SJ, Kwak SY, Ko BC (2019) Fast pedestrian detection in surveillance video based on soft target training of shallow random forest. IEEE ACCESS 7:12415–12426
Article Google Scholar
Breiman L (2001) Random forest. Springer Mach Learn 45:5–32
Article Google Scholar
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MathSciNet Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: 22nd ACM SIGKDD International conference on knowledge discovery and data mining. pp 785–794
Dorogush AV, Ershov V, Gulin A (2018) CatBoost: gradient boosting with categorical features support. arXiv:1810.11363
Lundberg SM, et al. (2020) From local explanations to global understanding with explainable AI for trees. Nature Mach Intell 2:56–67
Article Google Scholar
Shapley LS (1953) A value for n-person games. In: Contributions to the theory of games, vol 2, pp 307–317
Dua D, Graff C (2019) UCI Machine learning repository
Olson RS, et al. (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining 10:1–13
Article Google Scholar
Erickson N et al (2020) AutoGluon-tabular: robust and accurate automl for structured data. arXiv:2003.06505
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42
Article Google Scholar
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 3:408–421
Article MathSciNet Google Scholar
Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20:273–297
MATH Google Scholar
Zhou ZH, Feng J (2017) Deep forest: towards an alternative to deep neural networks. arXiv:1702.08835
Freund Y, Schapire R (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Article MathSciNet Google Scholar
Ke G et al (2017) Lightgbm: A highly efficient gradient boosting decision tree. In: NeurIPS, pp 3146–3154
Duan T et al (2020) Ngboost: Natural gradient boosting for probabilistic prediction. In: ICML, pp 2690–2700
Kokel H et al (2020) A unified framework for knowledge intensive gradient boosting: leveraging human experts for noisy sparse domains. In: AAAI. pp 4460–4468

Download references

Acknowledgements

This research was supported by the Bisa Research Grant of Keimyung University in 2021.

Author information

Authors and Affiliations

Keimyung University, Deagu, 1095, Korea
Sangwon Kim, Mira Jeong & Byoung Chul Ko

Authors

Sangwon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Mira Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Byoung Chul Ko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Byoung Chul Ko.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S., Jeong, M. & Ko, B.C. Lightweight surrogate random forest support for model simplification and feature relevance. Appl Intell 52, 471–481 (2022). https://doi.org/10.1007/s10489-021-02451-x

Download citation

Accepted: 20 April 2021
Published: 03 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02451-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight surrogate random forest support for model simplification and feature relevance

Abstract

Access this article

Similar content being viewed by others

Selective Cascade of Residual ExtraTrees

A review of random forest-based feature selection methods for data science education and applications

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lightweight surrogate random forest support for model simplification and feature relevance

Abstract

Access this article

Similar content being viewed by others

Selective Cascade of Residual ExtraTrees

A review of random forest-based feature selection methods for data science education and applications

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation