Application of Bayesian Automated Hyperparameter Tuning on Classifiers Predicting Customer Retention in Banking Industry

Pandey, Akash Sampurnanand; Shukla, K. K.

doi:10.1007/978-981-15-5619-7_7

Akash Sampurnanand Pandey¹⁸ &
K. K. Shukla¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1175))

913 Accesses
2 Citations

Abstract

The paper aims to demonstrate the comparison of accuracy metrics achieved on nine different fundamental Machine Learning (ML) classifiers. Bayesian Automated Hyperparameter Tuning, with Tree-structured Parzen Estimator, has been performed on all of nine ML classifiers predicting the customers likely to be retained by the bank. After visualizing the nature of dataset and its constraints of class imbalance and limited training examples, Feature Engineering has been performed to compensate for the constraints. The ML techniques comprise first using six classifiers (namely, K-Nearest Neighbors, Naive Bayes, Decision Tree, Random Forest, SVM, and Artificial Neural Network––ANN) individually on the dataset with their default hyperparameters with and without Feature Engineering. Second, three boosting classifiers (namely, AdaBoost, XGBoost, and GradientBoost) were used without changing their default hyperparameters. Thirdly, on each classifier, Bayesian Automated Hyperparameter tuning (AHT) with Tree-structured Parzen Estimator was performed to optimize the hyperparameters to obtain the best results on the training data. Next, AHT was performed on the three boosting classifiers as well. The cross-validation mean training accuracy achieved is comparatively quite better than those achieved on this dataset so far on Kaggle and other research papers. Besides, such an extensive comparison of nine classifiers after Bayesian AHT on Banking Industry dataset has never been made before.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

T. Vafeiadis, K.I. Diamantaras, G. Sarigiannidis, K.C. Chatzisavvas, A comparison of machine learning techniques for customer churn prediction. Simul. Model. Pract. Theory 55, 1–9 (2015)
Article Google Scholar
S.A. Qureshi, A.S. Rehman, A.M. Qamar, A. Kamal, A. Rehman, Telecommunication subscribers’ churn prediction model using machine learning, in 2013 Eighth International Conference on Digital Information Management (ICDIM) (IEEE, 2013), pp. 131–136
Google Scholar
K. Kim, C.-H. Jun, J. Lee, Improved churn prediction in telecommunication industry by analyzing a large network. Exp. Syst. Appl.
Google Scholar
C. Kirui, L. Hong, W. Cheruiyot, H. Kirui, Predicting customer churn in mobile telephony industry using probabilistic classifiers in data mining. Int. J. Comput. Sci. Issues (IJCSI) 10(2)
Google Scholar
G. Kraljevi´c, S. Gotovac, Modeling data mining applications for prediction of prepaid churn in telecommunication services. AUTOMATIKA: casopis za automatiku, mjerenje, elektroniku, raˇcunarstvo i komunikacije 51(3), 275–283 (2010)
Google Scholar
R.J. Jadhav, U.T. Pawar, Churn prediction in telecommunication using data mining technology. IJACSA Editorial
Google Scholar
D. Radosavljevik, P. van der Putten, K.K. Larsen, The impact of experimental setup in prepaid churn prediction for mobile telecommunications: What to predict, for whom and does the customer experience matter? Trans. MLDM 3(2), 80–99 (2010)
Google Scholar
Y. Richter, E. Yom-Tov, N. Slonim, Predicting customer churn in mobile networks through analysis of social groups, in SDM, vol. 2010 (SIAM, 2010), pp. 732–741
Google Scholar
S¸. G¨ursoy, U. Tu˘gba, Customer churn analysis in telecommunication sector. J. School Bus. Admin. Istanbul Univer. 39(1), 35–49 (2010)
Google Scholar
K. Tsiptsis, A. Chorianopoulos, Data Mining Techniques in CRM: Inside Customer Segmentation (Wiley, New York, 2011)
Google Scholar
F. Eichinger, D.D. Nauck, F. Klawonn, Sequence mining for customer behaviour predictions in telecommunications, in Proceedings of the Workshop on Practical Data Mining at ECML/PKDD (2006), pp. 3–10
Google Scholar
A. Lemmens, C. Croux, Bagging and boosting classification trees to predict churn. J. Mark. Res. 43(2), 276–286 (2006)
Article Google Scholar
Y. Xie, X. Li, Churn prediction with linear discriminant boosting algorithm, in 2008 International Conference on Machine Learning and Cybernetics, vol. 1 (IEEE, 2008), pp. 228–233
Google Scholar
U.D. Prasad, S. Madhavi, Prediction of churn behaviour of bank customers using data mining tools. Indian J. Mark. 42(9), 25–30 (2011)
Google Scholar
Dataset available on. https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling
Invesp Consulting. https://www.invespcro.com/blog/customer-acquisition-retention/
The Chartered Institute of Marketing, Cost of customer acquisition versus customer retention (2010)
Google Scholar
B. Shahriari, K. Swersky, Z. Wang, R.P. Adams, N. de Freitas, Taking the human out of the loop: a review of bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)
Article Google Scholar
J.S. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper-parameter optimization. in Advances in Neural Information Processing Systems (2011), pp. 2546–2554
Google Scholar
F. Hutter, H.H. Hoos, K. Leyton-Brown, Sequential model-based optimization for general algorithm configuration, in International Conference on Learning and Intelligent Optimization (Springer, Heidelberg), pp. 507–523
Google Scholar
K. Potdar, T.S. Pardawala, C.D. Pai, A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl. 175(4), 7–9 (2017)
Google Scholar
Interquartile Range Upton, Graham; Cook, Ian Understanding Statistics (Oxford University Press, 1996)
Google Scholar
T. Wong, N, Yang, Dependency analysis of accuracy estimates in k-fold cross validation. IEEE Trans. Knowl. Data Eng. 29(11), 2417–2427 (2017)
Google Scholar
J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
C. Cortes, V.N. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
A. Ben-Hur, D. Horn, H. Siegelmann, V.N. Vapnik, Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2001)
Google Scholar
T.R. Patil, S.S. Sherekar, Performance analysis of Naive Bayes and J48 classification algorithm for data classification. Int. J. Comput. Sci. Appl. 6(2). ISSN: 0974-1011
Google Scholar
T.K. Ho Random decision forests, in Proceedings of the 3rd International Conference on Document Analysis and Recognition (Montreal, QC, 1995), pp. 278–282
Google Scholar
N.S. Altman, An introduction to kernel and nearest-neighbor nonparametric regression. Am. Statist. 46(3), 175–185 (1992)
MathSciNet Google Scholar
L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (Wadsworth and Brooks/Cole Advanced Books and Software, Monterey, CA, 1984)
MATH Google Scholar
T. Elhassan, M. Aljurf, Classification of imbalance data using Tomek Link (T-Link) combined with Random Under-Sampling (RUS) as a data reduction method
Google Scholar
S. Visa, A. Ralescu, Issues in mining imbalanced data sets-a review paper, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, vol. 2005 (2005), pp. 67–73). sn
Google Scholar
M.R. Spiegel, L.J. Stephens, Schaum’s outlines statistics, 4th edn. (McGraw Hill, 2008)
Google Scholar
I. Jolliffe, Principal component analysis, in International Encyclopedia of Statistical Science, ed. by M. Lovric (Springer, Heidelberg, 2011)
Google Scholar
R. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions (1999)
Google Scholar
https://github.com/dmlc/xgboost
J.H. Friedman, Greedy function approximation: a gradient boosting machine (1999)
Google Scholar
Scikit learn documentation credits/link. https://scikit-learn.org/stable/documentation.html
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Perrot, É. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, India
Akash Sampurnanand Pandey & K. K. Shukla

Authors

Akash Sampurnanand Pandey
View author publications
You can also search for this author in PubMed Google Scholar
K. K. Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akash Sampurnanand Pandey .

Editor information

Editors and Affiliations

Society for Data Science, Pune, Maharashtra, India
Neha Sharma
A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Amlan Chakrabarti
Department of Automatics and Applied Software, Faculty of Engineering, University of Arad, Arad, Romania
Valentina Emilia Balas
IT4Innovations, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Jan Martinovic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandey, A.S., Shukla, K.K. (2021). Application of Bayesian Automated Hyperparameter Tuning on Classifiers Predicting Customer Retention in Banking Industry. In: Sharma, N., Chakrabarti, A., Balas, V.E., Martinovic, J. (eds) Data Management, Analytics and Innovation. Advances in Intelligent Systems and Computing, vol 1175. Springer, Singapore. https://doi.org/10.1007/978-981-15-5619-7_7

Download citation

DOI: https://doi.org/10.1007/978-981-15-5619-7_7
Published: 19 September 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5618-0
Online ISBN: 978-981-15-5619-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics