Handling Imbalanced Data in Churn Prediction Using RUSBoost and Feature Selection (Case Study: PT.Telekomunikasi Indonesia Regional 7)

Dwiyanti, Erna; Adiwijaya; Ardiyanti, Arie

doi:10.1007/978-3-319-51281-5_38

Erna Dwiyanti¹⁸,
Adiwijaya¹⁸ &
Arie Ardiyanti¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 549))

Included in the following conference series:

International Conference on Soft Computing and Data Mining

1275 Accesses
5 Citations

Abstract

Solving imbalance problems is a challenging tasks in data mining and machine learning. Most classifiers are biased towards the majority class examples when learning from highly imbalanced data. In practice, churn prediction is considered as one of data mining application that reflects imbalance problems. This study investigates how to handle class imbalance in churn prediction using RUSBoost, a combination of random under-sampling and boosting algorithm, which is combined with feature selection for better performance result. The datasets used are broadband internet data collected from a telecommunication industry in Indonesia. The study firstly select the important features using Information Gain, and then building churn prediction model using RUSBoost with C4.5 as the weak learner. The result shows that feature selection and RUSBoost improve 16% of the performance of prediction and reduce 48% of the processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)
Article Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)
Article Google Scholar
Berson, A., Smith, S.J., Thearling, K.: Building Data Mining Applications for CRM. McGraw-Hill Osborne (2000)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Google Scholar
Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)
Google Scholar
Jamali, I., Bazmara, M., Jafari, S.: Feature Selection in Imbalance data sets. Int. J. Comput. Sci. Issues 9(3), 42–45 (2012)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11 (2003)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Article Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Effendy, V., Adiwijaya, Z., Baizal, A.: Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest. In: 2014 2nd International Conference on Information and Communication Technology (ICoICT), pp. 325–330 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Telkom University, Bandung, Indonesia
Erna Dwiyanti, Adiwijaya & Arie Ardiyanti

Authors

Erna Dwiyanti
View author publications
You can also search for this author in PubMed Google Scholar
Adiwijaya
View author publications
You can also search for this author in PubMed Google Scholar
Arie Ardiyanti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erna Dwiyanti .

Editor information

Editors and Affiliations

Department of Information System, University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Rozaida Ghazali
Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Nazri Mohd Nawi
Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Mustafa Mat Deris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dwiyanti, E., Adiwijaya, Ardiyanti, A. (2017). Handling Imbalanced Data in Churn Prediction Using RUSBoost and Feature Selection (Case Study: PT.Telekomunikasi Indonesia Regional 7). In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol 549. Springer, Cham. https://doi.org/10.1007/978-3-319-51281-5_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-51281-5_38
Published: 29 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51279-2
Online ISBN: 978-3-319-51281-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics