Skip to main content

Handling Imbalanced Data in Churn Prediction Using RUSBoost and Feature Selection (Case Study: PT.Telekomunikasi Indonesia Regional 7)

  • Conference paper
  • First Online:
Recent Advances on Soft Computing and Data Mining (SCDM 2016)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 549))

Included in the following conference series:

Abstract

Solving imbalance problems is a challenging tasks in data mining and machine learning. Most classifiers are biased towards the majority class examples when learning from highly imbalanced data. In practice, churn prediction is considered as one of data mining application that reflects imbalance problems. This study investigates how to handle class imbalance in churn prediction using RUSBoost, a combination of random under-sampling and boosting algorithm, which is combined with feature selection for better performance result. The datasets used are broadband internet data collected from a telecommunication industry in Indonesia. The study firstly select the important features using Information Gain, and then building churn prediction model using RUSBoost with C4.5 as the weak learner. The result shows that feature selection and RUSBoost improve 16% of the performance of prediction and reduce 48% of the processing time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  2. Weiss, G.M.: Mining with rarity: a unifying framework. ACM SIGKDD Explor. Newsl. 6(1), 7–19 (2004)

    Article  Google Scholar 

  3. Berson, A., Smith, S.J., Thearling, K.: Building Data Mining Applications for CRM. McGraw-Hill Osborne (2000)

    Google Scholar 

  4. Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)

    Google Scholar 

  5. Tang, J., Alelyani, S., Liu, H.: Feature selection for classification: a review. In: Data Classification: Algorithms and Applications, p. 37 (2014)

    Google Scholar 

  6. Jamali, I., Bazmara, M., Jafari, S.: Feature Selection in Imbalance data sets. Int. J. Comput. Sci. Issues 9(3), 42–45 (2012)

    Google Scholar 

  7. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  8. Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11 (2003)

    Google Scholar 

  9. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)

    Article  Google Scholar 

  10. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  11. Effendy, V., Adiwijaya, Z., Baizal, A.: Handling imbalanced data in customer churn prediction using combined sampling and weighted random forest. In: 2014 2nd International Conference on Information and Communication Technology (ICoICT), pp. 325–330 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erna Dwiyanti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Dwiyanti, E., Adiwijaya, Ardiyanti, A. (2017). Handling Imbalanced Data in Churn Prediction Using RUSBoost and Feature Selection (Case Study: PT.Telekomunikasi Indonesia Regional 7). In: Herawan, T., Ghazali, R., Nawi, N.M., Deris, M.M. (eds) Recent Advances on Soft Computing and Data Mining. SCDM 2016. Advances in Intelligent Systems and Computing, vol 549. Springer, Cham. https://doi.org/10.1007/978-3-319-51281-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51281-5_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51279-2

  • Online ISBN: 978-3-319-51281-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics