Skip to main content

Ensemble Classification Method for Imbalanced Data Using Deep Learning

  • Conference paper
  • First Online:
The Ecosystem of e-Business: Technologies, Stakeholders, and Connections (WEB 2018)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 357))

Included in the following conference series:

Abstract

Nowadays various types of devices provide abundant data, and many businesses want to pinpoint what they are interested in from the data such as target marketing, fraud transaction detection. However, current classification algorithms in data mining show a poor performance when classifying imbalanced data.

To enhance the classification performance of minority class in imbalanced datasets, we present an ensemble learning method using the combination of an UnderBagging, a majority vote, and a meta classifier giving higher decision priority to the classifier that predicts a class better, basing on Deep Neural Network (DNN) as a classifier. We tested our method with two imbalanced datasets from UCI Data Repository and compared the performance of our method with four other techniques. The result showed that our method provided an improved performance on classifying minority class instances compared to the other four techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sikora, R., Rania, S.: Controlled under-sampling with majority voting ensemble learning for class imbalance problem. In: Proceedings of the IEEE Computing Conference, London, UK (2018)

    Google Scholar 

  2. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 20–29 (2004)

    Article  Google Scholar 

  3. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  4. Levi, G., Hassncer, T.: Age and gender classification using convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 34–42 (2015)

    Google Scholar 

  5. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)

    Article  Google Scholar 

  6. Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, pp. 1–8. Citeseer (2003)

    Google Scholar 

  7. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  8. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  9. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)

    Article  Google Scholar 

  10. Kowalczyk, A., Raskutti, B.: One class SVM for yeast regulation prediction. SIGKDD Explor. Newsl. 4, 99–100 (2002)

    Article  Google Scholar 

  11. Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. SIGKDD Explor. Newsl. 6, 60–69 (2004)

    Article  Google Scholar 

  12. Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM, New York (1999)

    Google Scholar 

  13. Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence – Volume 2, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  14. Pazzani, M.J., Merz, C.J., Murphy, P.M., Ali, K.M., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, pp. 217–225. Morgan Kaufmann Publishers Inc., San Francisco (1994)

    Google Scholar 

  15. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  16. Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: 2009 Second International Workshop on Computer Science and Engineering. WCSE 2009, pp. 13–17. IEEE (2009)

    Google Scholar 

  17. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining. CIDM 2009, pp. 324–331. IEEE (2009)

    Google Scholar 

  18. Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)

    Article  Google Scholar 

  19. Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)

    Article  MathSciNet  Google Scholar 

  20. Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B.: New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur. J. Oper. Res. 218, 211–229 (2012)

    Article  Google Scholar 

  21. Lessmann, S., Voß, S.: A reference model for customer-centric data mining with support vector machines. Eur. J. Oper. Res. 199, 520–530 (2009)

    Article  MathSciNet  Google Scholar 

  22. Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46, 707–730 (2016)

    Article  Google Scholar 

  23. Wang, B.X., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. Knowl. Inf. Syst. 25, 1–20 (2010)

    Article  Google Scholar 

  24. Lane, P.C., Clarke, D., Hender, P.: On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis. Support Syst. 53, 712–718 (2012)

    Article  Google Scholar 

  25. Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy forecasting: an empirical comparison of AdaBoost and neural networks. Decis. Support Syst. 45, 110–122 (2008)

    Article  Google Scholar 

  26. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)

    Article  Google Scholar 

  27. Guo, C., Berkhahn, F.: Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737 (2016)

  28. De Brébisson, A., Simon, É., Auvolat, A., Vincent, P., Bengio, Y.: Artificial neural networks applied to taxi destination prediction. arXiv preprint arXiv:1508.00021 (2015)

  29. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42, 463–484 (2012)

    Article  Google Scholar 

  30. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoon Sang Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lee, Y.S. (2019). Ensemble Classification Method for Imbalanced Data Using Deep Learning. In: Xu, J., Zhu, B., Liu, X., Shaw, M., Zhang, H., Fan, M. (eds) The Ecosystem of e-Business: Technologies, Stakeholders, and Connections. WEB 2018. Lecture Notes in Business Information Processing, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-22784-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22784-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22783-8

  • Online ISBN: 978-3-030-22784-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics