Ensemble Classification Method for Imbalanced Data Using Deep Learning

Lee, Yoon Sang

doi:10.1007/978-3-030-22784-5_16

Yoon Sang Lee¹²

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 357))

Included in the following conference series:

Workshop on E-Business

792 Accesses
2 Citations

Abstract

Nowadays various types of devices provide abundant data, and many businesses want to pinpoint what they are interested in from the data such as target marketing, fraud transaction detection. However, current classification algorithms in data mining show a poor performance when classifying imbalanced data.

To enhance the classification performance of minority class in imbalanced datasets, we present an ensemble learning method using the combination of an UnderBagging, a majority vote, and a meta classifier giving higher decision priority to the classifier that predicts a class better, basing on Deep Neural Network (DNN) as a classifier. We tested our method with two imbalanced datasets from UCI Data Repository and compared the performance of our method with four other techniques. The result showed that our method provided an improved performance on classifying minority class instances compared to the other four techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sikora, R., Rania, S.: Controlled under-sampling with majority voting ensemble learning for class imbalance problem. In: Proceedings of the IEEE Computing Conference, London, UK (2018)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6, 20–29 (2004)
Article Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Levi, G., Hassncer, T.: Age and gender classification using convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 34–42 (2015)
Google Scholar
Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018)
Article Google Scholar
Drummond, C., Holte, R.C.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, pp. 1–8. Citeseer (2003)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)
Article Google Scholar
Kowalczyk, A., Raskutti, B.: One class SVM for yeast regulation prediction. SIGKDD Explor. Newsl. 4, 99–100 (2002)
Article Google Scholar
Raskutti, B., Kowalczyk, A.: Extreme re-balancing for SVMs: a case study. SIGKDD Explor. Newsl. 6, 60–69 (2004)
Article Google Scholar
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM, New York (1999)
Google Scholar
Elkan, C.: The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence – Volume 2, pp. 973–978. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Pazzani, M.J., Merz, C.J., Murphy, P.M., Ali, K.M., Hume, T., Brunk, C.: Reducing misclassification costs. In: Proceedings of the Eleventh International Conference on International Conference on Machine Learning, pp. 217–225. Morgan Kaufmann Publishers Inc., San Francisco (1994)
Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: 2009 Second International Workshop on Computer Science and Engineering. WCSE 2009, pp. 13–17. IEEE (2009)
Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining. CIDM 2009, pp. 324–331. IEEE (2009)
Google Scholar
Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C., Kuncheva, L.I.: Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl.-Based Syst. 85, 96–111 (2015)
Article Google Scholar
Barandela, R., Valdovinos, R.M., Sánchez, J.S.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6, 245–256 (2003)
Article MathSciNet Google Scholar
Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B.: New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur. J. Oper. Res. 218, 211–229 (2012)
Article Google Scholar
Lessmann, S., Voß, S.: A reference model for customer-centric data mining with support vector machines. Eur. J. Oper. Res. 199, 520–530 (2009)
Article MathSciNet Google Scholar
Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46, 707–730 (2016)
Article Google Scholar
Wang, B.X., Japkowicz, N.: Boosting support vector machines for imbalanced data sets. Knowl. Inf. Syst. 25, 1–20 (2010)
Article Google Scholar
Lane, P.C., Clarke, D., Hender, P.: On developing robust models for favourability analysis: model choice, feature sets and imbalanced data. Decis. Support Syst. 53, 712–718 (2012)
Article Google Scholar
Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy forecasting: an empirical comparison of AdaBoost and neural networks. Decis. Support Syst. 45, 110–122 (2008)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)
Article Google Scholar
Guo, C., Berkhahn, F.: Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737 (2016)
De Brébisson, A., Simon, É., Auvolat, A., Vincent, P., Bengio, Y.: Artificial neural networks applied to taxi destination prediction. arXiv preprint arXiv:1508.00021 (2015)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42, 463–484 (2012)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Columbus State, Columbus, GA, 31909, USA
Yoon Sang Lee

Authors

Yoon Sang Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoon Sang Lee .

Editor information

Editors and Affiliations

Bentley University, Waltham, MA, USA
Jennifer J. Xu
Oregon State University, Corvallis, OR, USA
Bin Zhu
University of Utah, Salt Lake City, UT, USA
Xiao Liu
University of Illinois, Urbana-Champaign, USA
Michael J. Shaw
Georgia Institute of Technology, Atlanta, GA, USA
Han Zhang
University of Washington, Seattle, WA, USA
Ming Fan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, Y.S. (2019). Ensemble Classification Method for Imbalanced Data Using Deep Learning. In: Xu, J., Zhu, B., Liu, X., Shaw, M., Zhang, H., Fan, M. (eds) The Ecosystem of e-Business: Technologies, Stakeholders, and Connections. WEB 2018. Lecture Notes in Business Information Processing, vol 357. Springer, Cham. https://doi.org/10.1007/978-3-030-22784-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-22784-5_16
Published: 27 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22783-8
Online ISBN: 978-3-030-22784-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics