Skip to main content

LIUBoost: Locality Informed Under-Boosting for Imbalanced Data Classification

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Abstract

The problem of class imbalance along with class overlapping has become a major issue in the domain of supervised learning. Most classification algorithms assume equal cardinality of the classes under consideration while optimising the cost function, and this assumption does not hold true for imbalanced datasets, which results in suboptimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble-based methods, have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting, while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting-based method called Locality Informed Under-Boosting (LIUBoost). LIUBoost uses undersampling for balancing the datasets in every boosting iteration like Random Undersampling with Boosting (RUSBoost), while incorporating a cost term for every instance based on their hardness into the weight update formula minimising the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets, and the results indicate significant improvement over existing best performing method RUSBoost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Afza, A.A., Farid, D.M., Rahman, C.M.: A hybrid classifier using boosting, clustering, and naïve bayesian classifier. World Comput. Sci. Inf. Technol. J. (WCSIT) 1, 105–109 (2011). ISSN: 2221-0741

    Google Scholar 

  2. Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17 (2011)

    Google Scholar 

  3. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIDKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  4. Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, pp. 269–278. Springer (2013)

    Google Scholar 

  5. Borowska, K., Stepaniuk, J.: Rough sets in imbalanced data problem: Improving re–sampling process. In: IFIP International Conference on Computer Information Systems and Industrial Management, pp. 459–469. Springer (2017)

    Google Scholar 

  6. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  8. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Advances in Knowledge Discovery and Data Mining, pp. 475–482 (2009)

    Google Scholar 

  9. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)

    Google Scholar 

  10. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  11. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. ICML 99, 97–105 (1999)

    Google Scholar 

  12. Farid, D., Nguyen, H.H., Darmont, J., Harbi, N., Rahman, M.Z.: Scaling up detection rates and reducing false positives in intrusion detection using NBtree. In: International Conference on Data Mining and Knowledge Engineering (ICDMKE 2010), pp. 186–190 (2010)

    Google Scholar 

  13. Farid, D.M., Al-Mamun, M.A., Manderick, B., Nowe, A.: An adaptive rule-based classifier for mining big biological data. Expert Syst. Appl. 64, 305–316 (2016)

    Article  Google Scholar 

  14. Farid, D.M., Nowé, A., Manderick, B.: A new data balancing method for classifying multi-class imbalanced genomic data. In: Proceedings of 5th Belgian-Dutch Conference on Machine Learning (Benelearn), pp. 1–2 (2016)

    Google Scholar 

  15. Farid, D.M., Zhang, L., Hossain, A., Rahman, C.M., Strachan, R., Sexton, G., Dahal, K.: An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst. Appl. 40(15), 5895–5906 (2013)

    Article  Google Scholar 

  16. Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M., Strachan, R.: Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014)

    Article  Google Scholar 

  17. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 42(4), 463–484 (2012)

    Google Scholar 

  18. García, V., Sánchez, J., Mollineda, R.: An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Proceedings of Progress in Pattern Recognition, Image Analysis and Applications, pp. 397–406 (2007)

    Google Scholar 

  19. Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, pp. 878–887 (2005)

    Google Scholar 

  20. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE World Congress on Computational Intelligence. IEEE International Joint Conference on Neural Networks, 2008, IJCNN 2008, pp. 1322–1328. IEEE (2008)

    Google Scholar 

  21. Hopfield, J.J.: Artificial neural networks. IEEE Circ. Devices Mag. 4(5), 3–10 (1988)

    Article  Google Scholar 

  22. Karakoulas, G.I., Shawe-Taylor, J.: Optimizing classifers for imbalanced training sets. In: Proceedings of Advances in Neural Information Processing Systems, pp. 253–259 (1999)

    Google Scholar 

  23. Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41(3), 552–568 (2011)

    Article  Google Scholar 

  24. Liu, Y.H., Chen, Y.T.: Total margin based adaptive fuzzy support vector machines for multiview face recognition. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1704–1711. IEEE (2005)

    Google Scholar 

  25. Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)

    Google Scholar 

  26. Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016)

    Article  Google Scholar 

  27. Prati, R.C., Batista, G., Monard, M.C., et al.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Proceedings of MICAI, vol. 4, pp. 312–321. Springer (2004)

    Google Scholar 

  28. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)

    Article  Google Scholar 

  29. Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)

    Article  Google Scholar 

  30. Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)

    Article  Google Scholar 

  31. Tomašev, N., Mladenić, D.: Class imbalance and the curse of minority hubs. Knowl. Based Syst. 53, 157–172 (2013)

    Article  Google Scholar 

  32. Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dewan Md. Farid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ahmed, S., Rayhan, F., Mahbub, A., Rafsan Jani, M., Shatabda, S., Farid, D.M. (2019). LIUBoost: Locality Informed Under-Boosting for Imbalanced Data Classification. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_12

Download citation

Publish with us

Policies and ethics