LIUBoost: Locality Informed Under-Boosting for Imbalanced Data Classification

Ahmed, Sajid; Rayhan, Farshid; Mahbub, Asif; Rafsan Jani, Md.; Shatabda, Swakkhar; Farid, Dewan Md.

doi:10.1007/978-981-13-1498-8_12

Sajid Ahmed¹⁹,
Farshid Rayhan¹⁹,
Asif Mahbub¹⁹,
Md. Rafsan Jani¹⁹,
Swakkhar Shatabda¹⁹ &
…
Dewan Md. Farid¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

1180 Accesses
8 Citations

Abstract

The problem of class imbalance along with class overlapping has become a major issue in the domain of supervised learning. Most classification algorithms assume equal cardinality of the classes under consideration while optimising the cost function, and this assumption does not hold true for imbalanced datasets, which results in suboptimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble-based methods, have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting, while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting-based method called Locality Informed Under-Boosting (LIUBoost). LIUBoost uses undersampling for balancing the datasets in every boosting iteration like Random Undersampling with Boosting (RUSBoost), while incorporating a cost term for every instance based on their hardness into the weight update formula minimising the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets, and the results indicate significant improvement over existing best performing method RUSBoost.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Afza, A.A., Farid, D.M., Rahman, C.M.: A hybrid classifier using boosting, clustering, and naïve bayesian classifier. World Comput. Sci. Inf. Technol. J. (WCSIT) 1, 105–109 (2011). ISSN: 2221-0741
Google Scholar
Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17 (2011)
Google Scholar
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIDKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Błaszczyński, J., Stefanowski, J., Idkowiak, Ł.: Extending bagging for imbalanced data. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, pp. 269–278. Springer (2013)
Google Scholar
Borowska, K., Stepaniuk, J.: Rough sets in imbalanced data problem: Improving re–sampling process. In: IFIP International Conference on Computer Information Systems and Industrial Management, pp. 459–469. Springer (2017)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Advances in Knowledge Discovery and Data Mining, pp. 475–482 (2009)
Google Scholar
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: Smoteboost: improving prediction of the minority class in boosting. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 107–119. Springer (2003)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. ICML 99, 97–105 (1999)
Google Scholar
Farid, D., Nguyen, H.H., Darmont, J., Harbi, N., Rahman, M.Z.: Scaling up detection rates and reducing false positives in intrusion detection using NBtree. In: International Conference on Data Mining and Knowledge Engineering (ICDMKE 2010), pp. 186–190 (2010)
Google Scholar
Farid, D.M., Al-Mamun, M.A., Manderick, B., Nowe, A.: An adaptive rule-based classifier for mining big biological data. Expert Syst. Appl. 64, 305–316 (2016)
Article Google Scholar
Farid, D.M., Nowé, A., Manderick, B.: A new data balancing method for classifying multi-class imbalanced genomic data. In: Proceedings of 5th Belgian-Dutch Conference on Machine Learning (Benelearn), pp. 1–2 (2016)
Google Scholar
Farid, D.M., Zhang, L., Hossain, A., Rahman, C.M., Strachan, R., Sexton, G., Dahal, K.: An adaptive ensemble classifier for mining concept drifting data streams. Expert Syst. Appl. 40(15), 5895–5906 (2013)
Article Google Scholar
Farid, D.M., Zhang, L., Rahman, C.M., Hossain, M., Strachan, R.: Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41(4), 1937–1946 (2014)
Article Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 42(4), 463–484 (2012)
Google Scholar
García, V., Sánchez, J., Mollineda, R.: An empirical study of the behavior of classifiers on imbalanced and overlapped data sets. In: Proceedings of Progress in Pattern Recognition, Image Analysis and Applications, pp. 397–406 (2007)
Google Scholar
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. Advances in Intelligent Computing, pp. 878–887 (2005)
Google Scholar
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE World Congress on Computational Intelligence. IEEE International Joint Conference on Neural Networks, 2008, IJCNN 2008, pp. 1322–1328. IEEE (2008)
Google Scholar
Hopfield, J.J.: Artificial neural networks. IEEE Circ. Devices Mag. 4(5), 3–10 (1988)
Article Google Scholar
Karakoulas, G.I., Shawe-Taylor, J.: Optimizing classifers for imbalanced training sets. In: Proceedings of Advances in Neural Information Processing Systems, pp. 253–259 (1999)
Google Scholar
Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 41(3), 552–568 (2011)
Article Google Scholar
Liu, Y.H., Chen, Y.T.: Total margin based adaptive fuzzy support vector machines for multiview face recognition. In: 2005 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1704–1711. IEEE (2005)
Google Scholar
Mani, I., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of Workshop on Learning from Imbalanced Datasets, vol. 126 (2003)
Google Scholar
Napierala, K., Stefanowski, J.: Types of minority class examples and their influence on learning classifiers from imbalanced data. J. Intell. Inf. Syst. 46(3), 563–597 (2016)
Article Google Scholar
Prati, R.C., Batista, G., Monard, M.C., et al.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Proceedings of MICAI, vol. 4, pp. 312–321. Springer (2004)
Google Scholar
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Article Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K., Wang, Y.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Article Google Scholar
Sun, Z., Song, Q., Zhu, X., Sun, H., Xu, B., Zhou, Y.: A novel ensemble method for classifying imbalanced data. Pattern Recogn. 48(5), 1623–1637 (2015)
Article Google Scholar
Tomašev, N., Mladenić, D.: Class imbalance and the curse of minority hubs. Knowl. Based Syst. 53, 157–172 (2013)
Article Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
Sajid Ahmed, Farshid Rayhan, Asif Mahbub, Md. Rafsan Jani, Swakkhar Shatabda & Dewan Md. Farid

Authors

Sajid Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Farshid Rayhan
View author publications
You can also search for this author in PubMed Google Scholar
Asif Mahbub
View author publications
You can also search for this author in PubMed Google Scholar
Md. Rafsan Jani
View author publications
You can also search for this author in PubMed Google Scholar
Swakkhar Shatabda
View author publications
You can also search for this author in PubMed Google Scholar
Dewan Md. Farid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dewan Md. Farid .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
Department of Computer and System Sciences, Visva-Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, University of Kalyani, Kalyani, India
Jyotsna Kumar Mandal
Institute of Engineering and Management, Kolkata, West Bengal, India
Abhishek Bhattacharya
Institute of Engineering and Management, Kolkata, West Bengal, India
Soumi Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahmed, S., Rayhan, F., Mahbub, A., Rafsan Jani, M., Shatabda, S., Farid, D.M. (2019). LIUBoost: Locality Informed Under-Boosting for Imbalanced Data Classification. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_12

Download citation

DOI: https://doi.org/10.1007/978-981-13-1498-8_12
Published: 02 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics