Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing

Borowska, Katarzyna; Stepaniuk, Jarosław

doi:10.1007/978-3-319-99954-8_20

Katarzyna Borowska¹⁵ &
Jarosław Stepaniuk¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11127))

Included in the following conference series:

IFIP International Conference on Computer Information Systems and Industrial Management

1163 Accesses
3 Citations

Abstract

Selective preprocessing, representing data–level approach to the imbalanced data problem, is one of the most successful methods. This paper introduces novel algorithm combining this kind of technique with the filtering phase. The information granules are formed to distinguish specific types of positive examples that should be adequately treated. Three modes of oversampling, dedicated to minority class instances placed in specific areas of the feature space, are available. The rough set theory is applied to filter and remove inconsistencies from the generated positive samples. The experimental study shows that proposed method in most cases obtains better or similar performance of standard classifiers, such as C4.5 decision tree, in comparison with other techniques. Additionally, multiple values of algorithm’s parameters are evaluated. It is experimentally proven that two of the examined parameters values are the most appropriate to various applications. However, the automatic parameters tuning, based on the specific requirements of different data distributions, is recommended.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alcala-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17(2–3), 255–287 (2011)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Article Google Scholar
Borowska, K., Stepaniuk, J.: Imbalanced data classification: a novel re-sampling approach combining versatile improved SMOTE and rough sets. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 31–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_4
Chapter Google Scholar
Borowska, K., Stepaniuk, J.: Rough sets in imbalanced data problem: improving re–sampling process. In: Saeed, K., Homenda, W., Chaki, R. (eds.) CISIM 2017. LNCS, vol. 10244, pp. 459–469. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59105-6_39
Chapter Google Scholar
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43
Chapter Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)
MATH Google Scholar
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)
Article Google Scholar
Garcia, V., Mollineda, R.A., Sanchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11(3–4), 269–280 (2008)
Article MathSciNet Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, WCSE 2009, Qingdao, pp. 13–17 (2009)
Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)
Article Google Scholar
Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13529-3_18
Chapter Google Scholar
Pawlak, Z., Skowron, A.: Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)
Article MathSciNet Google Scholar
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB\(_{*}\): a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2011)
Article Google Scholar
Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Matwin, S., Mielniczuk, J. (eds.) Challenges in Computational Statistics and Data Mining. SCI, vol. 605, pp. 333–363. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-18781-5_17
Chapter Google Scholar
Stepaniuk, J.: Rough-Granular Computing in Knowledge Discovery and Data Mining, vol. 152. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70801-8
Book MATH Google Scholar
UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/ml/. Accessed 28 Apr 2018
Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)
Article MathSciNet Google Scholar
Zhu X., Pedrycz W.: Granular under-sampling for processing imbalanced data. IEEE (2018, in Print)
Google Scholar

Download references

Acknowledgements

This research was supported by the grant S/WI/1/2018 of the Polish Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351, Bialystok, Poland
Katarzyna Borowska & Jarosław Stepaniuk

Authors

Katarzyna Borowska
View author publications
You can also search for this author in PubMed Google Scholar
Jarosław Stepaniuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jarosław Stepaniuk .

Editor information

Editors and Affiliations

Bialystok University of Technology, Bialystok, Poland
Khalid Saeed
Warsaw University of Technology, Warsaw, Poland
Władysław Homenda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borowska, K., Stepaniuk, J. (2018). Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2018. Lecture Notes in Computer Science(), vol 11127. Springer, Cham. https://doi.org/10.1007/978-3-319-99954-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-99954-8_20
Published: 05 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99953-1
Online ISBN: 978-3-319-99954-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics