Skip to main content

Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing

  • Conference paper
  • First Online:
Computer Information Systems and Industrial Management (CISIM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11127))

Abstract

Selective preprocessing, representing data–level approach to the imbalanced data problem, is one of the most successful methods. This paper introduces novel algorithm combining this kind of technique with the filtering phase. The information granules are formed to distinguish specific types of positive examples that should be adequately treated. Three modes of oversampling, dedicated to minority class instances placed in specific areas of the feature space, are available. The rough set theory is applied to filter and remove inconsistencies from the generated positive samples. The experimental study shows that proposed method in most cases obtains better or similar performance of standard classifiers, such as C4.5 decision tree, in comparison with other techniques. Additionally, multiple values of algorithm’s parameters are evaluated. It is experimentally proven that two of the examined parameters values are the most appropriate to various applications. However, the automatic parameters tuning, based on the specific requirements of different data distributions, is recommended.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alcala-Fdez, J., et al.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17(2–3), 255–287 (2011)

    Google Scholar 

  2. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  3. Borowska, K., Stepaniuk, J.: Imbalanced data classification: a novel re-sampling approach combining versatile improved SMOTE and rough sets. In: Saeed, K., Homenda, W. (eds.) CISIM 2016. LNCS, vol. 9842, pp. 31–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45378-1_4

    Chapter  Google Scholar 

  4. Borowska, K., Stepaniuk, J.: Rough sets in imbalanced data problem: improving re–sampling process. In: Saeed, K., Homenda, W., Chaki, R. (eds.) CISIM 2017. LNCS, vol. 10244, pp. 459–469. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59105-6_39

    Chapter  Google Scholar 

  5. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_43

    Chapter  Google Scholar 

  6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)

    MATH  Google Scholar 

  7. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2012)

    Article  Google Scholar 

  8. Garcia, V., Mollineda, R.A., Sanchez, J.S.: On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal. Appl. 11(3–4), 269–280 (2008)

    Article  MathSciNet  Google Scholar 

  9. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  10. Hu, S., Liang, Y., Ma, L., He, Y.: MSMOTE: improving classification performance when training data is imbalanced. In: Second International Workshop on Computer Science and Engineering, WCSE 2009, Qingdao, pp. 13–17 (2009)

    Google Scholar 

  11. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)

    Article  Google Scholar 

  12. Napierała, K., Stefanowski, J., Wilk, S.: Learning from imbalanced data in presence of noisy and borderline examples. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) RSCTC 2010. LNCS (LNAI), vol. 6086, pp. 158–167. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13529-3_18

    Chapter  Google Scholar 

  13. Pawlak, Z., Skowron, A.: Rudiments of rough sets. Inf. Sci. 177(1), 3–27 (2007)

    Article  MathSciNet  Google Scholar 

  14. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB\(_{*}\): a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2011)

    Article  Google Scholar 

  15. Stefanowski, J.: Dealing with data difficulty factors while learning from imbalanced data. In: Matwin, S., Mielniczuk, J. (eds.) Challenges in Computational Statistics and Data Mining. SCI, vol. 605, pp. 333–363. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-18781-5_17

    Chapter  Google Scholar 

  16. Stepaniuk, J.: Rough-Granular Computing in Knowledge Discovery and Data Mining, vol. 152. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70801-8

    Book  MATH  Google Scholar 

  17. UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/ml/. Accessed 28 Apr 2018

  18. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance functions. J. Artif. Intell. Res. 6, 1–34 (1997)

    Article  MathSciNet  Google Scholar 

  19. Zhu X., Pedrycz W.: Granular under-sampling for processing imbalanced data. IEEE (2018, in Print)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the grant S/WI/1/2018 of the Polish Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jarosław Stepaniuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Borowska, K., Stepaniuk, J. (2018). Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing. In: Saeed, K., Homenda, W. (eds) Computer Information Systems and Industrial Management. CISIM 2018. Lecture Notes in Computer Science(), vol 11127. Springer, Cham. https://doi.org/10.1007/978-3-319-99954-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99954-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99953-1

  • Online ISBN: 978-3-319-99954-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics