Skip to main content

A First Approach to Deal with Imbalance in Multi-label Datasets

  • Conference paper
Hybrid Artificial Intelligent Systems (HAIS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8073))

Included in the following conference series:

Abstract

The process of learning from imbalanced datasets has been deeply studied for binary and multi-class classification. This problem also affects to multi-label datasets. Actually, the imbalance level in multi-label datasets uses to be much larger than in binary or multi-class datasets. Notwithstanding, the proposals on how to measure and deal with imbalanced datasets in multi-label classification are scarce.

In this paper, we introduce two measures aimed to obtain information about the imbalance level in multi-label datasets. Furthermore, two preprocessing methods designed to reduce the imbalance level in multi-label datasets are proposed, and their effectiveness is validated experimentally. Finally, an analysis for determining when these methods have to be applied depending on the dataset characteristics is provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining Multi-label Data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, ch. 34, pp. 667–685. Springer US, Boston (2010)

    Google Scholar 

  2. Zhang, M.-L.: Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization. IEEE Trans. Knowl. Data Eng. 18(10), 1338–1351 (2006)

    Article  Google Scholar 

  3. Wieczorkowska, A., Synak, P., Raś, Z.: Multi-Label Classification of Emotions in Music. In: Intel. Inf. Proces. and Web Mining, ch. 30, vol. 35, pp. 307–315 (2006)

    Google Scholar 

  4. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

    Article  Google Scholar 

  5. Tahir, M.A., Kittler, J., Bouridane, A.: Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recognit. Letters 33(5), 513–523 (2012)

    Article  Google Scholar 

  6. Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 45(10), 3738–3750 (2012)

    Article  Google Scholar 

  7. He, J., Gu, H., Liu, W.: Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites. PloS One 7(6), 7155 (2012)

    Google Scholar 

  8. Diplaris, S., Tsoumakas, G., Mitkas, P.A., Vlahavas, I.: Protein Classification with Multiple Algorithms. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 448–456. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Godbole, S., Sarawagi, S.: Discriminative Methods for Multi-Labeled Classification. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 22–30. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K.: Label ranking by learning pairwise preferences. Artificial Intelligence 172(16), 1897–1916 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  11. Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)

    Article  Google Scholar 

  12. Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  13. Zhang, M., Zhou, Z.: ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)

    Article  MATH  Google Scholar 

  14. Zhang, M.-L.: Ml-rbf: RBF Neural Networks for Multi-label Learning. Neural Process. Lett. 29, 61–74 (2009)

    Article  Google Scholar 

  15. Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. In: Adv. Neural Inf. Processing Systems 14, vol. 14, pp. 681–687. MIT Press (2001)

    Google Scholar 

  16. Japkowicz, N., Stephen, S.: The class imbalance problem: A systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  17. Japkowicz, N.: Learning from imbalanced data sets: A comparison of various strategies, pp. 10–15. AAAI Press (2000)

    Google Scholar 

  18. Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42, 203–231 (2001)

    Article  MATH  Google Scholar 

  19. Kotsiantis, S.B., Pintelas, P.E.: Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics 1, 46–55 (2003)

    Google Scholar 

  20. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  21. Tsoumakas, G., Xioufis, E.S., Vilcek, J., Vlahavas, I.: MULAN multi-label dataset repository, http://mulan.sourceforge.net/datasets.html

  22. Tsoumakas, G., Vlahavas, I.: Random k-labelsets: An ensemble method for multilabel classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 406–417. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  23. Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. In: Proc. ECML/PKDD Workshop on Mining Multidimensional Data, pp. 30–44 (2008)

    Google Scholar 

  24. Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 145–158. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  25. Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., Brinker, K.: Multilabel classification via calibrated label ranking. Mach. Learn. 73, 133–153 (2008)

    Article  Google Scholar 

  26. Cheng, W., Hüllermeier, E.: Combining instance-based learning and logistic regression for multilabel classification. Mach. Learn. 76(2-3), 211–225 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Charte, F., Rivera, A., del Jesus, M.J., Herrera, F. (2013). A First Approach to Deal with Imbalance in Multi-label Datasets. In: Pan, JS., Polycarpou, M.M., Woźniak, M., de Carvalho, A.C.P.L.F., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2013. Lecture Notes in Computer Science(), vol 8073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40846-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40846-5_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40845-8

  • Online ISBN: 978-3-642-40846-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics