Skip to main content

HSIM: A Supervised Imputation Method for Hierarchical Classification Scenario

  • Conference paper
  • First Online:
Discovery Science (DS 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9956))

Included in the following conference series:

Abstract

The missing value imputation process can be defined as a preprocessing step that fills missing values of attributes in incomplete datasets. Nowadays, the problem of incomplete datasets in the hierarchical classification scenario must be solved using unsupervised missing value imputation methods due to the lack of supervised methods to deal with the hierarchical context. Thus, in this work, we propose and evaluate a supervised missing value imputation method for datasets used in hierarchical classification problems in which the classes are organized into tree structure. Experiments were performed on incomplete datasets to evaluate the effect of the proposed missing value imputation method on classification performance when using a global hierarchical classifier. The results showed that, using the proposed method for dealing with missing attribute values, it provided higher classifier predictive performance than other unsupervised missing value imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Han, J., Kamber, M.: Data Mining: Concepts and Techniques: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  2. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Probability and Statistics, vol. 1, 2nd edn. Wiley, New York (2002)

    MATH  Google Scholar 

  3. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)

    Article  Google Scholar 

  4. Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Disc. 22(1–2), 31–72 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers, vol. 2, pp. 598–602. Association for Computational Linguistics (2011)

    Google Scholar 

  6. Dollah, R.B., Aono, M.: Classifying biomedical text abstracts based on hierarchical ‘concept’ structure. World Acad. Sci. Eng. Technol. Int. J. Comput. Electr. Autom. Control Inf. Eng. 5(2), 178–183 (2011)

    Google Scholar 

  7. Campos Merschmann, L.H., Freitas, A.A.: An extended local hierarchical classifier for prediction of protein and gene functions. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 159–171. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40131-2_14

    Chapter  Google Scholar 

  8. Valentini, G.: Hierarchical ensemble methods for protein function prediction. ISRN Bioinf. 2014 (2014)

    Google Scholar 

  9. Silla, C.N., Freitas, A.A.: Novel top-down approaches for hierarchical classification and their application to automatic music genre classification. In: 2009 IEEE International Conference on Systems, Man and Cybernetics, SMC 2009, pp. 3499–3504. IEEE (2009)

    Google Scholar 

  10. Ariyaratne, H.B., Zhang, D.: A novel automatic hierachical approach to music genre classification. In: 2012 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 564–569. IEEE (2012)

    Google Scholar 

  11. Binder, A., Kawanabe, M., Brefeld, U.: Efficient classification of images with taxonomies. In: Zha, H., Taniguchi, R., Maybank, S. (eds.) ACCV 2009. LNCS, vol. 5996, pp. 351–362. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12297-2_34

    Chapter  Google Scholar 

  12. Kramer, G., Bouma, G., Hendriksen, D., Homminga, M.: Classifying image galleries into a taxonomy using metadata and wikipedia. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds.) NLDB 2012. LNCS, vol. 7337, pp. 191–196. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31178-9_20

    Chapter  Google Scholar 

  13. Le, B.V., Bang, J.H., Lee, S.: Hierarchical emotion classification using genetic algorithms. In: Proceedings of the Fourth Symposium on Information and Communication Technology, pp. 158–163. ACM (2013)

    Google Scholar 

  14. Van Hulse, J., Khoshgoftaar, T.M.: Incomplete-case nearest neighbor imputation in software measurement data. Inf. Sci. 259, 596–610 (2014)

    Article  Google Scholar 

  15. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for dna microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  16. Rahman, M.G., Islam, M.Z.: IDMI: a novel technique for missing value imputation using a decision tree and expectation-maximization algorithm. In: 2013 16th International Conference on Computer and Information Technology (ICCIT), pp. 496–501. IEEE (2014)

    Google Scholar 

  17. Bi, W., Kwok, J.T.: Multi-label classification on tree-and dag-structured hierarchies. In: Proceedings of the 28th International Conference on Machine Learning (ICML 2011), pp. 17–24 (2011)

    Google Scholar 

  18. Sun, Z., Zhao, Y., Cao, D., Hao, H.: Hierarchical multilabel classification with optimal path prediction. Neural Process. Lett., 1–15 (2016)

    Google Scholar 

  19. Cerri, R., Barros, R.C., de Carvalho, A.: Hierarchical classification of gene ontology-based protein functions with neural networks. In: IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015)

    Google Scholar 

  20. Clare, A., King, R.D.: Predicting gene function in saccharomyces cerevisiae. Bioinformatics 19(suppl 2), ii42–ii49 (2003)

    Article  Google Scholar 

  21. Chen, Y.L., Hu, H.W., Tang, K.: Constructing a decision tree from data with hierarchical class labels. Expert Syst. Appl. 36(3), 4838–4847 (2009)

    Article  Google Scholar 

  22. Silla, C.N., Freitas, A.A.: A global-model naive bayes approach to the hierarchical prediction of protein functions. In: 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 992–997. IEEE (2009)

    Google Scholar 

  23. Blockeel, H., Schietgat, L., Struyf, J., Džeroski, S., Clare, A.: Decision trees for hierarchical multilabel classification: a case study in functional genomics. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 18–29. Springer, Heidelberg (2006). doi:10.1007/11871637_7

    Chapter  Google Scholar 

  24. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185–214 (2008)

    Article  Google Scholar 

  25. Otero, F.E.B., Freitas, A.A., Johnson, C.G.: A hierarchical classification ant colony algorithm for predicting gene ontology terms. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds.) EvoBIO 2009. LNCS, vol. 5483, pp. 68–79. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01184-9_7

    Chapter  Google Scholar 

  26. Brown, M.L., Kros, J.F.: Data mining and the impact of missing data. Ind. Manag. Data Syst. 103(8), 611–621 (2003)

    Article  Google Scholar 

  27. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.), 1–38 (1977)

    Google Scholar 

  28. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  29. Borges, H.B., Silla, C.N., Nievola, J.C.: An evaluation of global-model hierarchical classification algorithms for hierarchical classification problems with single path of labels. Comput. Math. Appl. 66(10), 1991–2002 (2013)

    Article  Google Scholar 

  30. Japkowicz, N., Shah, M.: Evaluating Learning Algorithms. Cambridge University Press, Cambridge (2011)

    Book  MATH  Google Scholar 

  31. Dias, T.N., Merschmann, L.H.C.: Adaptação da medida incerteza simétrica para a seleção de atributos no contexto de classificação hierárquica monorrótulo. In: Anais do Encontro Nacional de Inteligência Artificial e Computacional, Natal, RN, Brazil, pp. 142–149 (2015)

    Google Scholar 

Download references

Acknowledgements

This research was partially supported by CNPq, FAPEMIG, UFOP, and by individual grants from CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro R. Galvão .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Galvão, L.R., Merschmann, L.H.C. (2016). HSIM: A Supervised Imputation Method for Hierarchical Classification Scenario. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46307-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46306-3

  • Online ISBN: 978-3-319-46307-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics