Skip to main content

Improvement of the Training Dataset for Supervised Multiclass Classification

  • Conference paper
  • First Online:
Intelligent Decision Technologies (IDT 2020)

Abstract

The classification of objects based on corresponding classes is an important task in official statistics. In the previous study, the overlapping classifier that assigns classes to an object based on the reliability score was proposed. The proposed reliability score has been defined considering both the uncertainty from data and the uncertainty from the latent classification structure in data and generalized using the idea of the T-norm in statistical metric space. This paper proposes a new procedure for the improvement of the training dataset based on a pattern of reliability scores to get a better classification accuracy. The numerical example shows the proposed procedure gives a better result as compared to the result of our previous study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hacking, W., Willenborg, L.: Method series theme: Coding; interpreting short descriptions using a classification. In: Statistics Methods. Statistics Netherlands (2012). https://www.cbs.nl/en-gb/our-services/methods/statistical-methods/throughput/throughput/coding. Accessed 8 Jan 2020

  2. Gweon, H., Schonlau, M., Kaczmirek, L., Blohm, M., Steiner, S.: Three methods for occupation coding based on statistical learning. J. Off. Stat. 33(1), 101–122 (2017)

    Article  Google Scholar 

  3. Toko, Y., Wada, K., Kawano, M.: A supervised multiclass classifier for an autocoding system. J. Rom. Stat. Rev. 4, 29–39 (2017)

    Google Scholar 

  4. Toko, Y., Wada, K., Iijima, S., Sato-Ilic, M.: Supervised multiclass classifier for autocoding based on partition coefficient. In: Czarnowski, I., Howlett, R.J., Jain, L.C., Vlacic, L. (eds.) Intelligent Decision Technologies 2018. Smart Innovation, Systems and Technologies, vol. 97, pp. 54–64. Springer, Switzerland (2018)

    Chapter  Google Scholar 

  5. Toko, Y., Iijima, S., Sato-Ilic, M.: Overlapping classification for autocoding system. J. Rom. Stat. Rev. 4, 58–73 (2018)

    Google Scholar 

  6. Toko, Y., Iijima, S., Sato-Ilic, M.: Generalization for Improvement of the Reliability Score for Autocoding. J. Rom. Stat. Rev. 3, 47–59 (2019)

    Google Scholar 

  7. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  Google Scholar 

  8. Bezdek, J.C., Keller J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers (1999)

    Google Scholar 

  9. Menger, K.: Statistical metrics. Proc. Natl. Acad. Sci. U.S.A. 28, 535–537 (1942)

    Article  MathSciNet  Google Scholar 

  10. Mizumoto, M.: Pictorical representation of fuzzy connectives, Part I: Cases of T-norms, t-Conorms and averaging operators. Fuzzy Sets Syst. 31, 217–242 (1989)

    Article  Google Scholar 

  11. Schweizer, S., Sklar, A.: Probabilistic Metric Spaces. Dover Publications, New York (2005)

    MATH  Google Scholar 

  12. Kudo, T., Yamamoto, K., and Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: The 2004 Conference on Empirical Methods in Natural Language Processing on proceedings, pp. 230–237. Barcelona, Spain (2004)

    Google Scholar 

  13. Hartigan, J.A., Wong M.A.: Algorithm AS 136: A K-Means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)

    Google Scholar 

  14. R Core Team: R: A language and environment for statistical computing, R Foundation for Statistical Computing. Vienna, Austria (2018). https://www.R-project.org/. Accessed 8 Jan 2019

  15. Statistics Bureau of Japan: Outline of the Family Income and Expenditure Survey. Available at: https://www.stat.go.jp/english/data/kakei/1560.html. Accessed 14 Feb 2020

  16. Statistics Bureau of Japan: Income and Expenditure Classification Tables (revised in 2020). Available at: https://www.stat.go.jp/english/data/kakei/ct2020.html. Accessed 14 Feb 2020

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yukako Toko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Toko, Y., Sato-Ilic, M. (2020). Improvement of the Training Dataset for Supervised Multiclass Classification. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore. https://doi.org/10.1007/978-981-15-5925-9_25

Download citation

Publish with us

Policies and ethics