Skip to main content

Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Abstract

We apply a machine learning method to the occupation coding, which is a task to categorize the answers to open-ended questions regarding the respondent’s occupation. Specifically, we use Support Vector Machines (SVMs) and their combination with hand-crafted rules. Conducting the occupation coding manually is expensive and sometimes leads to inconsistent coding results when the coders are not experts of the occupation coding. For this reason, a rule-based automatic method has been developed and used. However, its categorization performance is not satisfiable. Therefore, we adopt SVMs, which show high performance in various fields, and compare it with the rule-based method. We also investigate effective combination methods of SVMs and the rule-based method. In our methods, the output of the rule-based method is used as features for SVMs. We empirically show that SVMs outperform the rule-based method in the occupation coding and that the combination of the two methods yields even better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The National Institute for Japanese Language Publications (ed.): Word List by Semantic Principles. Shuei Press (1964)

    Google Scholar 

  2. Giorgetti, D., Sebastiani, F.: Multiclass text categorization for automated survey coding. In: Proceedings of the 18th ACM Symposium on Applied Computing (SAC 2003), pp. 798–802 (2003)

    Google Scholar 

  3. 1995SSM Survey Research Group, SSM Industry and Occupation Classification (the 1995 edition). 1995SSM Survey Research Group (1995)

    Google Scholar 

  4. 1995SSM Survey Research Group, Codebook for 1995SSM Survey. 1995SSM Survey Research Group (1996)

    Google Scholar 

  5. Hara, J., Umino, M.: Social Surveys Seminar. University of Tokyo Press (1984)

    Google Scholar 

  6. Isozaki, H., Hirao, T.: Japanese zero pronoun resolution based on ranking rules and machine learning. In: Proceedings of the 8th Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), pp. 184–191 (2003)

    Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Kressel, U.: Pairwise classification and support vector machines. In: Schölkopf, B., Burgesa, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods -Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)

    Google Scholar 

  9. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. Journal of Natural language Processing 9(5), 3–22 (2002)

    Google Scholar 

  10. Park, S.-B., Zhang, B.-T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics (ACL 2003), pp. 497–504 (2003)

    Google Scholar 

  11. Sebastiani, F.: Machine learning automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  12. Takahashi, K.: A supporting system for coding of the answers from an open-ended question: An automatic coding system for SSM occupation data by case frame. Sociological Theory and Methods 15(1), 149–164 (2000)

    Google Scholar 

  13. Takahashi, K.: Automatic coding system for open-ended answers: Occupation data coding in the health and stratification survey. Keiai University International Studies 8(1), 31–52 (2001)

    Google Scholar 

  14. Takahashi, K.: Applying automatic occupation/industry coding system. In: Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing, pp. 491–494 (2002)

    Google Scholar 

  15. Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS 2000. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2000, pp. 171–184 (2000)

    Google Scholar 

  16. Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS-2001. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2001 [2], pp. 179–192 (2003)

    Google Scholar 

  17. Takahashi, K.: A combination of ROCCO-system and support vector machines in occupation coding. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2002 [3], pp. 163–174 (2004)

    Google Scholar 

  18. Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)

    MATH  Google Scholar 

  19. Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)

    Article  Google Scholar 

  20. Mainichi: CD Mainichi Shinbun 2000. Nichigai Associates Co. (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takahashi, K., Takamura, H., Okumura, M. (2005). Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_34

Download citation

  • DOI: https://doi.org/10.1007/11430919_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26076-9

  • Online ISBN: 978-3-540-31935-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics