Skip to main content

A Kalman Filter Based Human-Computer Interactive Word Segmentation System for Ancient Chinese Texts

  • Conference paper
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2013, CCL 2013)

Abstract

Previous research showed that Kalman filter based humancomputer interaction Chinese word segmentation algorithm achieves an encouraging effect in reducing user interventions. This paper designs an improved statistical model for ancient Chinese texts, and integrates it with the Kalman filter based framework. An online interactive system is presented to segment ancient Chinese corpora. Experiments showed that this approach has advantage in processing domain-specific text without the support of dictionaries or annotated corpora. Our improved statistical model outperformed the baseline model by 30% in segmentation precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liang, N.Y.: CDWS: An Automatic Word Segmentation System for Written Chinese Texts. Journal of Chinese Information Processing 2(2), 44–52 (1987) (in Chinese)

    Google Scholar 

  2. Nie, J.Y., Jin, W., Hannan, M.L.: A Hybrid Approach to Unknown Word Detection and Segmentation of Chinese. In: Proceedings of the International Conference on Chinese Computing, pp. 326–335 (1994)

    Google Scholar 

  3. Sun, M., Shen, D., Tsou, B.K.: Chinese Word Segmentation Without Using Lexicon and Hand-Crafted Training Data. In: COLING/ACL 1998, pp. 1265–1271 (1998)

    Google Scholar 

  4. Luo, X., Sun, M., Tsou, B.K.: Covering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information. In: COLING 2002, pp. 1–7 (2002)

    Google Scholar 

  5. Zhang, H.P., Liu, Q., Cheng, X.Q., Yu, H.K.: Chinese Lexical Analysis Using Hierarchical Hidden Markov Model. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 63–70 (2003)

    Google Scholar 

  6. Peng, F., Feng, F., McCallum, A.: Chinese Segmentation and New Word Detection Using Conditional Random Fields. In: COLING 2004, pp. 23–27 (2004)

    Google Scholar 

  7. Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual Dependencies in Unsupervised Word Segmentation. In: COLING/ACL 2006, pp. 673–680 (2006)

    Google Scholar 

  8. Wang, Z., Araki, K., Tochinai, K.: A Word Segmentation Method with Dynamic Adapting to Text Using Inductive Learning. In: Proceedings of the First SIGHAN Workshop on Chinese Language Processing, pp. 1–5 (2002)

    Google Scholar 

  9. Li, M., Gao, J., Huang, C., Li, J.: Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 1–7 (2003)

    Google Scholar 

  10. Sproat, R., Gale, W., Shih, C., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Computation Linguistics 22(3), 377–404 (1996)

    Google Scholar 

  11. Zhu, W., Sun, N., Zou, X., Hu, J.: The Application of Kalman Filter Based Human-Computer Learning Model to Chinese Word Segmentation. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 218–230. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  12. Sproat, R., Shih, C.: A Statistical Method for Finding Word Boundaries in Chinese Text. In: Computer Processing of Chinese and Oriental Languages, pp. 336–351 (1990)

    Google Scholar 

  13. Chien, L.F.: Pat-Tree-Based Keyword Extraction for Chinese Information Retrieval. ACM SIGIR Forum, 50–58 (1997)

    Google Scholar 

  14. Yamamoto, M., Kenneth, C.W.: Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus. Computer Linguistics 27(1), 1–30 (2001)

    Article  Google Scholar 

  15. Sun, M., Xiao, M., Tsou, B.K.: Chinese Word Segmentation without Using Dictionary Based on Unsupervised Learning Strategy. Chinese Journal of Computers 27(6), 736–742 (2004) (in Chinese)

    Google Scholar 

  16. Kit, C., Wilks, Y.: Unsupervised Learning of Word Boundary with Description Length Gain. In: Proceedings of the CoNLL 1999 ACL Workshop, pp. 1–6 (1999)

    Google Scholar 

  17. Feng, H., Chen, K., Deng, X., Zheng, W.: Accessor Variety Criteria for Chinese Word Extraction. Computation Linguistics 30(1), 75–93 (2004)

    Article  Google Scholar 

  18. Jin, Z., Tanaka-Ishii, K.: Unsupervised Segmentation of Chinese Text by Use of Branching Entropy. In: COLING/ACL 2006, pp. 428–435 (2006)

    Google Scholar 

  19. Shi, M., Li, B., Chen, X.: CRF Based Research on a Unified Approach to Word Segmentation and POS Tagging for Pre-Qin Chinese. Journal of Chinees Information Processing 24(2), 39–45 (2010) (in Chinese)

    Google Scholar 

  20. Feng, C., Chen, Z., Huang, H., Guan, Z.: Active Learning in Chinese Word Segmentation Based on Multigram Language Model. Journal of Chinese Information Processing 20(1), 50–58 (2006) (in Chinese)

    Google Scholar 

  21. Li, B., Chen, X.: A Human-Computer Interaction Word Segmentation Method Adapting to Chinese Unknown Texts. Journal of Chinese Information Processing 21(3), 92–98 (2007) (in Chinese)

    Google Scholar 

  22. Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering 82(1), 35–45 (1960)

    Article  Google Scholar 

  23. Agarwal, D., Chen, B.C., Elango, P., Motgi, N., Park, S.T., Ramakrishnan, R., Roy, S., Zachariah, J.: Online Models for Content Optimization. In: Proceedings of NIPS 2008, pp. 17–24 (2008)

    Google Scholar 

  24. Liu, Z., Sun, M.: Web-Based Automatic Detection for IT New Terms. In: Proceedings of the 9th China National Conference on Computational Linguistics, pp. 515–521 (2007)

    Google Scholar 

  25. Bookstein, A., Klein, S.T., Raita, T.: Clumping Properties of Content-bearing Words. Journal of the American Society for Information Science 49(2), 102–114 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, T., Zhu, W., Lv, X., Hu, J. (2013). A Kalman Filter Based Human-Computer Interactive Word Segmentation System for Ancient Chinese Texts. In: Sun, M., Zhang, M., Lin, D., Wang, H. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2013 2013. Lecture Notes in Computer Science(), vol 8202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41491-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41491-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41490-9

  • Online ISBN: 978-3-642-41491-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics