Skip to main content

Tibetan Syllable-Based Functional Chunk Boundary Identification

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2017, CCL 2017)

Abstract

Tibetan syntactic functional chunk parsing is aimed at identifying syntactic constituents of Tibetan sentences. In this paper, based on the Tibetan syntactic functional chunk description system, we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Conditional Random Fields (CRFs) to identify the functional chunk boundary of a sentence. According to the actual characteristics of the Tibetan language, we firstly identify and extract the syntactic markers as identification characteristics of syntactic functional chunk boundary in the text preprocessing stage, while the syntactic markers are composed of the sticky written form and the non-sticky written form. Afterwards we identify the syntactic functional chunk boundary using CRF. Experiments have been performed on a Tibetan language corpus containing 46783 syllables and the precision, recall rate and F value respectively achieves 75.70%, 82.54% and 79.12%. The experiment results show that the proposed method is effective when applied to a small-scale unlabeled corpus and can provide foundational support for many natural language processing applications such as machine translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://taku910.github.io/crfpp/.

References

  1. Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proceedings of the second Conference on Applied Natural Language Processing, pp. 136–143. Association for Computational Linguistics (1988)

    Google Scholar 

  2. Pla, F., Molina, A., Prieto, N.: An integrated statistical model for tagging and chunking unrestricted text. In: The Third International Workshop on Text, Speech and Dialogue, Brno, Czech Republic, pp. 15–20 (2000)

    Google Scholar 

  3. Sun, H.L.: Induction of grammatical rules from an annotated corpus V+N sequence analysis. In: China National Conference on Computational Linguistics, pp. 157–163. Tsinghua University Press, Beijing(1997)

    Google Scholar 

  4. Liu, C.Z.: Research on binding of common noun sequences based on POS tagging corpus. In: Proceedings of the International Conference on Chinese Information Processing, Tsinghua University Press, Beijing (1998)

    Google Scholar 

  5. Li, W.J., Zhou, M., et al.: Automatic extraction of Chinese longest noun phrases based on corpus. In: Chen, L., Yuan, Q. (eds.) Progress and Application of Computational Linguistics, pp. 119–124. Tsinghua University Press, Beijing (1995)

    Google Scholar 

  6. Huang, D., Yu, J.: The combination of distributed strategy and CRFs to identify Chinese chunk. J. Chinese Inf. Proces. 23(1), 16–22 (2009)

    Google Scholar 

  7. Dai, C., Zhou, Q.L., Cai, D.F., et al.: Automatic identification of Chinese maximum noun phrase based on statistics and rules. J. Chinese Inf. Proces. 22(6), 110–115 (2008)

    Google Scholar 

  8. Drábek, E.F., Zhou, Q.: Experiments in learning models for functional chunking of Chinese text. In: IEEE International Conference on Systems, Man, and Cybernetics IEEE, vol. 2, pp. 859–864 (2001)

    Google Scholar 

  9. Chen, Y., Zhou, Q.: Analysis and construction of hierarchical chinese function block description library. J. Chinese Inf. Proces. 22(3), 24–31 (2008)

    MathSciNet  Google Scholar 

  10. Zhou, Q., Zhao, Y.Z.: Automatic parsing of chinese functional chunks. Chin. J. Inf. 21(5), 18–24 (2007)

    Google Scholar 

  11. Jiang, D., Kang, C.J.: The methods of lemmatization of bound case makers in modern Tibetan. In: International Conference on Natural Language Processing and Knowledge Engineering, pp. 616–621(2003)

    Google Scholar 

  12. Jiang, D.: The method and process of block segmentation in modern Tibetan. Minor. Lang. China 2003(4), 30–39 (2003)

    Google Scholar 

  13. Long, C.J., Kang, C.J., Jiang, D.: The comparative research on the segmentation strategies of tibetan bounded-variant forms. In: International Conference on Asian Language Processing 2013(30), pp. 243–246. IEEE Computer Society (2013)

    Google Scholar 

  14. Wang, T.H., Shi, S.H., Long, C.J., et al.: Syntactic boundary block identification of Tibetan syntactic functions based on error driven learning strategy. Chinese J. Inf. 28(5), 170–175 (2014)

    Google Scholar 

  15. Wang, T.H.: Research on Tibetan functional block recognition for Machine Translation. Beijing Institute of Technology (2016)

    Google Scholar 

  16. Liu, H.D.: Research on Tibetan Word Segmentation and Text Resource Mining. University of the Chinese Academy of Sciences (2012)

    Google Scholar 

  17. Li, Y.C., Jia, Y.J., Zong, C.Q.: Research and implementation of tibetan automatic word segmentation based on conditional random field. J. Chinese Inf. Proces. 27(4), 52–58 (2013)

    Google Scholar 

  18. Li, L., Long, C.J., Jiang, D.: Tibetan functional chunks boundary detection. J. Chinese Inf. Proces. 27(6), 165–168 (2013)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (61671064, 61201352, and 61132009), the National Key Basic Research Program of China (2013CB329303) and the Fundamental Research Fund of Beijing Institute of Technology (20130742010).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shumin Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Shi, S., Liu, Y., Wang, T., Long, C., Huang, H. (2017). Tibetan Syllable-Based Functional Chunk Boundary Identification. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69005-6_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69004-9

  • Online ISBN: 978-3-319-69005-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics