Skip to main content

Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2017, CCL 2017)

Abstract

One of the important works of Information Content Security is evaluating the theme words of the text. Because of the variety of the Chinese expression, especially of the abbreviation, the supervision of the theme words becomes harder. The goal of this paper is to quickly and accurately discover the intercept abbreviations from the text crawled at the short time period. The paper firstly segments the target texts, and then utilizes the Supported Vector Machine (SVM) to recognize the abbreviations from the wrongly segmented texts as the candidates. Secondly, this paper presents the collaborative methods: Improve the Conditional Random Fields (CRF) to predict the corresponding word to each character of the abbreviation; To solve the problems of the 1:n relationship, collaboratively merge the ranking list from the predict steps with the matched results of the thesaurus of abbreviations. The experiments demonstrate that our method at the recognizing stage is 76.5% of the accuracy and 77.8% of the recall rate. At the recovery step, the accuracy is 62.1%, which is 20.8% higher than the method based on Hidden Markov Model (HMM).

This paper is supported by National Science Foundation of China (NOs.61672393, U1536204).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.icl.pku.edu.cn/icl_groups/corpus/.

  2. 2.

    http://www.sogou.com/labs/resource/.

  3. 3.

    NLPIR: http://ictclas.nlpir.org/.

References

  1. Wang, H.F.: Survey: abbreviation processing in chinese text. J. Chin. Inf. Process. 25(5), 60–67 (2011)

    Google Scholar 

  2. Wang, A.: Mining informal language from chinese microtext: joint word recognition and segmentation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 731–741. ACL, Sofia (2013)

    Google Scholar 

  3. Wang, A.: Chinese informal word normalization: an experimental study. In: The 6th International Joint Conference on Natural Language Processing (IJCNLP), pp. 127–135. ACL, Nagoya (2013)

    Google Scholar 

  4. Li, C.: Improving named entity recognition in tweets via detecting non-standard words. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 929–938. ACL, Beijing (2015)

    Google Scholar 

  5. Monroe, W.: Word segmentation of informal arabic with domain adaptation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 206–211. ACL, Baltimore (2014)

    Google Scholar 

  6. Barrena, A.: Alleviating poor context with background knowledge for named entity disambiguation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1903–1912. ACL, Berlin (2016)

    Google Scholar 

  7. Chang, J.S.: A preliminary study on probabilistic models for chinese abbreviations. In: Proceedings of the 3rd SIGHAN workshop on Chinese language learning, pp. 9–16. ACL, Barcelona (2004)

    Google Scholar 

  8. Roark, B.: Hippocratic abbreviation expansion. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 364–369. ACL, Baltimore (2014)

    Google Scholar 

  9. Jiao, Y.: Abbreviation Prediction Using Conditional Random Field and Web Data. J. Chin. Inf. Process. 26(2), 62–68 (2012)

    Google Scholar 

  10. Zhang, L.K.: Predicting chinese abbreviations with minimum semantic unit and global constraints. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1405–1414. ACL, Doha (2014)

    Google Scholar 

  11. Zhang, L.K.: Coarse-grained candidate generation and fine-grained re-ranking for chinese abbreviation prediction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1881–1890. ACL, Doha (2014)

    Google Scholar 

  12. Chen, H.: Chinese named entity abbreviation generation using first-order logic. In: The 6th International Joint Conference on Natural Language Processing (IJCNLP), pp. 320–328. ACL, Nagoya (2013)

    Google Scholar 

  13. Shi, Y.Y.: Cluster based Chinese Abbreviation Modeling. In: 15th Annual Conference of the International Speech Communication Association, pp. 273–277. COLIPS, Singapore (2014)

    Google Scholar 

  14. Chen, F.: Open Domain New Word Detection Using Condition Random Field Method. Ruan Jian Xue Bao/J. Softw. 24(5), 1051–1060 (2013)

    Google Scholar 

  15. Lavergne, T.: From n -gram-based to CRF-based translation models. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 542–553. ACL, Edinburgh (2011)

    Google Scholar 

  16. Tsuruoka, Y.: Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 477–485. AFNLP, Suntec (2009)

    Google Scholar 

  17. Sokolovska, N.: Efficient learning of sparse conditional random fields for supervised sequence labeling. IEEE J. Sel. Top. Sign. Process. 4(6), 953–964 (2010)

    Article  Google Scholar 

  18. Yin, Q.: A joint model for ellipsis identification and recovery. J. Comput. Res. Dev. 52(11), 2460–2467 (2015)

    Google Scholar 

  19. Sun, X.: Learning abbreviations from chinese and english terms by modeling non-local information. ACM Trans. Asian Lang. Inf. Process. (TALIP) 12(2), 5:1–5:17 (2013)

    Google Scholar 

  20. Kenyon-Dean, K.: Verb phrase ellipsis resolution using discriminative and margin-infused algorithms. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1734–1743. ACL, Austin (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Deng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Liu, J., Chen, Y., Deng, J., Ji, D., Pan, J. (2017). Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69005-6_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69004-9

  • Online ISBN: 978-3-319-69005-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics