Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation

Liu, Jinshuo; Chen, Yusen; Deng, Juan; Ji, Donghong; Pan, Jeff

doi:10.1007/978-3-319-69005-6_19

Jinshuo Liu¹⁷,
Yusen Chen¹⁷,
Juan Deng¹⁸,
Donghong Ji¹⁷ &
…
Jeff Pan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10565))

Included in the following conference series:

1927 Accesses

Abstract

One of the important works of Information Content Security is evaluating the theme words of the text. Because of the variety of the Chinese expression, especially of the abbreviation, the supervision of the theme words becomes harder. The goal of this paper is to quickly and accurately discover the intercept abbreviations from the text crawled at the short time period. The paper firstly segments the target texts, and then utilizes the Supported Vector Machine (SVM) to recognize the abbreviations from the wrongly segmented texts as the candidates. Secondly, this paper presents the collaborative methods: Improve the Conditional Random Fields (CRF) to predict the corresponding word to each character of the abbreviation; To solve the problems of the 1:n relationship, collaboratively merge the ranking list from the predict steps with the matched results of the thesaurus of abbreviations. The experiments demonstrate that our method at the recognizing stage is 76.5% of the accuracy and 77.8% of the recall rate. At the recovery step, the accuracy is 62.1%, which is 20.8% higher than the method based on Hidden Markov Model (HMM).

This paper is supported by National Science Foundation of China (NOs.61672393, U1536204).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.icl.pku.edu.cn/icl_groups/corpus/.
2.
http://www.sogou.com/labs/resource/.
3.
NLPIR: http://ictclas.nlpir.org/.

References

Wang, H.F.: Survey: abbreviation processing in chinese text. J. Chin. Inf. Process. 25(5), 60–67 (2011)
Google Scholar
Wang, A.: Mining informal language from chinese microtext: joint word recognition and segmentation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 731–741. ACL, Sofia (2013)
Google Scholar
Wang, A.: Chinese informal word normalization: an experimental study. In: The 6th International Joint Conference on Natural Language Processing (IJCNLP), pp. 127–135. ACL, Nagoya (2013)
Google Scholar
Li, C.: Improving named entity recognition in tweets via detecting non-standard words. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics, pp. 929–938. ACL, Beijing (2015)
Google Scholar
Monroe, W.: Word segmentation of informal arabic with domain adaptation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 206–211. ACL, Baltimore (2014)
Google Scholar
Barrena, A.: Alleviating poor context with background knowledge for named entity disambiguation. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1903–1912. ACL, Berlin (2016)
Google Scholar
Chang, J.S.: A preliminary study on probabilistic models for chinese abbreviations. In: Proceedings of the 3rd SIGHAN workshop on Chinese language learning, pp. 9–16. ACL, Barcelona (2004)
Google Scholar
Roark, B.: Hippocratic abbreviation expansion. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 364–369. ACL, Baltimore (2014)
Google Scholar
Jiao, Y.: Abbreviation Prediction Using Conditional Random Field and Web Data. J. Chin. Inf. Process. 26(2), 62–68 (2012)
Google Scholar
Zhang, L.K.: Predicting chinese abbreviations with minimum semantic unit and global constraints. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1405–1414. ACL, Doha (2014)
Google Scholar
Zhang, L.K.: Coarse-grained candidate generation and fine-grained re-ranking for chinese abbreviation prediction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1881–1890. ACL, Doha (2014)
Google Scholar
Chen, H.: Chinese named entity abbreviation generation using first-order logic. In: The 6th International Joint Conference on Natural Language Processing (IJCNLP), pp. 320–328. ACL, Nagoya (2013)
Google Scholar
Shi, Y.Y.: Cluster based Chinese Abbreviation Modeling. In: 15th Annual Conference of the International Speech Communication Association, pp. 273–277. COLIPS, Singapore (2014)
Google Scholar
Chen, F.: Open Domain New Word Detection Using Condition Random Field Method. Ruan Jian Xue Bao/J. Softw. 24(5), 1051–1060 (2013)
Google Scholar
Lavergne, T.: From n -gram-based to CRF-based translation models. In: Proceedings of the 6th Workshop on Statistical Machine Translation, pp. 542–553. ACL, Edinburgh (2011)
Google Scholar
Tsuruoka, Y.: Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 477–485. AFNLP, Suntec (2009)
Google Scholar
Sokolovska, N.: Efficient learning of sparse conditional random fields for supervised sequence labeling. IEEE J. Sel. Top. Sign. Process. 4(6), 953–964 (2010)
Article Google Scholar
Yin, Q.: A joint model for ellipsis identification and recovery. J. Comput. Res. Dev. 52(11), 2460–2467 (2015)
Google Scholar
Sun, X.: Learning abbreviations from chinese and english terms by modeling non-local information. ACM Trans. Asian Lang. Inf. Process. (TALIP) 12(2), 5:1–5:17 (2013)
Google Scholar
Kenyon-Dean, K.: Verb phrase ellipsis resolution using discriminative and margin-infused algorithms. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1734–1743. ACL, Austin (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer School, Wuhan University, Wuhan, 430072, China
Jinshuo Liu, Yusen Chen & Donghong Ji
International School of Software, Wuhan University, Wuhan, 430072, China
Juan Deng
University of Aberdeen, Aberdeen, AB24 3FX, UK
Jeff Pan

Authors

Jinshuo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yusen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Juan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Donghong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Deng .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Peking University, Beijing, China
Baobao Chang
Soochow University, Suzhou, China
Deyi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Chen, Y., Deng, J., Ji, D., Pan, J. (2017). Collaborative Recognition and Recovery of the Chinese Intercept Abbreviation. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-69005-6_19
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics