Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books

Song, Bingyan; Bao, Zhenshan; Wang, YueZhang; Zhang, Wenbo; Sun, Chao

doi:10.1007/978-3-030-60457-8_39

Bingyan Song¹²,
Zhenshan Bao¹²,
YueZhang Wang¹²,
Wenbo Zhang¹² &
…
Chao Sun¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12431))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

2086 Accesses
4 Citations

Abstract

Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Tang, J., Bao, Y.: Traditional Chinese medicine. Lancet. 372(9654), 1938-1940 (2008)
Google Scholar
Meng, H., Xie, Q.: Automatic identification of TCM terminology in Shanghan Lun based on Conditional Random Field. J. Beijing Univ. Tradit. Chin. 38(9), 587–590 (2015)
Google Scholar
Ye, H., Ji, D.: Research on symptom and medicine information abstraction of TCM book Jin Gui Yao Lue based on conditional random field. Chin. J. Libr. Inf. Sci. Tradit. Chin. Med. 040(005), 14–17 (2016)
Google Scholar
Wang, G., Du, J.: POS tagging and feature recombination for ancient prose of TCM diagnosis. Comput. Eng. Design 3, 835–840 (2015)
Google Scholar
Li, M., Liu, Z.: LSTM-CRF based symptom term recognition on traditional Chinese medical case. J. Comput. Appl. 38(3), 835–840 (2018)
Google Scholar
Zhang, Y., Jie, Y.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, Melbourne, pp. 1554–1564 (2018)
Google Scholar
Ma, R., Peng, M., Zhang, Q., et al.: Simplify the usage of lexicon in Chinese NER. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5951–5960 (2020)
Google Scholar
Wang, Q., Zhou, Y., Ruan, T., et al.: Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J. Biomed. Inform. 92, 103133 (2019)
Article Google Scholar
Lu, N., Zheng, J., Wu, W., et al.: Chinese clinical named entity recognition with word-level information incorporating dictionaries. In: Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Stroudsburg, pp. 110–120 (2014)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167 (2008)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Siwei, L., Kang, L., Shizhu, H.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)
Article Google Scholar
Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2824–2829 (2018)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Lafferty, J.D., Mccallum, A., Pereira F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Google Scholar
Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Beijing University of Technology, Beijing, 100124, China
Bingyan Song, Zhenshan Bao, YueZhang Wang & Wenbo Zhang
College of Chinese Medicine, Capital Medical University, Beijing, 100069, China
Chao Sun

Authors

Bingyan Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhenshan Bao
View author publications
You can also search for this author in PubMed Google Scholar
YueZhang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenbo Zhang .

Editor information

Editors and Affiliations

ECE & Ingenuity Labs Research Institute, Queen’s University, Kingston, ON, Canada
Xiaodan Zhu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Min Zhang
School of Computer Science and Technology, Soochow University, Suzhou, China
Yu Hong
College of Intelligence and Computing, Tianjin University, Tianjin, China
Ruifang He

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, B., Bao, Z., Wang, Y., Zhang, W., Sun, C. (2020). Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-60457-8_39
Published: 02 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)