Abstract
Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tang, J., Bao, Y.: Traditional Chinese medicine. Lancet. 372(9654), 1938-1940 (2008)
Meng, H., Xie, Q.: Automatic identification of TCM terminology in Shanghan Lun based on Conditional Random Field. J. Beijing Univ. Tradit. Chin. 38(9), 587–590 (2015)
Ye, H., Ji, D.: Research on symptom and medicine information abstraction of TCM book Jin Gui Yao Lue based on conditional random field. Chin. J. Libr. Inf. Sci. Tradit. Chin. Med. 040(005), 14–17 (2016)
Wang, G., Du, J.: POS tagging and feature recombination for ancient prose of TCM diagnosis. Comput. Eng. Design 3, 835–840 (2015)
Li, M., Liu, Z.: LSTM-CRF based symptom term recognition on traditional Chinese medical case. J. Comput. Appl. 38(3), 835–840 (2018)
Zhang, Y., Jie, Y.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, Melbourne, pp. 1554–1564 (2018)
Ma, R., Peng, M., Zhang, Q., et al.: Simplify the usage of lexicon in Chinese NER. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5951–5960 (2020)
Wang, Q., Zhou, Y., Ruan, T., et al.: Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J. Biomed. Inform. 92, 103133 (2019)
Lu, N., Zheng, J., Wu, W., et al.: Chinese clinical named entity recognition with word-level information incorporating dictionaries. In: Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Stroudsburg, pp. 110–120 (2014)
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167 (2008)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Siwei, L., Kang, L., Shizhu, H.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)
Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2824–2829 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Lafferty, J.D., Mccallum, A., Pereira F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, B., Bao, Z., Wang, Y., Zhang, W., Sun, C. (2020). Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books. In: Zhu, X., Zhang, M., Hong, Y., He, R. (eds) Natural Language Processing and Chinese Computing. NLPCC 2020. Lecture Notes in Computer Science(), vol 12431. Springer, Cham. https://doi.org/10.1007/978-3-030-60457-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-60457-8_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60456-1
Online ISBN: 978-3-030-60457-8
eBook Packages: Computer ScienceComputer Science (R0)