Abstract
The domain-specific knowledge graph construction and its corresponding applications are gradually attracting the attention of researchers. However, the lack of professional knowledge and term datasets restricts the development of domain-specific knowledge graph. In the electric power field, knowledge graph has been verified effective in electric fault monitoring, power consumer service, and decision-making on dispatching. Although the electric power knowledge graph is of great application prospects, it is difficult for artificial intelligence experts to create professional knowledge and terms for knowledge graph construction. To assist the process of building electric power knowledge graph, we introduce a new Chinese electric term dataset (ELETerm) containing 10,043 terms. We make full use of reliable data resources from State Grid Jiangsu Electric Power Company Research Institute to extract terms. Our approach includes four stages: word extraction, candidate term selection, term expansion, and dataset generation. We give the statistics and analysis of the dataset. The dataset is publicly available under CC BY-SA 4.0 in github.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80
Giannakopoulos, A., Musat, C., Hossmann, A., Baeriswyl, M.: Unsupervised aspect term extraction with B-LSTM & CRF using automatically labelled datasets. arXiv preprint arXiv:1709.05094 (2017)
Han, X., Xu, L., Qiao, F.: CNN-BiLSTM-CRF model for term extraction in Chinese corpus. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 267–274. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_25
Hippisley, A., Cheng, D., Ahmad, K.: The head-modifier principle and multilingual term extraction. Nat. Lang. Eng. 11(2), 129–157 (2005)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Li, W., Zhao, J.: TextRank algorithm by exploiting Wikipedia for short text keywords extraction. In: 2016 3rd International Conference on Information Science and Control Engineering (ICISCE), pp. 683–686. IEEE (2016)
Luo, S., Sun, M.: Two-character Chinese word extraction based on hybrid of internal and contextual measures. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 24–30 (2003)
Ma, J., Zhang, Y., Yao, S., Zhang, B., Guo, C.: Terminology extraction for new energy vehicle based on BiLSTM_Attention_CRF model. Appl. Res. Comput. 36(05), 1385–9 (2019)
Noy, N.F., McGuinness, D.L., et al.: Ontology development 101: a guide to creating your first ontology (2001)
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory 1, 1–20 (2010)
Tseng, H., Chang, P.C., Andrew, G., Jurafsky, D., Manning, C.D.: A conditional random field word segmenter for SIGHAN bakeoff 2005. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (2005)
Vanegas, J.A., Matos, S., González, F., Oliveira, J.L.: An overview of biomolecular event extraction from scientific documents. Comput. Math. Methods Med. 2015 (2015)
Vu, T., Aw, A., Zhang, M.: Term extraction through unithood and termhood unification. In: Proceedings of the Third International Joint Conference on Natural Language Processing, vol. II (2008)
Wang, J., Wang, X., Ma, C., Kou, L.: A survey on the development status and application prospects of knowledge graph in smart grids. IET Gener. Transm. Distrib. 15(3), 383–407 (2021)
Wong, W.: Determination of unithood and termhood for term recognition. In: Handbook of Research on Text and Web Mining Technologies, pp. 500–529 (2009)
Wu, T., Qi, G., Li, C., Wang, M.: A survey of techniques for constructing Chinese knowledge graphs and their applications. Sustainability 10(9), 3245 (2018)
Acknowledgements
This work was supported by the Science and Technology Project of State Grid Jiangsu Electric Power Co., LTD. under Grant J2021129 Research on the construction technology of relay protection knowledge graph.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yang, Y., Song, L., Zhuang, S., Chen, S., Li, J. (2022). ELETerm: A Chinese Electric Power Term Dataset. In: Sun, M., et al. Knowledge Graph and Semantic Computing: Knowledge Graph Empowers the Digital Economy. CCKS 2022. Communications in Computer and Information Science, vol 1669. Springer, Singapore. https://doi.org/10.1007/978-981-19-7596-7_17
Download citation
DOI: https://doi.org/10.1007/978-981-19-7596-7_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7595-0
Online ISBN: 978-981-19-7596-7
eBook Packages: Computer ScienceComputer Science (R0)