Abstract
As the processing power of mobile terminals increases, wireless network applications such as voice assistants can put more context-sensitive tasks on the mobile terminals, thus reducing the wireless network bandwidth needed and the cost of data storage in the cloud. Co-reference annotation, identifying the same semantics in context, is one of the critical techniques in these tasks. However, there are some problems with the existing co-reference annotation standards. First, the annotation is incomplete. Second, the types of annotated mentions are inconsistent. Third, there are currently no metrics for the above characteristics. Analyzing the above-mentioned issues, this paper proposes a new co-reference annotation standard. The new standard can annotate more semantics and co-reference relations and only adopts two types of mentions for annotation. Meanwhile, this paper presents a performance evaluation corpus and designs three performance metrics for evaluating the new standard according to the completeness of semantic annotation, the completeness of co-reference annotation, and the consistency of mention. The experiment shows that the new standard outperforms all the baseline methods and achieves 0.95 in the completeness of semantic annotation, 0.68 in the completeness of co-reference annotation, and 0.57 in the consistency of types of mentions.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information file.
References
Cybulska, A., & Vossen, P. (2014). Guidelines for ECB+ annotation of events and their coreference. Retrieved from http://www.newsreader-project.eu/files/2013/01/NWR-2014-1.pdf
Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., & Dagan, I. (2019). Revisiting joint modeling of cross-document entity and event coreference resolution. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 4179–4189). Presented at the ACL 2019, Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1409
Soon, W. M., Ng, H. T., & Lim, D. C. Y. (2001). A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4), 521–544. https://doi.org/10.1162/089120101753342653
Moosavi, N. S., & Strube, M. (2017). Lexical features in coreference resolution: To be used with caution. In Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 2: Short Papers) (Vol. 2, pp. 14–19). Presented at the ACL 2017, Vancouver, Canada: Association for computational linguistics. https://doi.org/10.18653/v1/P17-2003
Xu, Y., Xia, B., Wan, Y., Zhang, F., Xu, J., & Ning, H. (2021). CDCAT: A multi-language cross-document entity and event coreference annotation tool. Tsinghua Science and Technology, 27(3), 589–598. https://doi.org/10.26599/TST.2020.9010060
Pradhan, S., Moschitti, A., Xue, N., Uryupina, O., & Zhang, Y. (2012). CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In Proceedings of the shared task: Modeling multilingual unrestricted coreference in OntoNotes (pp. 1–40). Presented at the joint conference on EMNLP and CoNLL, Jeju Island, Korea: Association for Computational Linguistics. Retrieved from https://aclanthology.org/W12-4501
Wu, W., Wang, F., Yuan, A., Wu, F., & Li, J. (2020). CorefQA: Coreference resolution as query-based span prediction. In Proceedings of the 58th annual meeting of the association for computational linguistics. Presented at the ACL 2020, Online. Retrieved from https://virtual.acl2020.org/paper_main.622.html
Luan, Y., He, L., Ostendorf, M., & Hajishirzi, H. (2018). Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 3219–3232). Presented at the EMNLP 2018, Brussels, Belgium: Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1360
Kang, Y., Ou, R., Zhang, Y., Li, H., & Tian, S. (2022). PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination. Tsinghua Science and Technology, 27(4), 680–691. https://doi.org/10.26599/TST.2021.9010059
Liao, X., Zheng, D., & Cao, X. (2021). Coronavirus pandemic analysis through tripartite graph clustering in online social networks. Big Data Mining and Analytics, 4(4), 242–251. https://doi.org/10.26599/BDMA.2021.9020010
Humphreys, K., Gaizauskas, R., & Azzam, S. (1997). Event coreference for information extraction. In Proceedings of a workshop on operational factors in practical, robust anaphora resolution for unrestricted texts (pp. 75–81). Madrid, Spain. https://doi.org/10.3115/1598819.1598830
Xiong, A., Liu, D., Tian, H., Liu, Z., Yu, P., & Kadoch, M. (2021). News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Science and Technology, 26(6), 886–893. https://doi.org/10.26599/TST.2020.9010051
Peng, C., Zhang, C., Xue, X., Gao, J., Liang, H., & Niu, Z. (2022). Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification. Tsinghua Science and Technology, 27(4), 664–679. https://doi.org/10.26599/TST.2021.9010055
Bai, H., Yang, Y., & Wang, J. (2022). Exploiting more associations between slots for multi-domain dialog state tracking. Big Data Mining and Analytics, 5(1), 41–52.
Cybulska, A., & Vossen, P. (2014). Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In Proceedings of the ninth international conference on language resources and evaluation (pp. 4545–4552). Presented at the LREC 2014. Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/840_Paper.pdf
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006). OntoNotes: The 90% Solution. In Proceedings of the human language technology conference of the NAACL, companion volume: Short papers (pp. 57–60). Presented at the HLT-NAACL 2006, New York City, USA: Association for Computational Linguistics. https://doi.org/10.3115/1614049.1614064
Zeldes, A. (2017). The GUM corpus: Creating multilayer resources in the classroom. Language Resources and Evaluation, 51(3), 581–612. https://doi.org/10.1007/s10579-016-9343-x
Walker, C., Strassel, S., Medero, J., & Maeda, K. (2006). ACE 2005 multilingual training corpus. Retrieved April 10, 2022 from https://catalog.ldc.upenn.edu/LDC2006T06
Bhardwaj, N., & Sharma, P. (2021). An advanced uncertainty measure using fuzzy soft sets: application to decision-making problems. Big Data Mining and Analytics, 4(2), 94–103. https://doi.org/10.26599/BDMA.2020.9020020
McNamee, P., & Dang, H. T. (2009). Overview of the TAC 2009 knowledge base population track. In Text analysis conference (TAC) (pp. 111–113).
Bagga, A., & Baldwin, B. (1998). Entity-based cross-document coreferencing using the vector space model. In Proceedings of the 17th international conference on Computational linguistics (Vol. 1). Presented at the COLING 1998, Montreal, Quebec, Canada. https://doi.org/10.3115/980845.980859
Sandhaus, E. (2008). The New York times annotated corpus. Linguistic Data Consortium. https://doi.org/10.35111/77BA-9X74
Lu, J., & Ng, V. (2018). Event coreference resolution: A survey of two decades of research. In Proceedings of the twenty-seventh international joint conference on artificial intelligence (pp. 5479–5486). Presented at the IJCAI-18, Stockholm, Sweden. https://doi.org/10.24963/ijcai.2018/773
Acknowledgements
The authors would like to thank the editors and the reviewers who made valuable comments that helped us improve this paper.
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no financial interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, Y., Farha, F., Wan, Y. et al. Improving completeness and consistency of co-reference annotation standard. Wireless Netw (2022). https://doi.org/10.1007/s11276-022-03077-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11276-022-03077-8