Skip to main content

ChEMU 2021: Reaction Reference Resolution and Anaphora Resolution in Chemical Patents

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2021)

Abstract

Chemical patents serve as an indispensable source of information about new discoveries of chemical compounds. The ChEMU (Cheminformatics Elsevier Melbourne University) lab addresses information extraction over chemical patents, and aims to advance the state of the art on this topic. ChEMU lab 2021, as part of the 12th Conference and Labs of the Evaluation Forum (CLEF-2021), will be the second ChEMU lab. ChEMU 2021 will provide two distinct tasks related to reference resolution in chemical patents. Task 1—Chemical Reaction Reference Resolution—focuses on paragraph-level references and aims to identify the chemical reactions or general conditions specified in one reaction description referred to by another. Task 2—Anaphora Resolution—focuses on expression-level references and aims to identify the reference relationships between expressions in chemical reaction descriptions. In this paper, we introduce ChEMU 2021, including its motivation, goals, tasks, resources, and evaluation framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our main website is http://chemu.eng.unimelb.edu.au.

  2. 2.

    Reaxys® Copyright ©2020 Elsevier Limited except certain content provided by third parties. Reaxys is a trademark of Elsevier Limited. https://www.reaxys.com.

References

  1. Akhondi, S.A., et al.: Automatic identification of relevant chemical compounds from patents. In: Database (2019)

    Google Scholar 

  2. Bada, M., et al.: Concept annotation in the CRAFT corpus. BMC Bioinform. 13, 161 (2012). https://doi.org/10.1186/1471-2105-13-161. https://www.ncbi.nlm.nih.gov/pubmed/22776079

  3. Baumgartner Jr, W.A., et al.: CRAFT shared tasks 2019 overview–integrated structure, semantics, and coreference. In: Proceedings of The 5th Workshop on BioNLP Open Shared Tasks, pp. 174–184 (2019)

    Google Scholar 

  4. Bregonje, M.: Patents: a unique source for scientific technical information in chemistry related industry? World Patent Inf. 27(4), 309–315 (2005)

    Article  Google Scholar 

  5. Cohen, K.B., et al.: Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles. BMC Bioinform. 18(1), 1–14 (2017)

    Article  Google Scholar 

  6. Dow, R.L., Liu, K.K.C., Morgan, B.P., Swick, A.G.: Glucocorticoid receptor modulators. European patent no. EP1175383B1 (2018)

    Google Scholar 

  7. Fang, B., Druckenbrodt, C., Akhondi, S.A., He, J., Baldwin, T., Verspoor, K.: ChEMU-Ref: a corpus for modeling anaphora resolution in the chemical domain. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, April 2021

    Google Scholar 

  8. Fang, B., et al.: ChEMU-ref dataset for modeling anaphora resolution in the chemical domain (2021). https://doi.org/10.17632/r28xxr6p92

  9. He, J., et al.: Overview of ChEMU 2020: named entity recognition and event extraction of chemical reactions from patents. In: Arampatzis, A., et al. (eds.) CLEF 2020. LNCS, vol. 12260, pp. 237–254. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58219-7_18

    Chapter  Google Scholar 

  10. Hu, M., Cinciruk, D., Walsh, J.M.: Improving automated patent claim parsing: dataset, system, and experiments. arXiv preprint arXiv:1605.01744 (2016)

  11. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: CHEMDNER: the drugs and chemical names extraction challenge. J. Cheminform. 7(S1), S1 (2015)

    Article  Google Scholar 

  12. Lupu, M., Mayer, K., Kando, N., Trippe, A.J.: Current Challenges in Patent Information Retrieval, vol. 37. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-53817-3

    Book  Google Scholar 

  13. Muresan, S., et al.: Making every SAR point count: the development of Chemistry Connect for the large-scale integration of structure and bioactivity data. Drug Discov. Today 16(23-24), 1019–1030 (2011)

    Google Scholar 

  14. Nguyen, D.Q., et al.: ChEMU: named entity recognition and event extraction of chemical reactions from patents. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 572–579. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_74

    Chapter  Google Scholar 

  15. Nguyen, N., Kim, J.D., Tsujii, J.: Overview of BioNLP 2011 protein coreference shared task. In: Proceedings of BioNLP Shared Task 2011 Workshop, pp. 74–82 (2011)

    Google Scholar 

  16. Ohta, T., Tateisi, Y., Kim, J.D., Mima, H., Tsujii, J.: The GENIA corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 82–86 (2002)

    Google Scholar 

  17. Park, K.D., et al.: Alpha-aminoamide derivative compound and pharmaceutical composition comprising same. European patent no. EP3202759A1 (2017)

    Google Scholar 

  18. Senger, S., Bartek, L., Papadatos, G., Gaulton, A.: Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents. J. Cheminform. 7(1), 1–12 (2015)

    Article  Google Scholar 

  19. Verspoor, K., et al.: ChEMU dataset for information extraction from chemical patents (2020). https://doi.org/10.17632/wy6745bjfj

  20. Yoshikawa, H., et al.: Detecting chemical reactions in patents. In: Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association, pp. 100–110. Australasian Language Technology Association, Sydney, Australia, 4–6 December 2019. https://www.aclweb.org/anthology/U19-1014

  21. Zhai, Z., et al.: Improving chemical named entity recognition in patents with contextualized word embeddings. In: Proceedings of the 18th BioNLP Workshop and Shared Task. pp. 328–338. Association for Computational Linguistics, Florence, Italy, August 2019. https://doi.org/10.18653/v1/W19-5035. https://www.aclweb.org/anthology/W19-5035

Download references

Acknowledgements

Funding for the ChEMU project is provided by an Australian Research Council Linkage Project, project number LP160101469, and Elsevier.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karin Verspoor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, J. et al. (2021). ChEMU 2021: Reaction Reference Resolution and Anaphora Resolution in Chemical Patents. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12657. Springer, Cham. https://doi.org/10.1007/978-3-030-72240-1_71

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72240-1_71

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72239-5

  • Online ISBN: 978-3-030-72240-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics