A Framework to Automatically Extract Funding Information from Text

Kayal, Subhradeep; Afzal, Zubair; Tsatsaronis, George; Doornenbal, Marius; Katrenko, Sophia; Gregory, Michelle

doi:10.1007/978-3-030-13709-0_27

Subhradeep Kayal¹⁷,
Zubair Afzal¹⁷,
George Tsatsaronis¹⁷,
Marius Doornenbal¹⁷,
Sophia Katrenko¹⁷ &
…
Michelle Gregory¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11331))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

2238 Accesses

Abstract

Many would argue that the currency of research is citations; however, researchers and funding organizations alike are lacking tools with which they can explore how this currency translates to funding opportunities. Motivated by this need, in this paper we address one of the fundamental problems facing the development of such a tool, namely the problem of automatically extracting funding information from scientific articles. For this purpose, we experiment with a two-stage framework which ingests text, filters paragraphs which contain funding information, and then combines sequential learning methods to detect named entities in a novel ensemble approach. We present a comparative analysis of each independent component of this pipeline, named FundingFinder, the results of which indicate that the said pipeline can extract the funding organizations and the associated grants, from scientific articles, accurately and efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chieu, H.L.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 2002 International Conference on Computational Linguistics, pp. 190–196 (2002)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20(1), 37 (1960)
Article Google Scholar
Curran, J.R.: From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh (2003)
Google Scholar
Giles, C.L., Councill, I.G.: Who gets acknowledged: measuring scientific contributions through automatic acknowledgment indexing. Proc. Natl. Acad. Sci. U.S.A. 101, 17599–17604 (2004)
Article Google Scholar
Jonnalagadda, S., Topham, P.: NEMO: extraction and normalization of organization names from pubmed affiliation strings. J. Biomed. Discov. Collab. 5, 50–75 (2010)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 188–191 (2003)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investig. 30(1), 3–26 (2007)
Article Google Scholar
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33, 1–39 (2010)
Article Google Scholar
Yu, W., Yesupriya, A., Wulf, A., Qu, J., Gwinn, M., Khoury, M.J.: An automatic method to generate domain-specific investigator networks using pubmed abstracts. BMC Med. Inf. Decis. Making 7(1), 17 (2007)
Article Google Scholar
Zhou, G., Su, J.: Named entity recognition using an HMM-based chunk tagger. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 473–480 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Content and Innovation Group, Elsevier B.V., Amsterdam, Netherlands
Subhradeep Kayal, Zubair Afzal, George Tsatsaronis, Marius Doornenbal, Sophia Katrenko & Michelle Gregory

Authors

Subhradeep Kayal
View author publications
You can also search for this author in PubMed Google Scholar
Zubair Afzal
View author publications
You can also search for this author in PubMed Google Scholar
George Tsatsaronis
View author publications
You can also search for this author in PubMed Google Scholar
Marius Doornenbal
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Katrenko
View author publications
You can also search for this author in PubMed Google Scholar
Michelle Gregory
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Subhradeep Kayal .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy and University of Reading, Reading, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton
IBM, Tivoli Research Lab, Rome, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kayal, S., Afzal, Z., Tsatsaronis, G., Doornenbal, M., Katrenko, S., Gregory, M. (2019). A Framework to Automatically Extract Funding Information from Text. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-13709-0_27
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics