An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper)

Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S.

doi:10.1007/978-3-319-48472-3_43

Joshua Valdez²⁰,
Michael Rueschman²¹,
Matthew Kim²¹,
Susan Redline²¹ &
…
Satya S. Sahoo²⁰

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10033))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1684 Accesses
2 Citations

Abstract

Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called “Principles of Rigor and Reproducibility”. In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Courier New font is used to represent ontology classes. The provcare namespace refers to http://www.case.edu/ProvCaRe/provcare. The ProvCaRe ontology is available at: https://sites.google.com/a/case.edu/bmhinformaticsgroup/research/provcare/.

References

Sahoo, S.S., Valdez, J., Rueschman, M.: Scientific reproducibility in biomedical research: provenance metadata ontology for semantic annotation of study description. In: American Medical Informatics Association (AMIA) Annual Symposium, Chicago (2016)
Google Scholar
Collins, F.S., Tabak, L.A.: Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014)
Article Google Scholar
Landis, S.C., Amara, S.G., Asadullah, K., Austin, C.P., Blumenstein, R., Bradley, E.W., Crystal, R.G., Darnell, R.B., Ferrante, R.J., Fillit, H., Finkelstein, R., Fisher, M., Gendelman, H.E., Golub, R.M., Goudreau, J.L., Gross, R.A., Gubitz, A.K., Hesterlee, S.E., Howells, D.W., Huguenard, J., Kelner, K., Koroshetz, W., Krainc, D., Lazic, S.E., Levine, M.S., Macleod, M.R., McCall, J.M., Moxley III, R.T., Narasimhan, K., Noble, L.J., Perrin, S., Porter, J.D., Steward, O., Unger, E., Utz, U., Silberberg, S.D.: A call for transparent reporting to optimize the predictive value of preclinical research. Nature 490, 187–191 (2012)
Article Google Scholar
Dean, D.A., Goldberger, A.L., Mueller, R., Kim, M., Rueschman, M., Mobley, D., Sahoo, S.S., Jayapandian, C.P., Cui, L., Morrical, M.G., Surovec, S., Zhang, G.Q., Redline, S.: Scaling up scientific discovery in sleep medicine: the National Sleep Research Resource. SLEEP 39, 1151–1164 (2016)
Google Scholar
Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. IMIA Year Book of Med. Inf. 47, 128–144 (2008)
Google Scholar
Crowley, R.S., Castine, M., Mitchell, K.J., Chavan, G., McSherry, T., Feldman, M.: caTIES—a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research. J. Am. Med. Inform. Assoc. 17, 253–264 (2010)
Article Google Scholar
Friedman, C.: A broad coverage natural language processing system. In: AMIA Fall Symposium, pp. 270–274 (2000)
Google Scholar
Jain, N.L., Knirsch, C.A., Friedman, C., Hripcsak, G.: Identification of suspected tuberculosis patients based on natural language processing of chest radiograph reports. In: AMIA Fall Symposium, Philadelphia, pp. 542–546 (1996)
Google Scholar
Sneiderman, C.A., Rindflesch, T.C., Bean, C.A.: Identification of anatomical terminology in medical text. In: AMIA Fall Symposium, pp. 428–432 (1998)
Google Scholar
Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inf. Assoc. 17, 229–236 (2010)
Article Google Scholar
Aronson, A.R.: MetaMap: Mapping Text to the UMLS Metathesaurus, US NLM 2006 (2006)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, 267–270 (2004)
Article Google Scholar
Jonquet, C., Shah, N.M., Musen, M.A.: The open biomedical annotator. Presented at the AMIA Summit on Translat Bioinformatics, San Francisco (2009)
Google Scholar
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inform. Assoc. 17, 507–513 (2010)
Article Google Scholar
Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10, 327–348 (2004)
Article Google Scholar
OpenNLP. http://opennlp.sourceforge.net/index.html
Gottlieb, D.J., Punjabi, N.M., Mehra, R., Patel, S.R., Quan, S.F., Babineau, D.C., Tracy, R.P., Rueschman, M., Blumenthal, R.S., Lewis, E.F., Bhatt, D.L., Redline, S.: CPAP versus oxygen in obstructive sleep apnea. New England J. Med. 370, 2276–2285 (2014)
Article Google Scholar
Moreau, L., Missier, P.: PROV Data Model (PROV-DM), World Wide Web Consortium W3C 2013 (2013)
Google Scholar

Download references

Acknowledgement

This work is supported in part by the NIH-NIBIB Big Data to Knowledge (BD2 K) 1U01EB020955 and NIH-NHLBI R24 HL114473 grants.

Author information

Authors and Affiliations

Division of Medical Informatics and Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH, USA
Joshua Valdez & Satya S. Sahoo
Departments of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard University, Boston, MA, USA
Michael Rueschman, Matthew Kim & Susan Redline

Authors

Joshua Valdez
View author publications
You can also search for this author in PubMed Google Scholar
Michael Rueschman
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Kim
View author publications
You can also search for this author in PubMed Google Scholar
Susan Redline
View author publications
You can also search for this author in PubMed Google Scholar
Satya S. Sahoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satya S. Sahoo .

Editor information

Editors and Affiliations

ADAPT Centre, Trinity College Dublin, Dublin 2, Ireland
Christophe Debruyne
University of Lorraine, Vandoeuvre-les-Nancy, France
Hervé Panetto
TU Graz, Graz, Austria
Robert Meersman
La Trobe University, Melbourne, Australia
Tharam Dillon
Institute of Computer Languages, TU Wien, Vienna, Austria
eva Kühn
ADAPT Centre, Trinity College Dublin, Dublin 2, Ireland
Declan O'Sullivan
Università degli Studi di Milano Crema, Crema, Italy
Claudio Agostino Ardagna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valdez, J., Rueschman, M., Kim, M., Redline, S., Sahoo, S.S. (2016). An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper). In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-48472-3_43
Published: 18 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48471-6
Online ISBN: 978-3-319-48472-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics