Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles

Martin, Eric P. G.; Bremer, Eric G.; Guerin, Marie-Claude; DeSesa, Catherine; Jouve, Olivier

doi:10.1007/978-3-540-30478-4_9

Eric P. G. Martin²¹,
Eric G. Bremer²²,
Marie-Claude Guerin²¹,
Catherine DeSesa²³ &
…
Olivier Jouve²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3303))

Included in the following conference series:

International Symposium on Knowledge Exploration in Life Science Informatics

349 Accesses
3 Citations

Abstract

The challenge of knowledge management in the pharmaceutical industry is twofold. First it has to address the integration of sequence data with the vast and growing body of data from functional analysis of genes with the information in huge historical archival databases. Second, as the number of biomedical publications exponentially increases (Medline now contains more than 13 million records), researchers require assistance in order to broaden their vision and comprehension of scientific domains. Analogous to data mining in the sense that it uncovers relationships in information, text mining uncovers relationships in a text collection and leverages the creativity of the knowledge worker in the exploration of these relationships and in the discovery of new knowledge. We describe herein a text mining method to automatically detect protein interactions which are described across a large amount of scientific publications. This method relies on natural language processing to identify protein names, their synonyms and the various interactions they can bear with other proteins. We have then compared text mining analysis on abstracts to the same kind of analysis on full text articles to assess how much information is lost when only abstracts are processed. Our results show that: 1)LexiQuest Mine is a very versatile and accurate tool when mining biomedical literature to analyze interactions between proteins. 2)Mining only abstracts can be sufficient and time saving for applications that do not require a high level of detail on a large scale whereas mining full text articles is to be chosen for more exhaustive applications designed to address a specific issue. Availability: LexiQuest Mine is available for commercial licensing from SPSS, Inc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

National Library of Medicine’s bibliographic database, at http://www.ncbi.nlm.nih.gov
Fukuda, K., et al.: Toward information extraction: identifying protein names from biological papers. Pac. Symp. Biocomput., 707–718 (1998)
Google Scholar
Narayanaswamy, M., Ravikumar, K.E., Vijay-Shanker, K.: A biological named entity recognizer. Pac. Symp. Biocomput., 427–438 (2003)
Google Scholar
Krauthammer, M., et al.: Using BLAST for identifying gene and protein names in journal articles. Gene 259(1-2), 245–252 (2000)
Article Google Scholar
Hanisch, D., et al.: Playing biology’s name game: identifying protein names in scientific text. Pac. Symp. Biocomput., 403–414 (2003)
Google Scholar
Egorov, S., Yuryev, A., Daraselia, N.: A simple and practical dictionary-based approach for identification of proteins in Medline abstracts. J Am. Med. Inform. Assoc. 11(3), 174–178 (2004)
Article Google Scholar
Hatzivassiloglou, V., Duboue, P.A., Rzhetsky, A.: Disambiguating proteins, genes, and RNA in text: a machine learning approach. Bioinformatics 17, S97–S106 (2001)
Google Scholar
Wilbur, W.J., et al.: Analysis of biomedical text for chemical names: a comparison of three methods. In: Proc AMIA Symp, pp. 176–180 (1999)
Google Scholar
Collier, N., Nobata, C., Tsujii, T.: Extraction of name of genes and gene products with a Hidden Markov Model. In: COLING conference proceedings (2000)
Google Scholar
Kazama, J., et al.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: Proceedings of the Natural Language Processing in the Biomedical Domain (2002)
Google Scholar
Ono, T., et al.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17(2), 155–161 (2001)
Article Google Scholar
Wong, L.: PIES, a protein interaction extraction system. Pac Symp Biocomput, 520–531 (2001)
Google Scholar
Humphreys, K., Demetriou, G., Gaizauskas, R.: Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac. Symp. Biocomput., 505–516 (2000)
Google Scholar
Park, J.C., Kim, H.S., Kim, J.J.: Bidirectional incremental parsing for automatic path- way identification with combinatory categorial grammar. Pac. Symp. Biocomput., 396–407 (2001)
Google Scholar
Pustejovsky, J., et al.: Robust relational parsing over biomedical literature: extracting inhibit relations. Pac. Symp. Biocomput., 362–373 (2002)
Google Scholar
Yakushiji, A., et al.: Event extraction from biomedical papers using a full parser. Pac Symp Biocomput, 408–419 (2001)
Google Scholar
Sekimizu, T., Park, H.S., Tsujii, J.: Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. In: Genome Inform Ser Workshop Genome Inform, vol. 9, pp. 62–71 (1998)
Google Scholar
Rindflesch, T.C., et al.: EDGAR: extraction of drugs, genes and relations from the bio- medical literature. Pac. Symp. Biocomput., 517–528 (2000)
Google Scholar
Ng, S.K., Wong, M.: Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. In: Genome Inform Ser Workshop Genome Inform, vol. 10, pp. 104–112 (1999)
Google Scholar
http://dip.doe-mbi.ucla.edu
Corney, D.P., et al.: BioRAT: extracting biological information from full-length papers. Bioinformatics 20(17), 3206–3213 (2004)
Article Google Scholar
http://bioinf.cs.ucl.ac.uk/biorat/
http://www.spss.com/lexiquest/lexiquest_mine.htm
http://www.spss.com/lexiquest/lexiquest_categorize.htm
http://www.spss.com/lexiquest/text_mining_for_clementine.htm
Franzen, K., et al.: Protein names and how to find them. Int. J Med. Inf. 67(1-3), 49–61 (2002)
Article Google Scholar
Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)
Article Google Scholar
Blaschke, C., Valencia, A.: The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform Ser Workshop Genome Inform 12, 123–134 (2001)
Google Scholar
Hu, X., et al.: Extracting and Mining Protein-Protein InteractionNetwork from Biomedical Literature. In: Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2004)
Google Scholar
Daraselia, N., et al.: Extracting human protein interactions from MEDLINE using a full- sentence parser. Bioinformatics 20(5), 604–611 (2004)
Article Google Scholar
Huang, M., et al.: Discovering patterns to extract protein-protein interactions from full texts. Bioinformatics (2004)
Google Scholar
Marcotte, E.M., Xenarios, I., Eisenberg, D.: Mining literature for protein-protein interactions. Bioinformatics 17(4), 359–363 (2001)
Article Google Scholar
Temkin, J.M., Gilder, M.R.: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 19(16), 2046–2053 (2003)
Article Google Scholar
General Architecture for Text Engineering: http://gate.ac.uk/
Eisen, M.B., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95(25), 14863–14868 (1998)
Article Google Scholar
Wen, X., et al.: Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95(1), 334–339 (1998)
Article Google Scholar
Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. U S A 96(6), 2907–2912 (1999)
Article Google Scholar
Brown, M.P., et al.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. U S A 97(1), 262–267 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

SPSS, Tour Europlazza, La Défense 4, F-92925 Cedex, Paris-la-Défense, France
Eric P. G. Martin, Marie-Claude Guerin & Olivier Jouve
Children’s Memorial Hospital and Northwestern University, Chicago, IL, 60614, USA
Eric G. Bremer
SPSS, 233 S. Wacker Drive, Chicago, IL, 60606, USA
Catherine DeSesa

Authors

Eric P. G. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Eric G. Bremer
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Claude Guerin
View author publications
You can also search for this author in PubMed Google Scholar
Catherine DeSesa
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Jouve
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Sciences, School of Mathematics and Computing, University of Southern Queensland, 4350, Toowoomba, QLD, Australia
Jesús A. López
Istituto di Ricerche Farmacologiche “Mario Negri”, Via Eritrea 62, 20157, Milano, Italy
Emilio Benfenati
School of Biomedial Sciences, Bioinformatics Research Group, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland, UK
Werner Dubitzky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Martin, E.P.G., Bremer, E.G., Guerin, MC., DeSesa, C., Jouve, O. (2004). Analysis of Protein/Protein Interactions Through Biomedical Literature: Text Mining of Abstracts vs. Text Mining of Full Text Articles. In: López, J.A., Benfenati, E., Dubitzky, W. (eds) Knowledge Exploration in Life Science Informatics. KELSI 2004. Lecture Notes in Computer Science(), vol 3303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30478-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-30478-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23927-7
Online ISBN: 978-3-540-30478-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics