Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Ling, Maurice HT; Lefevre, Christophe; Nicholas, Kevin R.; Lin, Feng

doi:10.1007/978-3-540-75286-8_28

Maurice HT Ling^1,2,
Christophe Lefevre³,
Kevin R. Nicholas² &
…
Feng Lin¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4774))

Included in the following conference series:

IAPR International Workshop on Pattern Recognition in Bioinformatics

1093 Accesses
1 Citations

Abstract

The exponential increase in publication rate of new articles is limiting access of researchers to relevant literature. This has prompted the use of text mining tools to extract key biological information. Previous studies have reported extensive modification of existing generic text processors to process biological text. However, this requirement for modification had not been examined. In this study, we have constructed Muscorian, using MontyLingua, a generic text processor. It uses a two-layered generalization-specialization paradigm previously proposed where text was generically processed to a suitable intermediate format before domain-specific data extraction techniques are applied at the specialization layer. Evaluation using a corpus and experts indicated 86-90% precision and approximately 30% recall in extracting protein-protein interactions, which was comparable to previous studies using either specialized biological text processing tools or modified existing tools. Our study had also demonstrated the flexibility of the two-layered generalization-specialization paradigm by using the same generalization layer for two specialized information extraction tasks.

Download to read the full chapter text

Chapter PDF

Predicting protein functions by applying predicate logic to biomedical literature

Article Open access 08 February 2019

Roles for Text Mining in Protein Function Prediction

Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol

Keywords

References

Abulaish, M., Dey, L.: Biological relation extraction and query answering from MEDLINE abstracts using ontology-based text mining. Data & Knowledge Engineering 61, 228 (2007)
Article Google Scholar
Cappelletti, G., Galbiati, M., Ronchi, C., Maggioni, M.G., Onesto, E., Poletti, A.: Neuritin (cpg15) enhances the differentiating effect of NGF on neuronal PC12 cells. Journal of Neuroscience Research (2007)
Google Scholar
Chang, J.T., Schutze, H., Altman, R.B.: Creating an online dictionary of abbreviations from MEDLINE. Journal of the American Medical Informatics Association 9, 612–620 (2002)
Article Google Scholar
Chiang, J.H., Yu, H.C.: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 19, 1417–1422 (2003)
Article Google Scholar
Chiang, J.H., Yu, H.C., Hsu, H.J.: GIS: a biomedical text-mining system for gene information discovery. Bioinformatics 20(1), 120 (2004)
Article Google Scholar
Cooper, J.W., Kershenbaum, A.: Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information. BMC Bioinformatics 6, 143 (2005)
Article Google Scholar
Crystal, D.: The Cambridge Encyclopedia of Language, 2nd edn. Cambridge University Press, Cambridge (1997)
Google Scholar
Cunningham, H.: Software Architecture for Language Engineering. PhD Thesis. Department of Computer Science: University of Sheffield (2000)
Google Scholar
In: Cussens, J. (ed.): Proceedings of the Learning Languages in Logic Workshop 2005 (2005)
Google Scholar
Daniel, M.M., Hsinchun, C., Hua, S., Byron, B.M.: Extracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser. Bioinformatics 20, 3370 (2004)
Article Google Scholar
Daraselia, D., Yuryev, A., Egorov, S., Novichkova, S., Nikitin, A., Mazo, I.: Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 20, 604–611 (2004)
Article Google Scholar
David, P.A.C., Bernard, F.B., William, B.L., David, T.J.: BioRAT: extracting biological information from full-length papers. Bioinformatics 20, 3206 (2004)
Article Google Scholar
Efron, B., Tibshirani, R.: Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science 1, 54–75 (1986)
Article MathSciNet Google Scholar
Eslick, I., Liu, H.: Langutils – A natural language toolkit for Common Lisp. In: Proceedings of the International Conference on Lisp 2005 (2005)
Google Scholar
Friedman, C., Alderson, P.O., Austin, J.H., Cimino, J.J., Johnson, S.B.: A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association 1, 161–174 (1994)
Google Scholar
Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A.: GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17, S74–S82 (2001)
Google Scholar
Grover, C., Klein, E., Lascarides, A., Lapata, M.: XML-based NLP Tools for Analysing and Annotating Medical Language. In: Proc. of the 2nd Int. Workshop on NLP and XML (NLPXML-2002), Taipei (2002)
Google Scholar
Han, Y., Chen, X., Shi, F., Li, S., Huang, J., Xie, M., Hu, L., Hoidal, J.R., Xu, P.: CPG15, A New Factor Upregulated after Ischemic Brain Injury, Contributes to Neuronal Network Re-Establishment after Glutamate-Induced Injury. Journal of Neurotrauma 24, 722–731 (2007)
Article Google Scholar
Hu, Z., Narayanaswamy, M., Ravikumar, K., Vijay-Shanker, K., Wu, C.: Literature mining and database annotation of protein phosphorylation using a rule-based system. Bioinformatics 21, 2759–2765 (2005)
Article Google Scholar
Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nature Review Genetics 7, 119–129 (2006)
Article Google Scholar
Jenssen, T.K., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 28, 21–28 (2001)
Article Google Scholar
Ling, M.H.T.: An Anthological Review of Research Utilizing MontyLingua, a Python-Based End-to-End Text Processor. The Python Papers 1, 5–12 (2006)
Google Scholar
Liu, H., Singh, P.: ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal 22, 211–226 (2004)
Article Google Scholar
Malik, R., Franke, L., Siebes, A.: Combination of text-mining algorithms increases the performance. Bioinformatics 22, 2151–2157 (2006)
Article Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330 (1993)
Google Scholar
Masseroli, M., Kilicoglu, H., Lang, F.M., Rindflesch, T.: Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease. BMC Bioinformatics 7, 291 (2006)
Article Google Scholar
Nasukawa, T., Nagono, T.: Text analysis and knowledge mining system. IBM System Journal 40, 967–984 (2001)
Article Google Scholar
National Library of Medicine, UMLS Knowledge Sources, 14th edn. (2003)
Google Scholar
Novichkova, S., Egorov, S., Daraselia, N.: MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics 19, 1699–1706 (2003)
Article Google Scholar
Rebholz-Schuhmann, D., Kirsch, H., Couto, F.: Facts from Text - Is Text Mining Ready to Deliver? PLoS Biology 3, e65 (2005)
Article Google Scholar
Santos, C., Eggle, D., States, D.J.: Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction. Bioinformatics 21, 1653–1658 (2005)
Article Google Scholar
Sleator, D., Temperley, D.: Parsing English with a Link Grammar. In: Proceedings of the 3rd International Workshop on Parsing Technologies (1991)
Google Scholar
Smith, L., Rindflesch, T., Wilbur, W.J.: MedPost: a part-of-speech tagger for bioMedical text. Bioinformatics 20, 2320–2321 (2004)
Article Google Scholar
Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine 30, 7–18 (1986)
Google Scholar
van Eck, N.J., van den Berg, J.: A novel algorithm for visualizing concept associations. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, Springer, Heidelberg (2005)
Google Scholar
Uramoto, N., Matsuzawa, H., Nagano, T., Murakami, A., Takeuchi, H., Takeda, K.: A text-mining system for knowledge discovery from biomedical documents. IBM System Journal 43, 516–533 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

BioInformatics Research Centre, Nanyang Technological University, Singapore
Maurice HT Ling & Feng Lin
CRC for Innovative Dairy Products, Department of Zoology, The University of Melbourne, Australia
Maurice HT Ling & Kevin R. Nicholas
Victorian Bioinformatics Consortium, Monash University, Australia
Christophe Lefevre

Authors

Maurice HT Ling
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Lefevre
View author publications
You can also search for this author in PubMed Google Scholar
Kevin R. Nicholas
View author publications
You can also search for this author in PubMed Google Scholar
Feng Lin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jagath C. Rajapakse Bertil Schmidt Gwenn Volkert

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ling, M.H., Lefevre, C., Nicholas, K.R., Lin, F. (2007). Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates. In: Rajapakse, J.C., Schmidt, B., Volkert, G. (eds) Pattern Recognition in Bioinformatics. PRIB 2007. Lecture Notes in Computer Science(), vol 4774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75286-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-75286-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75285-1
Online ISBN: 978-3-540-75286-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Abstract

Chapter PDF

Similar content being viewed by others

Predicting protein functions by applying predicate logic to biomedical literature

Roles for Text Mining in Protein Function Prediction

Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Reconstruction of Protein-Protein Interaction Pathways by Mining Subject-Verb-Objects Intermediates

Abstract

Chapter PDF

Similar content being viewed by others

Predicting protein functions by applying predicate logic to biomedical literature

Roles for Text Mining in Protein Function Prediction

Automated Extraction and Visualization of Protein–Protein Interaction Networks and Beyond: A Text-Mining Protocol

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation