Skip to main content

Virus Causes Flu: Identifying Causality in the Biomedical Domain Using an Ensemble Approach with Target-Specific Semantic Embeddings

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2021)

Abstract

Identification of Cause-Effect (CE) relation is crucial for creating a scientific knowledge-base and facilitate question-answering in the biomedical domain. An example sentence having CE relation in the biomedical domain (precisely Leukemia) is: viability of THP-1 cells was inhibited by COR. Here, COR is the cause argument, viability of THP-1 cells is the effect argument and inhibited is the trigger word creating a causal scenario. Notably CE relation has a temporal order between cause and effect arguments. In this paper, we harness this property and hypothesize that the temporal order of CE relation can be captured well by the Long Short Term Memory (LSTM) network with independently obtained semantic embeddings of words trained on the targeted disease data. These focused semantic embeddings of words overcome the labeled data requirement of the LSTM network. We extensively validate our hypothesis using three types of word embeddings, viz., GloVe, PubMed, and target-specific where the target (focus) is Leukemia. We obtain a statistically significant improvement in the performance with LSTM using GloVe and target-specific embeddings over other baseline models. Furthermore, we show that an ensemble of LSTM models gives a significant improvement (\(\sim \)3%) over the individual models as per the t-test. Our CE relation classification system’s results generate a knowledge-base of 277478 CE relation mentions using a rule-based approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Causal questions are frequently used in general on Web. Naver Knowledge iN, http://kin.naver.com reported 130,000 causal questions from 950,000 sentence-sized database [18].

  2. 2.

    https://en.wikipedia.org/wiki/Leukemia.

  3. 3.

    Download: https://nlp.stanford.edu/projects/glove/.

  4. 4.

    Available for download: http://evexdb.org/pmresources/vec-space-models/.

References

  1. Ananiadou, S., Mcnaught, J.: Text mining for biology and biomedicine. Citeseer (2006)

    Google Scholar 

  2. Berry, K.J., Mielke, P.W., Jr.: A generalization of cohen’s kappa agreement measure to interval measurement and multiple raters. Educ. Psychol. Meas. 48(4), 921–933 (1988)

    Article  Google Scholar 

  3. Chang, D.-S., Choi, K.-S.: Causal relation extraction using cue phrase and lexical pair probabilities. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 61–70. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30211-7_7

    Chapter  Google Scholar 

  4. Cohen, K.B., Hunter, L.: Getting started in text mining. PLoS Comput. Biol. 4(1), e20 (2008)

    Article  Google Scholar 

  5. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)

    Article  Google Scholar 

  6. Do, Q.X., Chan, Y.S., Roth, D.: Minimally supervised event causality identification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 294–303. Association for Computational Linguistics (2011)

    Google Scholar 

  7. Garcia, D.: COATIS, an NLP system to locate expressions of actions connected by causality links. In: Plaza, E., Benjamins, R. (eds.) EKAW 1997. LNCS, vol. 1319, pp. 347–352. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0026799

    Chapter  Google Scholar 

  8. Girju, R.: Automatic detection of causal relations for question answering. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering-Volume 12, pp. 76–83. Association for Computational Linguistics (2003)

    Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  10. Joskowicz, L., Ksiezyck, T., Grishman, R.: Deep domain models for discourse analysis. In: AI Systems in Government Conference, 1989, Proceedings of the Annual, pp. 195–200. IEEE (1989)

    Google Scholar 

  11. Kaplan, R.M., Berry-Rogghe, G.: Knowledge-based acquisition of causal relationships in text. Knowl. Acquisition 3(3), 317–337 (1991)

    Article  Google Scholar 

  12. Khoo, C.S., Kornfilt, J., Oddy, R.N., Myaeng, S.H.: Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary Linguist. Comput. 13(4), 177–186 (1998)

    Article  Google Scholar 

  13. Kim, H.D., et al.: Incatomi: integrative causal topic miner between textual and non-textual time series data. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2689–2691. ACM (2012)

    Google Scholar 

  14. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 136–140. IEEE (2015)

    Google Scholar 

  15. MIHÄ‚ILÄ‚, C., Ananiadou, S.: Recognising discourse causality triggers in the biomedical domain. J. Bioinform. Comput. Biol. 11(06), 1343008 (2013)

    Google Scholar 

  16. Mihăilă, C., Ohta, T., Pyysalo, S., Ananiadou, S.: Biocause: annotating and analysing causality in the biomedical domain. BMC Bioinform. 14(1), 2 (2013)

    Article  Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  18. Moldovan, D., Paşca, M., Harabagiu, S., Surdeanu, M.: Performance issues and error analysis in an open-domain question answering system. ACM Trans. Inf. Syst. (TOIS) 21(2), 133–154 (2003)

    Article  Google Scholar 

  19. Pedregosa, F., et al.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

    Google Scholar 

  20. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. EMNLP 14, 1532–43 (2014)

    Google Scholar 

  21. Radinsky, K., Davidovich, S., Markovitch, S.: Learning causality from textual data. In: Proceedings of Learning by Reading for Intelligent Question Answering Conference (2011)

    Google Scholar 

  22. Sharma, R., Palshikar, G., Pawar, S.: An unsupervised approach for cause-effect relation extraction from biomedical text. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds.) NLDB 2018. LNCS, vol. 10859, pp. 419–427. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91947-8_43

    Chapter  Google Scholar 

  23. Yin, Y., Jin, Z.: Document sentiment classification based on the word embedding. In: 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raksha Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, R., Palshikar, G. (2021). Virus Causes Flu: Identifying Causality in the Biomedical Domain Using an Ensemble Approach with Target-Specific Semantic Embeddings. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80599-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80598-2

  • Online ISBN: 978-3-030-80599-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics