Skip to main content
Log in

Identification of concept domains and its application in biomedical information retrieval

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

With the explosive growth of biomedical information volumes, there is obviously an increasing need for developing effective and efficient tools for indexing and retrieval. Automatic indexing and retrieval in the biomedical domain is faced with several challenges such as recognition of terms denoting concepts and term disambiguation. In this paper, we are interested in identifying (sub-)domains of concepts in ontologies. We propose two algorithms for identifying the most appropriate (sub-)domain of a concept in the context of a document/query. We integrate these methods into a semantic indexing and retrieval framework. The experimental evaluation carried out on the OHSUMED collection shows that our approaches of semantic indexing and retrieval outperform the state-of-the-art approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. In MeSH, a main heading is the preferred term denoting concepts, which can be also referred to as concept name.

  2. http://www.cismef.org/.

  3. http://sourceforge.net/projects/irtoolkit/.

  4. http://sourceforge.net/projects/cxtractor/.

  5. http://www.just-the-word.com/.

  6. http://www.natcorp.ox.ac.uk/.

  7. http://www.sketchengine.co.uk/.

  8. http://www.senseval.org/.

  9. http://biocreative.sourceforge.net/.

References

  • Agirre E, Edmonds P (2007) Word sense disambigution: algorithms and applications. Springer, Berlin

    Book  Google Scholar 

  • Agirre E, Rigau G (1996) Word sense disambiguation using conceptual density. In International Conference on Computational Linguistics (COLING), pp 16–22

  • Ananiadou S (1994) A methodology for automatic term recognition. In International Conference on Computational Linguistics (COLING), pp 1034–1038

  • Andreopoulos B, Dimitra A, Schroeder M (2008) Word sense disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering. Int J Data Min Bioinform 2(3):193–215

    Article  Google Scholar 

  • Aronson AR (2001) Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. American Medical Informatics Association symposium (AMIA), pp 17–21

  • Baeza-Yates R, Ribeiro-Neto B (2005) Modern information retrieval. Addison Wesley, Harlow

    Google Scholar 

  • Bharat K, Henzinger MR (1998) Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, pp 104–111, New York, NY, USA, ACM

  • Bruce R, Wiebe J (1994) Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp 139–146

  • Buitelaar P, Magnini B, Strapparava C, Vossen P (2007) Domain specific word sense disambiguation, chapter 10. In Word sense disambiguation: algorithms and applications, pp 275–298

  • Cai L, Hofmann T (2004) Hierarchical document categorization with support vector machines. In Conference on Information and Knowledge Management (CIKM), pp 78–87

  • Chih-Hao T (2000) MMSEG: a word identification system for mandarin chinese text based on two variants of the maximum matching algorithm. In Technical Report,

  • Collier N, Nobata C, Tsujii J (2000) Extracting the names of genes and gene products with a hidden markov model. In International Conference on Computational Linguistics (COLING), pp 201–206

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  • Daoud M (2009) Accs personnalis l’information : approche base sur l’utilisation d’un profil utilisateur smantique driv d’une ontologie de domaines travers l’historique des sessions de recherche. Phd thesis, Paul Sabatier University, Toulouse, France

  • Dinh D (2012) Accès à l’information biomédicale : vers une approche d’indexation et de recherche d’information conceptuelle basée sur la fusion de ressources termino-ontologiques. Phd thesis, Paul Sabatier University, Toulouse, France, September

  • Dinh D, Tamine L (2012) Towards a context sensitive approach to searching information based on domain specific knowledge sources. J Web Semant 12:41–52

    Article  Google Scholar 

  • Dinh D, Tamine L, Boubekeur F (2013) Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies. J Artif Intell Med 57(2):155–167

    Article  Google Scholar 

  • Dinh D, Tamine L (2010) Sense-Based Biomedical Indexing and Retrieval. In NLDB, pp 24–35

  • Dinh D, Tamine L (2011) Biomedical concept extraction based on combining the content-based and word order similarities. In Proceedings of Symposium on Applied Computing (SAC), pp 1159–1163

  • Dinh D, Tamine L (2011) Combining global and local semantic contexts for improving biomedical information retrieval. In Proceedings of European Conference on Information Retrieval (ECIR), pp 375–386

  • Francis WN, Kucera H (1979) Brown corpus manual. Technical report, department of linguistics. Brown University, Providence, Rhode Island

    Google Scholar 

  • Gale WA, Church KW, Yarowsky D (1992) One sense per discourse. In HLT ’91: Proceedings of the workshop on Speech and Natural Language, pp 233–237

  • Gaudan S, Kirsch H, Rebholz-Schuhmann D (2005) Resolving abbreviations to their senses in medline. Bioinformatics 21(18):3658–3664

    Article  Google Scholar 

  • Gliozzo A, Magnini B, Strapparava C (2004) Usupervised domain relevance estimation for word sense disambiguation. In Conference on empirical methods in natural language processing (EMNLP), pp 380–387

  • Hersh W, Buckley C, Leone TJ, Hickam D (1994) OHSUMED: an interactive retrieval evaluation and new large test collection for research. In ACM Special Interest Group on Information Retrieval (SIGIR), pp 192–201

  • Hirschman L, Morgan AA (2002) Rutabaga by any other name. J Biomed Inform 35:247–259

    Article  Google Scholar 

  • Humphrey SM, Rogers WJ, Kilicoglu H, Demner-Fushman D, Rindflesch TC (2006) Word sense disambiguation by selecting the best semantic type based on journal descriptor indexing: preliminary experiment. J Am Soc Inf Sci Technol 57(1):96–113

    Article  Google Scholar 

  • Jiang J, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In International Conference on Research in Computational Linguistics (ROC), pp 19–33

  • Joshi M, Pedersen T, Maclin R (2005) A comparative study of support vector machines applied to the word sense disambiguation problem for the medical domain. In Indian International Conference on Artificial Intelligence (IICAI), pp 3449–3468

  • Kang BY, Kim DW, Lee SJ (2005) Exploiting concept clusters for content-based information retrieval. Inf Sci-Informa Comput Sci 170(2–4):443–462

    Google Scholar 

  • Kim W, Aronson AR, Wilbur WJ (2001) Automatic MeSH term assignment and quality assessment. American Medical Informatics Association symposium (AMIA), pp 319–323

  • Krauthammer M, Nenadic G (2004) Term identification in the biomedical literature. J Biomed Inform 37:512–528

    Article  Google Scholar 

  • Krovetz R, Croft WB (1992) Lexical ambiguity and information retrieval. ACM Trans Inf Syst 10(2):115–141

    Article  Google Scholar 

  • Laecock C, Chodrow M (1998) Combining local context and wordnet similarity for word sense identification. Wordnet: An electronic lexical dataset, pp 265–283

  • Leacock C, Chodorow M (1998) Combining local context and wordnet similarity for word sense identification. An Electronic Lexical Database, pp 265–283

  • Leacock C, Towell G, Voorhees E (1993) Corpus-based statistical sense resolution. In Proceedings of the workshop on Human Language Technology, HLT ’93, pp 260–265. Association for Computational Linguistics

  • Leroy G, Rindflesch CT (2005) Effects of information and machine learning algorithms on word sense disambiguation with small datasets. Medical Informatics, pp 573–585

  • Liu H, Teller V, Friedman C (2004) A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc 11(4):320–331

    Article  Google Scholar 

  • Lou B (1995) Users reference guide British national corpus version 1.0. Oxford University Computing Services, UK

    Google Scholar 

  • Magnini B, Strapparava C, Pezzulo G, Gliozzo A (2002) The role of domain information in word sense disambiguation. Nat Lang Eng 8:359–373

    Article  Google Scholar 

  • Mallery JC (1988) Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers

  • Maynard D (2000) Term Recognition using Combined Knowledge Sources. PhD thesis, Manchester Metropolitan University, UK

  • McInnes BT, Pedersen T, Carlis J (2007) Using UMLS concept unique identifiers (CUIs) for word sense disambiguation in the biomedical domain. pp 746–750

  • Mihalcea R, Tarau P, Figa E (2004) PageRank on semantic networks with application to word sense disambiguation. In International Conference on Computational Linguistics (COLING), pp 1126–1132

  • Miller GA, Leacock C, Tengi R, Bunker RT (1993) A semantic concordance. In Proceedings of the workshop on Human Language Technology, HLT ’93, pp 303–308. Association for Computational Linguistics

  • Morris J, Hirst G (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Comput Linguist 1:21–48

    Google Scholar 

  • Navigli R (2009) Word sense disambiguation: a survey. ACM Computing surveys, 41

  • Navigli R (2008) A structural approach to the automatic adjudication of word sense disagreements. Nat Lang Eng 14:547–573

    Article  Google Scholar 

  • Navigli R, Lapata M (2007) Graph connectivity measures for unsupervised for unsupervised word sense disambiguation. In International Joint Conference on Artificial Intelligence (IJCAI), pp 1683–1688

  • Névéol A, Rogozan A, Darmoni SJ (2006) Automatic indexing of online health resources for a French quality controlled gateway. Inf Process and Manag 42(3):695–709

    Article  Google Scholar 

  • Névéol A, Shooshan SE, Humphrey SM, Rindflesch TC, Aronson AR (2007) Multiple Approaches to Fine-Grained Indexing of the Biomedical Literature. In Pacific Symposium on Biocomputing, pp 292–303

  • Niwa Y, Yoshihiko N (1994) Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of the 15th conference on Computational linguistics - Volume 1, COLING ’94, pp 304–309. Association for Computational Linguistics

  • Nobata C, Collier N, Tsujii J (1999) Automatic term identification and classification in biomedical texts. In Natural Language Pacific Rim Symposium, pp 369–374

  • Pedersen T, Pakhomov VS, Patwardhan S, Chute C (2006) Measures of semantic similarity and relatdness in the biomedical domain. J Biomed Inform 40:288–299

    Article  Google Scholar 

  • Pereira S, Neveol A, Kerdelhué G, Serrot E, Joubert M, Darmoni SJ (2008) Using multi-terminology indexing for the assignment of MeSH descriptors to health resources in a French online catalogue. American Medical Informatics Association symposium (AMIA), pp 586–590

  • Rada R, Mili H, Bickneli E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19:17–30

    Article  Google Scholar 

  • Robertson SE, Walker S, Hancock-Beaulieu M (1998) Okapi at TREC-7: automatic Ad Hoc, filtering, VLC and interactive. In Text REtrieval Conference (TREC), pp 199–210

  • Ruch P (2006) Automatic assignment of biomedical categories: toward a generic approach. Bioinformatics 22(6):658–664

    Article  Google Scholar 

  • Saif M, Pedersen T (2004) Combining lexical and syntactic features for supervised word sense disambiguation. In Conference on Computational Natural Language Learning (CoNLL), pp 25–32

  • Sinha R, Mihalcea R (2007) Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In International Conference on Semantic Computing (ICSC), pp 363–369

  • Spasic I, Nenadic G, Ananiadou S (2003) Using domain-specific verbs for term classification. In Natural Language Processing in Biomedicine, ACL, pp 17–24

  • Srinivasan P (1996) Retrieval feedback in medline. J Am Med Inf Assoc 3:157–167

    Article  Google Scholar 

  • Stevenson M, Guo Y, Gaizauskas R, Martinez D (2008) Knowledge sources for word sense disambiguation of biomedical text. In Natural Language Processing for Biology (BioNLP), pp 80–87

  • Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In International Conference on Information and Knowledge Base Management, pp 67–74

  • Trieschnigg D (2010) Proof of concept: concept-based biomedical information retrieval. Phd thesis, University of Twente

  • Tsatsaronis G, Vazirgiannis M, Androutsopoulos I (2007) Word sense disambiguation with spreading activation networks generated from thesauri. In International Joint Conference on Artificial Intelligence (IJCAI), pp 1725–1730

  • Vijai G, Brandt C (2012) Semantic similarity in the biomedical domain: an evaluation across knowledge sources. J BMC Bioinform, 13

  • Wagner C (2006) Breaking the knowledge acquisition bottleneck through conversational knowledge management. Inf Res Manag J

  • Xu J, Croft WB (2000) Improving the effectiveness of information retrieval with local context analysis. ACM Trans Inf Syst 18(1):79–112

    Article  Google Scholar 

  • Zhou X, Zhang X, Hu X (2006) Using concept-based indexing to improve language modeling approach to genomic IR. In European Colloquium on Information Retrieval (ECIR), pp 444–455

  • Zhou X, Zhang X, Hu X (2007) Dragon toolkit: incorporating auto-learned semantic knowledge into large-scale text retrieval and mining. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence-Volume 02, ICTAI ’07, pp 197–201, Washington, DC, USA, IEEE Computer Society

  • Zhou X, Zhang X, Hu X (2007) Dragon toolkit: incorporating auto-learned semantic knowledge into large-scale text retrieval and mining. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duy Dinh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dinh, D., Tamine, L. Identification of concept domains and its application in biomedical information retrieval. Inf Syst E-Bus Manage 13, 647–672 (2015). https://doi.org/10.1007/s10257-014-0259-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-014-0259-y

Keywords

Navigation