Skip to main content

Bridging the Gap – Using External Knowledge Bases for Context-Aware Document Retrieval

  • Conference paper
Digital Libraries: Social Media and Community Networks (ICADL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8279))

Included in the following conference series:

Abstract

Today, a vast amount of information is made available over the Web in the form of unstructured text indexed by Web search engines. But especially for searches on abstract concepts or context terms, a simple keyword-based Web search may compromise retrieval quality, because query terms may or may not directly occur in the texts (vocabulary problem). The respective state-of-the-art solution is query expansion leading to an increase in recall, although it often also leads to a steep decrease of retrieval precision. This decrease however is a severe problem for digital library providers: in libraries it is vital to ensure high quality retrieval meeting current standards. In this paper we present an approach allowing even for abstract context searches (conceptual queries) with high retrieval quality by using Wikipedia to semantically bridge the gap between query terms and textual content. We do not expand queries, but extract the most important terms from each text document in a focused Web collection and then enrich them with features gathered from Wikipedia. These enriched terms are further used to compute the relevance of a document with respect to a conceptual query. The evaluation shows significant improvements over query expansion approaches: the overall retrieval quality is increased up to 74.5% in mean average precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Furnas, G.W., et al.: The vocabulary problem in human-system communication. Communications of the ACM 30(11), 964–971 (1987)

    Article  Google Scholar 

  2. Kraft, R., Zien, J.: Mining anchor text for query refinement. In: Proc. of Int. Conf. on World Wide Web, WWW (2004)

    Google Scholar 

  3. Köhncke, B., Balke, W.-T.: Using Wikipedia categories for compact representations of chemical documents. In: Proc. of Int. Conf. on Information and Knowledge Management, CIKM (2010)

    Google Scholar 

  4. Carpineto, C., Romano, G.: A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Surveys 44(1), 1–50 (2012)

    Article  Google Scholar 

  5. Lüke, T., Schaer, P., Mayr, P.: Improving Retrieval Results with discipline-specific Query Expansion. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds.) TPDL 2012. LNCS, vol. 7489, pp. 408–413. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Xu, J., Croft, W.: Query expansion using local and global document analysis. In: Proc. of Int. Conf. on Research and Development in Information Retrieval, SIGIR (1996)

    Google Scholar 

  7. Jing, Y., Croft, W.: An association thesaurus for information retrieval. In: Proceedings of RIAO, pp. 1–15 (1994)

    Google Scholar 

  8. Cao, G., et al.: Selecting good expansion terms for pseudo-relevance feedback. In: Proc. of Int. Conf. on Research and Development in Information Retrieval, SIGIR (2008)

    Google Scholar 

  9. Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proc. of Int. Conf. on Research and Development in Information Retrieval, SIGIR (2007)

    Google Scholar 

  10. Cui, H., Wen, J., Nie, J., Ma, W.: Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering 15(4), 829–839 (2003)

    Article  Google Scholar 

  11. Wang, X., Zhai, C.X.: Mining term association patterns from search logs for effective query reformulation. In: Proc. of Int. Conf. on Information and Knowledge Management, CIKM (2008)

    Google Scholar 

  12. Gao, J., Nie, J.: Towards Concept-Based Translation Models Using Search Logs for Query Expansion. In: Proc. of Int. Conf. on Inf. and Knowledge Management, CIKM (2012)

    Google Scholar 

  13. Deerwester, S., et al.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1998)

    Article  Google Scholar 

  14. Hu, J., Wang, G., Lochovsky, F., Sun, J., Chen, Z.: Understanding user’s query intent with wikipedia. In: Proc. of Int. Conf. on World Wide Web, WWW (2009)

    Google Scholar 

  15. Xu, Y., et al.: Query dependent pseudo-relevance feedback based on wikipedia. In: Proc. of Int. Conf. on Research and Development in Information Retrieval, SIGIR (2009)

    Google Scholar 

  16. Bendersky, M., et al.: Effective query formulation with multiple information sources. In: Proc. of Int. Conf. on Web Search and Data Mining, WSDM (2012)

    Google Scholar 

  17. Milne, D., et al.: A knowledge-based search engine powered by wikipedia. In: Proc. of Int. Conf. on Information and Knowledge Management, CIKM (2007)

    Google Scholar 

  18. Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business. Economies, Societies and Nations (2004)

    Google Scholar 

  19. Milne, D., Witten, I.H.: An open-source toolkit for mining Wikipedia. Artificial Intelligence 194, 222–239 (2012)

    Article  MathSciNet  Google Scholar 

  20. Köhncke, B., Balke, W.-T.: Context-Sensitive Ranking Using Cross-Domain Knowledge for Chemical Digital Libraries. In: Aalberg, T., Papatheodorou, C., Dobreva, M., Tsakonas, G., Farrugia, C.J. (eds.) TPDL 2013. LNCS, vol. 8092, pp. 285–296. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  21. Sahlgren, M.: An Introduction to Random Indexing. In: Proc. of the Methods and Applications of Semantic Indexing Workshop (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Köhncke, B., Siehndel, P., Balke, WT. (2013). Bridging the Gap – Using External Knowledge Bases for Context-Aware Document Retrieval. In: Urs, S.R., Na, JC., Buchanan, G. (eds) Digital Libraries: Social Media and Community Networks. ICADL 2013. Lecture Notes in Computer Science, vol 8279. Springer, Cham. https://doi.org/10.1007/978-3-319-03599-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03599-4_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03598-7

  • Online ISBN: 978-3-319-03599-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics