Skip to main content

Representing Context Information for Document Retrieval

  • Conference paper
Flexible Query Answering Systems (FQAS 2009)

Abstract

The bag of words representation (BoW), which is widely used in information retrieval (IR), represents documents and queries as word lists that do not express anything about context information. When we look for information, we find that not everything is explicitly stated in a document, so context information is needed to understand its content. This paper proposes the use of bag of concepts (BoC) and Holographic reduced representation (HRR) in IR. These representations go beyond BoW by incorporating context information to document representations. Both HRR and BoC are produced using a vector space methodology known as Random Indexing, and allow expressing additional knowledge from different sources. Our experiments have shown the feasibility of the representations and improved the mean average precision by up to 7% when they are compared with the traditional vector space model.

The first author was supported by scholarship 217251/ 208265, second author by scholarship 165545 granted by CONACYT, while the third, fifth and sixth author were partially supported by SNI, Mexico.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sahlgren, M.: An Introduction to Random Indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th Int. Conf. on Terminology and Knowledge Engineering (2005)

    Google Scholar 

  2. Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Procs. of the 20th Int. Conf. on Computational Linguistics, pp. 487–493 (2004)

    Google Scholar 

  3. Plate, T.A.: Holographic Reduced Representation: Distributed representation for cognitive structures. CSLI Publications, Stanford (2003)

    Google Scholar 

  4. Grinberg, D., Lafferty, J., Sleator, D.: A Robust Parsing Algorithm for Link Grammars, Carnegie Mellon University, Computer Science, Technical Report CMU-CS-95-125 (1995)

    Google Scholar 

  5. Liu, H.: MontyLingua: An end-to-end natural language processor with common sense (2004), http://web.media.mit.edu/~hugo/montylingua

  6. Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An Analysis of Statistical and Syntactic Phrases. In: Procs. of RIAO 1997, 5th Int. Conf., pp. 200–214 (1997)

    Google Scholar 

  7. Evans, D., Zhai, C.: Noun-phrase Analysis in Unrestricted Text for Information Retrieval. In: Procs. of the 34th Annual Meeting on ACL, pp. 17–24 (1996)

    Google Scholar 

  8. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  9. Fishbein, J.M., Eliasmith, C.: Integrating structure and meaning: A new method for encoding structure for text classification. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 514–521. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  10. Cross-lingual evaluation forum (May 2009), http://www.clef-campaign.org/

  11. Mandl, T., Carvalho, P., Gey, F., Larson, R., Santos, D., Womser-Hacker, C., Di Nunzio, G., Ferro, N.: Geoclef 2008: the CLEF 2008 Cross Language Geographic Information Retrieval Track Overview. In: Working notes for the Workshop, Denmark (2008)

    Google Scholar 

  12. Henrich, A., Luedecke, V.: Characteristics of Geographic Information needs. In: Procs. of Workshop on Geographic Information Retrieval, Lisbon, Portugal. ACM Press, New York (2007)

    Google Scholar 

  13. Andrade, L., Silva, M.J.: Relevance ranking for geographic IR. In: Procs. of 3rd Workshop on Geographic Information Retrieval, SIGIR 2006, Seattle, USA. ACM Press, New York (2006)

    Google Scholar 

  14. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the ASIS 41, 391–407 (1990)

    Google Scholar 

  15. Hofmann, T.: Probabilistic latent semantic indexing. In: Procs. of the 22st Annual International ACM SIGIR Conf. on R&D in Information Retrieval (SIGIR 1999), Berkeley, CA, pp. 50–57. ACM, New York (1999)

    Chapter  Google Scholar 

  16. Wong, S.K.M., Ziarko, W., Raghavan, V.V., Wong, P.C.N.: On modeling of information retrieval concepts in vector spaces. ACM Trans. on Database Systems 12, 299–321 (1987)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carrillo, M., Villatoro-Tello, E., López-López, A., Eliasmith, C., Montes-y-Gómez, M., Villaseñor-Pineda, L. (2009). Representing Context Information for Document Retrieval. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 2009. Lecture Notes in Computer Science(), vol 5822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04957-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04957-6_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04956-9

  • Online ISBN: 978-3-642-04957-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics