Abstract
The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the approximate length of each indexed document to be computed analytically.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press, Boston (2010)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perego, R., Silvestri, F., Tonellotto, N. (2011). Representing Document Lengths with Identifiers. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_66
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)