Skip to main content

On Inverted Index Compression for Search Engine Efficiency

  • Conference paper
Advances in Information Retrieval (ECIR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8416))

Included in the following conference series:

Abstract

Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression schemes across different types of posting information (document ids, frequencies, positions). In this paper, we experiment with different modern integer compression algorithms, integrating these into a modern IR system. Through comprehensive experiments conducted on two large, widely used document corpora and large query sets, our results show the benefit of compression for different types of posting information to the space- and time-efficiency of the search engine. Overall, we find that the simple Frame of Reference compression scheme results in the best query response times for all types of posting information. Moreover, we observe that the frequency and position posting information in Web corpora that have large volumes of anchor text are more challenging to compress, yet compression is beneficial in reducing average query response times.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proc. WSDM 2009 (2009)

    Google Scholar 

  2. Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impact scores. In: Proc. SIGIR 2006 (2006)

    Google Scholar 

  3. Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10 (2007)

    Google Scholar 

  4. Broccolo, D., Macdonald, C., Orlando, S., Ounis, I., Perego, R., Tonellotto, N.: Load-sensitive selective pruning for distributed search. In: Proc. CIKM 2013 (2013)

    Google Scholar 

  5. Witten, I.H., Bell, T.C., Moffat, A.: Managing Gigabytes: Compressing and Indexing Documents and Images, 1st edn. (1994)

    Google Scholar 

  6. Elias, P.: Universal codeword sets and representations of the integers. Trans. Info. Theory 21(2) (1975)

    Google Scholar 

  7. Yan, H., Ding, S., Suel, T.: Compressing term positions in web indexes. In: Proc. SIGIR 2009 (2009)

    Google Scholar 

  8. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proc. ICDE 2006 (2006)

    Google Scholar 

  9. Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. WWW 2009 (2009)

    Google Scholar 

  10. Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Software: Practice and Experience (2013)

    Google Scholar 

  11. Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical Report 2010-12-16, DERI (2010)

    Google Scholar 

  12. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable IR Platform. In: Proc. OSIR 2006 (2006)

    Google Scholar 

  13. Wang, L., Lin, J., Metzler, D.: Learning to efficiently rank. In: Proc. SIGIR 2010 (2010)

    Google Scholar 

  14. Shurman, E., Brutlag, J.: Performance related changes and their user impacts. In: Velocity: Web Performance and Operations Conference (2009)

    Google Scholar 

  15. Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proc. CIKM 2003 (2003)

    Google Scholar 

  16. Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: Proc. WSDM 2010 (2010)

    Google Scholar 

  17. Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: Proc. SIGIR 2011 (2011)

    Google Scholar 

  18. Macdonald, C., Santos, R.L., Ounis, I., He, B.: About learning models with multiple query dependent features. Trans. Info. Sys. 13(3) (2013)

    Google Scholar 

  19. Williams, H.E., Zobel, J.: Compressing integers for fast file access. The Computer Journal 42 (1999)

    Google Scholar 

  20. Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. SIGIR 2002 (2002)

    Google Scholar 

  21. Golomb, S.: Run-length encodings. Trans. Infor. Theory 12(3) (1966)

    Google Scholar 

  22. Rice, R., Plaunt, J.: Adaptive variable-length coding for efficient compression of spacecraft television data. Trans. Communication Technology 19(6) (1971)

    Google Scholar 

  23. Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1) (2005)

    Google Scholar 

  24. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proc. ICDE 1998 (1998)

    Google Scholar 

  25. Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. WWW 2008 (2008)

    Google Scholar 

  26. Zhang, J., Suel, T.: Efficient search in large textual collections with redundancy. In: Proc. WWW 2007 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Catena, M., Macdonald, C., Ounis, I. (2014). On Inverted Index Compression for Search Engine Efficiency. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06028-6_30

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06027-9

  • Online ISBN: 978-3-319-06028-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics