Skip to main content

A Practical Alphabet-Partitioning Rank/Select Data Structure

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11811))

Included in the following conference series:

Abstract

This paper proposes a practical implementation of an alphabet-partitioning compressed data structure, which represents a string within compressed space and supports the fundamental operations \(\mathsf {rank}\) and \(\mathsf {select}\) efficiently. We show experimental results that indicate that our implementation outperforms the current realizations of the alphabet-partitioning approach (which is one of the most efficient approaches in practice). In particular, the time for operation \(\mathsf {select}\) can be reduced by about 80%, using only 11% more space than current alphabet-partitioning schemes. We also show the impact of our data structure on several applications, like the intersection of inverted lists (where improvements of up to 60% are achieved, using only 2% of extra space), and the distributed-computation processing of \(\mathsf {rank}\) and \(\mathsf {select}\) operations. As far as we know, this is the first study about the support of \(\mathsf {rank}\)/\(\mathsf {select}\) operations on a distributed-computing environment.

Funded by the Millennium Institute for Foundational Research on Data (IMFD).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://trec.nist.gov/data/million.query07.html.

References

  1. Arroyuelo, D., Gil-Costa, V., González, S., Marin, M., Oyarzún, M.: Distributed search based on self-indexed compressed text. Inf. Process. Manag. 48(5), 819–827 (2012)

    Article  Google Scholar 

  2. Arroyuelo, D., González, S., Marín, M., Oyarzún, M., Suel, T.: To index or not to index: time-space trade-offs in search engines with positional ranking functions. In: Proceedings of 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 255–264 (2012)

    Google Scholar 

  3. Arroyuelo, D., González, S., Oyarzún, M.: Compressed self-indices supporting conjunctive queries on document collections. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 43–54. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16321-0_5

    Chapter  Google Scholar 

  4. Arroyuelo, D., Oyarzún, M., González, S., Sepulveda, V.: Hybrid compression of inverted lists for reordered document collections. Inf. Process. Manag. 54(6), 1308–1324 (2018)

    Article  Google Scholar 

  5. Barbay, J., Claude, F., Gagie, T., Navarro, G., Nekrich, Y.: Efficient fully-compressed sequence representations. Algorithmica 69(1), 232–268 (2014)

    Article  MathSciNet  Google Scholar 

  6. Brodnik, A., Carlsson, S., Demaine, E.D., Ian Ian Munro, J., Sedgewick, R.: Resizable arrays in optimal time and space. In: Dehne, F., Sack, J.-R., Gupta, A., Tamassia, R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 37–48. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48447-7_4

    Chapter  MATH  Google Scholar 

  7. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3(2), 20 (2007)

    Article  MathSciNet  Google Scholar 

  8. Gog, S., Moffat, A., Petri, M.: CSA++: fast pattern search for large alphabets. In: Proceedings of 19th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 73–82 (2017)

    Google Scholar 

  9. Gog, S., Petri, M.: Optimized succinct data structures for massive data. Softw. Pract. Exper. 44(11), 1287–1314 (2014)

    Article  Google Scholar 

  10. Golynski, A., Munro, J.I., Srinivasa Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proceedings of 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 368–373 (2006)

    Google Scholar 

  11. Golynski, A., Raman, R., Srinivasa Rao, S.: On the redundancy of succinct data structures. In: Proceedings of 11th Scandinavian Workshop on Algorithm Theory (SWAT), pp. 148–159 (2008)

    Google Scholar 

  12. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)

    Google Scholar 

  13. Grossi, R., Orlandi, A., Raman, R.: Optimal trade-offs for succinct string indexes. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 678–689. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14165-2_57

    Chapter  Google Scholar 

  14. Joannou, S., Raman, R.: An empirical evaluation of extendible arrays. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 447–458. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20662-7_38

    Chapter  Google Scholar 

  15. Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-62034-6_35

    Chapter  Google Scholar 

  16. Navarro, G.: Wavelet trees for all. J. Discrete Algorithms 25, 2–20 (2014)

    Article  MathSciNet  Google Scholar 

  17. Navarro, G.: Compact Data Structures - A Practical Approach. Cambridge University Press, Cambridge (2016)

    Book  Google Scholar 

  18. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of 9th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 60–70 (2007)

    Google Scholar 

  19. Ottaviano, G., Venturini, R.: Partitioned Elias-Fano indexes. In: Proceedings of 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–282 (2014)

    Google Scholar 

  20. Raman, R., Raman, V., Rao Satti, S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43 (2007)

    Article  MathSciNet  Google Scholar 

  21. Said, A.: Efficient alphabet partitioning algorithms for low-complexity entropy coding. In: Proceedings of 15th Data Compression Conference (DCC), pp. 183–192 (2005)

    Google Scholar 

  22. Turpin, A., Tsegay, Y., Hawking, D., Williams, H.: Fast generation of result snippets in web search. In: Proceedings of 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 127–134 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Arroyuelo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arroyuelo, D., Sepúlveda, E. (2019). A Practical Alphabet-Partitioning Rank/Select Data Structure. In: Brisaboa, N., Puglisi, S. (eds) String Processing and Information Retrieval. SPIRE 2019. Lecture Notes in Computer Science(), vol 11811. Springer, Cham. https://doi.org/10.1007/978-3-030-32686-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32686-9_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32685-2

  • Online ISBN: 978-3-030-32686-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics