Skip to main content

Indexing Highly Repetitive Collections

  • Conference paper
Combinatorial Algorithms (IWOCA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7643))

Included in the following conference series:

Abstract

The need to index and search huge highly repetitive sequence collections is rapidly arising in various fields, including computational biology, software repositories, versioned collections, and others. In this short survey we briefly describe the progress made along three research lines to address the problem: compressed suffix arrays, grammar compressed indexes, and Lempel-Ziv compressed indexes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abeliuk, A., Navarro, G.: Compressed Suffix Trees for Repetitive Texts. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 30–41. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  2. Bille, P., Landau, G., Raman, R., Sadakane, K., Rao Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)

    Google Scholar 

  3. Chan, T., Larsen, K., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Proc. 27th SoCG, pp. 1–10 (2011)

    Google Scholar 

  4. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., Sahai, A., Shelat, A.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: Proc. 34th STOC, pp. 792–801 (2002)

    Google Scholar 

  5. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theo. 51(7), 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  6. Claude, F., Fariña, A., Martínez-Prieto, M., Navarro, G.: Compressed q-gram indexing for highly repetitive biological sequences. In: Proc. 10th BIBE, pp. 86–91 (2010)

    Google Scholar 

  7. Claude, F., Fariña, A., Martínez-Prieto, M., Navarro, G.: Indexes for highly repetitive document collections. In: Proc. 20th CIKM, pp. 463–468 (2011)

    Google Scholar 

  8. Claude, F., Navarro, G.: Improved Grammar-Based Compressed Indexes. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 180–192. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Do, H.-H., Jansson, J., Sadakane, K., Sung, W.-K.: Fast relative Lempel-Ziv self-index for similar sequences. In: Proc. FAW-AAIM, pp. 291–302 (2012)

    Google Scholar 

  10. Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comp. Sci. 410(51), 5354–5364 (2009)

    Article  MATH  Google Scholar 

  11. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A Faster Grammar-Based Self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  12. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)

    Google Scholar 

  13. Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comp. 35(2), 378–407 (2006)

    Article  MathSciNet  Google Scholar 

  14. Huang, S., Lam, T.W., Sung, W.K., Tam, S.L., Yiu, S.M.: Indexing Similar DNA Sequences. In: Chen, B. (ed.) AAIM 2010. LNCS, vol. 6124, pp. 180–190. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  15. Kärkkäinen, J.: Repetition-Based Text Indexing. PhD thesis, Dept of Comp. Sci., Univ. of Helsinki, Finland (1999)

    Google Scholar 

  16. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comp. Sci. (to appear, 2012); Earlier versions in Proc. DCC 2010 and Proc. CPM 2011

    Google Scholar 

  17. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comp. Biol. 17(3), 281–308 (2010)

    Article  Google Scholar 

  18. Manber, U., Myers, E.: Suffix arrays: a new method for on-line string searches. SIAM J. Comp., 935–948 (1993)

    Google Scholar 

  19. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  20. Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  21. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), article 2 (2007)

    Google Scholar 

  22. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theo. Comp. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Alg. 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theo. 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  25. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theo. 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Navarro, G. (2012). Indexing Highly Repetitive Collections. In: Arumugam, S., Smyth, W.F. (eds) Combinatorial Algorithms. IWOCA 2012. Lecture Notes in Computer Science, vol 7643. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35926-2_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35926-2_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35925-5

  • Online ISBN: 978-3-642-35926-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics