Skip to main content

Fingerprints in Compressed Strings

  • Conference paper
Algorithms and Data Structures (WADS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8037))

Included in the following conference series:

Abstract

The Karp-Rabin fingerprint of a string is a type of hash value that due to its strong properties has been used in many string algorithms. In this paper we show how to construct a data structure for a string S of size N compressed by a context-free grammar of size n that answers fingerprint queries. That is, given indices i and j, the answer to a query is the fingerprint of the substring S[i,j]. We present the first O(n) space data structures that answer fingerprint queries without decompressing any characters. For Straight Line Programs (SLP) we get O(logN) query time, and for Linear SLPs (an SLP derivative that captures LZ78 compression and its variations) we get O(loglogN) query time. Hence, our data structures has the same time and space complexity as for random access in SLPs. We utilize the fingerprint data structures to solve the longest common extension problem in query time O(logNlogℓ) and O(logℓloglogℓ + loglogN) for SLPs and Linear SLPs, respectively. Here, ℓ denotes the length of the LCE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Amir, A., Farach, M., Matias, Y.: Efficient randomized dictionary matching algorithms. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 262–275. Springer, Heidelberg (1992)

    Chapter  Google Scholar 

  3. Andoni, A., Indyk, P.: Efficient algorithms for substring near neighbor problem. In: Proc. 17th SODA, pp. 1203–1212 (2006)

    Google Scholar 

  4. Belazzougui, D., Boldi, P., Vigna, S.: Predecessor search with distance-sensitive query time. arXiv:1209.5441 (2012)

    Google Scholar 

  5. Bender, M., Farach-Colton, M.: The level ancestor problem simplified. Theoret. Comput. Sci. 321, 5–12 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  6. Berkman, O., Vishkin, U.: Finding level-ancestors in trees. J. Comput. System Sci. 48(2), 214–230 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  7. Bille, P., Gørtz, I.L., Sach, B., Vildhøj, H.W.: Time-space trade-offs for longest common extensions. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 293–305. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Bille, P., Landau, G., Raman, R., Sadakane, K., Satti, S., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389 (2011)

    Google Scholar 

  9. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  10. Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)

    MathSciNet  MATH  Google Scholar 

  11. Cole, R., Hariharan, R.: Faster suffix tree construction with missing suffix links. SIAM J. Comput. 33(1), 26–42 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Cormode, G., Muthukrishnan, S.: Substring compression problems. In: Proc. 16th SODA, pp. 321–330 (2005)

    Google Scholar 

  13. Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(1), 2 (2007)

    Article  MathSciNet  Google Scholar 

  14. Dietz, P.F.: Finding level-ancestors in dynamic trees. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 32–40. Springer, Heidelberg (1991)

    Chapter  Google Scholar 

  15. Farach, M., Thorup, M.: String matching in Lempel–Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  16. Gąsieniec, L., Karpinski, M., Plandowski, W., Rytter, W.: Randomized efficient algorithms for compressed strings: The finger-print approach. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 39–49. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  17. Gąsieniec, L., Kolpakov, R., Potapov, I., Sant, P.: Real-time traversal in grammar-based compressed files. In: Proc. 15th DCC, p. 458 (2005)

    Google Scholar 

  18. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  19. Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)

    Google Scholar 

  20. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  21. Mehlhorn, K., Näher, S.: Bounded ordered dictionaries in O(loglogN) time and O(n) space. Inform. Process. Lett. 35(4), 183–189 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  22. Porat, B., Porat, E.: Exact and approximate pattern matching in the streaming model. In: Proc. 50th FOCS, pp. 315–323 (2009)

    Google Scholar 

  23. Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302(1), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. van Emde Boas, P., Kaas, R., Zijlstra, E.: Design and implementation of an efficient priority queue. Theory Comput. Syst. 10(1), 99–127 (1976)

    Google Scholar 

  25. Willard, D.: Log-logarithmic worst-case range queries are possible in space Θ(N). Inform. Process. Lett. 17(2), 81–84 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  26. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  27. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bille, P., Cording, P.H., Gørtz, I.L., Sach, B., Vildhøj, H.W., Vind, S. (2013). Fingerprints in Compressed Strings. In: Dehne, F., Solis-Oba, R., Sack, JR. (eds) Algorithms and Data Structures. WADS 2013. Lecture Notes in Computer Science, vol 8037. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40104-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40104-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40103-9

  • Online ISBN: 978-3-642-40104-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics