Skip to main content

Probabilistic analysis of generalized suffix trees

Extended abstract

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 644))

Included in the following conference series:

Abstract

Suffix trees find several applications in computer science and telecommunications, most notably in algorithms on strings, data compressions and codes. We consider in a probabilistic framework a family of generalized suffix trees — called b-suffix trees — built from the first n suffixes of a random word. In this family of trees, a noncompact suffix trees (i.e., such that every edge is labeled by a single symbol) is represented by b= 1, and a compact suffix tree (i.e., without unary nodes) is asymptotically equivalent to b → ∂. Several parameters of b-suffix trees are of interest, namely the typical depth, the depth of insertion, the height, the external path length, and so forth. We establish some results concerning typical, that is, almost sure (a.s.), behavior of these parameters. These findings are used to obtain several insights into certain algorithms on words and universal data compression schemes.

This research was supported in part by NSF Grants CCR-8900305 and INT-8912631, and AFOSR Grant 90-0107, NATO Grant 0057/89, and Grant R01 LM05118 from the National Library of Medicine

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley (1974).

    Google Scholar 

  2. A. Apostolico, The Myriad Virtues of Suffix Trees, Combinatorial Algorithms on Words, 85–96, Springer-Verlag, ASI F12 (1985).

    Google Scholar 

  3. A. Apostolico and W. Szpankowski, Self-alignments in Words and Their Applications, J. of Algorithms, 13 (1992), in press.

    Google Scholar 

  4. P. Billingsley, Ergodic Theory and Information, John Wiley & Sons, New York 1965.

    Google Scholar 

  5. A. Blumer, A. Ehrenfeucht and D. Haussler, Average Size of Suffix Trees and DAWGS, Discrete Applied Mathematics, 24, 37–45 (1989).

    Google Scholar 

  6. W. Chang, E. Lawler, Approximate String Matching in Sublinear Expected Time, Proc. of 1990 FOCS, 116–124 (1990).

    Google Scholar 

  7. L. Devroye, W. Szpankowski and B. Rais, A note of the height of suffix trees, SIAM J. Computing 21, 48–54 (1992).

    Google Scholar 

  8. Z. Galil, K. Park, An Improved Algorithm for Approximate String Matching, SIAM J. Computing, 19, 989–999 (1990)

    Google Scholar 

  9. G.H. Gonnet and R. Baeza-Yates, Handbook of Algorithms and Data Structures, Addison-Wesley, Workingham (1991).

    Google Scholar 

  10. P. Grassberger, Estimating the Information Content of Symbol Sequences and Efficient Codes, IEEE Trans. Information Theory, 35, 669–675 (1991).

    Google Scholar 

  11. L. Guibas and A. W. Odlyzko, String Overlaps, Pattern Matching, and Nontransitive Games, Journal of Combinatorial Theory, Series A, 30, 183–208 (1981).

    Google Scholar 

  12. P. Jacquet and W. Szpankowski, Analysis of Digital Tries with Markovian Dependency, IEEE Trans. Information Theory, 37, 1470–1475 (1991).

    Google Scholar 

  13. P. Jacquet and W. Szpankowski, Autocorrelation on Words and Its Applications. Analysis of Suffix Tree by String-Ruler Approach, INRIA TR-1106 (1989); also submitted to a journal.

    Google Scholar 

  14. D. Knuth, The Art of Computer Programming. Sorting and Searching, Addison-Wesley (1973).

    Google Scholar 

  15. G.M. Landau and U. Vishkin, Fast String Matching with k Differences, J. Comp. Sys. Sci., 37, 63–78 (1988)

    Google Scholar 

  16. G.M. Landau and U. Vishkin Fast Parallel and Serial Approximate String Matching, J. Algorithms, 10, 157–169 (1989).

    Google Scholar 

  17. A. Lempel and J. Ziv, On the Complexity of Finite Sequences, IEEE Information Theory 22, 1, 75–81 (1976).

    Google Scholar 

  18. M. Lothaire, Combinatorics on Words, Addison-Wesley (1982)

    Google Scholar 

  19. E.M. McCreight, A Space Economical Suffix Tree Construction Algorithm, JACM, 23, 262–272 (1976).

    Google Scholar 

  20. B. Pittel, Asymptotic growth of a class of random trees, The Annals of Probability, 18, 414–427 (1985).

    Google Scholar 

  21. M. Rodeh, V. Pratt and S. Even, Linear Algorithm for Data Compression via String Matching, Journal of the ACM, 28, 16–24 (1981).

    Google Scholar 

  22. W. Szpankowski, On the Height of Digital Trees and Related Problems, Algorithmica, 6, 256–277 (1991).

    Google Scholar 

  23. W. Szpankowski, Patricia tries again revisited, Journal of the ACM, 37, 691–711 (1991).

    Google Scholar 

  24. W. Szpankowski, A Typical Behavior of Some Data Compression Schemes, Proc. of Data Compression Conference, pp. 247–256, Snowbirds (1991).

    Google Scholar 

  25. W. Szpankowski, (Un) Expected Behavior of Typical Suffix Trees, Proc. Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 422–431, Orlando 1992.

    Google Scholar 

  26. W. Szpankowski, Suffix Trees Revisited: (Un)Expected Asymptotic Behaviors, Purdue University, CSD-TR-91-063 (1991).

    Google Scholar 

  27. P. Weiner, Linear Pattern Matching Algorithms, Proc. of the 14-th Annual Symposium on Switching and Automata Theory, 111 (1973).

    Google Scholar 

  28. U. Vishkin, Deterministic Sampling — A New Technique for fast Pattern Matching, SIAM J. Computing, 20, 22–40 (1991).

    Google Scholar 

  29. A. Wyner and J. Ziv, Some Asymptotic Properties of the Entropy of a Stationary Ergodic Data Source with Applications to Data Compression, IEEE Trans. Information Theory, 35, 1250–1258 (1989).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alberto Apostolico Maxime Crochemore Zvi Galil Udi Manber

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szpankowski, W. (1992). Probabilistic analysis of generalized suffix trees. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds) Combinatorial Pattern Matching. CPM 1992. Lecture Notes in Computer Science, vol 644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56024-6_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-56024-6_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56024-1

  • Online ISBN: 978-3-540-47357-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics