Skip to main content

On-Line Linear-Time Construction of Word Suffix Trees

  • Conference paper
Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Abstract

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string in D  + , namely, w is a sequence w 1w k of k words in D. The word suffix tree of w w.r.t. D is a path-compressed trie that represents only the k suffixes in the form of w i w k . A typical example of its application is word- and phrase-level search on natural language documents. Andersson et al. proposed an algorithm to build word suffix trees in O(n) expected time with O(k) space. In this paper we present a new word suffix tree construction algorithm with O(n) running time and O(k) space in the worst cases. Our algorithm is on-line, which means that it can sequentially process the characters in the input, each by each, from left to right.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.: Efficient string matching: An aid to bibliographic search. Comm. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Andersson, A., Larsson, N.J., Swanson, K.: Suffix trees on words. Algorithmica 23(3), 246–260 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Apostolico, A.: The myriad virtues of subword trees. Combinatorial Algorithms on Words F12, 85–96 (1985)

    Google Scholar 

  4. Baeza-Yates, R., Gonnet, G.H.: Efficient text searching of regular expressions. In: Ronchi Della Rocca, S., Ausiello, G., Dezani-Ciancaglini, M. (eds.) ICALP 1989. LNCS, vol. 372, pp. 46–62. Springer, Heidelberg (1989)

    Chapter  Google Scholar 

  5. Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. Journal of Bioinformatics and Computational Biology 2(2), 273–288 (2004)

    Article  Google Scholar 

  6. Clifford, R., Sergot, M.: Distributed and paged suffix trees for large genetic databases. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 70–82. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Dorohonceanu, B., Nevill-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proc. 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), pp. 128–133. AAAI Press, Menlo Park (2000)

    Google Scholar 

  8. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  9. Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Inenaga, S., Funamoto, T., Takeda, M., Shinohara, A.: Linear-time off-line text compression by longest-first substitution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 137–152. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Inenaga, S., Kivioja, T., Mäkinen, V.: Finding missing patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 463–474. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Kärkkänen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)

    Google Scholar 

  13. Larsson, N.J.: Extended application of suffix trees to data compression. In: Proc. Data Compression Conference 1996 (DCC 1996), pp. 190–199. IEEE Computer Society, Los Alamitos (1996)

    Chapter  Google Scholar 

  14. Marsan, L., Sagot, M.-F.: Extracting structured motifs using a suffix tree - algorithms and application to promoter consensus identification. In: Proc. 4th Annual International Conference on Computational Molecular Biology (RECOMB 2000), pp. 210–219. ACM, New York (2000)

    Google Scholar 

  15. McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of ACM 23(2), 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  16. Na, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304(1–3), 87–101 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  17. Takeda, M., Miyamoto, S., Kida, T., Shinohara, A., Fukamachi, S., Shinohara, T., Arikawa, S.: Processing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 170–186. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  18. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  19. Weiner, P.: Linear pattern-matching algorithms. In: Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Inenaga, S., Takeda, M. (2006). On-Line Linear-Time Construction of Word Suffix Trees. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_7

Download citation

  • DOI: https://doi.org/10.1007/11780441_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics