On-Line Linear-Time Construction of Word Suffix Trees

Inenaga, Shunsuke; Takeda, Masayuki

doi:10.1007/11780441_7

Shunsuke Inenaga^18,19 &
Masayuki Takeda^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

941 Accesses
11 Citations

Abstract

Suffix trees are the key data structure for text string matching, and are used in wide application areas such as bioinformatics and data compression. Sparse suffix trees are kind of suffix trees that represent only a subset of suffixes of the input string. In this paper we study word suffix trees, which are one variation of sparse suffix trees. Let D be a dictionary of words and w be a string in D ⁺, namely, w is a sequence w ₁ ⋯w _k of k words in D. The word suffix tree of w w.r.t. D is a path-compressed trie that represents only the k suffixes in the form of w _i ⋯w _k. A typical example of its application is word- and phrase-level search on natural language documents. Andersson et al. proposed an algorithm to build word suffix trees in O(n) expected time with O(k) space. In this paper we present a new word suffix tree construction algorithm with O(n) running time and O(k) space in the worst cases. Our algorithm is on-line, which means that it can sequentially process the characters in the input, each by each, from left to right.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A.V., Corasick, M.: Efficient string matching: An aid to bibliographic search. Comm. ACM 18(6), 333–340 (1975)
Article MathSciNet MATH Google Scholar
Andersson, A., Larsson, N.J., Swanson, K.: Suffix trees on words. Algorithmica 23(3), 246–260 (1999)
Article MathSciNet MATH Google Scholar
Apostolico, A.: The myriad virtues of subword trees. Combinatorial Algorithms on Words F12, 85–96 (1985)
Google Scholar
Baeza-Yates, R., Gonnet, G.H.: Efficient text searching of regular expressions. In: Ronchi Della Rocca, S., Ausiello, G., Dezani-Ciancaglini, M. (eds.) ICALP 1989. LNCS, vol. 372, pp. 46–62. Springer, Heidelberg (1989)
Chapter Google Scholar
Bannai, H., Inenaga, S., Shinohara, A., Takeda, M., Miyano, S.: Efficiently finding regulatory elements using correlation with gene expression. Journal of Bioinformatics and Computational Biology 2(2), 273–288 (2004)
Article Google Scholar
Clifford, R., Sergot, M.: Distributed and paged suffix trees for large genetic databases. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 70–82. Springer, Heidelberg (2003)
Chapter Google Scholar
Dorohonceanu, B., Nevill-Manning, C.G.: Accelerating protein classification using suffix trees. In: Proc. 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000), pp. 128–133. AAAI Press, Menlo Park (2000)
Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Inenaga, S., Bannai, H., Hyyrö, H., Shinohara, A., Takeda, M., Nakai, K., Miyano, S.: Finding optimal pairs of cooperative and competing patterns with bounded distance. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 32–46. Springer, Heidelberg (2004)
Chapter Google Scholar
Inenaga, S., Funamoto, T., Takeda, M., Shinohara, A.: Linear-time off-line text compression by longest-first substitution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 137–152. Springer, Heidelberg (2003)
Chapter Google Scholar
Inenaga, S., Kivioja, T., Mäkinen, V.: Finding missing patterns. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 463–474. Springer, Heidelberg (2004)
Chapter Google Scholar
Kärkkänen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Google Scholar
Larsson, N.J.: Extended application of suffix trees to data compression. In: Proc. Data Compression Conference 1996 (DCC 1996), pp. 190–199. IEEE Computer Society, Los Alamitos (1996)
Chapter Google Scholar
Marsan, L., Sagot, M.-F.: Extracting structured motifs using a suffix tree - algorithms and application to promoter consensus identification. In: Proc. 4th Annual International Conference on Computational Molecular Biology (RECOMB 2000), pp. 210–219. ACM, New York (2000)
Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of ACM 23(2), 262–272 (1976)
Article MathSciNet MATH Google Scholar
Na, J.C., Apostolico, A., Iliopoulos, C.S., Park, K.: Truncated suffix trees and their application to data compression. Theoretical Computer Science 304(1–3), 87–101 (2003)
Article MathSciNet MATH Google Scholar
Takeda, M., Miyamoto, S., Kida, T., Shinohara, A., Fukamachi, S., Shinohara, T., Arikawa, S.: Processing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 170–186. Springer, Heidelberg (2002)
Chapter Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Article MathSciNet MATH Google Scholar
Weiner, P.: Linear pattern-matching algorithms. In: Proc. of 14th IEEE Ann. Symp. on Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

Japan Society for the Promotion of Science,
Shunsuke Inenaga
Department of Informatics, Kyushu University, Fukuoka, 812-8581, Japan
Shunsuke Inenaga & Masayuki Takeda
SORST, Japan Science and Technology Agency (JST),
Masayuki Takeda

Authors

Shunsuke Inenaga
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Takeda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Bar-Ilan University, 52900, Ramat-Gan, Israel
Moshe Lewenstein
Department of Software, Technical University of Catalonia, 08034, Barcelona, Spain
Gabriel Valiente

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Inenaga, S., Takeda, M. (2006). On-Line Linear-Time Construction of Word Suffix Trees. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_7

Download citation

DOI: https://doi.org/10.1007/11780441_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics