Skip to main content

Search-Optimized Suffix-Tree Storage for Biological Applications

  • Conference paper
High Performance Computing – HiPC 2005 (HiPC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3769))

Included in the following conference series:

Abstract

Suffix-trees are popular indexing structures for various sequence processing problems in biological data management. We investigate here the possibility of enhancing the search efficiency of disk-resident suffix-trees through customized layouts of tree-nodes to disk-pages. Specifically, we propose a new layout strategy, called Stellar, that provides significantly improved search performance on a representative set of real genomic sequences. Further, Stellar supports both the standard root-to-leaf lookup queries as well as sophisticated sequencesearch algorithms that exploit the suffix-links of suffix-trees. Our results are encouraging with regard to the ultimate objective of seamlessly integrating sequence processing in database engines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., et al.: Efficient tree layout in a multilevel memory hierarchy. Technical Report arXiv:cs.DS/0211010v1 (2002)

    Google Scholar 

  2. Altschul, S., et al.: A Basic Local Alignment Search Tool. Journal of Molecular Biology 215(3) (1990)

    Google Scholar 

  3. Baswana, S., Sen, S.: Planar Graph Blocking for External Searching. Algorithmica 34(3) (2002)

    Google Scholar 

  4. Bayer, R., McCreight, E.M.: Organization and Maintenance of Large Ordered Indexes. Acta Informatica 1(3) (1972)

    Google Scholar 

  5. Bedathur, S., Haritsa, J.: Engineering a Fast Online Persistent Suffix Tree Construction. In: Proc. of the IEEE Intl. Conf. on Data Engg, ICDE (2004)

    Google Scholar 

  6. Bedathur, S., Haritsa, J.: Search-Optimized Persistent Suffix-tree Storage for Biological Applications. Technical Report TR-2004-04, Database Systems Lab, Indian Institute of Science (2004)

    Google Scholar 

  7. Chang, W.I., Lawler, E.L.: Approximate String Matching in Sublinear Expected Time. In: Proc. of the IEEE Symp. on Found. of Comp. Sci, FOCS (1990)

    Google Scholar 

  8. Cobbs, A.L.: Fast Approximate Matching using Suffix Trees. In: Proc. of the 6th Annual Symp. on Combinatorial Pattern Matching, CPM (1995)

    Google Scholar 

  9. Delcher, A.L., et al.: Alignment of Whole Genomes. Nucleic Acids Research 27(11) (1999)

    Google Scholar 

  10. Diwan, A.A., et al.: Clustering Techniques for Minimizing External Path Length. In: Proc. of the 22nd Intl. Conf. on Very Large Databases, VLDB (1996)

    Google Scholar 

  11. Gil, J., Itai, A.: How to Pack Trees. Journal of Algorithms 32(2) (1999)

    Google Scholar 

  12. Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  13. Gusfield, D.: Suffix Trees Come of Age in Bioinformatics (Invited Talk). In: IEEE Bioinformatics Conference, CSB (2002)

    Google Scholar 

  14. Hunt, E., Atkinson, M.P., Irving, R.W.: Database Indexing for Large DNA and Protein Sequence Collections. VLDB Journal 7(3) (2001)

    Google Scholar 

  15. Hunt, E., Atkinson, M.P., Irving, R.W.: A Database Index to Large Biological Sequences. In: Proc. of the 27th Intl. Conf. on Very Large Databases, VLDB (2001)

    Google Scholar 

  16. McCreight, E.M.: A Space-Efficient Suffix Tree Construction Algorithm. Jl. of the ACM (JACM) 23(2) (1976)

    Google Scholar 

  17. Nodine, M., Goodrich, M., Vitter, J.: Blocking for External Graph Searching. In: Proc. of the 12th ACM Symp. on Principles of Database Systems, PODS (1993)

    Google Scholar 

  18. Schürman, K.-B., Stoye, J.: Suffix Tree Construction and Storage with Limited Main Memory. Technical Report 2003-06, Universität Bielefeld (2003)

    Google Scholar 

  19. Tata, S., Hankins, R.A., Patel, J.M.: Practical Suffix Tree Construction. In: Proc. of the 30th Intl. Conf. on Very Large Databases, VLDB (2004)

    Google Scholar 

  20. Thite, S.: Optimum Binary Search Trees on the Hierarchical Memory Model. Master’s thesis, Dept. of Computer Science, Univ. of Illinois at Urbana-Champaign (2001)

    Google Scholar 

  21. Ukkonen, E.: Approximate String Matching over Suffix Trees. In: Proc. of the 4th Annual Symp. on Combinatorial Pattern Matching, CPM (1993)

    Google Scholar 

  22. Ukkonen, E.: Online Construction of Suffix-trees. Algorithmica 14(3) (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bedathur, S.J., Haritsa, J.R. (2005). Search-Optimized Suffix-Tree Storage for Biological Applications. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds) High Performance Computing – HiPC 2005. HiPC 2005. Lecture Notes in Computer Science, vol 3769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11602569_8

Download citation

  • DOI: https://doi.org/10.1007/11602569_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30936-9

  • Online ISBN: 978-3-540-32427-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics