Skip to main content

Multi-pattern Matching with Bidirectional Indexes

  • Conference paper
Computing and Combinatorics (COCOON 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7434))

Included in the following conference series:

Abstract

We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm-factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  2. Clark, D.R.: Compact pat trees. PhD thesis, Waterloo, Ont., Canada, Canada (1998)

    Google Scholar 

  3. Li, R., et al.: Soap2. Bioinformatics 25(15), 1966–1967 (2009)

    Article  Google Scholar 

  4. Ferragina, P., Manzini, G.: Indexing compressed texts. Journal of the ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  5. Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  6. Fischer, J., Mäkinen, V., Navarro, G.: Faster entropy-bounded compressed suffix trees. Theor. Comput. Sci. 410(51), 5354–5364 (2009)

    Article  MATH  Google Scholar 

  7. Fischer, J., Mäkinen, V., Välimäki, N.: Space efficient string mining under frequency constraints. In: ICDM, pp. 193–202 (2008)

    Google Scholar 

  8. Gagie, T., Karhu, K., Kärkkäinen, J., Mäkinen, V., Salmela, L., Tarhio, J.: Indexed Multi-pattern Matching. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 399–407. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Gagie, T., Puglisi, S.J., Turpin, A.: Range Quantile Queries: Another Virtue of Wavelet Trees. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 1–6. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Handelsman, J., Rondon, M.R., Brady, S.F., Clardy, J., Goodman, R.: Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chemistry & Biology 5, 245–249 (1998)

    Article  Google Scholar 

  11. Hui, L.C.K.: Color set size problem with application to string matching. In: Proc. 3rd Annual Symposium on Combinatorial Pattern Matching, pp. 230–243. Springer, London (1992)

    Google Scholar 

  12. Jacobson, G.: Succinct Static Data Structures. PhD thesis. Carnegie–Mellon University, CMU-CS-89-112 (1989)

    Google Scholar 

  13. Karhu, K.: Improving exact search of multiple patterns from a compressed suffix array. In: Holub, J., Žďárek, J. (eds.) Proceedings of the Prague Stringology Conference 2011, pp. 226–231. Czech Technical University in Prague, Czech Republic (2011)

    Google Scholar 

  14. Karhu, K., Mäkinen, V.: Practical multi-pattern matching with bidirectional indexes. Submitted manuscript (2012)

    Google Scholar 

  15. Lam, T.W., Li, R., Tam, A., Wong, S., Wu, E., Yiu, S.M.: High throughput short read alignment via bi-directional BWT. In: IEEE International Conference on Bioinformatics and Biomedicine, vol. 0, pp. 31–36 (2009)

    Google Scholar 

  16. Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biology 10(3), R25 (2009)

    Google Scholar 

  17. Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  18. Maaß, M.G.: Linear bidirectional on-line construction of affix trees. Algorithmica 37(1), 43–74 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  19. Mäkinen, V., Välimäki, N., Laaksonen, A., Katainen, R.: Unified View of Backward Backtracking in Short Read Mapping. In: Elomaa, T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060, pp. 182–195. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)

    Google Scholar 

  21. Russo, L.M.S., Navarro, G., Oliveira, A.L.: Fully compressed suffix trees. ACM Trans. Algorithms 7, 53:1–53:34 (2011)

    Google Scholar 

  22. Sadakane, K.: Compressed suffix trees with full functionality. Theor. Comp. Sys. 41, 589–607 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  23. Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional Search in a String with Wavelet Trees. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 40–50. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  24. Stoye, J.: Affix trees. Technical Report 2000-04, Faculty of Technology, Bielefeld University (2000), http://www.techfak.uni-bielefeld.de/~stoye/rpublications/report00-04.pdf

  25. Vazirani, V.V.: Approximation Algorithms. Springer (2001)

    Google Scholar 

  26. Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gog, S., Karhu, K., Kärkkäinen, J., Mäkinen, V., Välimäki, N. (2012). Multi-pattern Matching with Bidirectional Indexes. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds) Computing and Combinatorics. COCOON 2012. Lecture Notes in Computer Science, vol 7434. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32241-9_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32241-9_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32240-2

  • Online ISBN: 978-3-642-32241-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics