Abstract
We study the pattern matching automaton introduced in [1] for the purpose of seed-based similarity search. We show that our definition provides a compact automaton, much smaller than the one obtained by applying the Aho-Corasick construction. We study properties of this automaton and present an efficient implementation of the automaton construction. We also present some experimental results and show that this automaton can be successfully applied to more general situations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. JBCB 4, 553–569 (2006)
Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56, 51–70 (2003)
Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Brown, D., Li, M., Ma, B.: A tutorial of recent developments in the seeding of local alignment. JBCB 2, 819–842 (2004)
Brown, D.: A survey of seeding for sequence alignments. In: Bioinformatics Algorithms: Techniques and Applications (to appear, 2007)
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology 2, 417–439 (2004)
Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research 33(web-server issue), W540–W543 (2005)
Califano, A., Rigoutsos, I.: Flash: A fast look-up algorithm for string homology. In: Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 56–64 (1993)
Tsur, D.: Optimal probing patterns for sequencing by hybridization. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 366–375. Springer, Heidelberg (2006)
Schwartz, S., Kent, J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., Miller, W.: Human–mouse alignments with BLASTZ. Genome Research 13, 103–107 (2003)
Sun, Y., Buhler, J.: Choosing the best heuristic for seeded alignment of DNA sequences. BMC Bioinformatics 7 (2006)
Csürös, M., Ma, B.: Rapid homology search with two-stage extension and daughter seeds. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 104–114. Springer, Heidelberg (2005)
Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22, e341–e349 (2006)
Brejová, B., Brown, D., Vinar, T.: Vector seeds: An extension to spaced seeds. Journal of Computer and System Sciences 70, 364–380 (2005)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138, 253–263 (2004) preliminary version in 2002.
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 67–75 (2003)
Brejová, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)
Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic O(nlog3 n)-time. In: Proceedings of 10th Symposium on Discrete Algorithms (SODA), pp. 245–254 (1999)
Holub, J., Smyth, W.F., Wang, S.: Fast pattern-matching on indeterminate strings. Journal of Discrete Algorithms (2006)
Rahman, S., Iliopoulos, C., Mouchard, L.: Pattern matching in degenerate DNA/RNA sequences. In: Proceedings of the Workshop on Algorithms and Computation (WALCOM), pp. 109–120 (2007)
Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5 (2004)
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)
Amir, A., Porat, E., Lewenstein, M.: Approximate subset matching with don’t cares. In: Proceedings of 12th Symposium on Discrete Algorithms (SODA), pp. 305–306 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kucherov, G., Noé, L., Roytberg, M. (2007). Subset Seed Automaton. In: Holub, J., Žďárek, J. (eds) Implementation and Application of Automata. CIAA 2007. Lecture Notes in Computer Science, vol 4783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76336-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-76336-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76335-2
Online ISBN: 978-3-540-76336-9
eBook Packages: Computer ScienceComputer Science (R0)