Abstract
Spaced seeds technology, which was proposed by PatternHunter, has been proven to be more sensitive and faster than continuous seeds, and it is now widely used for bio-sequence local alignments. However, finding optimal spaced seeds is an NP-hard problem. A seed digraph model is proposed to find good spaced seeds, which are very close to optimal, in a very different but effective way. Using this different approach, some good long spaced seeds which cannot be calculated by normal optimal sensitivity formulas due to their exponential complexity can be found.
Similar content being viewed by others
References
Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 1970, 48: 443–453
Smith T F, Waterman M S. Identification of common molecular subsequences. J Comput Biol, 1981, 147: 195–197
Lipman D J, Pearson W R. Rapid and sensitive protein similarity searches. Science, 1985, 227: 1435–1441
Altschul S F, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol, 1990, 215: 403–410
Altschul S F, Madden T L, Schffer A, et al. Gapped Blast and Psi-Blast: A new generation of protein database search programs. Nucl Acids Res, 1997, 25: 3389–3402
Ma B, Tromp J, Li M. PatternHunter: Faster and more sensitive homology search. Bioinformatics, 2002, 18: 440–445
Brona B, Daniel G B, Tomas V. Optimal spaced seeds for homologous coding regions. J Bioinform Comput Bio, 2004, 1: 595–610
Choi K, Zeng F F, Zhang L. Good spaced seeds for homology search. Bioinformatics, 2004, 20: 1053–1059
Choi K P, Zhang L. Sensitivity analysis and efficient method for identifying optimal spaced seeds. J Comput Syst Sci, 2003, 68: 22–40
Keich U, Li M, Ma B, et al. On spaced seeds for similarity search. Discrete Appl Math, 2004, 138: 253–263
Buhler J, Keich U, Sun Y. Designing seeds for similarity search in genomic DNA. J Comput Syst Sci, 2005, 70: 342–363
Ma B, Yao H. Seed optimization for i.i.d. similarities is no easier than optimal Golomb ruler design. Inf Proce Lett, 2009, 109: 1120–1124
Ilie L, Ilie S. Long spaced seeds for finding similarities between biological sequences. In: BIOCOMP’07, June 25–28, 2007, Las Vegas, 3–8
Ilie L, Ilie S. Multiple spaced seeds for homology search. Bioinformatics, 2007, 23: 2969–2977
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is published with open access at Springerlink.com
About this article
Cite this article
Chen, K., She, K. & Zhu, Q. Overlap digraph: An effective model for finding good spaced seeds for biological sequence local alignment. Chin. Sci. Bull. 56, 1100–1107 (2011). https://doi.org/10.1007/s11434-010-4161-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11434-010-4161-9