What’s Behind Blast

Myers, Gene

doi:10.1007/978-1-4471-5298-9_1

Gene Myers⁸

Part of the book series: Computational Biology ((COBO,volume 19))

2250 Accesses
2 Citations

Abstract

The BLAST search engine was published and released in 1990. It is a heuristic that uses the idea of a neighborhood to find seed matches that are then extended. This approach came from work that this author was doing to lever these ideas to arrive at a deterministic algorithm with a characterized and superior time complexity. The resulting \(O(en^{\operatorname{pow}(e/p)} \log n)\) expected-time algorithm for finding all e-matches to a string of length p in a text of length n was completed in 1991. The function \(\operatorname{pow}( \epsilon )\) is 0 for ϵ=0 and concave increasing, so the algorithm is truly sublinear in that its running time is O(n ^c) for c<1 for ϵ sufficiently small. This paper reviews the history and the unfolding of the basic concepts, and it attempts to intuitively describe the deeper result whose time complexity, to this author’s knowledge, has yet to be improved upon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Ukkonen, E.: Algorithms for approximate string matching. Inf. Control 64, 100–119 (1985)
Article MathSciNet MATH Google Scholar
Myers, E., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. 4(1), 11–17 (1988)
Google Scholar
Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239–249 (1986)
Article MathSciNet MATH Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)
Article Google Scholar
Myers, E.: An O(ND) difference algorithm and its variations. Algorithmica 1(2), 251–266 (1986)
Article MathSciNet MATH Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Google Scholar
Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)
Article MathSciNet MATH Google Scholar
Weiner, P.: Linear pattern matching algorithm. In: 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Chapter Google Scholar
Manber, U., Myers, E.: Suffix arrays: a new method for on-line searches. In: Proc. 1st ACM-SIAM Symp. on Discrete Algorithms, pp. 319–327 (1990)
Google Scholar
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Google Scholar
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11), 1851–1858 (2008)
Article Google Scholar
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Proc. of MFCS’91. LNCS, vol. 520, pp. 240–248 (1991)
Google Scholar
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)
Article MathSciNet MATH Google Scholar
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Article MathSciNet MATH Google Scholar
Roberts, L.: New chip may speed genome analysis. Science 244, 655–656 (1989)
Article Google Scholar
Mealy, G.: A method for synthesizing sequential circuits. Bell Syst. Tech. J. 34, 1045–1079 (1955)
Article MathSciNet Google Scholar
Moore, E.: Gedanken-experiments on sequential machines. In: Automata Studies. Annals of Mathematical Studies, vol. 34, pp. 129–153. Princeton University Press, Princeton (1956)
Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Karlin, S., Altschul, S.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

MPI for Cellular Molecular Biology and Genetics, 01307, Dresden, Germany
Gene Myers

Authors

Gene Myers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gene Myers .

Editor information

Editors and Affiliations

Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, Canada
Cedric Chauve
Computer Science and Operations Research, University of Montreal, Montreal, Québec, Canada
Nadia El-Mabrouk
Biometry and Evolutionary Biology, INRIA Rhône-Alpes, University of Lyon, Villeurbanne, France
Eric Tannier

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Myers, G. (2013). What’s Behind Blast. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds) Models and Algorithms for Genome Evolution. Computational Biology, vol 19. Springer, London. https://doi.org/10.1007/978-1-4471-5298-9_1

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5298-9_1
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5297-2
Online ISBN: 978-1-4471-5298-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics