Skip to main content

Part of the book series: Computational Biology ((COBO,volume 19))

Abstract

The BLAST search engine was published and released in 1990. It is a heuristic that uses the idea of a neighborhood to find seed matches that are then extended. This approach came from work that this author was doing to lever these ideas to arrive at a deterministic algorithm with a characterized and superior time complexity. The resulting \(O(en^{\operatorname{pow}(e/p)} \log n)\) expected-time algorithm for finding all e-matches to a string of length p in a text of length n was completed in 1991. The function \(\operatorname{pow}( \epsilon )\) is 0 for ϵ=0 and concave increasing, so the algorithm is truly sublinear in that its running time is O(n c) for c<1 for ϵ sufficiently small. This paper reviews the history and the unfolding of the basic concepts, and it attempts to intuitively describe the deeper result whose time complexity, to this author’s knowledge, has yet to be improved upon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  2. Ukkonen, E.: Algorithms for approximate string matching. Inf. Control 64, 100–119 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  3. Myers, E., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. 4(1), 11–17 (1988)

    Google Scholar 

  4. Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239–249 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  5. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  6. Myers, E.: An O(ND) difference algorithm and its variations. Algorithmica 1(2), 251–266 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  7. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Google Scholar 

  8. Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  9. Weiner, P.: Linear pattern matching algorithm. In: 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Chapter  Google Scholar 

  10. Manber, U., Myers, E.: Suffix arrays: a new method for on-line searches. In: Proc. 1st ACM-SIAM Symp. on Discrete Algorithms, pp. 319–327 (1990)

    Google Scholar 

  11. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  12. Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18(11), 1851–1858 (2008)

    Article  Google Scholar 

  13. Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Proc. of MFCS’91. LNCS, vol. 520, pp. 240–248 (1991)

    Google Scholar 

  14. Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92(1), 191–211 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  15. Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  16. Roberts, L.: New chip may speed genome analysis. Science 244, 655–656 (1989)

    Article  Google Scholar 

  17. Mealy, G.: A method for synthesizing sequential circuits. Bell Syst. Tech. J. 34, 1045–1079 (1955)

    Article  MathSciNet  Google Scholar 

  18. Moore, E.: Gedanken-experiments on sequential machines. In: Automata Studies. Annals of Mathematical Studies, vol. 34, pp. 129–153. Princeton University Press, Princeton (1956)

    Google Scholar 

  19. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  20. http://blast.ncbi.nlm.nih.gov/Blast.cgi

  21. Karlin, S., Altschul, S.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gene Myers .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Myers, G. (2013). What’s Behind Blast. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds) Models and Algorithms for Genome Evolution. Computational Biology, vol 19. Springer, London. https://doi.org/10.1007/978-1-4471-5298-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5298-9_1

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5297-2

  • Online ISBN: 978-1-4471-5298-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics