Skip to main content

Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2007)

Abstract

With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large-scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a seed-based technique on a parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows large databases to be fully indexed and rapidly accessed. Compared to traditional approaches presented by the Blastp software, we obtain both a significant speed-up and better results. To the best of our knowledge, this is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S., Gish, W., Miller, W., Myers, W., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)

    Google Scholar 

  2. Rognes, T.: ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Research 29(7), 1647–1652 (2001)

    Article  Google Scholar 

  3. Farrar, M.: Striped smith–waterman speeds database searches six times over other simd implementations. Bioinformatics 23(2), 156–161 (2007)

    Article  Google Scholar 

  4. Darling, A., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference and Expo (CWCE 2003) (2003)

    Google Scholar 

  5. Thorsen, O., Smith, B., Sosa, C.P., Jiang, K., Lin, H., Peters, A., Fen, W.: Parallel genomic sequence-search on a massively parallel system. In: Int. Conference on Computing Frontiers (CF 2007), pp. 59–68 (2007)

    Google Scholar 

  6. Lavenier, D., Xinchun, L., Georges, G.: Seed-based genomic sequence comparison using a FPGA/FLASH accelerator. In: Field Programmable Technology (FPT 2006), pp. 41–48 (2006)

    Google Scholar 

  7. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  8. Crochemore, M., Landau, G., Ziv-Ukelson, M.: A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In: Symposium On Discrete Algorithms (SODA 2002), pp. 679–688 (2002)

    Google Scholar 

  9. Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    Article  Google Scholar 

  10. Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research 33, W540–W543 (2005)

    Article  Google Scholar 

  11. Csürös, M., Ma, B.: Rapid homology search with two-stage extension and daughter seeds. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 104–114. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3), 342–363 (2005)

    Article  MathSciNet  Google Scholar 

  13. Brejová, B., Brown, D., Vinar, T.: Vector seeds: An extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Li, M., Ma, M., Zhang, L.: Superiority and complexity of the spaced seeds. In: Symp. on Discrete Algorithms (SODA 2006), pp. 444–453 (2006)

    Google Scholar 

  15. Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22(14), e341–e349 (2006)

    Article  Google Scholar 

  16. Kisman, D., Li, M., Ma, B., Li, W.: tPatternhunter: gapped, fast and sensitive translated homology search. Bioinformatics 21(4), 542–544 (2005)

    Article  Google Scholar 

  17. Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2(1), 29–38 (2005)

    Article  Google Scholar 

  18. Kung, H.T., Leiserson, C.: Algorithms for VLSI processors arrays. Addison-Wesley, Reading (1980)

    Google Scholar 

  19. Lipton, R., Lopresti, D.: In: Fuchs, H. (ed.) A systolic array for rapid string comparison, pp. 363–376. Computer Science Press, Rockville, MD (2004)

    Google Scholar 

  20. Chow, E., Hunkapiller, T., Peterson, J., Waterman, M.S.: Biological information signal processor. In: International Conference on Application Specific Array Processors (ASAP 1991), pp. 144–160 (1991)

    Google Scholar 

  21. Hoang, D.: Searching genetic databases on splash 2. In: IEEE Workshop on FPGAs for Custom Computing Machines (FCCM 1993), Napa, California, pp. 185–191 (1993)

    Google Scholar 

  22. Lavenier, D., Giraud, M.: Bioinformatics Applications. In: Gokhale, M.B., Graham, P.S. (eds.) Reconfigurable Computing, Springer, Heidelberg (2005)

    Google Scholar 

  23. Dydel, S., Bala, P.: Large scale protein sequence alignment using FPGA reprogrammable logic devices. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 23–32. Springer, Heidelberg (2004)

    Google Scholar 

  24. Court, T.V., Herbordt, M.C.: Families of fpga-based accelerators for approximate string matching. Microprocessors and Microsystems 31(2), 135–145 (2007)

    Article  Google Scholar 

  25. Singh, R.K., Tell, S.G., White, C.T., Hoffman, D., Chi, V.L., Erickson, B.W.: A scalable systolic multiprocessor system for analysis of biological sequences. In: Borrielo, G., Ebeling, C. (eds.) Symposium on Research on Integrated Systems, pp. 168–182 (1993)

    Google Scholar 

  26. Knowles, G., Gardner-Stephen, P.: A new hardware architecture for genomic and proteomic sequence alignment. In: IEEE Computational Systems Bioinformatics Conference (CSBC 2004) (2004)

    Google Scholar 

  27. Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. J. Bioinf. Comp. Biology 4(2), 553–569 (2006)

    Article  Google Scholar 

  28. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peterlongo, P. et al. (2008). Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_131

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68111-3_131

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68105-2

  • Online ISBN: 978-3-540-68111-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics