Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware

Peterlongo, Pierre; Noé, Laurent; Lavenier, Dominique; Georges, Gilles; Jacques, Julien; Kucherov, Gregory; Giraud, Mathieu

doi:10.1007/978-3-540-68111-3_131

Pierre Peterlongo¹,
Laurent Noé²,
Dominique Lavenier¹,
Gilles Georges¹,
Julien Jacques¹,
Gregory Kucherov² &
…
Mathieu Giraud²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1166 Accesses
1 Citations

Abstract

With a sharp increase of available DNA and protein sequence data, new precise and fast similarity search methods are needed for large-scale genome and proteome comparisons. Modern seed-based techniques of similarity search (spaced seeds, multiple seeds, subset seeds) provide a better sensitivity/specificity ratio. We present an implementation of such a seed-based technique on a parallel specialized hardware embedding reconfigurable architecture (FPGA), where the FPGA is tightly connected to large capacity Flash memories. This parallel system allows large databases to be fully indexed and rapidly accessed. Compared to traditional approaches presented by the Blastp software, we obtain both a significant speed-up and better results. To the best of our knowledge, this is the first attempt to exploit efficient seed-based algorithms for parallelizing the sequence similarity search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S., Gish, W., Miller, W., Myers, W., Lipman, D.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Google Scholar
Rognes, T.: ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. Nucleic Acids Research 29(7), 1647–1652 (2001)
Article Google Scholar
Farrar, M.: Striped smith–waterman speeds database searches six times over other simd implementations. Bioinformatics 23(2), 156–161 (2007)
Article Google Scholar
Darling, A., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference and Expo (CWCE 2003) (2003)
Google Scholar
Thorsen, O., Smith, B., Sosa, C.P., Jiang, K., Lin, H., Peters, A., Fen, W.: Parallel genomic sequence-search on a massively parallel system. In: Int. Conference on Computing Frontiers (CF 2007), pp. 59–68 (2007)
Google Scholar
Lavenier, D., Xinchun, L., Georges, G.: Seed-based genomic sequence comparison using a FPGA/FLASH accelerator. In: Field Programmable Technology (FPT 2006), pp. 41–48 (2006)
Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Article Google Scholar
Crochemore, M., Landau, G., Ziv-Ukelson, M.: A sub-quadratic sequence alignment algorithm for unrestricted cost matrices. In: Symposium On Discrete Algorithms (SODA 2002), pp. 679–688 (2002)
Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Article Google Scholar
Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research 33, W540–W543 (2005)
Article Google Scholar
Csürös, M., Ma, B.: Rapid homology search with two-stage extension and daughter seeds. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 104–114. Springer, Heidelberg (2005)
Chapter Google Scholar
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and System Sciences 70(3), 342–363 (2005)
Article MathSciNet Google Scholar
Brejová, B., Brown, D., Vinar, T.: Vector seeds: An extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005)
Article MathSciNet MATH Google Scholar
Li, M., Ma, M., Zhang, L.: Superiority and complexity of the spaced seeds. In: Symp. on Discrete Algorithms (SODA 2006), pp. 444–453 (2006)
Google Scholar
Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22(14), e341–e349 (2006)
Article Google Scholar
Kisman, D., Li, M., Ma, B., Li, W.: tPatternhunter: gapped, fast and sensitive translated homology search. Bioinformatics 21(4), 542–544 (2005)
Article Google Scholar
Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2(1), 29–38 (2005)
Article Google Scholar
Kung, H.T., Leiserson, C.: Algorithms for VLSI processors arrays. Addison-Wesley, Reading (1980)
Google Scholar
Lipton, R., Lopresti, D.: In: Fuchs, H. (ed.) A systolic array for rapid string comparison, pp. 363–376. Computer Science Press, Rockville, MD (2004)
Google Scholar
Chow, E., Hunkapiller, T., Peterson, J., Waterman, M.S.: Biological information signal processor. In: International Conference on Application Specific Array Processors (ASAP 1991), pp. 144–160 (1991)
Google Scholar
Hoang, D.: Searching genetic databases on splash 2. In: IEEE Workshop on FPGAs for Custom Computing Machines (FCCM 1993), Napa, California, pp. 185–191 (1993)
Google Scholar
Lavenier, D., Giraud, M.: Bioinformatics Applications. In: Gokhale, M.B., Graham, P.S. (eds.) Reconfigurable Computing, Springer, Heidelberg (2005)
Google Scholar
Dydel, S., Bala, P.: Large scale protein sequence alignment using FPGA reprogrammable logic devices. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 23–32. Springer, Heidelberg (2004)
Google Scholar
Court, T.V., Herbordt, M.C.: Families of fpga-based accelerators for approximate string matching. Microprocessors and Microsystems 31(2), 135–145 (2007)
Article Google Scholar
Singh, R.K., Tell, S.G., White, C.T., Hoffman, D., Chi, V.L., Erickson, B.W.: A scalable systolic multiprocessor system for analysis of biological sequences. In: Borrielo, G., Ebeling, C. (eds.) Symposium on Research on Integrated Systems, pp. 168–182 (1993)
Google Scholar
Knowles, G., Gardner-Stephen, P.: A new hardware architecture for genomic and proteomic sequence alignment. In: IEEE Computational Systems Bioinformatics Conference (CSBC 2004) (2004)
Google Scholar
Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. J. Bioinf. Comp. Biology 4(2), 553–569 (2006)
Article Google Scholar
Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Symbiose, IRISA, INRIA, CNRS, Université Rennes 1,
Pierre Peterlongo, Dominique Lavenier, Gilles Georges & Julien Jacques
Sequoia/Bioinfo, LIFL, INRIA, CNRS, Université Lille 1,
Laurent Noé, Gregory Kucherov & Mathieu Giraud

Authors

Pierre Peterlongo
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Noé
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Lavenier
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Georges
View author publications
You can also search for this author in PubMed Google Scholar
Julien Jacques
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Kucherov
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Giraud
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peterlongo, P. et al. (2008). Protein Similarity Search with Subset Seeds on a Dedicated Reconfigurable Hardware. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_131

Download citation

DOI: https://doi.org/10.1007/978-3-540-68111-3_131
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68105-2
Online ISBN: 978-3-540-68111-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics