Skip to main content

Data-Parallel Computational Model for Next Generation Sequencing on Commodity Clusters

  • Conference paper
  • First Online:
Parallel Computing Technologies (PaCT 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11657))

Included in the following conference series:

Abstract

It is obvious that the next generation sequencing (NGS) technologies, are poised to be the next big revolution in personalized healthcare, and caused the amount of available sequencing data growing exponentially. While NGS data processing has become a major challenge for individual genomic research, commodity computers as a cost-effective platform for distributed and parallel processing in laboratories can help processing such huge volume of data. To deploy sequence-processing methods on these platforms, in this paper we present a parallel computational model for BLAST on commodity clusters that works in a data parallel manner. The suggested model has a master-worker paradigm. The master stores temporarily incoming requests and splits the database to chunks according to the number of available workers. Each worker pulls, formats, and searches queries against a unique chunk of the database. To show that our model works well, we used queries with different lengths to search against a small database (i.e. UniProtKB/SWISS-PROT) and a large database (i.e. UniProtKB/TrEMBL). The results were equal with the output of the golden method (i.e. NCBI BLAST) and the performance of our model outperformed the most popular distributed form of BLAST (i.e. mpiBLAST) with 25% higher performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    GenomeTools. Available at: http://genometools.org

References

  1. Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)

    Article  Google Scholar 

  2. Wilkinson, B., Allen, M.: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers. Prentice-Hall Inc., Upper Saddle River (2004)

    Google Scholar 

  3. Petsko, G., Ringe, D.: From sequence to function: case studies in structural and functional genomics. In: Protein Structure and Function. New Science Press (2004)

    Google Scholar 

  4. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  5. Mathog, D.: Parallel BLAST on split databases. Bioinformatics 19(14), 1865–1866 (2003)

    Article  Google Scholar 

  6. Bjornson, R., Sherman, A., Weston, S., Willard, N., Wing, J.: TurboBLAST: a parallel implementation of BLAST built on the TurboHub. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium, Washington, DC, USA, p. 325 (2002)

    Google Scholar 

  7. Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proceedings of the 2008 Fourth IEEE International Conference on eScience, Indianapolis, IN, USA, pp. 222–229 (2008)

    Google Scholar 

  8. Castro, M., Tostes, C., Dávila, A., Senger, H., Silva, F.: SparkBLAST: scalable BLAST processing using in-memory operations. BMC Bioinformatics 18(1), 318 (2017)

    Article  Google Scholar 

  9. Ye, W., Chen, Y., Zhang, Y., Xu, Y.: H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs. Bioinformatics 33(8), 1130–1138 (2017)

    Google Scholar 

  10. Darling, A., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: 4th International Conference on Linux Clusters, San Jose, CA, USA, p. 14p (2003)

    Google Scholar 

  11. Zhang, L., Tang, B.: Parka: a parallel implementation of BLAST with MapReduce. In: Xhafa, F., Patnaik, S., Zomaya, A.Y. (eds.) IISA 2017. AISC, vol. 686, pp. 185–191. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-69096-4_26

    Chapter  Google Scholar 

  12. Dong, G., Fu, X., Li, H., Li, J.: An accurate algorithm for multiple sequence alignment in MapReduce. J. Comput. Methods Sci. Eng. 18(1), 283–295 (2018)

    Google Scholar 

  13. Guo, R., Zhao, Y., Zou, Q., Fang, X., Peng, S.: Bioinformatics applications on Apache Spark. GigaScience 7(8), giy098 (2018)

    Google Scholar 

  14. Mondal, S., Khatua, S.: Accelerating pairwise sequence alignment algorithm by MapReduce technique for Next-Generation Sequencing (NGS) data analysis. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds.) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol. 813, pp. 213–220. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-1498-8_19

    Chapter  Google Scholar 

  15. Oehmen, C.S., Baxter, D.J.: ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics 29(6), 797–798 (2013)

    Article  Google Scholar 

  16. Kim, D.-W., et al.: G-BLAST: BLAST manager in an heterogeneous distributed environment. In: 2012 Sixth International Symposium on Theoretical Aspects of Software Engineering, Tianjin, China, pp. 315–316 (2009)

    Google Scholar 

  17. Braun, R.C., Pedretti, K.T., Casavant, T.L., Scheetz, T.E., Birkett, C.L., Roberts, C.A.: Parallelization of local BLAST service on workstation clusters. Future Gener. Comput. Syst. 17, 745–754 (2001)

    Article  Google Scholar 

  18. Xiao, S., Lin, H., Feng, W.-C.: Accelerating protein sequence search in a heterogeneous computing system. In: Proceedings of the 2011 IEEE International Parallel Distributed Processing Symposium (IPDPS), Washington, DC, USA, pp. 1212–1222 (2011)

    Google Scholar 

  19. Kim, H.-S., Kim, H.-J., Han, D.-S.: Hyper-BLAST: a parallelized BLAST on cluster system. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2659, pp. 213–222. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44863-2_22

    Chapter  Google Scholar 

  20. Pinthong, W., Muangruen, P., Suriyaphol, P., Mairiang, D.: A simple grid implementation with Berkeley Open Infrastructure for Network Computing using BLAST as a model. PeerJ 4, e1388 (2016)

    Article  Google Scholar 

  21. Tao, T., Madden, T., Christiam, C., Szilagyi, L.: BLAST® Help. https://www.ncbi.nlm.nih.gov/books/NBK62345/

  22. Li, L., Malony, A.D.: Model-based performance diagnosis of master-worker parallel computations. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 35–46. Springer, Heidelberg (2006). https://doi.org/10.1007/11823285_5

    Chapter  Google Scholar 

  23. Agarwal, A.: Parallel Computational Models, Handout, Lecture02, Multicore Systems Laboratory. MIT (2010)

    Google Scholar 

  24. Hamilton, S.: An Introduction to Parallel Programming. CreateSpace Independent Publishing Platform, Scotts Valley (2014)

    Google Scholar 

  25. Muresano, R., Rexachs, D., Luque, E.: Learning parallel programming: a challenge for university students. Procedia Comput. Sci. 1(1), 875–883 (2010)

    Article  Google Scholar 

  26. Massingill, B., Mattson, T., Sanders, B.: Patterns for parallel application programs. In: 6th Pattern Languages of Programs Workshop (1999)

    Google Scholar 

  27. Hughey, R.: Parallel hardware for sequence comparison and alignment. CABIOS 12(6), 473–479 (1996)

    Google Scholar 

  28. Lin, H., Ma, X., Chandramohan, P., Geist, A., Samatova, N.: Efficient data access for parallel BLAST. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, Colorado, US, p. 72b (2005)

    Google Scholar 

  29. Korf, I., Yandell, M., Bedell, J.: BLAST - An Essential Guide to the Basic Local Alignment Search Tool. O’Reilly & Associates, Sebastopol (2003)

    Google Scholar 

  30. Vidyarthi, D., Sarker, B., Tripathi, A., Yang, L.: Scheduling in Distributed Computing Systems. Springer, New York (2009). https://doi.org/10.1007/978-0-387-74483-4

    Book  MATH  Google Scholar 

  31. Yap, T., Frieder, O., Martino, R.: Parallel computation in biological sequence analysis. IEEE Trans. Parallel Distrib. Syst. 9(3), 283–294 (1998)

    Article  Google Scholar 

  32. Amdahl, G.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, New York, NY, USA (1967)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majid Hajibaba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hajibaba, M., Sharifi, M., Gorgin, S. (2019). Data-Parallel Computational Model for Next Generation Sequencing on Commodity Clusters. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2019. Lecture Notes in Computer Science(), vol 11657. Springer, Cham. https://doi.org/10.1007/978-3-030-25636-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-25636-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-25635-7

  • Online ISBN: 978-3-030-25636-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics