Skip to main content

Parallel Software Architecture for Experimental Workflows in Computational Biology on Clouds

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7204))

Abstract

Cloud computing opens new possibilities for computational biologists. Given the pay-as-you-go model and the commodity hardware base, new tools for extensive parallelism are needed to make experimentation in the cloud an attractive option. In this paper, we present EasyProt, a parallel message-passing architecture designed for developing experimental workflows in computational biology while harnessing the power of cloud resources. The system exploits parallelism in two ways: by multithreading modular components on virtual machines while respecting data dependencies and by allowing expansion across multiple virtual machines. Components of the system, called elements, are easily configured for efficient modification and testing of workflows during ever-changing experimentation. Though EasyProt, as an abstract cloud programming model, can be extended beyond computational biology, current development brings cloud computing to experimenters in this important discipline who are facing unprecedented data-processing challenges, with a type system designed for proteomics, interactomics and comparative genomics data, and a suite of elements that perform useful analysis tasks on biological data using cloud resources.

Availability: EasyProt is available as a public abstract machine image (AMI) on Amazon EC2 cloud service, with an open source license, registered with manifest easyprot-ami/easyprot.img.manifest.xml.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Stein, L.D.: The case for cloud computing in genome informatics. Genome Biology 11(5), 207 (2010)

    Article  Google Scholar 

  2. Khalidi, Y.A.: Building a cloud computing platform for new possibilities. Computer 44(3), 29–34 (2011)

    Article  Google Scholar 

  3. Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., et al.: Examining the challenges of scientific workflows. Computer 40(12), 24–32 (2007)

    Article  Google Scholar 

  4. Lord, H.D.: Improving the application development process with modular visualization environments. Computer Graphics 29(2), 10–12 (1995)

    Article  Google Scholar 

  5. Kohler, E., Morris, R., Chen, B., Jannotti, J., Kaashoek, F.: The Click modular router. ACM Trans. on Computer Systems 18(3), 263–297 (2000)

    Article  Google Scholar 

  6. Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. In: Proc. of the 18th Symposium on Operating Systems Principles, SOSP 2001 (2001)

    Google Scholar 

  7. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R.H., et al.: Above the clouds: a Berkeley view of cloud computing. EECS Department, University of California, Berkeley UCB/EECS-2009-28 (2009)

    Google Scholar 

  8. Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for e-Science: Scientific Workflows for Grids. Springer, Heidelberg (2006)

    Google Scholar 

  9. Deelman, E., Singh, G., Su, M., Blythe, J., Gil, Y.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13, 219–237 (2005)

    Google Scholar 

  10. Juve, G., Deelman, E.: Scientific workflows in the cloud. In: Cafaro, M., Aloisio, G. (eds.) Grids, Clouds and Virtualization, pp. 71–91. Springer, Heidelberg (2010)

    Google Scholar 

  11. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M.R., et al.: Taverna: a tool for building and running workflows of services. Nucleic Acids Research 34(Web Server issue), W729–W732 (2006)

    Google Scholar 

  12. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., et al.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice and Experience 18, 1039–1065 (2006)

    Article  Google Scholar 

  13. Linke, B., Giegerich, R., Goesmann, A.: Conveyor: a workflow engine for bioinformatic analyses. Bioinformatics 27(7), 903–911 (2011)

    Article  Google Scholar 

  14. Dudley, J.T., Butte, A.J.: In silico research in the era of cloud computing. Nature Biotechnology 28(11), 1181–1185 (2010)

    Article  Google Scholar 

  15. Donoho, D.L., Maleki, A., Rahman, I.U., Shahram, M., Stodden, V.: Reproducible research in computational harmonic analysis. Computing in Science and Engineering 11(1), 8–18 (2009)

    Article  Google Scholar 

  16. Parr, T.J., Quong, R.W.: ANTLR: a predicated-LL(k) parser generator. Software-Practice and Experience 25(7), 789–810 (1995)

    Article  Google Scholar 

  17. Klipp, E., Liebermeister, W., Wierling, C., Kowald, A., Lehrach, H., Herwig, R.: Systems Biology: A Textbook. Wiley-VCH, Weinheim (2009)

    Google Scholar 

  18. Hodgkinson, L., Karp, R.M.: Algorithms to detect multiprotein modularity conserved during evolution. IEEE/ACM Trans. on Computational Biology and Bioinformatics (September 27, 2011), IEEE Computer Society Digital Library. IEEE Computer Society, http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.125

  19. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  20. Bialecki, A., Cafarella, M., Cutting, D., OMalley, O.: Hadoop: a framework for running applications on large clusters built of commodity hardware, Wiki at, http://lucene.apache.org/hadoop

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hodgkinson, L., Rosa, J., Brewer, E.A. (2012). Parallel Software Architecture for Experimental Workflows in Computational Biology on Clouds. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2011. Lecture Notes in Computer Science, vol 7204. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31500-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31500-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31499-5

  • Online ISBN: 978-3-642-31500-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics