Skip to main content

HyperLoom Possibilities for Executing Scientific Workflows on the Cloud

  • Conference paper
  • First Online:
Complex, Intelligent, and Software Intensive Systems (CISIS 2017)

Abstract

We have developed HyperLoom - a platform for defining and executing scientific workflows in large-scale HPC systems. The computational tasks in such workflows often have non-trivial dependency patterns, unknown execution time and unknown sizes of generated outputs. HyperLoom enables to efficiently execute the workflows respecting task requirements and cluster resources agnostically to the shape or size of the workflow. Although HPC infrastructures provide an unbeatable performance, they may be unavailable or too expensive especially for small to medium workloads. Moreover, for some workloads, due to HPCs not very flexible resource allocation policy, the system energy efficiency may not be optimal at some stages of the execution. In contrast, current public cloud providers such as Amazon, Google or Exoscale allow users a comfortable and elastic way of deploying, scaling and disposing a virtualized cluster of almost any size. In this paper, we describe HyperLoom virtualization and evaluate its performance in a virtualized environment using workflows of various shapes and sizes. Finally, we discuss the Hyperloom potential for its expansion to cloud environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amazon AWS. https://aws.amazon.com/

  2. Docker. https://www.docker.com/

  3. Docker Hub. https://hub.docker.com/

  4. Exoscale. https://www.exoscale.ch/

  5. Singularity. http://singularity.lbl.gov/

  6. Specsheet - Processor Intel Xeon E5 2680. http://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz

  7. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  8. Chen, W., Deelman, E.: Workflow overhead analysis and optimizations. In: Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, WORKS 2011, New York, NY, USA, pp. 11–20. ACM (2011)

    Google Scholar 

  9. Deelman, E., Singh, G., Mei-Hui, S., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., BruceBerriman, G., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)

    Google Scholar 

  10. Red Hat: Red hat enterprise linux (2017). https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux. Accessed 31 Mar 2017

  11. HTCondor: Htcondor (2017). https://research.cs.wisc.edu/htcondor/index.html. Accessed 31 Mar 2017

  12. Lampa, S., Alvarsson, J., Spjuth, O.: Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. J. Cheminformatics 8(1), 67 (2016)

    Article  Google Scholar 

  13. Rocklin, M.: Dask: parallel computation with blocked algorithms and task scheduling. In: Proceedings of the 14th Python in Science Conference, pp. 130–136. Citeseer (2015)

    Google Scholar 

  14. White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)

    Google Scholar 

  15. Wikipedia: Infiniband – wikipedia, the free encyclopedia (2017). https://en.wikipedia.org/w/index.php?title=InfiniBand&oldid=772443735. Accessed 31 Mar 2017

  16. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 671555. This work was supported by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project IT4Innovations excellence in science - LQ1602 and by the IT4Innovations infrastructure which is supported from the Large Infrastructures for Research, Experimental Development and Innovations project IT4Innovations National Supercomputing Center LM2015070.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vojtech Cima .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Cima, V., Böhm, S., Martinovič, J., Dvorský, J., Ashby, T.J., Chupakhin, V. (2018). HyperLoom Possibilities for Executing Scientific Workflows on the Cloud. In: Barolli, L., Terzo, O. (eds) Complex, Intelligent, and Software Intensive Systems. CISIS 2017. Advances in Intelligent Systems and Computing, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-61566-0_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-61566-0_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-61565-3

  • Online ISBN: 978-3-319-61566-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics