HyperLoom Possibilities for Executing Scientific Workflows on the Cloud

Cima, Vojtech; Böhm, Stanislav; Martinovič, Jan; Dvorský, Jiří; Ashby, Thomas J.; Chupakhin, Vladimir

doi:10.1007/978-3-319-61566-0_36

Vojtech Cima¹⁶,
Stanislav Böhm¹⁶,
Jan Martinovič¹⁶,
Jiří Dvorský¹⁶,
Thomas J. Ashby¹⁷ &
…
Vladimir Chupakhin¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 611))

Included in the following conference series:

Conference on Complex, Intelligent, and Software Intensive Systems

2249 Accesses
2 Citations
3 Altmetric

Abstract

We have developed HyperLoom - a platform for defining and executing scientific workflows in large-scale HPC systems. The computational tasks in such workflows often have non-trivial dependency patterns, unknown execution time and unknown sizes of generated outputs. HyperLoom enables to efficiently execute the workflows respecting task requirements and cluster resources agnostically to the shape or size of the workflow. Although HPC infrastructures provide an unbeatable performance, they may be unavailable or too expensive especially for small to medium workloads. Moreover, for some workloads, due to HPCs not very flexible resource allocation policy, the system energy efficiency may not be optimal at some stages of the execution. In contrast, current public cloud providers such as Amazon, Google or Exoscale allow users a comfortable and elastic way of deploying, scaling and disposing a virtualized cluster of almost any size. In this paper, we describe HyperLoom virtualization and evaluate its performance in a virtualized environment using workflows of various shapes and sizes. Finally, we discuss the Hyperloom potential for its expansion to cloud environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amazon AWS. https://aws.amazon.com/
Docker. https://www.docker.com/
Docker Hub. https://hub.docker.com/
Exoscale. https://www.exoscale.ch/
Singularity. http://singularity.lbl.gov/
Specsheet - Processor Intel Xeon E5 2680. http://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
Chen, W., Deelman, E.: Workflow overhead analysis and optimizations. In: Proceedings of the 6th Workshop on Workflows in Support of Large-Scale Science, WORKS 2011, New York, NY, USA, pp. 11–20. ACM (2011)
Google Scholar
Deelman, E., Singh, G., Mei-Hui, S., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., BruceBerriman, G., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
Google Scholar
Red Hat: Red hat enterprise linux (2017). https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux. Accessed 31 Mar 2017
HTCondor: Htcondor (2017). https://research.cs.wisc.edu/htcondor/index.html. Accessed 31 Mar 2017
Lampa, S., Alvarsson, J., Spjuth, O.: Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. J. Cheminformatics 8(1), 67 (2016)
Article Google Scholar
Rocklin, M.: Dask: parallel computation with blocked algorithms and task scheduling. In: Proceedings of the 14th Python in Science Conference, pp. 130–136. Citeseer (2015)
Google Scholar
White, T.: Hadoop: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2009)
Google Scholar
Wikipedia: Infiniband – wikipedia, the free encyclopedia (2017). https://en.wikipedia.org/w/index.php?title=InfiniBand&oldid=772443735. Accessed 31 Mar 2017
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar

Download references

Acknowledgements

This project has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No. 671555. This work was supported by The Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project IT4Innovations excellence in science - LQ1602 and by the IT4Innovations infrastructure which is supported from the Large Infrastructures for Research, Experimental Development and Innovations project IT4Innovations National Supercomputing Center LM2015070.

Author information

Authors and Affiliations

IT4Innovations, VŠB Technical University of Ostrava, Ostrava, Czech Republic
Vojtech Cima, Stanislav Böhm, Jan Martinovič & Jiří Dvorský
IMEC, Brussels, Belgium
Thomas J. Ashby
Janssen Pharmaceutica NV, Brussels, Belgium
Vladimir Chupakhin

Authors

Vojtech Cima
View author publications
You can also search for this author in PubMed Google Scholar
Stanislav Böhm
View author publications
You can also search for this author in PubMed Google Scholar
Jan Martinovič
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Dvorský
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Ashby
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Chupakhin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vojtech Cima .

Editor information

Editors and Affiliations

Department of Information and Communication Engineering, Faculty of Information Engineering, Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
Politècnico di Torino, Istituto Superiore Mario Boella, Turin, Italy
Olivier Terzo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cima, V., Böhm, S., Martinovič, J., Dvorský, J., Ashby, T.J., Chupakhin, V. (2018). HyperLoom Possibilities for Executing Scientific Workflows on the Cloud. In: Barolli, L., Terzo, O. (eds) Complex, Intelligent, and Software Intensive Systems. CISIS 2017. Advances in Intelligent Systems and Computing, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-61566-0_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-61566-0_36
Published: 05 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61565-3
Online ISBN: 978-3-319-61566-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics