Skip to main content

Provenance of Dynamic Adaptations in User-Steered Dataflows

  • Conference paper
  • First Online:
Provenance and Annotation of Data and Processes (IPAW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11017))

Included in the following conference series:

Abstract

Due to the exploratory nature of scientific experiments, computational scientists need to steer dataflows running on High-Performance Computing (HPC) machines by tuning parameters, modifying input datasets, or adapting dataflow elements at runtime. This happens in several application domains, such as in Oil and Gas where they adjust simulation parameters, or in Machine Learning where they tune models’ hyperparameters during the training. This is also known as computational steering or putting the “human-in-the-loop” of HPC simulations. Such adaptations must be tracked and analyzed, especially during long executions. Tracking adaptations with provenance not only improves experiments’ reproducibility and reliability, but also helps scientists to understand, online, the consequences of their adaptations. We propose PROV-DfA, a specialization of W3C PROV elements to model computational steering. We provide provenance data representation for online adaptations, associating them with the adapted domain dataflow and with execution data, all in the same provenance database. We explore a case study in the Oil and Gas domain to show how PROV-DfA supports scientists in questions like “who, when, and which dataflow elements were adapted and what happened to the dataflow and execution after the adaptation (e.g., how much execution time or processed data was reduced)”, in a real scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jagadish, H.V., et al.: Big data and its technical challenges. Commun. ACM 57, 86–94 (2014)

    Article  Google Scholar 

  2. Mattoso, M., et al.: Dynamic steering of HPC scientific workflows: a survey. FGCS 46, 100–113 (2015)

    Article  Google Scholar 

  3. Souza, R., Silva, V., Camata, J., Coutinho, A., Valduriez, P., Mattoso, M.: Tracking of online parameter tuning in scientific workflows. In: Works in ACM/IEEE Supercomputing Workshops (2017)

    Google Scholar 

  4. Dias, J., Guerra, G., Rochinha, F., Coutinho, A.L.G.A., Valduriez, P., Mattoso, M.: Data-centric iteration in dynamic workflows. FGCS 46, 114–126 (2015)

    Article  Google Scholar 

  5. Souza, R., Silva, V., Coutinho, A.L.G.A., Valduriez, P., Mattoso, M.: Data reduction in scientific workflows using provenance monitoring and user steering. FGCS 1–34 (2017). https://doi.org/10.1016/j.future.2017.11.028

  6. Deelman, E., et al.: The future of scientific workflows. Int J HPC Appl. 32(1), 159–175 (2018)

    Google Scholar 

  7. De Oliveira, D., Silva, V., Mattoso, M.: How much domain data should be in provenance databases? In: TaPP. USENIX Association, Edinburgh (2015)

    Google Scholar 

  8. Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: SIGMOD, New York, NY, USA, pp. 1345–1350 (2008)

    Google Scholar 

  9. da Silva, R.F., Filgueira, R., Pietri, I., Jiang, M., Sakellariou, R., Deelman, E.: A characterization of workflow management systems for extreme-scale applications. FGCS 75, 228–238 (2017)

    Article  Google Scholar 

  10. Bauer, A.C., Abbasi, H., Ahrens, J., Childs, H., Geveci, B., Klasky, S., et al.: In situ methods, infrastructures, and applications on high performance computing platforms. Comput. Graph. Forum Banner 35, 577–597 (2016)

    Article  Google Scholar 

  11. Atkinson, M., Gesing, S., Montagnat, J., Taylor, I.: Scientific workflows: past, present and future. FGCS 75, 216–227 (2017)

    Article  Google Scholar 

  12. Hanzich, M., Rodriguez, J., Gutierrez, N., de la Puente, J., Cela, J.: Using HPC software frameworks for developing BSIT: a geophysical imaging tool. In: Proceedings of WCCM ECCM ECFD, vol. 3, pp. 2019–2030 (2014)

    Google Scholar 

  13. Lee, K., Paton, N.W., Sakellariou, R., Fernandes, A.A.A.: Utility functions for adaptively executing concurrent workflows. CCPE 23, 646–666 (2011)

    Google Scholar 

  14. Pouya, I., Pronk, S., Lundborg, M., Lindahl, E.: Copernicus, a hybrid dataflow and peer-to-peer scientific computing platform for efficient large-scale ensemble sampling. FGCS 71, 18–31 (2017)

    Article  Google Scholar 

  15. Jain, A., Ong, S.P., Chen, W., Medasani, B., Qu, X., Kocher, M., et al.: FireWorks: a dynamic workflow system designed for high-throughput applications. CCPE 27, 5037–5059 (2015)

    Google Scholar 

  16. Nguyen, H.A., Abramson, D., Kipouros, T., Janke, A., Galloway, G.: WorkWays: interacting with scientific workflows. CCPE 27, 4377–4397 (2015)

    Google Scholar 

  17. Abramson, D., Enticott, C., Altinas, I.: Nimrod/K: towards massively parallel dynamic grid workflows. In: Supercomputing, pp. 24:1–24:11. IEEE Press, Piscataway (2008)

    Google Scholar 

  18. Gil, Y., et al.: Wings: intelligent workflow-based design of computational experiments. IEEE Intell. Syst. 26, 62–72 (2011)

    Article  Google Scholar 

  19. Stamatogiannakis, M., Athanasopoulos, E., Bos, H., Groth, P.: PROV 2R: practical provenance analysis of unstructured processes. ACM Trans. Internet Technol. 17, 37:1–37:24 (2017)

    Article  Google Scholar 

  20. Bourhis, P., Deutch, D., Moskovitch, Y.: Analyzing data-centric applications: why, what-if, and how-to. In: ICDE, pp. 779–790 (2016)

    Google Scholar 

  21. Silva, B., Netto, M.A.S., Cunha, R.L.F.: JobPruner: a machine learning assistant for exploring parameter spaces in HPC applications. FGCS 83, 144–157 (2018)

    Article  Google Scholar 

  22. Silva, V., et al.: Raw data queries during data-intensive parallel workflow execution. FGCS 75, 402–422 (2017)

    Article  Google Scholar 

  23. Ikeda, R., Sarma, A.D., Widom, J.: Logical provenance in data-oriented workflows? In: ICDE, pp. 877–888 (2013)

    Google Scholar 

  24. Camata, J.J., Silva, V., Valduriez, P., Mattoso, M., Coutinho, A.L.G.A.: In situ visualization and data analysis for turbidity currents simulation. Comput. Geosci. 110, 23–31 (2018)

    Article  Google Scholar 

  25. Costa, F., Silva, V., de Oliveira, D., Ocaña, K., et al.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: EDBT/ICDT Workshops, pp. 282–289 (2013)

    Google Scholar 

  26. Moreau, L., Missier, P.: PROV-DM: The PROV Data Model. https://www.w3.org/TR/prov-dm/

  27. ProvONE provenance model for scientific workflow. http://vcvcomputing.com/provone/provone.html

  28. Oliveira, W., Missier, P., Oliveira, D., Braganholo, V.: Comparing provenance data models for scientific workflows: an analysis of PROV-Wf and ProvOne. In: Brazilian e-Science Workshop (2016)

    Google Scholar 

  29. PROV-DfA: PROV-DfA GitHub Repository. https://github.com/hpcdb/PROV-DfA

Download references

Acknowledgement

This work was partially funded by CNPq, FAPERJ and HPC4E (EU H2020 and MCTI/RNP-Brazil).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renan Souza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Souza, R., Mattoso, M. (2018). Provenance of Dynamic Adaptations in User-Steered Dataflows. In: Belhajjame, K., Gehani, A., Alper, P. (eds) Provenance and Annotation of Data and Processes. IPAW 2018. Lecture Notes in Computer Science(), vol 11017. Springer, Cham. https://doi.org/10.1007/978-3-319-98379-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98379-0_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98378-3

  • Online ISBN: 978-3-319-98379-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics