A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization

Koskela, Tuomas; Matveev, Zakhar; Yang, Charlene; Adedoyin, Adetokunbo; Belenov, Roman; Thierry, Philippe; Zhao, Zhengji; Gayatri, Rahulkumar; Shan, Hongzhang; Oliker, Leonid; Deslippe, Jack; Green, Ron; Williams, Samuel

doi:10.1007/978-3-319-92040-5_12

Tuomas Koskela^17,21,22,
Zakhar Matveev¹⁹,
Charlene Yang¹⁷,
Adetokunbo Adedoyin²⁰,
Roman Belenov¹⁹,
Philippe Thierry¹⁹,
Zhengji Zhao¹⁷,
Rahulkumar Gayatri¹⁷,
Hongzhang Shan¹⁸,
Leonid Oliker¹⁸,
Jack Deslippe¹⁷,
Ron Green²⁰ &
…
Samuel Williams¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10876))

Included in the following conference series:

International Conference on High Performance Computing

2141 Accesses
13 Citations

Abstract

With energy-efficient architectures, including accelerators and many-core processors, gaining traction, application developers face the challenge of optimizing their applications for multiple hardware features including many-core parallelism, wide processing vector-units and on-chip high-bandwidth memory. In this paper, we discuss the development and utilization of a new application performance tool based on an extension of the classical roofline-model for simultaneously profiling multiple levels in the cache-memory hierarchy. This tool presents a powerful visual aid for the developer and can be used to frame the many-dimensional optimization problem in a tractable way. We show case studies of real scientific applications that have gained insights from the Integrated Roofline Model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Williams, S., et al.: CACM 52(4), 65–76 (2009)
Google Scholar
Ilic, A., et al.: IEEE Comput. Architect. Lett. 12(1), 21–24 (2013)
Article Google Scholar
Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907. IEEE, 17 July 2017
Google Scholar
Doerfler, D., et al.: Applying the roofline performance model to the intel xeon phi knights landing processor. In: ISC Workshops (2016)
Google Scholar
Intel Advisor Roofline. https://software.intel.com/en-us/articles/intel-advisor-roofline
Intel(r) Advisor Roofline Analysis. CodeProject, February 2017 https://www.codeproject.com/Articles/1169323/Intel-Advisor-Roofline-Analysis
How to use Intel Advisor Python. Intel Developer Zone, June 2017. https://software.intel.com/en-us/articles/how-to-use-the-intel-advisor-python-api
Koskela, T., et al.: Performance tuning of scientific codes with the roofline model. Tutorial in SC 2017 (2017). http://bit.ly/tut160, https://sc17.supercomputing.org/full-program/
Koskela, T., et al.: A practical approach to application performance tuning with the Roofline Model, Tutorial submitted to ISC 2018 (2018)
Google Scholar
Classical molecular dynamics proxy application, Exascale Co-Design Center for Materials in Extreme Environments. exmatex.org, https://github.com/ECP-copa/CoMD.git
Ku, S., et al.: Nuclear Fusion, vol. 49 no. 11, Article 115021 (2009)
Google Scholar
Koskela, T., Deslippe, J.: Optimizing fusion PIC code performance at scale on cori phase two. In: Kunkel, J.M., Yokota, R., Taufer, M., Shalf, J. (eds.) ISC High Performance 2017. LNCS, vol. 10524, pp. 430–440. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67630-2_32
Chapter Google Scholar
https://software.intel.com/en-us/articles/intel-xeon-processor-scalable-family-technical-overview
Kresse, G., Furthmüller, J.: Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mat. Sci. 6, 15 (1996)
Article Google Scholar
http://www.vasp.at/
Wende, F., Marsman, M., Zhao, Z., Kim, J.: Porting VASP from MPI to MPI+OpenMP [SIMD]. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 107–122. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_8
Chapter Google Scholar
Shan, H., et al.: Parallel implementation and performance optimization of the configuration-interaction method. In: Supercomputing (SC) (2015)
Google Scholar
Johansen, H., et al.: Toward exascale earthquake ground motion simulations for near-fault engineering analysis. Comput. Sci. Eng. 19(5), 27 (2017)
Article Google Scholar
Mohd-Yusof, J.: CoDesign Molecular Dynamics (CoMD) Proxy App, LA-UR-12-21782, Los Alamos National Lab (2012)
Google Scholar
Cicotti, P., et al.: An evaluation of threaded models for a classical MD proxy application. In: 2014 Hardware-Software Co-Design for High Performance Computing, New Orleans, LA, pp. 41–48 (2014). https://doi.org/10.1109/Co-HPC.2014.6
Adedoyin, A.: A Case Study on Software Modernizationusing CoMD - A Molecular Dynamics Proxy Application, LA-UR-17-22676, Los Alamos National Lab (2017)
Google Scholar
Gunter, D., Adedoyin, A.: Kokkos Port of CoMD Mini-App, DOE COE Performance Portability Meeting (2017)
Google Scholar
Germann, T.C., et al.: 369 Tflop-s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’. Concurrency Comput. Pract. Experience 21(17), 2143–2159 (2009)
Article Google Scholar
https://berkeleygw.org
https://github.com/cyanguwa/BerkeleyGW-GPP
Soininen, J.A., et al.: Electron self-energy calculation using a general multi-pole approximation. J. Phys. Condensed Matter 15(17), 2573 (2003)
Article Google Scholar
Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14390-8_64
Chapter Google Scholar
http://icl.cs.utk.edu/papi
https://github.com/RRZE-HPC/likwid
Culler, D., et al.: LogP: towards a realistic model of parallel computation. In: PPoPP (1993)
Article Google Scholar
Alexandrov, A., et al.: LogGP: incorporating long messages into the LogP model. JPDC 44(1), 71–79 (1997)
MathSciNet Google Scholar
Altaf, M.B., Wood, D.A.: LogCA: a performance model for hardware accelerators. In: ISCA (2017)
Google Scholar
Shende, S., Malony, A.: The TAU parallel performance system. IJHPCA 20(2), 287–311 (2005)
Google Scholar
Adhianto, L., et al.: HPCToolkit: performance measurement and analysis for supercomputers with node-level parallelism. In: Workshop on Node Level Parallelism for Large Scale Supercomputers (2008)
Google Scholar
http://docs.cray.com

Download references

Author information

Authors and Affiliations

NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Tuomas Koskela, Charlene Yang, Zhengji Zhao, Rahulkumar Gayatri & Jack Deslippe
CRD, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
Hongzhang Shan, Leonid Oliker & Samuel Williams
Intel Corporation, Santa Clara, USA
Zakhar Matveev, Roman Belenov & Philippe Thierry
Los Alamos National Laboratory, Los Alamos, USA
Adetokunbo Adedoyin & Ron Green
University of Helsinki, Helsinki, Finland
Tuomas Koskela
University of Turku, Turku, Finland
Tuomas Koskela

Authors

Tuomas Koskela
View author publications
You can also search for this author in PubMed Google Scholar
Zakhar Matveev
View author publications
You can also search for this author in PubMed Google Scholar
Charlene Yang
View author publications
You can also search for this author in PubMed Google Scholar
Adetokunbo Adedoyin
View author publications
You can also search for this author in PubMed Google Scholar
Roman Belenov
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Thierry
View author publications
You can also search for this author in PubMed Google Scholar
Zhengji Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Rahulkumar Gayatri
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Oliker
View author publications
You can also search for this author in PubMed Google Scholar
Jack Deslippe
View author publications
You can also search for this author in PubMed Google Scholar
Ron Green
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Williams
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tuomas Koskela .

Editor information

Editors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
University of Edinburgh, Edinburgh, United Kingdom
Michèle Weiland
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
David Keyes
Technische Universität München, Garching bei München, Germany
Carsten Trinitis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koskela, T. et al. (2018). A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization. In: Yokota, R., Weiland, M., Keyes, D., Trinitis, C. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 10876. Springer, Cham. https://doi.org/10.1007/978-3-319-92040-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-92040-5_12
Published: 29 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92039-9
Online ISBN: 978-3-319-92040-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics