Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs

Serrano, Estefania; Ilic, Aleksandar; Sousa, Leonel; Garcia-Blas, Javier; Carretero, Jesus

doi:10.1007/978-3-030-02465-9_36

Estefania Serrano¹⁶,
Aleksandar Ilic¹⁷,
Leonel Sousa¹⁷,
Javier Garcia-Blas¹⁶ &
…
Jesus Carretero¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11203))

Included in the following conference series:

International Conference on High Performance Computing

1321 Accesses

Abstract

When optimizing or porting applications to new architectures, a preliminary characterization is necessary to exploit the maximum computing power of the employed devices. Profiling tools are available for numerous architectures and programming models, making it easier to spot possible bottlenecks. However, for a better interpretation of the collected results, current profilers rely on insightful performance models. In this paper, we describe the Cache Aware Roofline Model (CARM) and tools for its generation to enable the performance characterization of GPU architectures and workloads. We use CARM to characterize two kernels that are part of a 3D iterative reconstruction application for Computed Tomography (CT). These two kernels take most of the execution time of the whole method, being therefore suitable for a deeper analysis. By exploring the model and the methodology proposed, the overall performance of the kernels has been improved up to two times compared to the previous implementations.

This work has been partially supported through grant TIN2016-79637-P “Towards unification of HPC and Big Data Paradigms” from the Spanish Ministry of Economy and Competitiveness, the COST Action IC1305 “Network for Sustainable Ultrascale Computing Platforms” (NESUS), grant FPU14/03875 from the Spanish Ministry of Education and by the Portuguese national funds through FCT under the projects UID/CEC/50021/2013 and LISBOA-01-0145-FEDER-031901 (PTDC/CCI-COM/31901/2017).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Using the compilation flag –maxrregcount=32.

References

Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Ilic, A., Pratas, F., Sousa, L.: Cache-aware Roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)
Article Google Scholar
Ilic, A., Pratas, F., Sousa, L.: Beyond the roofline: cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Comput. 66(1), 52–58 (2017)
Article MathSciNet Google Scholar
Shinsel, A.: Intel Advisor Roofline (2017). https://software.intel.com/en-us/articles/intel-advisor-roofline. Accessed 02 Mar 2017
Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing and Simulation (HPCS), pp. 898–907. IEEE (2017)
Google Scholar
Lopes, A., Pratas, F., Sousa, L., Ilic, A.: Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 259–268. IEEE (2017)
Google Scholar
Feldkamp, L., Davis, L., Kress, J.: Practical cone-beam algorithm. JOSA A 1(6), 612–619 (1984)
Article Google Scholar
de Molina, C., Serrano, E., Garcia-Blas, J., Carretero, J., Desco, M., Abella, M.: Gpu-accelerated iterative reconstruction for limited-data tomography in CBCT systems. BMC Bioinform. 19(1), 171 (2018)
Article Google Scholar
Abella, M., et al.: FUX-Sim: implementation of a fast universal simulation/reconstruction framework for X-ray systems. PLOS ONE 12(7), 1–22 (2017)
Article Google Scholar
Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)
Google Scholar
Dongarra, J., et al.: Performance application programming interface
Google Scholar
NVidia, “NVidia Profiler.” http://docs.nvidia.com/cuda/profiler-users-guide/index.html
Kim, K.-H., Kim, K., Park, Q.-H.: Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput. Phys. Commun. 182(6), 1201–1207 (2011)
Article Google Scholar
Carvalho, P., Drummond, L.M.A., Bentes, C., Clua, E., Cataldo, E., Marzulo, L.A.J.: Analysis and characterization of GPU benchmarks for kernel concurrency efficiency. In: Mocskos, E., Nesmachnow, S. (eds.) CARLA 2017. CCIS, vol. 796, pp. 71–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73353-1_5
Chapter Google Scholar
Ryoo, J.H., Quirem, S.J., Lebeane, M., Panda, R., Song, S., John, L.K.: GPGPU benchmark suites: how well do they sample the performance spectrum? In: 2015 44th International Conference on Parallel Processing (ICPP), pp. 320–329. IEEE (2015)
Google Scholar
Che, S., Skadron, K.: BenchFriend: correlating the performance of GPU benchmarks. Int. J. High Perform. Comput. Appl. 28(2), 238–250 (2014)
Article Google Scholar
Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)
Article Google Scholar
Jia, H., Zhang, Y., Long, G., Xu, J., Yan, S., Li, Y.: GPURoofline: a model for guiding performance optimizations on GPUs. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 920–932. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_90
Chapter Google Scholar
Konstantinidis, E., Cotronis, Y.: A practical performance model for compute and memory bound GPU kernels. In: 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 651–658. IEEE (2015)
Google Scholar
Nugteren, C., van den Braak, G.-J., Corporaal, H.: Roofline-aware DVFS for GPUs. In: Proceedings of International Workshop on Adaptive Self-tuning Computing Systems, p. 8. ACM (2014)
Google Scholar
Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 235–246. IEEE (2010)
Google Scholar
Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University Carlos III of Madrid, Madrid, Spain
Estefania Serrano, Javier Garcia-Blas & Jesus Carretero
INESC-ID, Instituto Superior Tecnico, University of Lisbon, Lisbon, Portugal
Aleksandar Ilic & Leonel Sousa

Authors

Estefania Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandar Ilic
View author publications
You can also search for this author in PubMed Google Scholar
Leonel Sousa
View author publications
You can also search for this author in PubMed Google Scholar
Javier Garcia-Blas
View author publications
You can also search for this author in PubMed Google Scholar
Jesus Carretero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Estefania Serrano .

Editor information

Editors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
University of Edinburgh, Edinburgh, UK
Michèle Weiland
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
John Shalf
Swiss National Supercomputing Centre, Lugano, Switzerland
Sadaf Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Serrano, E., Ilic, A., Sousa, L., Garcia-Blas, J., Carretero, J. (2018). Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-02465-9_36
Published: 25 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics