A Transformation-Based Approach to Developing High-Performance GPU Programs

Hagedorn, Bastian; Steuwer, Michel; Gorlatch, Sergei

doi:10.1007/978-3-319-74313-4_14

Bastian Hagedorn¹⁵,
Michel Steuwer¹⁶ &
Sergei Gorlatch¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10742))

Included in the following conference series:

International Andrei Ershov Memorial Conference on Perspectives of System Informatics

Abstract

We advocate the use of formal patterns and transformations for programming modern many-core processors like Graphics Processing Units (GPU), as an alternative to the currently used low-level, ad hoc programming approaches like CUDA or OpenCL. Our new contribution is introducing an intermediate level of low-level patterns in order to bridge the abstraction gap between popular high-level patterns (\({map}\), fold/reduce, \({zip}\), etc.) and imperative, executable code for many-cores. We define our low-level patterns based on the OpenCL programming model which is portable across parallel architectures of different vendors, and we introduce semantics-preserving rewrite rules that transform programs with high-level patterns into programs with low-level patterns, from which executable OpenCL programs are automatically generated. We show that program design decisions and optimizations, which are usually applied ad-hoc by experts, are systematically expressed in our approach as provably-correct transformations for high- and low-level patterns. We evaluate our approach by systematically deriving several differently optimized OpenCL implementations of parallel reduction that achieve performance competitive with OpenCL programs which are manually written and highly tuned by performance experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aldinucci, M., Danelutto, M., Kilpatrick, P., Torquati, M.: Fastflow: high-level and efficient streaming on multi-core. In: Programming Multi-core and Many-core Computing Systems. Wiley-Blackwell, Hoboken (2011)
Google Scholar
AMD: Bolt C++ Template Library
Google Scholar
Backus, J.: Can programming be liberated from the von Neumann style? A functional style and its algebra of programs. Commun. ACM 21(8), 613–641 (1978)
Article MathSciNet MATH Google Scholar
Bird, R.S.: Algebraic identities for program calculation. Comput. J. 32(2), 122–126 (1989)
Article MathSciNet Google Scholar
Burstall, R.M., Darlington, J.: A transformation system for developing recursive programs. J. ACM 24(1), 44–67 (1977)
Article MathSciNet MATH Google Scholar
Chakravarty, M., Keller, G., Lee, S., McDonell, T.L., Grover, V.: Accelerating Haskell array codes with multicore GPUs. In: DAMP, pp. 3–14. ACM (2011)
Google Scholar
Gorlatch, S., Cole, M.: Parallel skeletons. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4
Google Scholar
Harris, M., et al.: Optimizing parallel reduction in CUDA. NVIDIA Developer Technol. 2(4), 1–39 (2007)
Google Scholar
Holk, E., Byrd, W.E., Mahajan, N., Willcock, J., Chauhan, A., Lumsdaine, A.: Declarative parallel programming for GPUs. In: PARCO, pp. 297–304 (2011)
Google Scholar
Khronos OpenCL Working Group: The OpenCL Specification
Google Scholar
Kuchen, H.: A skeleton library. In: Monien, B., Feldmann, R. (eds.) Euro-Par 2002. LNCS, vol. 2400, pp. 620–629. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45706-2_86
Chapter Google Scholar
Nvidia: CUDA Basic Linear Algebra Subroutines (cuBLAS). Version 6.5
Google Scholar
Steuwer, M., Fensch, C., Lindley, S., Dubach, C.: Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance openCL code. In: ICFP, pp. 205–217. ACM (2015)
Google Scholar
Steuwer, M., Gorlatch, S.: High-level programming for medical imaging on multi-GPU systems using the skelCL library. In: Procedia Computer Science, ICCS, vol. 18, pp. 749–758. Elsevier (2013)
Google Scholar
Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL: a portable skeleton library for high-level GPU programming. In: HIPS @ IPDPS, pp. 1176–1182. IEEE (2011)
Google Scholar
Steuwer, M., Remmelg, T., Dubach, C.: Lift: a functional data-parallel IR for high-performance GPU code generation. In: CGO, pp. 74–85. ACM (2017)
Google Scholar
Svensson, J., Sheeran, M., Claessen, K.: Obsidian: a domain specific embedded language for parallel programming of graphics processors. In: Scholz, S.-B., Chitil, O. (eds.) IFL 2008. LNCS, vol. 5836, pp. 156–173. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24452-0_9
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the German Research Council (DFG) within the Cluster of Excellence CiM (University of Münster), by the German Ministry of Education and Research (BMBF) within the project HPC\(^2\)SE, and by a EuroLab-4-HPC collaboration. We thank Nvidia for their generous hardware donation used in our experiments.

Author information

Authors and Affiliations

University of Münster, Münster, Germany
Bastian Hagedorn & Sergei Gorlatch
University of Glasgow, Glasgow, UK
Michel Steuwer

Authors

Bastian Hagedorn
View author publications
You can also search for this author in PubMed Google Scholar
Michel Steuwer
View author publications
You can also search for this author in PubMed Google Scholar
Sergei Gorlatch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bastian Hagedorn .

Editor information

Editors and Affiliations

Ivannikov Institute for System Programming of RAS, Moscow, Russia
Alexander K. Petrenko
The University of Manchester, Manchester, United Kingdom
Andrei Voronkov

Appendices

Appendix

A Additional Rewrite Rules

B Proof of a Rewrite Rule

Rewrite rules are proved using equational reasoning. As an example we prove rule (25) which introduces layers in the computation hierarchy of a reduction: first a partial reduction is computed, followed by a reduction combining all temporary results.

Proof

(Reduce-Promotion Variant). Let n be a number divisible by m.

C Derived Low-Level Reduction Programs

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hagedorn, B., Steuwer, M., Gorlatch, S. (2018). A Transformation-Based Approach to Developing High-Performance GPU Programs. In: Petrenko, A., Voronkov, A. (eds) Perspectives of System Informatics. PSI 2017. Lecture Notes in Computer Science(), vol 10742. Springer, Cham. https://doi.org/10.1007/978-3-319-74313-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-74313-4_14
Published: 18 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74312-7
Online ISBN: 978-3-319-74313-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics