Beyond Do Loops: Data Transfer Generation with Convex Array Regions

Guelton, Serge; Amini, Mehdi; Creusillet, Béatrice

doi:10.1007/978-3-642-37658-0_17

Serge Guelton¹⁷,
Mehdi Amini^18,19 &
Béatrice Creusillet¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7760))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

1067 Accesses
4 Citations

Abstract

Automatic data transfer generation is a critical step for guided or automatic code generation for accelerators using distributed memories. Although good results have been achieved for loop nests, more complex control flows such as switches or while loops are generally not handled. This paper shows how to leverage the convex array regions abstraction to generate data transfers. The scope of this study ranges from inter-procedural analysis in simple loop nests with function calls, to inter-iteration data reuse optimization and arbitrary control flow in loop bodies. Generated transfers are approximated when an exact solution cannot be found. Array regions are also used to extend redundant load store elimination to array variables. The approach has been successfully applied to GPUs and domain-specific hardware accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alias, C., Darte, A., Plesco, A.: Program Analysis and Source-Level Communication Optimizations for High-Level Synthesis. Rapport de recherche RR-7648, INRIA (June 2011), http://hal.inria.fr/inria-00601822
Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-Level Synthesis for FPGA. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)
Google Scholar
Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-level Synthesis for FPGA. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pp. 1–10. ACM, New York (2012)
Google Scholar
Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: International Workshop on Languages and Compilers for Parallel Computing, LCPC (September 2011)
Google Scholar
Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4All: From convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)
Google Scholar
Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)
Chapter Google Scholar
Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., Bastoul, C.: The Polyhedral Model Is More Widely Applicable Than You Think. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 283–303. Springer, Heidelberg (2010)
Chapter Google Scholar
Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., Gauget, P.: Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In: Design Automation and Test in Europe, DATE, pp. 610–615. IEEE Computer Society Press (2008)
Google Scholar
Coelho, F.: Étude de la Compilation du High Performance Fortran. Ph.D. thesis, Université Paris VI (1993)
Google Scholar
Creusillet, B.: Array Region Analyses and Applications. Ph.D. thesis, MINES ParisTech. (1996)
Google Scholar
Creusillet, B., Irigoin, F.: Exact vs. Approximate Array Region Analyses. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1996. LNCS, vol. 1239, pp. 86–100. Springer, Heidelberg (1997)
Chapter Google Scholar
Creusillet, B., Irigoin, F.: Interprocedural array region analyses. International Journal of Parallel Programming 24(6), 513–546 (1996)
Google Scholar
Entreprise, C.: HMPP workbench, http://www.caps-entreprise.com/hmpp.html
Guelton, S.: Building Source-to-Source compilers for Heterogenous targets. Ph.D. thesis, Télécom Bretagne (2011)
Google Scholar
Guelton, S.: Transformations for memory size and distribution. [14], chap. 6
Google Scholar
Kandemir, M., Ramanujam, J., Irwin, M.J., Vijaykrishnan, N., Kadayif, I., Parikh, A.: A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. In: Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 243–260. IEEE (February 2004)
Google Scholar
Meister, B., Leung, A., Vasilache, N., Wohlford, D., Bastoul, C., Lethin, R.: Productivity via automatic code generation for PGAS platforms with the R-Stream compiler. In: Workshop on Asynchrony in the PGAS Programming Model, APGAS, Yorktown Heights, New York (June 2009)
Google Scholar
Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-Stream compiler. In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer (2011)
Google Scholar
NVIDIA, Cray, PGI, CAPS: The OpenACC Specification, version 1.0 (November 2011), http://www.openacc-standard.org/Downloads/OpenACC.1.0.pdf
Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: Conference on Supercomputing, pp. 4–13. ACM, New York (1991)
Google Scholar
Silkan: Par4All initiative for automatic parallelization (2010), http://www.par4all.org
Torquati, M., Vanneschi, M., Amini, M., Guelton, S., Keryell, R., Lanore, V., Pasquier, F.X., Barreteau, M., Barrère, R., Petrisor, C.T., Lenormand, É., Cantini, C., De Stefani, F.: An innovative compilation tool-chain for embedded multi-core architectures. In: Embedded World Conference (February 2012)
Google Scholar
Triolet, R., Feautrier, P., Irigoin, F.: Direct parallelization of call statements. In: ACM SIGPLAN Symposium on Compiler Construction, pp. 176–185 (1986)
Google Scholar
Ventroux, N., Sassolas, T., Guerre, A., Creusillet, B., Keryell, R.: SESAM/ Par4All: a tool for joint exploration of MPSoC architectures and dynamic dataflow code generation. In: Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO, pp. 9–16. ACM, New York (2012)
Chapter Google Scholar
Verdoolaege, S., Grosser, T.: Polyhedral Extraction Tool. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)
Google Scholar
Wolfe, M.: Implementing the PGI accelerator model. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU, pp. 43–50. ACM, New York (2010)
Chapter Google Scholar
Wolfe, M.: Optimizing Data Movement in the PGI Accelerator Programming Model (February 2011), http://www.pgroup.com/lit/articles/insider/v3n1a1.htm
Wonnacott, D., Pugh, W.: Nonlinear array dependence analysis. In: Proceedings of the Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Telecom Bretagne, Brest, France
Serge Guelton
MINES ParisTech/CRI, Fontainebleau, France
Mehdi Amini
HPC-Project, Meudon, France
Mehdi Amini & Béatrice Creusillet

Authors

Serge Guelton
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Amini
View author publications
You can also search for this author in PubMed Google Scholar
Béatrice Creusillet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Department of Computer Science and Engineering, Waseda University, 27 Waseda-machi, 162-0042, Shinjuku-ku, Tokyo, Japan
Hironori Kasahara & Keiji Kimura &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guelton, S., Amini, M., Creusillet, B. (2013). Beyond Do Loops: Data Transfer Generation with Convex Array Regions. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-37658-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37657-3
Online ISBN: 978-3-642-37658-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics