Skip to main content

Beyond Do Loops: Data Transfer Generation with Convex Array Regions

  • Conference paper
Languages and Compilers for Parallel Computing (LCPC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7760))

Abstract

Automatic data transfer generation is a critical step for guided or automatic code generation for accelerators using distributed memories. Although good results have been achieved for loop nests, more complex control flows such as switches or while loops are generally not handled. This paper shows how to leverage the convex array regions abstraction to generate data transfers. The scope of this study ranges from inter-procedural analysis in simple loop nests with function calls, to inter-iteration data reuse optimization and arbitrary control flow in loop bodies. Generated transfers are approximated when an exact solution cannot be found. Array regions are also used to extend redundant load store elimination to array variables. The approach has been successfully applied to GPUs and domain-specific hardware accelerators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alias, C., Darte, A., Plesco, A.: Program Analysis and Source-Level Communication Optimizations for High-Level Synthesis. Rapport de recherche RR-7648, INRIA (June 2011), http://hal.inria.fr/inria-00601822

  2. Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-Level Synthesis for FPGA. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)

    Google Scholar 

  3. Alias, C., Darte, A., Plesco, A.: Optimizing Remote Accesses for Offloaded Kernels: Application to High-level Synthesis for FPGA. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pp. 1–10. ACM, New York (2012)

    Google Scholar 

  4. Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: International Workshop on Languages and Compilers for Parallel Computing, LCPC (September 2011)

    Google Scholar 

  5. Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., PĂ©an, G., Villalon, P.: Par4All: From convex array regions to heterogeneous computing. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)

    Google Scholar 

  6. Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Benabderrahmane, M.-W., Pouchet, L.-N., Cohen, A., Bastoul, C.: The Polyhedral Model Is More Widely Applicable Than You Think. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 283–303. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Bonnot, P., Lemonnier, F., Edelin, G., Gaillat, G., Ruch, O., Gauget, P.: Definition and SIMD implementation of a multi-processing architecture approach on FPGA. In: Design Automation and Test in Europe, DATE, pp. 610–615. IEEE Computer Society Press (2008)

    Google Scholar 

  9. Coelho, F.: Étude de la Compilation du High Performance Fortran. Ph.D. thesis, Université Paris VI (1993)

    Google Scholar 

  10. Creusillet, B.: Array Region Analyses and Applications. Ph.D. thesis, MINES ParisTech. (1996)

    Google Scholar 

  11. Creusillet, B., Irigoin, F.: Exact vs. Approximate Array Region Analyses. In: Sehr, D., Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1996. LNCS, vol. 1239, pp. 86–100. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  12. Creusillet, B., Irigoin, F.: Interprocedural array region analyses. International Journal of Parallel Programming 24(6), 513–546 (1996)

    Google Scholar 

  13. Entreprise, C.: HMPP workbench, http://www.caps-entreprise.com/hmpp.html

  14. Guelton, S.: Building Source-to-Source compilers for Heterogenous targets. Ph.D. thesis, Télécom Bretagne (2011)

    Google Scholar 

  15. Guelton, S.: Transformations for memory size and distribution. [14], chap. 6

    Google Scholar 

  16. Kandemir, M., Ramanujam, J., Irwin, M.J., Vijaykrishnan, N., Kadayif, I., Parikh, A.: A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. In: Computer-Aided Design of Integrated Circuits and Systems, vol. 23, pp. 243–260. IEEE (February 2004)

    Google Scholar 

  17. Meister, B., Leung, A., Vasilache, N., Wohlford, D., Bastoul, C., Lethin, R.: Productivity via automatic code generation for PGAS platforms with the R-Stream compiler. In: Workshop on Asynchrony in the PGAS Programming Model, APGAS, Yorktown Heights, New York (June 2009)

    Google Scholar 

  18. Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-Stream compiler. In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer (2011)

    Google Scholar 

  19. NVIDIA, Cray, PGI, CAPS: The OpenACC Specification, version 1.0 (November 2011), http://www.openacc-standard.org/Downloads/OpenACC.1.0.pdf

  20. Pugh, W.: The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: Conference on Supercomputing, pp. 4–13. ACM, New York (1991)

    Google Scholar 

  21. Silkan: Par4All initiative for automatic parallelization (2010), http://www.par4all.org

  22. Torquati, M., Vanneschi, M., Amini, M., Guelton, S., Keryell, R., Lanore, V., Pasquier, F.X., Barreteau, M., Barrère, R., Petrisor, C.T., Lenormand, É., Cantini, C., De Stefani, F.: An innovative compilation tool-chain for embedded multi-core architectures. In: Embedded World Conference (February 2012)

    Google Scholar 

  23. Triolet, R., Feautrier, P., Irigoin, F.: Direct parallelization of call statements. In: ACM SIGPLAN Symposium on Compiler Construction, pp. 176–185 (1986)

    Google Scholar 

  24. Ventroux, N., Sassolas, T., Guerre, A., Creusillet, B., Keryell, R.: SESAM/ Par4All: a tool for joint exploration of MPSoC architectures and dynamic dataflow code generation. In: Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO, pp. 9–16. ACM, New York (2012)

    Chapter  Google Scholar 

  25. Verdoolaege, S., Grosser, T.: Polyhedral Extraction Tool. In: 2nd International Workshop on Polyhedral Compilation Techniques, Impact (January 2012)

    Google Scholar 

  26. Wolfe, M.: Implementing the PGI accelerator model. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU, pp. 43–50. ACM, New York (2010)

    Chapter  Google Scholar 

  27. Wolfe, M.: Optimizing Data Movement in the PGI Accelerator Programming Model (February 2011), http://www.pgroup.com/lit/articles/insider/v3n1a1.htm

  28. Wonnacott, D., Pugh, W.: Nonlinear array dependence analysis. In: Proceedings of the Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guelton, S., Amini, M., Creusillet, B. (2013). Beyond Do Loops: Data Transfer Generation with Convex Array Regions. In: Kasahara, H., Kimura, K. (eds) Languages and Compilers for Parallel Computing. LCPC 2012. Lecture Notes in Computer Science, vol 7760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37658-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37658-0_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37657-3

  • Online ISBN: 978-3-642-37658-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics