Abstract
Today multiple frameworks exist for elevating the task of writing programs for GPGPUs, which are massively data-parallel execution platforms. These are needed as writing correct and high-performing applications for GPGPUs is notoriously difficult due to the intricacies of the underlying architecture. However, the existing frameworks lack a formal foundation that makes them difficult to use together with formal verification, testing, and design space exploration. We present in this chapter a novel software synthesis tool—called f2cc—which is capable of generating efficient GPGPU code from abstract formal models based on the synchronous model of computation. These models can be built using high-level modeling methodologies that hide low-level architecture details from the developer. The correctness of the tool has been experimentally validated on models derived from two applications. The experiments also demonstrate that the synthesized GPGPU code yielded a 28× speedup when executed on a graphics card with 96 cores and compared against a sequential version that uses only the CPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Source code is available at http://forsyde.ict.kth.se/trac/wiki/ForSyDe/f2cc.
References
Attarzadeh Niaki, S.H., Jakobsen, M.K., Sulonen, T., Sander, I.: Formal heterogeneous system modeling with SystemC. In: Forum on Specification and Design Languages, FDL 2012, pp. 160–167, Vienna, Austria, September 2012
Bell, N., Hoberock, J.: Thrust: A productivity-oriented library for cuda. In: Wen-mei, W.H. (ed.) GPU Computing Gems, Jade edition, Chapter 26, pp. 356–371. Morgan Kaufmann, Los Altos, CA (2011)
Benveniste, A., Berry, G.: The synchronous approach to reactive and real-time systems. Proc. IEEE 79(9), 1270–1280 (1991)
Berry, G., Cosserat, L.: The ESTEREL synchronous programming language and its mathematical semantics. In: Brookes, S., Roscoe, A., Winskel, G. (eds.) Seminar on Concurrency. Lecture Notes in Computer Science, vol. 197, pp. 389–448. Springer, Berlin (1985)
Brandes, U., Eiglsperger, M., Lerner, J.: GraphML Primer (June 2004). http://graphml.graphdrawing.org/primer/graphml-primer.html (last visited 2014-05-19).
Chackravarty, M.M.T., Keller, G., Lee, S., McDonell, T.L., Grover, V.: Accelerating haskell array codes with multicore GPUs. In: Proceedings of the 6th Workshop on Declarative Aspects of Multicore Programming (DAMP’11), pp. 3–14 (2011)
Dastgeer, U., Kessler, C.W., Thibault, S.: Flexible runtime support for efficient skeleton programming on hybrid systems. In: Proceedings of the International Conference on Parallel Programming (ParCo’11), Heraklion, Greece (2011)
Edwards, S., Lavagno, L., Lee, E.A., Sangiovanni-Vincentelli, A.: Design of embedded systems: formal models, validation, and synthesis. Proc. IEEE 85, 366–387 (1997)
Garland, M., Kirk, D.B.: Understanding throughput-oriented architectures. Commun. ACM 53, 58–66 (2010)
Garland, M., Le Grand, S., Nickolls, J., Anderson, J., Hardwick, J., Morton, S., Phillips, E., Zhang, Y., Volkov, V.: Parallel computation experiences with cuda. IEEE Micro 28, 13–27 (2008)
Halbwachs, N., Caspi, P., Raymond, P., Pilaud, D.: The synchronous dataflow programming language LUSTRE. Proc. IEEE 79(9), 1305–1320 (1991)
Han, T.D., Abdelrahman, T.S.: hiCUDA: high-level GPGPU programming. IEEE Trans. Parallel Distrib. Syst. 22, 78–90 (2011)
Hjort Blindell, G.: Synthesizing software from a ForSyDe model targeting GPGPUs. Master’s thesis, KTH Royal Institute of Technology, School of Information and Communication, Stockholm, Sweden (2012)
Kirk, D.B., Wen-mei, W.H.: Programming Massively Parallel Processors. Morgan Kaufmann, Los Altos, CA (2010)
Lee, E.A., Sangiovanni-Vincentelli, A.: A framework for comparing models of computation. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 17(12), 1217–1229 (1998)
Lee, S., Min, S.-J., Eigenmann, R.: OpenMP-to-CUDA: a compiler framework for automatic translation and optimization. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’09), vol. 44, pp. 101–110 (2009)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: Nvidia Tesla: a unified graphics and computing architecture. IEEE Micro. 30, 39–55 (2010)
Nickolls, J., Dally, W.J.: The GPU computing era. IEEE Micro 30, 56–69 (2010)
Sander, I., Jantsch, A.: System modeling and transformational design refinement in ForSyDe. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 23, 17–32 (2004)
Svensson, J., Claessen, K., Sheeran, M.: GPGPU kernel implementation and refinement using obsidian. In: Proceedings of the International Conference on Computational Science (ICCS’10), vol. 1, pp. 2065–2074 (2010)
Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: a language for streaming applications. In Proceedings of the 11th International Conference on Compiler Construction, CC ’02, pp. 179–196 (2002)
Ungureanu, G.: Automatic software synthesis from high-level ForSyDe models targeting massively parallel processors. Master’s thesis, KTH Royal Institute of Technology, School of Information and Communication, Stockholm, Sweden (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Blindell, G.H., Menne, C., Sander, I. (2016). Synthesizing Code for GPGPUs from Abstract Formal Models. In: Oppenheimer, F., Medina Pasaje, J. (eds) Languages, Design Methods, and Tools for Electronic System Design. Lecture Notes in Electrical Engineering, vol 361. Springer, Cham. https://doi.org/10.1007/978-3-319-24457-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-24457-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24455-6
Online ISBN: 978-3-319-24457-0
eBook Packages: EngineeringEngineering (R0)