BFCA+: automatic synthesis of parallel code with TLS capabilities

Aldea, Sergio; Llanos, Diego R.; Gonzalez-Escribano, Arturo

doi:10.1007/s11227-016-1623-0

BFCA+: automatic synthesis of parallel code with TLS capabilities

Published: 21 January 2016

Volume 73, pages 88–99, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Sergio Aldea¹,
Diego R. Llanos¹ &
Arturo Gonzalez-Escribano¹

204 Accesses
Explore all metrics

Abstract

Parallelization of sequential applications requires extracting information about the loops and how their variables are accessed, and afterwards, augmenting the source code with extra code depending on such information. In this paper we propose a framework that avoids such an error-prone, time-consuming task. Our solution leverages the compile-time information extracted from the source code to classify all variables used inside each loop according to their accesses. Then, our system, called BFCA+, automatically instruments the source code with the necessary OpenMP directives and clauses to allow its parallel execution, using the standard shared and private clauses for variable classification. The framework is also capable of instrumenting loops for speculative parallelization, with the help of the ATLaS runtime system, that defines a new speculative clause to point out those variables that may lead to a dependency violation. As a result, the target loop is guaranteed to correctly run in parallel, ensuring that its execution follows sequential semantics even in the presence of dependency violations. Our experimental evaluation shows that the framework not only saves development time, but also leads to a faster code than the one manually parallelized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of automated unit test generation for Python

Article Open access 31 January 2023

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Notes

Only well-formed for loops where the number of iterations are known at the beginning of the loop can be parallelized by the ATLaS framework. See [7] for additional details.
The current version of BFCA\(+\) only transforms a single loop of the application to avoid the transformation of two nested loops, a situation not allowed by the ATLaS runtime system. We expect to overcome this limitation in the near future.
Note that the manual transformation process is included to figure out which loop would be more profitable to be parallelized and then perform an in-depth analysis of the data elements being accessed inside the loop. This is an error-prone, time-consuming process that, for the benchmarks considered, took between 10 and 30 h.

References

Aldea S, Llanos DR, Gonzalez-Escribano A (2012) Using SPEC CPU2006 to evaluate the secuential and parallel code generated by commercial and open-source compilers. J Supercomput 59(1):486–498
Article Google Scholar
Cintra M, Llanos DR (2003) Toward efficient and robust software speculative parallelization on multiprocessors. In: PPoPP’03 proceedings, pp 13–24
Dang FH, Yu H, Rauchwerger L (2002) The R-LRPD test: speculative parallelization of partially parallel loops. In: IPDPS’02 proceedings, pp 20–29
Aldea S, Llanos DR, Gonzalez-Escribano A (2012) Support for thread-level speculation into OpenMP. In: IWOMP’12 proceedings, pp 275–278
Aldea S, Llanos DR, Gonzalez-Escribano A (2014) The BonaFide C analyzer: automatic loop-level characterization and coverage measurement. J Supercomput 68(3):1378–1401
Article Google Scholar
Aldea S, Estebanez A, Llanos DR, Gonzalez-Escribano A (2014) A new GCC plugin-based compiler pass to add support for thread-level speculation into OpenMP. In: EuroPar’14 proceedings, LNCS 8632, Springer, pp 234–245
Aldea S et al (2015) An OpenMP extension that supports thread-level speculation. IEEE Trans Partial Distrib Syst (to appear)
Oancea CE, Mycroft A, Harris T (2009) A lightweight in-place implementation for software thread-level speculation. In: SPAA 2009 proceedings, pp 223–232. ACM, New York
Yiapanis P et al (2013) Optimizing software runtime systems for speculative parallelization. ACM Trans Arch Code Optim (TACO) 9(4):39
Google Scholar
Adhianto Laksono et al (2000) Tools for OpenMP application development: the POST project. Concurr Pract Exp 12:1177–1191
Article MATH Google Scholar
Ierotheou Cos S et al (2005) Generating OpenMP code using an interactive parallelization environment. Parallel Comput 31(10–12):999–1012
Article Google Scholar
Jin Haoqiang et al (2003) Automatic multilevel parallelization using OpenMP. J Sci Program EWOMP’11 11(2):177–190 (2)
Google Scholar
Johnson S et al (2005) The ParaWise expert assistant—widening accessibility to efficient and scalable tool generated OpenMP code. In: Proceedings of the WOMPAT’04, pp 67–82
Bondhugula, Uday et al (2008) A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI’08 proceedings, pp 101–113
Trifunovic K et al (2010) Graphite two years after: first lessons learned from real-world polyhedral compilation. In: GROW’10 proceedings, pp 4–19
Grosser T et al (2011) Polly—polyhedral optimization in LLVM. In: IMPACT’11 workshop proceedings, Charmonix, France, pp 1–6
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis transformation. In: CGO’04 proceedings, pp 75–86 (2004)
Amini M et al (2012) Par4All: from convex array regions to heterogeneous computing. In: IMPACT’12 HiPEAC workshop proceedings, Paris, France, pp 1–2
Guelton S (2011) Building source-to-source compilers for heterogeneous targets. PhD thesis, Universit europenne de Bretagne, Rennes (2011)
Amini M et al (2011) PIPS is not (just) polyhedral software. In: IMPACT’11 workshop proceedings, Charmonix, France, pp 7–12
Liao C et al (2008) Automatic parallelization using OpenMP based on STL semantics. In: Languages and compilers for parallel computing (LCPC)
Dave Chirag et al (2009) Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput 42(12):36–42
Article Google Scholar
Taillard J, Guyomarch F, Dekeyser JL (2008) A graphical framework for high performance computing using an MDE approach. In: PDP’08 proceedings, pp 165–173
Nardi L et al (2012) YAO: a generator of parallel code for variational data assimilation applications. In: HPCC’12 proceedings, pp 224–232
Clarkson KL, Mehlhorn K, Seidel R (1993) Four results on randomized incremental constructions. Comput Geom Theory Appl 3(4):185–212
Article MathSciNet MATH Google Scholar
Devroye L, Mücke EP, Zhu B (1998) A note on point location in Delaunay triangulations of random points. Algorithmica 22:477–482
Article MathSciNet MATH Google Scholar
Welzl E (1991) Smallest enclosing disks (balls and ellipsoids). In: New results and new trends in computer science. LNCS, vol 555. Springer, New York, pp 359–370
Barnes JE (1997) TREE. Institute for Astronomy, University of Hawaii. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/

Download references

Acknowledgments

This research has been partially supported by MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS).

Author information

Authors and Affiliations

ETS Ingeniera Informtica, Universidad de Valladolid, Paseo Beln 15, 47011, Valladolid, Spain
Sergio Aldea, Diego R. Llanos & Arturo Gonzalez-Escribano

Authors

Sergio Aldea
View author publications
You can also search for this author in PubMed Google Scholar
Diego R. Llanos
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Gonzalez-Escribano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego R. Llanos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aldea, S., Llanos, D.R. & Gonzalez-Escribano, A. BFCA+: automatic synthesis of parallel code with TLS capabilities. J Supercomput 73, 88–99 (2017). https://doi.org/10.1007/s11227-016-1623-0

Download citation

Published: 21 January 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11227-016-1623-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

BFCA+: automatic synthesis of parallel code with TLS capabilities

Abstract

Access this article

Similar content being viewed by others

An empirical study of automated unit test generation for Python

Shared Memory Parallelism in Modern C++ and HPX

Efficient High-Level Programming in Plain Java

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

BFCA+: automatic synthesis of parallel code with TLS capabilities

Abstract

Access this article

Similar content being viewed by others

An empirical study of automated unit test generation for Python

Shared Memory Parallelism in Modern C++ and HPX

Efficient High-Level Programming in Plain Java

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation