Automatic Discovery of Coarse-Grained Parallelism in Media Applications

Ryoo, Shane; Ueng, Sain-Zee; Rodrigues, Christopher I.; Kidd, Robert E.; Frank, Matthew I.; Hwu, Wen-mei W.

doi:10.1007/978-3-540-71528-3_13

Shane Ryoo¹⁷,
Sain-Zee Ueng¹⁷,
Christopher I. Rodrigues¹⁷,
Robert E. Kidd¹⁷,
Matthew I. Frank¹⁷ &
…
Wen-mei W. Hwu¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

555 Accesses
5 Citations

Abstract

With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Advanced Micro Devices. AMD Athlon 64 X2 dual-core product data sheet (May 2005)
Google Scholar
ASTRÉE Static Analyzer. http://www.astree.ens.fr/.
Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: The next generation in parallelizing compilers. Technical Report 1375, University of Illinois at Urbana-Champaign (1994)
Google Scholar
Byler, M., Davies, J.R.B., Huson, C., Leasure, B., Wolfe, M.: Multiple version loops. In: Proceedings of the 1987 International Conference on Parallel Processing, pp. 312–318 (1987)
Google Scholar
Cell Project at IBM Research. http://www.research.ibm.com/cell/
Gerlek, M.P., Stoltz, E., Wolfe, M.: Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form. ACM Transactions on Programming Languages and Systems 17(1), 85–122 (1995)
Article Google Scholar
Ghiya, R., Lavery, D.M., Sehr, D.C.: On the importance of points-to analysis and other memory disambiguation methods for C programs. In: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Design and Implementation, pp. 47–58. ACM Press, New York (2001)
Chapter Google Scholar
Ghiya, R., Hendren, L.J.: Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in c. In: Symposium on Principles of Programming Languages, pp. 1–15 (1996)
Google Scholar
Gupta, R., Pande, S., Psarris, K., Sarkar, V.: Compilation techniques for parallel systems. Parallel Computing 25(13-14), 1741–1783 (1999)
Article Google Scholar
Hall, M.W., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Lam, M.S.: Interprocedural parallelization analysis in SUIF. ACM Transactions on Programming Languages and Systems 27, 662–731 (2005)
Article Google Scholar
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer 29(12), 84–89 (1996)
Article Google Scholar
Harrison, W.H.: Compiler analysis of the value ranges for variables. IEEE Transactions on Software Engineering 3(3), 243–250 (1977)
Article MATH Google Scholar
Hasti, R., Horwitz, S.: Using static single assignment form to improve flow-insensitive pointer analysis. In: Proceedings of the ACM SIGPLAN ’98 Conference on Programming Language Design and Implementation, June 1998, pp. 97–105. ACM Press, New York (1998)
Google Scholar
Hendren, L., Hummel, J., Nicolau, A.: Abstractions for recursive pointer data structures: Improving the analysis and transformation of imperative programs. In: Proceedings of the ACM SIGPLAN ’92 Conference on Programming Language Design and Implementation, June 1992, pp. 249–260. ACM Press, New York (1992)
Google Scholar
Hendren, L.J., Nicolau, A.: Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed System 1(1), 35–47 (1990)
Article Google Scholar
Hind, M.: Pointer analysis: Haven’t we solved this problem yet? In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 54–61. ACM Press, New York (2001)
Chapter Google Scholar
Hind, M., Pioli, A.: Evaluating the effectiveness of pointer alias analyses. Technical Report RC21510, IBM T. J. Watson Research Center (March 1999)
Google Scholar
Independent JPEG Group. coderules.doc. Text file in zipped archive (1998), ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz
Intel Network Processors. http://www.intel.com/design/network/products/npfamily/
Intel Pentium D Processor. http://www.intel.com/products/processor/pentium_D/index.htm
Johnson, H.: Data flow analysis for ’intractable’ imbedded system software. In: Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, pp. 109–117 (1986)
Google Scholar
Kennedy, K., Allen, R.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Kennedy, K., McKinley, K.S., Tseng, C.: Interactive parallel programming using the ParaScope editor. IEEE Transactions on Parallel and Distributed Systems 2, 329–341 (1991)
Article Google Scholar
Kong, X., Klappholz, D., Psarris, K.: The I-Test: An improved dependence test for automatic parallelization and vectorization. IEEE Transactions on Parallel and Distributed Systems, Special Issue on Parallel Languages and Compilers 2(3), 342–349 (1991)
Article Google Scholar
MPEG Industry Forum. http://www.mpegif.org/
Nystrom, E.M.: FULCRA Pointer Analysis Framework. PhD thesis, University of Illinois at Urbana-Champaign (2005)
Google Scholar
Nystrom, E.M., Kim, H.-S., Hwu, W.W.: Importance of heap specialization in pointer analysis. In: Proceedings of ACM-SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, June 2004, pp. 43–48. ACM Press, New York (2004)
Chapter Google Scholar
Paek, Y., Hoeflinger, J., Padua, D.: Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems 24(1), 65–109 (2000)
Article Google Scholar
Player, J.: An evaluation of low-overhead partial flow-sensitivity. Master’s thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign (2005)
Google Scholar
Power Mac G5. http://www.apple.com/powermac/
Pugh, W.: The Omega Test: A fast and practical integer programming algorithm for dependence analysis. In: Proceedings of Supercomputing 1991, November 1991, pp. 4–13 (1991)
Google Scholar
Sagiv, M., Reps, T., Wilhelm, R.: Solving shape-analysis problems in languages with destructive updating. In: Proceedings of the ACM Symposium on Programming Languages, January 1996, pp. 16–31. ACM Press, New York (1996)
Google Scholar
Salamí, E., Valero, M.: Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications. ACM Transactions on Architecture and Code Optimization 2(2), 199–219 (2005)
Article Google Scholar
Sarkar, V., Hennessy, J.: Compile-time partitioning and scheduling of parallel programs. In: Proceedings of the ACM SIGPLAN 86 Symposium on Compiler Construction, June 1986, pp. 17–26. ACM Press, New York (1986)
Google Scholar
Tu, P., Padua, D.: Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In: Proceedings of the 1995 International Conference on Supercomputing, pp. 414–423 (1995)
Google Scholar
Voss, M., Eigenmann, R.: Dynamically adaptive parallel programs. In: Proceedings of the International Symposium on High Performance Computing, May 1999, pp. 109–120 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Reliable and High-Performance Computing, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, USA
Shane Ryoo, Sain-Zee Ueng, Christopher I. Rodrigues, Robert E. Kidd, Matthew I. Frank & Wen-mei W. Hwu

Authors

Shane Ryoo
View author publications
You can also search for this author in PubMed Google Scholar
Sain-Zee Ueng
View author publications
You can also search for this author in PubMed Google Scholar
Christopher I. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Kidd
View author publications
You can also search for this author in PubMed Google Scholar
Matthew I. Frank
View author publications
You can also search for this author in PubMed Google Scholar
Wen-mei W. Hwu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryoo, S., Ueng, SZ., Rodrigues, C.I., Kidd, R.E., Frank, M.I., Hwu, Wm.W. (2007). Automatic Discovery of Coarse-Grained Parallelism in Media Applications. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-71528-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics