Abstract
With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Advanced Micro Devices. AMD Athlon 64 X2 dual-core product data sheet (May 2005)
ASTRÉE Static Analyzer. http://www.astree.ens.fr/.
Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: The next generation in parallelizing compilers. Technical Report 1375, University of Illinois at Urbana-Champaign (1994)
Byler, M., Davies, J.R.B., Huson, C., Leasure, B., Wolfe, M.: Multiple version loops. In: Proceedings of the 1987 International Conference on Parallel Processing, pp. 312–318 (1987)
Cell Project at IBM Research. http://www.research.ibm.com/cell/
Gerlek, M.P., Stoltz, E., Wolfe, M.: Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form. ACM Transactions on Programming Languages and Systems 17(1), 85–122 (1995)
Ghiya, R., Lavery, D.M., Sehr, D.C.: On the importance of points-to analysis and other memory disambiguation methods for C programs. In: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Design and Implementation, pp. 47–58. ACM Press, New York (2001)
Ghiya, R., Hendren, L.J.: Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in c. In: Symposium on Principles of Programming Languages, pp. 1–15 (1996)
Gupta, R., Pande, S., Psarris, K., Sarkar, V.: Compilation techniques for parallel systems. Parallel Computing 25(13-14), 1741–1783 (1999)
Hall, M.W., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Lam, M.S.: Interprocedural parallelization analysis in SUIF. ACM Transactions on Programming Languages and Systems 27, 662–731 (2005)
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer 29(12), 84–89 (1996)
Harrison, W.H.: Compiler analysis of the value ranges for variables. IEEE Transactions on Software Engineering 3(3), 243–250 (1977)
Hasti, R., Horwitz, S.: Using static single assignment form to improve flow-insensitive pointer analysis. In: Proceedings of the ACM SIGPLAN ’98 Conference on Programming Language Design and Implementation, June 1998, pp. 97–105. ACM Press, New York (1998)
Hendren, L., Hummel, J., Nicolau, A.: Abstractions for recursive pointer data structures: Improving the analysis and transformation of imperative programs. In: Proceedings of the ACM SIGPLAN ’92 Conference on Programming Language Design and Implementation, June 1992, pp. 249–260. ACM Press, New York (1992)
Hendren, L.J., Nicolau, A.: Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed System 1(1), 35–47 (1990)
Hind, M.: Pointer analysis: Haven’t we solved this problem yet? In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 54–61. ACM Press, New York (2001)
Hind, M., Pioli, A.: Evaluating the effectiveness of pointer alias analyses. Technical Report RC21510, IBM T. J. Watson Research Center (March 1999)
Independent JPEG Group. coderules.doc. Text file in zipped archive (1998), ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz
Intel Network Processors. http://www.intel.com/design/network/products/npfamily/
Intel Pentium D Processor. http://www.intel.com/products/processor/pentium_D/index.htm
Johnson, H.: Data flow analysis for ’intractable’ imbedded system software. In: Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, pp. 109–117 (1986)
Kennedy, K., Allen, R.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2002)
Kennedy, K., McKinley, K.S., Tseng, C.: Interactive parallel programming using the ParaScope editor. IEEE Transactions on Parallel and Distributed Systems 2, 329–341 (1991)
Kong, X., Klappholz, D., Psarris, K.: The I-Test: An improved dependence test for automatic parallelization and vectorization. IEEE Transactions on Parallel and Distributed Systems, Special Issue on Parallel Languages and Compilers 2(3), 342–349 (1991)
MPEG Industry Forum. http://www.mpegif.org/
Nystrom, E.M.: FULCRA Pointer Analysis Framework. PhD thesis, University of Illinois at Urbana-Champaign (2005)
Nystrom, E.M., Kim, H.-S., Hwu, W.W.: Importance of heap specialization in pointer analysis. In: Proceedings of ACM-SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, June 2004, pp. 43–48. ACM Press, New York (2004)
Paek, Y., Hoeflinger, J., Padua, D.: Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems 24(1), 65–109 (2000)
Player, J.: An evaluation of low-overhead partial flow-sensitivity. Master’s thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign (2005)
Power Mac G5. http://www.apple.com/powermac/
Pugh, W.: The Omega Test: A fast and practical integer programming algorithm for dependence analysis. In: Proceedings of Supercomputing 1991, November 1991, pp. 4–13 (1991)
Sagiv, M., Reps, T., Wilhelm, R.: Solving shape-analysis problems in languages with destructive updating. In: Proceedings of the ACM Symposium on Programming Languages, January 1996, pp. 16–31. ACM Press, New York (1996)
SalamÃ, E., Valero, M.: Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications. ACM Transactions on Architecture and Code Optimization 2(2), 199–219 (2005)
Sarkar, V., Hennessy, J.: Compile-time partitioning and scheduling of parallel programs. In: Proceedings of the ACM SIGPLAN 86 Symposium on Compiler Construction, June 1986, pp. 17–26. ACM Press, New York (1986)
Tu, P., Padua, D.: Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In: Proceedings of the 1995 International Conference on Supercomputing, pp. 414–423 (1995)
Voss, M., Eigenmann, R.: Dynamically adaptive parallel programs. In: Proceedings of the International Symposium on High Performance Computing, May 1999, pp. 109–120 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ryoo, S., Ueng, SZ., Rodrigues, C.I., Kidd, R.E., Frank, M.I., Hwu, Wm.W. (2007). Automatic Discovery of Coarse-Grained Parallelism in Media Applications. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-71528-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)