Skip to main content

Automatic Discovery of Coarse-Grained Parallelism in Media Applications

  • Conference paper
Transactions on High-Performance Embedded Architectures and Compilers I

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

Abstract

With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Advanced Micro Devices. AMD Athlon 64 X2 dual-core product data sheet (May 2005)

    Google Scholar 

  2. ASTRÉE Static Analyzer. http://www.astree.ens.fr/.

  3. Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Weatherford, S.: Polaris: The next generation in parallelizing compilers. Technical Report 1375, University of Illinois at Urbana-Champaign (1994)

    Google Scholar 

  4. Byler, M., Davies, J.R.B., Huson, C., Leasure, B., Wolfe, M.: Multiple version loops. In: Proceedings of the 1987 International Conference on Parallel Processing, pp. 312–318 (1987)

    Google Scholar 

  5. Cell Project at IBM Research. http://www.research.ibm.com/cell/

  6. Gerlek, M.P., Stoltz, E., Wolfe, M.: Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form. ACM Transactions on Programming Languages and Systems 17(1), 85–122 (1995)

    Article  Google Scholar 

  7. Ghiya, R., Lavery, D.M., Sehr, D.C.: On the importance of points-to analysis and other memory disambiguation methods for C programs. In: Proceedings of the ACM SIGPLAN 2001 Conference on Programming Design and Implementation, pp. 47–58. ACM Press, New York (2001)

    Chapter  Google Scholar 

  8. Ghiya, R., Hendren, L.J.: Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in c. In: Symposium on Principles of Programming Languages, pp. 1–15 (1996)

    Google Scholar 

  9. Gupta, R., Pande, S., Psarris, K., Sarkar, V.: Compilation techniques for parallel systems. Parallel Computing 25(13-14), 1741–1783 (1999)

    Article  Google Scholar 

  10. Hall, M.W., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Lam, M.S.: Interprocedural parallelization analysis in SUIF. ACM Transactions on Programming Languages and Systems 27, 662–731 (2005)

    Article  Google Scholar 

  11. Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer 29(12), 84–89 (1996)

    Article  Google Scholar 

  12. Harrison, W.H.: Compiler analysis of the value ranges for variables. IEEE Transactions on Software Engineering 3(3), 243–250 (1977)

    Article  MATH  Google Scholar 

  13. Hasti, R., Horwitz, S.: Using static single assignment form to improve flow-insensitive pointer analysis. In: Proceedings of the ACM SIGPLAN ’98 Conference on Programming Language Design and Implementation, June 1998, pp. 97–105. ACM Press, New York (1998)

    Google Scholar 

  14. Hendren, L., Hummel, J., Nicolau, A.: Abstractions for recursive pointer data structures: Improving the analysis and transformation of imperative programs. In: Proceedings of the ACM SIGPLAN ’92 Conference on Programming Language Design and Implementation, June 1992, pp. 249–260. ACM Press, New York (1992)

    Google Scholar 

  15. Hendren, L.J., Nicolau, A.: Parallelizing programs with recursive data structures. IEEE Transactions on Parallel and Distributed System 1(1), 35–47 (1990)

    Article  Google Scholar 

  16. Hind, M.: Pointer analysis: Haven’t we solved this problem yet? In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 54–61. ACM Press, New York (2001)

    Chapter  Google Scholar 

  17. Hind, M., Pioli, A.: Evaluating the effectiveness of pointer alias analyses. Technical Report RC21510, IBM T. J. Watson Research Center (March 1999)

    Google Scholar 

  18. Independent JPEG Group. coderules.doc. Text file in zipped archive (1998), ftp://ftp.uu.net/graphics/jpeg/jpegsrc.v6b.tar.gz

  19. Intel Network Processors. http://www.intel.com/design/network/products/npfamily/

  20. Intel Pentium D Processor. http://www.intel.com/products/processor/pentium_D/index.htm

  21. Johnson, H.: Data flow analysis for ’intractable’ imbedded system software. In: Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, pp. 109–117 (1986)

    Google Scholar 

  22. Kennedy, K., Allen, R.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  23. Kennedy, K., McKinley, K.S., Tseng, C.: Interactive parallel programming using the ParaScope editor. IEEE Transactions on Parallel and Distributed Systems 2, 329–341 (1991)

    Article  Google Scholar 

  24. Kong, X., Klappholz, D., Psarris, K.: The I-Test: An improved dependence test for automatic parallelization and vectorization. IEEE Transactions on Parallel and Distributed Systems, Special Issue on Parallel Languages and Compilers 2(3), 342–349 (1991)

    Article  Google Scholar 

  25. MPEG Industry Forum. http://www.mpegif.org/

  26. Nystrom, E.M.: FULCRA Pointer Analysis Framework. PhD thesis, University of Illinois at Urbana-Champaign (2005)

    Google Scholar 

  27. Nystrom, E.M., Kim, H.-S., Hwu, W.W.: Importance of heap specialization in pointer analysis. In: Proceedings of ACM-SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, June 2004, pp. 43–48. ACM Press, New York (2004)

    Chapter  Google Scholar 

  28. Paek, Y., Hoeflinger, J., Padua, D.: Efficient and precise array access analysis. ACM Transactions on Programming Languages and Systems 24(1), 65–109 (2000)

    Article  Google Scholar 

  29. Player, J.: An evaluation of low-overhead partial flow-sensitivity. Master’s thesis, Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign (2005)

    Google Scholar 

  30. Power Mac G5. http://www.apple.com/powermac/

  31. Pugh, W.: The Omega Test: A fast and practical integer programming algorithm for dependence analysis. In: Proceedings of Supercomputing 1991, November 1991, pp. 4–13 (1991)

    Google Scholar 

  32. Sagiv, M., Reps, T., Wilhelm, R.: Solving shape-analysis problems in languages with destructive updating. In: Proceedings of the ACM Symposium on Programming Languages, January 1996, pp. 16–31. ACM Press, New York (1996)

    Google Scholar 

  33. Salamí, E., Valero, M.: Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications. ACM Transactions on Architecture and Code Optimization 2(2), 199–219 (2005)

    Article  Google Scholar 

  34. Sarkar, V., Hennessy, J.: Compile-time partitioning and scheduling of parallel programs. In: Proceedings of the ACM SIGPLAN 86 Symposium on Compiler Construction, June 1986, pp. 17–26. ACM Press, New York (1986)

    Google Scholar 

  35. Tu, P., Padua, D.: Gated SSA-based demand-driven symbolic analysis for parallelizing compilers. In: Proceedings of the 1995 International Conference on Supercomputing, pp. 414–423 (1995)

    Google Scholar 

  36. Voss, M., Eigenmann, R.: Dynamically adaptive parallel programs. In: Proceedings of the International Symposium on High Performance Computing, May 1999, pp. 109–120 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ryoo, S., Ueng, SZ., Rodrigues, C.I., Kidd, R.E., Frank, M.I., Hwu, Wm.W. (2007). Automatic Discovery of Coarse-Grained Parallelism in Media Applications. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71528-3_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71527-6

  • Online ISBN: 978-3-540-71528-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics