Skip to main content
Log in

A Vectorizing Compiler for Multimedia Extensions

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. R. B. Lee and M. D. Smith, Media processing: A new design target, IEEE Micro, pp. 6–10 (August 1996).

  2. T. M. Conte, P. K. Dubey, M. D. Jennings, R. B. Lee, A. Peleg, S. Rathnam, M. Schlansker, P. Song, and A. Wolfe, Challenges to combining general-purpose and multimedia processors, IEEE Micro, pp. 33–37 (December 1997).

  3. R. B. Lee, Subword parallelism, IEEE Micro (August 1997).

  4. Intel, Intel Programmers User Manual (1996).

  5. U. Weiser and A. Peleg, MMX technology extension to Intel architecture, IEEE Micro, pp. 42–50 (August 1996).

  6. M. Tremblay, VIS speeds new media processing, IEEE Micro, pp. 10–20 (August 1996).

  7. R. B. Lee, Subword parallelism with MAX-2, IEEE Micro, pp. 51–59 (August 1996).

  8. P. K. Dubey, Architectural and design implication of media processing, HIPC'98 Tutorial Lecture (1998).

  9. K. Kennedy and R. Allen, Automatic translation of FORTRAN programs to vector form, ACM Trans. Progr. Lang. Syst., 9(4):491–554 (October 1987).

    Google Scholar 

  10. H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers, Addison-Wesley, Reading, Massachusetts (1991).

    Google Scholar 

  11. M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Reading, Massachusetts (1996).

    Google Scholar 

  12. D. F. Bacon, S. L. Graham, and O. J. Sharp, Compiler transformation for high-performance computing, ACM Computing Surveys, 26(4):345–420 (December 1995).

    Google Scholar 

  13. Suif Compiler Group, SUIF Manual, Stanford University Compiler Group (1994).

  14. A. V. Aho, J. D. Ullman, and R. Sethi, Compilers, Principles, Techniques and Tools, Addison-Wesley, Reading, Massachusetts (1986).

    Google Scholar 

  15. J. Ferrante, K. J. Ottenstein, and J. D. Warren, The program dependence graph and its use in optimization, ACM Trans. Progr. Lang. Syst., 9(3):319–349 (July 1987).

    Google Scholar 

  16. U. Banerjee, Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Boston, Massachusetts (1988).

    Google Scholar 

  17. M. Burke and R. Cytron, Interprocedural dependence analysis and parallelization, Proc. ACM SIGPLAN Symp. on Compiler Construction, Palo Alta, California (July 1986).

  18. D. Kuck, Y. Muraoka, and S. Chen, On the number of operations simultaneously executable in FORTRAN-like programs and their resulting speedup, IEEE Trans. Computers, C-21(12):1293–1310 (December 1972).

    Google Scholar 

  19. G. Goff, K. Kennedy, and C-W. Tseng. Practical dependence testing, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation (PLDI-91), Toronto, Ontario, pp. 15–29 (June 1991).

  20. W. Pugh, A practical algorithm for exact array dependence analysis, Commun. ACM, 35(8):102–115 (August 1992).

    Google Scholar 

  21. Suif Compiler Group, An Overview of the SUIF Compiler System, Stanford University Compiler Group (1994).

  22. A. Darte and F. Vivien, On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops. Special Issue on Optimizing Compilers for Parallel Languages. J. Parallel Algorithms and Applications, 12(1-3):83–112 (1997).

    Google Scholar 

  23. J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, Proc. of the Tenth SIGACT-SIGPLAN Conf. Principles Progr. Lang. (POPL-83), Austin, Texas, pp. 177–189 (January 1983).

  24. K. Kennedy and K. S. McKinley, Loop distribution with arbitrary control flow, Proc. Supercomputing, New York, pp. 407–416 (November 1990).

  25. M. D. Smith, Extending SUIF for Machine-specific Optimizations, Technical Report, Harvard University, Cambridge, Massachusetts (July 1997).

    Google Scholar 

  26. R. Cytron and J. Ferrante, What's in a name?-or-the value of renaming for parallelism detection and storage allocation, Proc. Int'l. Conf. Parallel Processing, pp. 19–27 (1987).

  27. B. Underwood, Brennan's guide to inline assembly. http://www.rt66.com/~tbrennan/ djgpp/djgpp_asm.html.

  28. C. Young. The SUIF Control Flow Graph Library, Harvard University, Cambridge, Massachusetts (1996).

    Google Scholar 

  29. M. Thekaulp, Digital Video Processing, Prentice-Hall, Englewood Cliffs, New Jersey (1995).

    Google Scholar 

  30. D. DeVries, SUIF vectorizing compiler, IEEE Micro, pp. 51–59 (August 1996).

  31. K. Asanovic and D. Johnson, Torrent architecture manual, Technical Report, ICSI (1996).

  32. M. Lam and G. Cheong, An optimizer for multimedia instruction set, SUIF Workshop Preliminary Report.

  33. D. Brooks and M. Martonosi, Dynamically exploiting narrow width operands to improve processor power and performance, Proc. of the Fifth Int'l. Symp. on High Performance Computer Architecture, pp. 51–59 (January 1999).

  34. A. J. C. Bik, M. Girkar, and M. R. Haghighat, Incorporating Intel MMX technology into a Java JIT compiler, Sci. Progr., 7:167–184 (1999).

    Google Scholar 

  35. A. Krall and S. Lelait, Vectorizing techniques for VIS. Dagstuhl Seminar on Instruction and Loop-Level Parallelism, Report No. 237 (April 1997).

  36. S. S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, California (1997).

    Google Scholar 

  37. V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan, Software pipelining, ACM Computing Surveys, 27(3):367–432 (September 1995).

    Google Scholar 

  38. B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Microprogr. Workshop, Chatham, Massachusetts, pp. 183–198 (October 12-15, 1981).

  39. M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. SIGPLAN'88 Conf. Progr. Lang. Design and Implementation, Atlanta, Georgia, pp. 318–328 (June 22-24, 1988).

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sreraman, N., Govindarajan, R. A Vectorizing Compiler for Multimedia Extensions. International Journal of Parallel Programming 28, 363–400 (2000). https://doi.org/10.1023/A:1007559022013

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007559022013

Navigation