The Superblock: An Effective Technique for VLIW and Superscalar Compilation

Hwu, Wen-Mei W.; Mahlke, Scott A.; Chen, William Y.; Chang, Pohua P.; Warter, Nancy J.; Bringmann, Roger A.; Ouellette, Roland G.; Hank, Richard E.; Kiyohara, Tokuzo; Haab, Grant E.; Holm, John G.; Lavery, Daniel M.

doi:10.1007/978-1-4615-3200-2_7

Wen-Mei W. Hwu²,
Scott A. Mahlke²,
William Y. Chen²,
Pohua P. Chang²,
Nancy J. Warter²,
Roger A. Bringmann²,
Roland G. Ouellette²,
Richard E. Hank²,
Tokuzo Kiyohara²,
Grant E. Haab²,
John G. Holm² &
…
Daniel M. Lavery²

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 235))

179 Accesses
23 Citations

Abstract

A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to effectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called the superblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho, A., Sethi, R., and Ullman, J. 1986.Compilers: Principles Techniques and Tools.Addison-Wesley, Reading, Mass.
Google Scholar
Aiken, A., and Nicolau, A. 1988. A development environment for horizontal microcode.IEEE Trans. Software Engineering14 (May): 584–594.
Article Google Scholar
Bernstein, D., and Rodeh, M. 1991. Global instruction scheduling for superscalar machines. In Proc.
Google Scholar
Acm Sigplan 1991 Conf. on Programming Language Design and Implementation(June), pp. 241–255.
Google Scholar
Chaitin, G.J. 1982. Register allocation and spilling via graph coloring. InProc. ACM SIGPLAN 82 Symp. on Compiler Construction(June), pp. 98–105.
Google Scholar
Chang, P.P., and Hwu, W.W. 1988. Trace selection for compiling large C application programs to microcode. In Proc.21st Internat. Workshop on Microprogramming and Microarchitecture(Nov.), pp. 188–198.
Google Scholar
Chang, P.P., Mahlke, S.A., and Hwu, W.W. 1991. Using profile information to assist classic code optimizations.Software Practice and Experience21, 12 (Dec.): 1301–1321.
Article Google Scholar
Chang, P.P., Mahlke, S.A., Chen, W.Y., Waiter, N.J., and Hwu, W.W. 1991. IMPACT: An architectural framework for multiple-instruction-issue processors. In Proc.18th Internat. Symp. on Comp. Architecture(May), pp. 266–275.
Google Scholar
Chen, W.Y., Chang, P.P., Conte, T.M., and Hwu, W.W. 1991. The effect of code expanding optimizations on instruction cache design. Tech. Rept. CRHC-91–17, Center for Reliable and High-Performance Computing, Univ. of Ill., Urbana, Ill.
Google Scholar
Chow, F.C., and Hennessy, J.L. 1990. The priority-based coloring approach to register allocation.ACM Trans. Programming Languages and Systems12 (Oct.): 501–536.
Article Google Scholar
Colwell, R.P., Nix, R.P., O’Donnell, J.J., Papworth, D.B., and Rodman, P.K. 1987. A VLIW architecture for a trace scheduling compiler. In Proc.2nd Internat. Conf. on Architectural Support for Programming Languages and Operating Systems(Apr.), pp. 180–192.
Google Scholar
Ellis, J. 1986.Bulldog: A Compiler for VLIW Architectures.MIT Press, Cambridge, Mass.
Google Scholar
Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans.Comps., C-30, 7 (July): 478–490.
Google Scholar
Gupta, R., and Soffa, M.L. 1990. Region scheduling: An approach for detecting and redistributing parallelism.IEEE Trans. Software Engineering16 (Apr.): 421–431.
Article Google Scholar
Horst, R.W., Harris, R.L., and Jardine, R.L. 1990. Multiple instruction issue in the NonStop Cyclone processor. In Proc.17th Internat. Symp. on Computer Architecture(May), pp. 216–226.
Google Scholar
Hwu, W.W., and Chang, P.P. 1989a. Achieving high instruction cache performance with an optimizing compiler. In Proc., 16th htternat. Symp. on Comp. Architecture (May), pp. 242–251.
Google Scholar
Hwu, W.W., and Chang, P.P. 1989b. Inline function expansion for compiling realistic C programs. InProc.
Google Scholar
Acm Sigplan 1989 Conf. on Programming Language Design and Implementation (June), pp. 246–257.
Google Scholar
Hwu, W.W., and Chang, P.P. 1992. Efficient instruction sequencing with inline target insertion.IEEE Trans.Comps.41, 12 (Dec.):1537–1551.
Google Scholar
Intel. 1989.i860 64-Bit Microprocessor Programmer’s Reference Manual.Intel Corp., Santa Clara, Calif.
Google Scholar
Jouppi, N.P., and Wall, D.W. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. In Proc., 3rd Internat.Conf. on Architectural Support for Programming Languages and Operating Systems(Apr.), pp. 272–282.
Google Scholar
Kane, G. 1987.MIPS R2000 RISC Architecture.Prentice-Hall, Englewood Cliffs, N.J.
Google Scholar
Kuck, D.J. 1978.The Structure of Computers and Computations.John Wiley, New York.
Google Scholar
Kuck, D.J., Kuhn, R.H., Padua, D.A., Leasure, B., and Wolfe, M. 1981. Dependence graphs and compiler optimizations. In Proc.8th ACM Symp. on Principles of Programming Languages(Jan.), pp. 207–218.
Google Scholar
Mahlke, S.A., Chen, W.Y., Hwu, W.W., Rau, B.R., and Schlansker, M.S.S. 1992. Sentinel scheduling for VLIW and superscalar processors. In Proc.5th Internat. Conf. on Architectural Support for Programming Languages and Operating Systems(Boston, Oct.), pp. 238–247.
Google Scholar
Nakatani, T., and Ebcioglu, K. 1989. Combining as a compilation technique for VLIW architectures. InProc.22nd Internat. Workshop on Microprogramming and Microarchitecture(Sept.), pp. 43–55.
Google Scholar
Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. 1989. The Cydra 5 departmental supercomputer.IEEE Comp22, 1 (Jan.): 12–34.
Article Google Scholar
Schuette, M.A., and Shen, J.P. 1991. An instruction-level performance analysis of the Multiflow TRACE 14/300. InProc. 24th Internat. Workshop on Microprogramming and Microarchitecture(Nov.), pp. 2–11.
Google Scholar
Smith, M.D., Johnson, M., and Horowitz, M.A. 1989. Limits on multiple instruction issue. In Proc.3rd Internat. Conf.on Architectural Support for Programming Languages and Operating Systems(Apr.), pp. 290–302.
Google Scholar
Warren, H.S., Jr. 1990. Instruction scheduling for the IBM RISC System/6000 processor.IBM J. Res. and Del34, 1 (Jan.): 85–92.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL, 61801, USA
Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm & Daniel M. Lavery

Authors

Wen-Mei W. Hwu
View author publications
You can also search for this author in PubMed Google Scholar
Scott A. Mahlke
View author publications
You can also search for this author in PubMed Google Scholar
William Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pohua P. Chang
View author publications
You can also search for this author in PubMed Google Scholar
Nancy J. Warter
View author publications
You can also search for this author in PubMed Google Scholar
Roger A. Bringmann
View author publications
You can also search for this author in PubMed Google Scholar
Roland G. Ouellette
View author publications
You can also search for this author in PubMed Google Scholar
Richard E. Hank
View author publications
You can also search for this author in PubMed Google Scholar
Tokuzo Kiyohara
View author publications
You can also search for this author in PubMed Google Scholar
Grant E. Haab
View author publications
You can also search for this author in PubMed Google Scholar
John G. Holm
View author publications
You can also search for this author in PubMed Google Scholar
Daniel M. Lavery
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hewlett-Packard Laboratories, UK
B. R. Rau & J. A. Fisher &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hwu, WM.W. et al. (1993). The Superblock: An Effective Technique for VLIW and Superscalar Compilation. In: Rau, B.R., Fisher, J.A. (eds) Instruction-Level Parallelism. The Springer International Series in Engineering and Computer Science, vol 235. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-3200-2_7

Download citation

DOI: https://doi.org/10.1007/978-1-4615-3200-2_7
Published: 13 May 2011
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-6404-7
Online ISBN: 978-1-4615-3200-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics