Skip to main content
Log in

Exploiting Value Locality to Exceed the Dataflow Limit

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

The serialization constraints imposed by true data dependences have always been regarded as an absolute dataflow limit on the parallel execution of serial programs. This paper describes value prediction, a new technique that allows data dependent instructions to issue and execute in parallel without violating program semantics. This technique exploits value locality, or the likelihood of the recurrence of a previously-seen value within a storage location inside a computer system. Value prediction consists of predicting entire 32- and 64-bit register values based on previously-seen values. We find that values loaded from memory or generated by ALU instructions are frequently predictable. Furthermore, we show that simple microarchitectural enhancements to a modern microprocessor implementation based on the PowerPC 620 that enable value prediction can effectively exploit value locality to collapse true dependences, reduce average memory and result latencies, and provide average performance gains of 3%-23% by exceeding the dataflow limit.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. A. Aho, R. Sethi, and J. Ullman, Compilers Principles, Techniques, and Tools, Addison-Wesley, Reading, Massachusetts (1986).

    Google Scholar 

  2. E. M. Riseman and C. C. Foster, The inhibition of potential parallelism by conditional jumps, IEEE Trans. Computers, pp. 1405–1411 (December 1972).

  3. D. W. Wall, Limits of instruction-level parallelism, Proc. Fourth Intl. Conf. Architect. Support for Progr. Lang. Oper. Syst., Santa Clara, California, pp. 176–189 (1991).

  4. M. Lam and R. Wilson, Limits of control flow on parallelism, Proc. 19th Intl. Symp. Computer Architecture, pp. 46–57 (1992).

  5. T. Y. Yeh and Y. N. Patt, Two-level adaptive training branch prediction, Proc. 24th Ann. Intl. Symp. Microarchitecture, pp. 51–61 (November 1991).

  6. R. P. Colwell and R. Steck, A 0.6um BiCMOS processor with Dynamic Execution, Proc. ISSCC (1995).

  7. M. Johnson, Superscalar Microprocessor Design, Prentice Hall, Englewood Cliffs, New Jersey (1991).

    Google Scholar 

  8. N. P. Jouppi, Architectural and organizational tradeoffs in the design of the MultiTitan CPU, Technical Report TN-8, DEC-wrl (December 1988).

  9. T. M. Austin and G. S. Sohi, Zero-cycle loads: Microarchitecture support for reducing load latency, Proc. 28th Ann. ACM/IEEE Intl. Symp. Microarchitecture, pp. 82–92 (December 1995).

  10. M. Franklin, The Multiscalar Architecture, Ph.D. thesis, University of Wisconsin-Madison (1993).

  11. A. S. Huang, G. Slavenburg, and J. P. Shen, Speculative disambiguation: A compilation technique for dynamic memory disambiguation, Proc. 21st Intl. Symp. Computer Architecture, pp. 200–210, Chicago, Illinois ( April 1994).

  12. D. M. Gallagher, W. Y. Chen, S. A. Mahlke, J. C. Gyllenhaal, and W.-M. Hwu, Dynamic memory disambiguation using the memory conflict buffer, Proc. Sixth Intl. Conf. Architectural Support for Progr. Lang. Oper. Syst., San Jose, California, pp. 183–193 (October 4–7, 1994).

  13. D. Levitan, T. Thomas, and P. Tu, The PowerPC 620 microprocessor: A high performance superscalar RISC processor, COMPCON 95 (1995).

  14. S. P. Harbison, A Computer Architecture for the Dynamic Optimization of High-Level Language Programs, Ph.D. thesis, Carnegie Mellon University (September 1980).

  15. S. P. Harbison, An architectural alternative to optimizing compilers, Proc. Intl. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS ), pp. 57–65 (March 1982).

  16. S. E. Richardson, Caching function results: Faster arithmetic by avoiding unnecessary computation, Technical report, Sun Microsystems Laboratories (1992).

  17. M. H. Lipasti, C. B. Wilkerson, and J. P. Shen, Value locality and load value prediction, Proc. Seventh Intl. Conf. Archit. Support for Progr. Lang. Oper. Syst. (ASPLOS-VII ) (October 1996).

  18. M. H. Lipasti and J. P. Shen, Exceeding the dataflow limit via value prediction, Proc. 29th Ann. ACM/IEEE Intl. Symp. Microarchitecture (December 1996).

  19. J. E. Smith, A study of branch prediction techniques, Proc. Eight Ann. Symp. Computer Archit., pp. 135–147 ( June 1981).

  20. W. Y. Chen, S. A. Mahlke, P. P. Chang, and W.-M. Hwu, Data access microarchitecture for superscalar processors with compiler-assisted data prefetching, Proc. 24th Intl. Symp. Microarchitecture (1991).

  21. T.-F. Chen and J.-L. Baer, A performance study of software and hardware data prefetching schemes, 21st Ann. Intl. Symp. Computer Archit., pp. 223–232 (1994).

  22. M. H. Lipasti, W. J. Schmidt, R. R. Roediger, and S. R. Kunkel, SPAID: Software prefetching in pointer-and call-intensive environments, Proc. 28th Ann. ACM/IEEE Intl. Symp. Microarchitecture (1995).

  23. SIGPLAN Notices, Proc. Symp. Partial Evaluation and Semantics-Based Program Manipulation, Volume 26, Cambridge, Massachusetts (September 1991).

  24. A. Srivastava and D. W. Wall, Link-time optimization of address calculation on a 64–bit architecture, SIGPLAN Notices 29(6):49–60 (June 1994). Proc. ACM SIGPLAN '94 Conf. on Progr. Lang. Design and Implementation.

    Google Scholar 

  25. D. Keppel, S. J. Eggers, and R. R. Henry, Evaluating runtime-compiled, value-specific optimizations, Technical report, University of Washington (1993).

  26. S. G. Abraham, R. A. Sugumar, D. Windheiser, B. R. Rau, and R. Gupta, Predictability of load/store instruction latencies, Proc. 26th Ann. ACM/IEEE Intl. Symp. Microarchitecture (December 1993).

  27. T. A. Diep, C. Nelson, and J. P. Shen, Performance evaluation of the PowerPC 620 microarchitecture, Proc. 22nd Intl. Symp. Computer Architecture, Santa Margherita Ligure, Italy (June 1995).

  28. T. A. Diep and J. P. Shen, VMW: A visualization-based microarchitecture workbench, IEEE Computer 28(12):57–64 (1995).

    Google Scholar 

  29. F. Gabbay and A. Mendelson, The effect of instruction fetch bandwidth on value prediction, Proc. 25th Ann. Intl. Symp. Computer Architecture, Barcelona, Spain (June 1998).

  30. M. H. Lipasti, Value Locality and Speculative Execution, Ph.D. thesis, Carnegie Mellon University (May 1997).

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lipasti, M.H., Shen, J.P. Exploiting Value Locality to Exceed the Dataflow Limit. International Journal of Parallel Programming 26, 505–538 (1998). https://doi.org/10.1023/A:1018706700344

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018706700344

Navigation