Skip to main content

A High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors

  • Chapter
Transactions on High-Performance Embedded Architectures and Compilers IV

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 6760))

Abstract

Chip Multiprocessors (CMPs) mainly base their performance gains on exploiting thread-level parallelism. Consequently, powerful memory systems are needed to support an increasing number of concurrent threads. Conventional CMP memory systems do not account for thread interference which can result in reduced overall system performance. Therefore, conventional high bandwidth Miss Handling Architectures (MHAs) are not well suited to CMPs because they can create severe memory bus congestion. However, high miss bandwidth is desirable when sufficient bus bandwidth is available. This paper presents a novel, CMP-specific technique called the Adaptive Miss Handling Architecture (AMHA). If the memory bus is congested, AMHA improves performance by dynamically reducing the maximum allowed number of concurrent L1 cache misses of a processor core if this creates a significant speedup for the other processors. Compared to a 16-wide conventional MHA, AMHA improves performance by 12% on average for one of the workload collections used in this work.

This work was supported by the Norwegian Metacenter for Computational Science (NOTUR).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hennessy, J.L., Patterson, D.A.: Computer Architecture - A Quantitative Approach, 4th edn. Morgan Kaufmann Publishers, San Francisco (2007)

    MATH  Google Scholar 

  2. Burger, D., Goodman, J.R., Kgi, A.: Memory Bandwidth Limitations of Future Microprocessors. In: ISCA 1996: Proc. of the 23rd An. Int. Symp. on Comp. Arch. (1996)

    Google Scholar 

  3. ITRS: Int. Tech. Roadmap for Semiconductors (2006), http://www.itrs.net/

  4. Tuck, J., Ceze, L., Torrellas, J.: Scalable Cache Miss Handling for High Memory-Level Parallelism. In: MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarchitecture, pp. 409–422 (2006)

    Google Scholar 

  5. Kroft, D.: Lockup-free Instruction Fetch/Prefetch Cache Organization. In: ISCA 1981: Proc. of the 8th An. Symp. on Comp. Arch., pp. 81–87 (1981)

    Google Scholar 

  6. Farkas, K.I., Jouppi, N.P.: Complexity/Performance Tradeoffs with Non-Blocking Loads. In: ISCA 1994: Proc. of the 21st An. Int. Symp. on Comp. Arch., pp. 211–222 (1994)

    Google Scholar 

  7. Sohi, G.S., Franklin, M.: High-bandwidth Data Memory Systems for Superscalar Processors. In: ASPLOS-IV: Proc. of the Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 53–62 (1991)

    Google Scholar 

  8. Belayneh, S., Kaeli, D.R.: A Discussion on Non-Blocking/Lockup-Free Caches. SIGARCH Comp. Arch. News 24(3), 18–25 (1996)

    Article  Google Scholar 

  9. Mutlu, O., Moscibroda, T.: Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In: MICRO 40: Proc. of the 40th An. IEEE/ACM Int. Symp. on Microarchitecture (2007)

    Google Scholar 

  10. Nesbit, K.J., Aggarwal, N.: L., J., Smith, J.E.: Fair Queuing Memory Systems. In: MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarchitecture, pp. 208–222 (2006)

    Google Scholar 

  11. Rafique, N., Lim, W.T., Thottethodi, M.: Effective Management of DRAM Bandwidth in Multicore Processors. In: PACT 2007: Proc. of the 16th Int. Conf. on Parallel Architecture and Compilation Techniques, pp. 245–258 (2007)

    Google Scholar 

  12. Shao, J., Davis, B.: A Burst Scheduling Access Reordering Mechanism. In: HPCA 2007: Proc. of the 13th Int. Symp. on High-Performance Comp. Arch. (2007)

    Google Scholar 

  13. Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory Access Scheduling. In: ISCA 2000: Proc. of the 27th An. Int. Symp. on Comp. Arch., pp. 128–138 (2000)

    Google Scholar 

  14. Qureshi, M.K., Patt, Y.N.: Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In: MICRO 39: Proc. of the 39th An. IEEE/ACM Int. Symp. on Microarch., pp. 423–432 (2006)

    Google Scholar 

  15. Dybdahl, H., Stenstrm, P.: An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In: HPCA 2007: Proc. of the 13th Int. Symp. on High-Performance Comp. Arch. (2007)

    Google Scholar 

  16. Nesbit, K.J., Laudon, J., Smith, J.E.: Virtual Private Caches. In: ISCA 2007: Proc. of the 34th An. Int. Symp. on Comp. Arch., pp. 57–68 (2007)

    Google Scholar 

  17. Chang, J., Sohi, G.S.: Cooperative Cache Partitioning for Chip Multiprocessors. In: ICS 2007: Proc. of the 21st An. Int. Conf. on Supercomputing, pp. 242–252 (2007)

    Google Scholar 

  18. Scott, S.L., Sohi, G.S.: The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control. IEEE Trans. Parallel Distrib. Syst., 385–398 (1990)

    Google Scholar 

  19. Thottethodi, M., Lebeck, A., Mukherjee, S.: Exploiting Global Knowledge to achieve Self-tuned Congestion Control for k-ary n-cube Networks. IEEE Trans. on Parallel and Distributed Systems, 257–272 (2004)

    Google Scholar 

  20. Martin, M., Sorin, D., Hill, M., Wood, D.: Bandwidth Adaptive Snooping. In: HPCA 2002: Proc. of the 8th Int. Symp. on High-Performance Comp. Arch., p. 251 (2002)

    Google Scholar 

  21. SPEC: SPEC CPU 2000 Web Page, http://www.spec.org/cpu2000/

  22. Luo, K., Gummaraju, J., Franklin, M.: Balancing Throughput and Fairness in SMT Processors. In: ISPASS (2001)

    Google Scholar 

  23. Snavely, A., Tullsen, D.M.: Symbiotic Jobscheduling for a Simultaneous Multithreading Processor. In: Arch. Support for Programming Languages and Operating Systems, pp. 234–244 (2000)

    Google Scholar 

  24. Eyerman, S., Eeckhout, L.: System-Level Performance Metrics for Multiprogram Workloads. IEEE Micro 28(3), 42–53 (2008)

    Article  Google Scholar 

  25. Binkert, N.L., Dreslinski, R.G., Hsu, L.R., Lim, K.T., Saidi, A.G., Reinhardt, S.K.: The M5 Simulator: Modeling Networked Systems. IEEE Micro 26(4), 52–60 (2006)

    Article  Google Scholar 

  26. JEDEC Solid State Technology Association: DDR2 SDRAM Specification (May 2006)

    Google Scholar 

  27. Cuppu, V., Jacob, B., Davis, B., Mudge, T.: A Performance Comparison of Contemporary DRAM Architectures. In: Proc. of the 26th Inter. Symp. on Comp. Arch., pp. 222–233 (1999)

    Google Scholar 

  28. Asanovic, K., et al.: The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California at Berkeley (December 2006)

    Google Scholar 

  29. Huh, J., Burger, D., Keckler, S.W.: Exploring the Design Space of Future CMPs. In: Malyshkin, V.E. (ed.) PaCT 2001. LNCS, vol. 2127, pp. 199–210. Springer, Heidelberg (2001)

    Google Scholar 

  30. Zhao, L., Iyer, R., Illikkal, R., Moses, J., Makineni, S., Newell, D.: CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms. In: PACT 2007: Proc. of the 16th Int. Conf. on Parallel Architecture and Compilation Techniques, pp. 339–352 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Jahre, M., Natvig, L. (2011). A High Performance Adaptive Miss Handling Architecture for Chip Multiprocessors. In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers IV. Lecture Notes in Computer Science, vol 6760. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24568-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24568-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24567-1

  • Online ISBN: 978-3-642-24568-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics