Skip to main content

Architectural Considerations for Exascale Supercomputing

  • Conference paper
  • First Online:
Sustained Simulation Performance 2012
  • 602 Accesses

Abstract

Towards exascale supercomputing, both academia and industry have started to investigate the future HPC technologies. One of the most difficult challenges is the enhancement of energy efficiency of the computer system. We discuss the energy efficiency of existing architecture in this paper. With the analysis of the performance of the dense matrix–matrix multiplication (DGEMM), we propose the DGEMM-specialized Vector-SIMD architecture that only requires the small number of processor cores and low memory bandwidth. The DGEMM-specialized Vector-SIMD architecture can outperform existing architectures with respect to several metrics, as far as it is dedicated to limited usages, such as the DGEMM calculation. We conclude that this type of discussion will be essential in designing the future computer architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Each fused multiply and add operation requires two memory accesses (16B).

References

  1. Gotoblas library. http://www.tacc.utexas.edu/tacc-projects/gotoblas2/.

  2. Top500 supercomputer sites. http://www.top500.org/.

  3. Kazushige Goto and Robert A. van de Geijn. Anatomy of high-performance matrix multiplication. ACM Trans. Math. Softw., 34(3):12:1–12:25, May 2008.

    Google Scholar 

  4. P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carson, W. Dally, M. Denneau, P. Franzon, W. Harrod, K. Hill, and Others. Exascale computing study: Technology challenges in achieving exascale systems. Technical report, University of Notre Dame, CSE Dept., 2008.

    Google Scholar 

  5. Peter M. Kogge and Timothy J. Dysart. Using the top500 to trace and project technology and architecture trends. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 28:1–28:11, New York, NY, USA, 2011. ACM.

    Google Scholar 

  6. Kuniaki Koike, Ken Fujino, Toshiyuki Fukushige, Hiroshi Daisaka, Yutaka Sugawara, Mary Inaba, Kei Hiraki, and Junnichiro Makino. Gravitational n-body simulation and lu decomposition with the multi purpose accelerator grape-dr (in japanese). Technical Report, 2009(26):1–11, 2009-07-28.

    Google Scholar 

  7. Yunsup Lee, Rimas Avizienis, Alex Bishara, Richard Xia, Derek Lockhart, Christopher Batten, and Krste Asanović. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In Proceedings of the 38th annual international symposium on Computer architecture, ISCA ’11, pages 129–140, New York, NY, USA, 2011. ACM.

    Google Scholar 

  8. Naoya Maruyama, Masaaki Kondo, Yasuo Ishii, Akihiro Nomura, Hiroyuki Takizawa, Takahiro Katagiri, Reiji Suda, and Yutaka Ishikawa. Technical Roadmap for Exascale Supercomputing. Technical report, SDHPC, 2008.

    Google Scholar 

  9. Takashi Soga, Akihiro Musa, Youichi Shimomura, Ryusuke Egawa, Ken’ichi Itakura, Hiroyuki Takizawa, Koki Okabe, and Hiroaki Kobayashi. Performance evaluation of nec sx-9 using real science and engineering applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, pages 28:1–28:12, New York, NY, USA, 2009. ACM.

    Google Scholar 

  10. Guangming Tan, Linchuan Li, Sean Triechle, Everett Phillips, Yungang Bao, and Ninghui Sun. Fast implementation of dgemm on fermi gpu. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 35:1–35:11, New York, NY, USA, 2011. ACM.

    Google Scholar 

  11. David W. Wall. Limits of instruction-level parallelism. In Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, ASPLOS-IV, pages 176–188, New York, NY, USA, 1991. ACM.

    Google Scholar 

  12. Wm. A. Wulf and Sally A. McKee. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News, 23(1):20–24, March 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasuo Ishii .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ishii, Y. (2013). Architectural Considerations for Exascale Supercomputing. In: Resch, M., Wang, X., Bez, W., Focht, E., Kobayashi, H. (eds) Sustained Simulation Performance 2012. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32454-3_2

Download citation

Publish with us

Policies and ethics