Skip to main content

Evaluation of OpenMP for the Cyclops Multithreaded Architecture

  • Conference paper
  • First Online:
OpenMP Shared Memory Parallel Programming (WOMPAT 2003)

Abstract

Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that were not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anant Agarwal. Raw computation. Scientific American, August 1999.

    Google Scholar 

  2. George Almási, Călin Caşcaval, José G. Castaõs, Monty Denneau, Derek Lieber, José E. Moreira, and Jr. Henry S. Warren. Dissecting Cyclops: A detailed analysis of a multithreaded architecture. In MEDEA Workshop on On-Chip Multiprocessor: Processor Architecture and Memory Hierarchy related Issues, September 2002.

    Google Scholar 

  3. D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and Maurice Yarrow. The NAS parallel benchmarks 2.0. Technical Report Technical Report NAS-95-020, NASA Ames Research Center, December 1995.

    Google Scholar 

  4. L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In 27th Annual International Symposium on Computer Architecture, pages 282–293, June 2000.

    Google Scholar 

  5. Călin Caşcaval, José Castaõs, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, José E. Moreira, Karin Strauss, and Henry S. Warren, Jr. Evaluation of a multithreaded architecture for cellular computing. In Proceedings of the 8th International Symposium of High Performance Computer Architecture, February 2002.

    Google Scholar 

  6. Intel Corporation. Intel hyperthreading technology. http://www.intel.com/info/hyperthreading. 2003.

  7. D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, R. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks. Technical Report Technical Report RNR-94-007, NASA Ames Research Center, March 1994.

    Google Scholar 

  8. Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro, pages 12–18, September/October 1997.

    Google Scholar 

  9. Frances Allen et al. Blue gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2):310–328, 2001.

    Article  Google Scholar 

  10. M. Gonzalez, E. Ayguadé, X. Martorell, J. Labarta, N. Navarro, and J. Oliver. NanosCompiler: Supporting flexible multilevel parallelism in OpenMP. Concurrency: Practice and Experience, 12(9), August 2000.

    Google Scholar 

  11. M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.

    Google Scholar 

  12. H. Jin, M. Frumkin, and J. Yan. The OpenMP implementation of the NAS parallel benchmarks and its performance. Technical Report Technical Report NAS-99-011, NASA Ames Research Center, October 1999.

    Google Scholar 

  13. Yi Kang, Michael Huang, Seung-Moon Yoo, Zhenzho Ge, Diana Keen, Vinh Lam, Prattap Pattnaik, and Josep Torrellas. FlexRAM: Toward an advanced intelligent memory system. In International Conference on Computer Design (ICCD), October 1999.

    Google Scholar 

  14. P. Kogge, S. Bass, J. Brockman, D. Chen, and E. Sha. Pursuing a petaflop: Point designs for 100 TF computers using PIM technologies. In Frontiers of Massively Parallel Computation Symposium, 1996.

    Google Scholar 

  15. Peter M. Kogge. The EXECUBE approach to massively parallel processing. In Intl. Conf. on Parallel Processing, August 1994.

    Google Scholar 

  16. Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, and Dean M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In International Symposium on Microarchitecture, pages 114–124, 1997.

    Google Scholar 

  17. H. Lu, Y. C. Hu, and W. Zwaenepoel. OpenMP on network of workstations. In Proc. of Supercomputing’98, 1998.

    Google Scholar 

  18. X. Martorell, E. Ayguadé, J.I. Navarro, J. Corbalán, M. González, and J. Labarta. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. In Proceedings of the 13th Int. Conference on Supercomputing ICS’99, June 1999.

    Google Scholar 

  19. X. Martorell, J. Labarta, J.I. Navarro, and E. Ayguadé. A library implementation of the nano-threads programming model. In Proceedings of Euro-Par’96, August 1996.

    Google Scholar 

  20. OpenMP Organization. OpenMP Fortran application interface, v. 2.0. http://www.openmp.org, June 2000.

  21. Mark Oskin, Frederic T. Chong, and Timothy Sherwood. Active Pages: A computation model for intelligent memory. In International Symposium on Computer Architecture, pages 192–203, 1998.

    Google Scholar 

  22. David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. A case for intelligent RAM: IRAM. In Proceedings of IEEE Micro, April 1997.

    Google Scholar 

  23. Constantine D. Polychronopoulos, Milind B. Girkar, Mohammed Resa Haghighat, Chia Ling Lee, Bruce P. Leung, and Dale A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing and scheduling programs on multiprocessors. In 1989 International Conference on Parallel Processing, volume II, pages 39–48, St. Charles, Ill., 1989.

    Google Scholar 

  24. William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical recipes in C. In Cambridge University Press, 1992.

    Google Scholar 

  25. S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. Lopez-Lagunas, P.R. Mattson, and J.D. Owens. A bandwidth-efficient architecture for media processing. In 31st International Symposium on Microarchitecture, November 1998.

    Google Scholar 

  26. M. Sato, S. Satoh, K. Kusano, and Y. Tanaka. Design of OpenMP compiler for an smp cluster, 1999.

    Google Scholar 

  27. Scientific Computing Associates, Inc. PCGPACK user’s guide.

    Google Scholar 

  28. A. Snavely, L. Carter, J. Boisseau, A. Majumdar, K. S. Gatlin, N. Mitchel, J. Feo, and B. Koblenz. Multiprocessor performance on the Tera MTA. In Proceedings Supercomputing’ 98, Orlando, Florida, Nov. 7–13 1998.

    Google Scholar 

  29. A. Snavely, G. Johnson, and J. Genetti. Data intensive volume visualization on the Tera MTA and Cray T3E. In Proceedings of the High Performance Computing Symposium-HPC’ 99, pages 59–64, 1999.

    Google Scholar 

  30. Silicon Graphics Computer Systems. Origin2000 and Onyx2 performance tuning and optimization guide. Technical Report Doc. num. 007-3430-002, 1998.

    Google Scholar 

  31. J. M. Tendler, J. S. Dodson, Jr. J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5–26, 2002.

    Google Scholar 

  32. Josep Torrellas, Liuxi Yang, and Anthony-Trung Nguyen. Toward a cost-effective DSM organization that exploits processor-memory integration. In Sixth International Symposium on High-Performance Computer Architecture, January 2000.

    Google Scholar 

  33. M. Tremblay. MAJC: Microprocessor architecture for Java computing. In Hot Chips, August 1999.

    Google Scholar 

  34. Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, June 1995.

    Google Scholar 

  35. Dean M. Tullsen, Jack L. Lo, Susan J. Eggers, and Henry M. Levy. Supporting fine-grained synchronization on a simultaneous multithreading processor. In HPCA, pages 54–58, 1999.

    Google Scholar 

  36. Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. Baring it all to software: Raw machines. IEEE Computer, pages 86–93, September 1997.

    Google Scholar 

  37. M. Yankelevsky and C. D. Polychronopoulos. α-Coral: A multigrain, multithreading processor architecture. In Proceedings of International Conference on Supercomputing’01, 2001.

    Google Scholar 

  38. H. P. Zima and T. Sterling. The Gilgamesh processor-in-memory architecture and its execution model. In Workshop on Compilers for Parallel Computers, June 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Almasi, G. et al. (2003). Evaluation of OpenMP for the Cyclops Multithreaded Architecture. In: Voss, M.J. (eds) OpenMP Shared Memory Parallel Programming. WOMPAT 2003. Lecture Notes in Computer Science, vol 2716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45009-2_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45009-2_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40435-4

  • Online ISBN: 978-3-540-45009-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics