Evaluation of OpenMP for the Cyclops Multithreaded Architecture

Almasi, George; Ayguadé, Eduard; Caşcaval, Călin; Castaños, José; Labarta, Jesús; Martínez, Francisco; Martorell, Xavier; Moreira, José

doi:10.1007/3-540-45009-2_6

George Almasi⁶,
Eduard Ayguadé⁵,
Călin Caşcaval⁶,
José Castaños⁶,
Jesús Labarta⁵,
Francisco Martínez⁵,
Xavier Martorell⁵ &
…
José Moreira⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2716))

Included in the following conference series:

International Workshop on OpenMP Applications and Tools

531 Accesses
4 Citations

Abstract

Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that were not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anant Agarwal. Raw computation. Scientific American, August 1999.
Google Scholar
George Almási, Călin Caşcaval, José G. Castaõs, Monty Denneau, Derek Lieber, José E. Moreira, and Jr. Henry S. Warren. Dissecting Cyclops: A detailed analysis of a multithreaded architecture. In MEDEA Workshop on On-Chip Multiprocessor: Processor Architecture and Memory Hierarchy related Issues, September 2002.
Google Scholar
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and Maurice Yarrow. The NAS parallel benchmarks 2.0. Technical Report Technical Report NAS-95-020, NASA Ames Research Center, December 1995.
Google Scholar
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In 27th Annual International Symposium on Computer Architecture, pages 282–293, June 2000.
Google Scholar
Călin Caşcaval, José Castaõs, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, José E. Moreira, Karin Strauss, and Henry S. Warren, Jr. Evaluation of a multithreaded architecture for cellular computing. In Proceedings of the 8th International Symposium of High Performance Computer Architecture, February 2002.
Google Scholar
Intel Corporation. Intel hyperthreading technology. http://www.intel.com/info/hyperthreading. 2003.
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, R. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks. Technical Report Technical Report RNR-94-007, NASA Ames Research Center, March 1994.
Google Scholar
Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro, pages 12–18, September/October 1997.
Google Scholar
Frances Allen et al. Blue gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2):310–328, 2001.
Article Google Scholar
M. Gonzalez, E. Ayguadé, X. Martorell, J. Labarta, N. Navarro, and J. Oliver. NanosCompiler: Supporting flexible multilevel parallelism in OpenMP. Concurrency: Practice and Experience, 12(9), August 2000.
Google Scholar
M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.
Google Scholar
H. Jin, M. Frumkin, and J. Yan. The OpenMP implementation of the NAS parallel benchmarks and its performance. Technical Report Technical Report NAS-99-011, NASA Ames Research Center, October 1999.
Google Scholar
Yi Kang, Michael Huang, Seung-Moon Yoo, Zhenzho Ge, Diana Keen, Vinh Lam, Prattap Pattnaik, and Josep Torrellas. FlexRAM: Toward an advanced intelligent memory system. In International Conference on Computer Design (ICCD), October 1999.
Google Scholar
P. Kogge, S. Bass, J. Brockman, D. Chen, and E. Sha. Pursuing a petaflop: Point designs for 100 TF computers using PIM technologies. In Frontiers of Massively Parallel Computation Symposium, 1996.
Google Scholar
Peter M. Kogge. The EXECUBE approach to massively parallel processing. In Intl. Conf. on Parallel Processing, August 1994.
Google Scholar
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, and Dean M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In International Symposium on Microarchitecture, pages 114–124, 1997.
Google Scholar
H. Lu, Y. C. Hu, and W. Zwaenepoel. OpenMP on network of workstations. In Proc. of Supercomputing’98, 1998.
Google Scholar
X. Martorell, E. Ayguadé, J.I. Navarro, J. Corbalán, M. González, and J. Labarta. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. In Proceedings of the 13th Int. Conference on Supercomputing ICS’99, June 1999.
Google Scholar
X. Martorell, J. Labarta, J.I. Navarro, and E. Ayguadé. A library implementation of the nano-threads programming model. In Proceedings of Euro-Par’96, August 1996.
Google Scholar
OpenMP Organization. OpenMP Fortran application interface, v. 2.0. http://www.openmp.org, June 2000.
Mark Oskin, Frederic T. Chong, and Timothy Sherwood. Active Pages: A computation model for intelligent memory. In International Symposium on Computer Architecture, pages 192–203, 1998.
Google Scholar
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. A case for intelligent RAM: IRAM. In Proceedings of IEEE Micro, April 1997.
Google Scholar
Constantine D. Polychronopoulos, Milind B. Girkar, Mohammed Resa Haghighat, Chia Ling Lee, Bruce P. Leung, and Dale A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing and scheduling programs on multiprocessors. In 1989 International Conference on Parallel Processing, volume II, pages 39–48, St. Charles, Ill., 1989.
Google Scholar
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical recipes in C. In Cambridge University Press, 1992.
Google Scholar
S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. Lopez-Lagunas, P.R. Mattson, and J.D. Owens. A bandwidth-efficient architecture for media processing. In 31st International Symposium on Microarchitecture, November 1998.
Google Scholar
M. Sato, S. Satoh, K. Kusano, and Y. Tanaka. Design of OpenMP compiler for an smp cluster, 1999.
Google Scholar
Scientific Computing Associates, Inc. PCGPACK user’s guide.
Google Scholar
A. Snavely, L. Carter, J. Boisseau, A. Majumdar, K. S. Gatlin, N. Mitchel, J. Feo, and B. Koblenz. Multiprocessor performance on the Tera MTA. In Proceedings Supercomputing’ 98, Orlando, Florida, Nov. 7–13 1998.
Google Scholar
A. Snavely, G. Johnson, and J. Genetti. Data intensive volume visualization on the Tera MTA and Cray T3E. In Proceedings of the High Performance Computing Symposium-HPC’ 99, pages 59–64, 1999.
Google Scholar
Silicon Graphics Computer Systems. Origin2000 and Onyx2 performance tuning and optimization guide. Technical Report Doc. num. 007-3430-002, 1998.
Google Scholar
J. M. Tendler, J. S. Dodson, Jr. J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5–26, 2002.
Google Scholar
Josep Torrellas, Liuxi Yang, and Anthony-Trung Nguyen. Toward a cost-effective DSM organization that exploits processor-memory integration. In Sixth International Symposium on High-Performance Computer Architecture, January 2000.
Google Scholar
M. Tremblay. MAJC: Microprocessor architecture for Java computing. In Hot Chips, August 1999.
Google Scholar
Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, June 1995.
Google Scholar
Dean M. Tullsen, Jack L. Lo, Susan J. Eggers, and Henry M. Levy. Supporting fine-grained synchronization on a simultaneous multithreading processor. In HPCA, pages 54–58, 1999.
Google Scholar
Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. Baring it all to software: Raw machines. IEEE Computer, pages 86–93, September 1997.
Google Scholar
M. Yankelevsky and C. D. Polychronopoulos. α-Coral: A multigrain, multithreading processor architecture. In Proceedings of International Conference on Supercomputing’01, 2001.
Google Scholar
H. P. Zima and T. Sterling. The Gilgamesh processor-in-memory architecture and its execution model. In Workshop on Compilers for Parallel Computers, June 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

CEPBA-IBM Research Institute, UPC, Barcelona, Spain
Eduard Ayguadé, Jesús Labarta, Francisco Martínez & Xavier Martorell
IBM Thomas J. Watson Research Center, Yorktown Heights, NY
George Almasi, Călin Caşcaval, José Castaños & José Moreira

Authors

George Almasi
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Ayguadé
View author publications
You can also search for this author in PubMed Google Scholar
Călin Caşcaval
View author publications
You can also search for this author in PubMed Google Scholar
José Castaños
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Labarta
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
José Moreira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King’s College Road, Toronto, Ontario, M5S 3G4, Canada
Michael J. Voss

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Almasi, G. et al. (2003). Evaluation of OpenMP for the Cyclops Multithreaded Architecture. In: Voss, M.J. (eds) OpenMP Shared Memory Parallel Programming. WOMPAT 2003. Lecture Notes in Computer Science, vol 2716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45009-2_6

Download citation

DOI: https://doi.org/10.1007/3-540-45009-2_6
Published: 27 May 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40435-4
Online ISBN: 978-3-540-45009-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics