Abstract
Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that were not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anant Agarwal. Raw computation. Scientific American, August 1999.
George Almási, Călin Caşcaval, José G. Castaõs, Monty Denneau, Derek Lieber, José E. Moreira, and Jr. Henry S. Warren. Dissecting Cyclops: A detailed analysis of a multithreaded architecture. In MEDEA Workshop on On-Chip Multiprocessor: Processor Architecture and Memory Hierarchy related Issues, September 2002.
D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and Maurice Yarrow. The NAS parallel benchmarks 2.0. Technical Report Technical Report NAS-95-020, NASA Ames Research Center, December 1995.
L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In 27th Annual International Symposium on Computer Architecture, pages 282–293, June 2000.
Călin Caşcaval, José Castaõs, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, José E. Moreira, Karin Strauss, and Henry S. Warren, Jr. Evaluation of a multithreaded architecture for cellular computing. In Proceedings of the 8th International Symposium of High Performance Computer Architecture, February 2002.
Intel Corporation. Intel hyperthreading technology. http://www.intel.com/info/hyperthreading. 2003.
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, R. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks. Technical Report Technical Report RNR-94-007, NASA Ames Research Center, March 1994.
Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro, pages 12–18, September/October 1997.
Frances Allen et al. Blue gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2):310–328, 2001.
M. Gonzalez, E. Ayguadé, X. Martorell, J. Labarta, N. Navarro, and J. Oliver. NanosCompiler: Supporting flexible multilevel parallelism in OpenMP. Concurrency: Practice and Experience, 12(9), August 2000.
M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.
H. Jin, M. Frumkin, and J. Yan. The OpenMP implementation of the NAS parallel benchmarks and its performance. Technical Report Technical Report NAS-99-011, NASA Ames Research Center, October 1999.
Yi Kang, Michael Huang, Seung-Moon Yoo, Zhenzho Ge, Diana Keen, Vinh Lam, Prattap Pattnaik, and Josep Torrellas. FlexRAM: Toward an advanced intelligent memory system. In International Conference on Computer Design (ICCD), October 1999.
P. Kogge, S. Bass, J. Brockman, D. Chen, and E. Sha. Pursuing a petaflop: Point designs for 100 TF computers using PIM technologies. In Frontiers of Massively Parallel Computation Symposium, 1996.
Peter M. Kogge. The EXECUBE approach to massively parallel processing. In Intl. Conf. on Parallel Processing, August 1994.
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, and Dean M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In International Symposium on Microarchitecture, pages 114–124, 1997.
H. Lu, Y. C. Hu, and W. Zwaenepoel. OpenMP on network of workstations. In Proc. of Supercomputing’98, 1998.
X. Martorell, E. Ayguadé, J.I. Navarro, J. Corbalán, M. González, and J. Labarta. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. In Proceedings of the 13th Int. Conference on Supercomputing ICS’99, June 1999.
X. Martorell, J. Labarta, J.I. Navarro, and E. Ayguadé. A library implementation of the nano-threads programming model. In Proceedings of Euro-Par’96, August 1996.
OpenMP Organization. OpenMP Fortran application interface, v. 2.0. http://www.openmp.org, June 2000.
Mark Oskin, Frederic T. Chong, and Timothy Sherwood. Active Pages: A computation model for intelligent memory. In International Symposium on Computer Architecture, pages 192–203, 1998.
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. A case for intelligent RAM: IRAM. In Proceedings of IEEE Micro, April 1997.
Constantine D. Polychronopoulos, Milind B. Girkar, Mohammed Resa Haghighat, Chia Ling Lee, Bruce P. Leung, and Dale A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing and scheduling programs on multiprocessors. In 1989 International Conference on Parallel Processing, volume II, pages 39–48, St. Charles, Ill., 1989.
William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical recipes in C. In Cambridge University Press, 1992.
S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. Lopez-Lagunas, P.R. Mattson, and J.D. Owens. A bandwidth-efficient architecture for media processing. In 31st International Symposium on Microarchitecture, November 1998.
M. Sato, S. Satoh, K. Kusano, and Y. Tanaka. Design of OpenMP compiler for an smp cluster, 1999.
Scientific Computing Associates, Inc. PCGPACK user’s guide.
A. Snavely, L. Carter, J. Boisseau, A. Majumdar, K. S. Gatlin, N. Mitchel, J. Feo, and B. Koblenz. Multiprocessor performance on the Tera MTA. In Proceedings Supercomputing’ 98, Orlando, Florida, Nov. 7–13 1998.
A. Snavely, G. Johnson, and J. Genetti. Data intensive volume visualization on the Tera MTA and Cray T3E. In Proceedings of the High Performance Computing Symposium-HPC’ 99, pages 59–64, 1999.
Silicon Graphics Computer Systems. Origin2000 and Onyx2 performance tuning and optimization guide. Technical Report Doc. num. 007-3430-002, 1998.
J. M. Tendler, J. S. Dodson, Jr. J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5–26, 2002.
Josep Torrellas, Liuxi Yang, and Anthony-Trung Nguyen. Toward a cost-effective DSM organization that exploits processor-memory integration. In Sixth International Symposium on High-Performance Computer Architecture, January 2000.
M. Tremblay. MAJC: Microprocessor architecture for Java computing. In Hot Chips, August 1999.
Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, June 1995.
Dean M. Tullsen, Jack L. Lo, Susan J. Eggers, and Henry M. Levy. Supporting fine-grained synchronization on a simultaneous multithreading processor. In HPCA, pages 54–58, 1999.
Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. Baring it all to software: Raw machines. IEEE Computer, pages 86–93, September 1997.
M. Yankelevsky and C. D. Polychronopoulos. α-Coral: A multigrain, multithreading processor architecture. In Proceedings of International Conference on Supercomputing’01, 2001.
H. P. Zima and T. Sterling. The Gilgamesh processor-in-memory architecture and its execution model. In Workshop on Compilers for Parallel Computers, June 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Almasi, G. et al. (2003). Evaluation of OpenMP for the Cyclops Multithreaded Architecture. In: Voss, M.J. (eds) OpenMP Shared Memory Parallel Programming. WOMPAT 2003. Lecture Notes in Computer Science, vol 2716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45009-2_6
Download citation
DOI: https://doi.org/10.1007/3-540-45009-2_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40435-4
Online ISBN: 978-3-540-45009-2
eBook Packages: Springer Book Archive