Impact of Node Level Caching in MPI Job Launch Mechanisms

Sridhar, Jaidev K.; Panda, Dhabaleswar K.

doi:10.1007/978-3-642-03770-2_29

Jaidev K. Sridhar¹⁸ &
Dhabaleswar K. Panda¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5759))

Included in the following conference series:

European Parallel Virtual Machine / Message Passing Interface Users’ Group Meeting

1109 Accesses
2 Citations

Abstract

The quest for petascale computing systems has seen cluster sizes expressed in terms of number of processor cores increase rapidly. The Message Passing Interface (MPI) has emerged as the defacto standard on these modern, large scale clusters. This has resulted in an increased focus on research into the scalability of MPI libraries. However, as clusters grow in size, the scalability and performance of job launch mechanisms need to be re-visited.

In this work, we study the information exchange involved in the job launch phase of MPI applications. With the emergence of multi-core processing nodes, we examine the benefits of caching information at the node level during the job launch phase. We propose four design alternatives for such node level caches and evaluate their performance benefits. We propose enhancements to make these caches memory efficient while retaining the performance benefits by taking advantage of communication patterns during the job startup phase. One of our cache design – Hierarchical Cache with Message Aggregation, Broadcast and LRU (HCMAB-LRU) reduces the time involved in typical communication stages to one tenth while capping the memory used to a fixed upper bound based on the number of processes. This enables scalable MPI job launching for next generation clusters with hundreds of thousands of processor cores.

This research is supported in part by U.S. Department of Energy grants #DE-FC02-06ER25749 and #DE-FC02-06ER25755; National Science Foundation grants #CNS-0403342, #CCF-0702675 and #CCF-0833169; grant from Wright Center for Innovation #WCI04-010-OSU-0; grants from Intel, Mellanox, Cisco, and Sun Microsystems; Equipment donations from Intel, Mellanox, AMD, Advanced Clustering, Appro, QLogic, and Sun Microsystems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

TOP 500 Project: TOP 500 Supercomputer Sites (2009), http://www.top500.org
Los Alamos National Laboratory: Roadrunner, http://www.lanl.gov/roadrunner/
Sandia National Laboratories: Thunderbird Linux Cluster, http://www.cs.sandia.gov/platforms/Thunderbird.html
Texas Advanced Computing Center: HPC Systems, http://www.tacc.utexas.edu/resources/hpcsystems/
Message Passing Interface Forum: MPI: A Message-Passing Interface Standard (1994)
Google Scholar
Koop, M., Sridhar, J., Panda, D.K.: Scalable MPI Design over InfiniBand using eXtended Reliable Connection. In: IEEE Int’l Conference on Cluster Computing (Cluster 2008) (2008)
Google Scholar
Koop, M., Jones, T., Panda, D.K.: MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand. In: IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS 2008) (2008)
Google Scholar
Sur, S., Koop, M.J., Panda, D.K.: High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth Performance Analysis. In: Super Computing (2006)
Google Scholar
Sur, S., Chai, L., Jin, H.-W., Panda, D.K.: Shared Receive Queue Based Scalable MPI Design for InfiniBand Clusters. In: International Parallel and Distributed Processing Symposium (IPDPS) (2006)
Google Scholar
Shipman, G., Woodall, T., Graham, R., Maccabe, A.: InfiniBand Scalability in Open MPI. In: International Parallel and Distributed Processing Symposium (IPDPS) (2006)
Google Scholar
Yu, W., Gao, Q., Panda, D.K.: Adaptive Connection Management for Scalable MPI over InfiniBand. In: International Parallel and Distributed Processing Symposium (IPDPS) (2006)
Google Scholar
Sridhar, J.K., Koop, M.J., Perkins, J.L., Panda, D.K.: ScELA: Scalable and Extensible Launching Architecture for Clusters. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2008. LNCS, vol. 5374, pp. 323–335. Springer, Heidelberg (2008)
Chapter Google Scholar
Network-based Computing Laboratory: MVAPICH: MPI over InfiniBand and iWARP, http://mvapich.cse.ohio-state.edu
Moody, A.: PMGR Collective Startup, http://sourceforge.net/projects/pmgrcollective
Huang, W., Santhanaraman, G., Jin, H.-W., Gao, Q., Panda, D.K.: Design of High Performance MVAPICH2: MPI2 over InfiniBand. In: Sixth IEEE International Symposium on Cluster Computing and the Grid, CCGRID 2006 (2006)
Google Scholar
Argonne National Laboratory: MPICH2: High-performance and Widely Portable MPI, http://www.mcs.anl.gov/research/projects/mpich2/
Intel Corporation: Intel MPI Library, http://software.intel.com/en-us/intel-mpi-library/
Argonne National Laboratory: PMI v2 API, http://wiki.mcs.anl.gov/mpich2/index.php/PMI_v2_API

Download references

Author information

Authors and Affiliations

Network-Based Computing Laboratory, The Ohio State University, 2015 Neil Ave., Columbus, OH, 43210, USA
Jaidev K. Sridhar & Dhabaleswar K. Panda

Authors

Jaidev K. Sridhar
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Technology, Åbo Akademi, 20500, Turku, Finland
Matti Ropo & Jan Westerholm &
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sridhar, J.K., Panda, D.K. (2009). Impact of Node Level Caching in MPI Job Launch Mechanisms. In: Ropo, M., Westerholm, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2009. Lecture Notes in Computer Science, vol 5759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03770-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-03770-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03769-6
Online ISBN: 978-3-642-03770-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics