Abstract
We describe an approach and tools for optimizing collective operation spanning tree performance. The allreduce operation is analyzed using performance data collected at a lower level than by traditional monitoring systems. We calculate latencies and wait times to detect load balance problems, find subtrees with similar behavior, do cost breakdown, and compare the performance of two spanning tree configurations. We evaluate the performance of different configurations and mappings of allreduce run on clusters of different size and with different number of CPUs per host. We achieve a speedup of up to 1.49 for allreduce. Monitoring overhead is low, and the analysis is simplified since many subtrees have similar behavior. However, the calculated values have large variations, and reconfiguration may affect unchanged parts.
Chapter PDF
Similar content being viewed by others
References
Bernaschi, M., Iannello, G.: Collective communication operations: Experimental results vs.theory. Concurrency: Practice and Experience 10, 5 (1998)
Bjørndalen, J.M.: Improving the Speedup of Parallel and Distributed Applications on Clusters and Multi-Clusters. PhD thesis, Tromsø University (2003)
Bongo, L.A., Anshus, O., Bjørndalen, J.M.: EventSpace - Exposing and observing communication behavior of parallel cluster applications. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 47–56. Springer, Heidelberg (2003)
Bongo, L.A., Anshus, O., Bjørndalen, J.M.: Evaluating the performance of the allreduce collective operation on clusters: Approach and results, Technical Report 2004-48. Dep.of Computer Science, University of Tromsø (2004)
Jones, T.: Personal communication (2003)
Karwande, A., Yuan, X., Lowenthal, D.K.C.-M.: a compiled communication capable MPI prototype for Ethernet switched clusters. In: Proc. of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 95–106. ACM Press, New York (2003)
Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: Magpie: Mpi’s collective communication operations for clustered wide area systems. In: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 131–140. ACM Press, New York (1999)
Moore, S., Cronk, D.: Review of performance analysis tools for MPI parallel programs. In: Cotronis, Y., Dongarra, J. (eds.) PVM/MPI 2001. LNCS, vol. 2131, p. 241. Springer, Heidelberg (2001)
MPI: A Message-Passing Interface Standard. Message Passing Interface Forum (March 1994)
Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In: Proc. of the 2003 ACM/IEEE conference on Supercomputing (2003)
Pasztor, A., Veitch, D.: Pc based precision timing without gps. In: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 1–10. ACM Press, New York (2002)
Sistare, S.: vandeVaart, R., and Loh, E. Optimization of mpi collectives on clusters of large-scale smp’s. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing 1999, ACM Press, New York (1999)
Tipparaju, V., Nieplocha, J., Panda, D.: Fast collective operations using shared and remote memory access protocols on clusters. In: 17th Intl. Parallel and Distributed Processing Symp. (May 2003)
Vadhiyar, S.S., Fagg, G.E., Dongarra, J.: Automatically tuned collective communications. In: Proceedings of the 2000 ACM/IEEE conference on Supercomputing (2000)
Vetter, J., Mueller, F.: Communication characteristics of large-scale scientific applications for contemporary cluster architectures. In: 16th Intl. Parallel and Distributed Processing Symp (May 2002)
Vetter, J.S., Yoo, A.: An empirical performance evaluation of scalable scientific applications. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, IEEE Computer Society Press, Los Alamitos (2002)
Vinter, B.: PastSet a Structured Distributed Shared Memory System. PhD thesis, Tromsø University (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bongo, L.A., Anshus, O.J., Bjørndalen, J.M. (2004). Collective Communication Performance Analysis Within the Communication System. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-27866-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive