Abstract
We consider the realization of matrix-matrix multiplication and propose a hierarchical algorithm implemented in a task-parallel way using multiprocessor tasks on distributed memory. The algorithm has been designed to minimize the communication overhead while showing large locality of memory references. The task-parallel realization makes the algorithm especially suited for cluster of SMPs since tasks can then be mapped to the different cluster nodes in order to efficiently exploit the cluster architecture. Experiments on current cluster machines show that the resulting execution times are competitive with state-of-the-art methods like PDGEMM.
Chapter PDF
Similar content being viewed by others
References
Bilmes, J., Asanovic, K., Chin, C.-W., Demmel, J.: Optimizing matrix multiply using PHiPAC: A portable, high-performance, ANSI c coding methodology. In: International Conference on Supercomputing, pp. 340–347 (1997)
Desprez, F., Suter, F.: Impact of Mixed-Parallelism on Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms. Technical Report RR2002-24, Laboratoire de l’Informatique du Parallélisme (LIP) (June 2002), Also INRIA Research Report RR-4482
Van De Geijn, R.A., Watts, J.: SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9(4), 255–274 (1997)
Grayson, B., Shah, A., van de Geijn, R.: A High Performance Parallel Strassen Implementation. Technical Report CS-TR-95-24, Department of Computer Sciences, The Unversity of Texas, 1 (1995)
Luo, Q., Drake, J.B.: A Scalable Parallel Strassen’s Matrix Multiplication Algorithm for Distributed-Memory Computers. In: Proceedings of the 1995 ACM symposium on Applied computing, pp. 221–226. ACM Press, New York (1995)
Rauber, T., Rünger, G.: Library Support for Hierarchical Multi-Processor Tasks. In: Proc. of the Supercomputing 2002, Baltimore, USA (2002)
Clint Whaley, R., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. Technical Report UT-CS-97-366, University of Tennessee (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hunold, S., Rauber, T., Rünger, G. (2004). Hierarchical Matrix-Matrix Multiplication Based on Multiprocessor Tasks. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds) Computational Science - ICCS 2004. ICCS 2004. Lecture Notes in Computer Science, vol 3037. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24687-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-24687-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22115-9
Online ISBN: 978-3-540-24687-9
eBook Packages: Springer Book Archive