Scalability of Sparse Direct Solvers

Schreiber, Robert

doi:10.1007/978-1-4613-8369-7_9

Robert Schreiber⁵

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 56))

947 Accesses
18 Citations

Abstract

We shall say that a scalable algorithm achieves efficiency that is bounded away from zero as the number of processors and the problem size increase in such a way that the size of the data structures increases linearly with the number of processors. In this paper we show that the column-oriented approach to sparse Cholesky for distributed-memory machines is not scalable. By considering message volume, node contention, and bisection width, one may obtain lower bounds on the time required for communication in a distributed algorithm. Applying this technique to distributed, column-oriented, dense Cholesky leads to the conclusion that N (the order of the matrix) must scale with P (the number of processors) so that storage grows like P ². So the algorithm is not scalable. Identical conclusions have previously been obtained by consideration of communication and computation latency on the critical path in the algorithm; these results complement and reinforce that conclusion.

For the sparse case, both theory and some new experimental measurements, reported here, make the same point: for column-oriented distributed methods, the number of gridpoints (which is O(N)) must grow as P ² in order to maintain parallel efficiency bounded above zero. Our sparse matrix results employ the “fan-in” distributed scheme, implemented on machines with either a grid or a fat-tree interconnect using a subtree-to-submachine mapping of the columns.

The alternative of distributing the rows and columns of the matrix to the rows and columns of a grid of processors is shown to be scalable for the dense case. Its scalability for the sparse case has been established previously [10]. To date, however, none of these methods has achieved high efficiency on a highly parallel machine.

Finally, open problems and other approaches that may be more fruitful are discussed.

Research Institute for Advanced Computer Science, MS T045-1 NASA Ames Research Center, Moffett Field, CA 94035. This author’s work was supported by the NAS Systems Division via Cooperative Agreement NCC 2-387 between NASA and the University Space Research Association (USRA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

E. Anderson, A. Benzoni, J. Dongarra, S. Moulton, S. Ostrouchov, B. TourancheauAND R. VAN DE Geijn, LAPACK for distributed memory architectures: progress report, In Parallel Processing for Scientific Computing, SIAM, 1992.
Google Scholar
C. Ashcraft, S. C. Eisenstat, AND J. W. H. Liu, A fan-in algorithm for distributed sparse numerical factorization, SIAM J. Scient. Stat. Comput. 11 (1990), pp. 593–599.
Article MathSciNet MATH Google Scholar
C. Ashcraft, S. C. Eisenstat, J. W. H. Liu, AND A. H. Sherman, A comparison of three column-based distributed sparse factorization schemes, Research Report YALEU/DCS/RR810, Comp. Sci. Dept., Yale Univ., 1990.
Google Scholar
C. Ashcraft, S. C. Eisenstat, J. W. H. Liu, B.W. Peyton, AND A. H. Sherman, A compute-ahead fan-in scheme for parallel sparse matrix factorization, In D. Pelletier, editor, Proceedings, Supercomputing Symposium ’90, pp. 351–361. École Polytechnique de Montréal, 1990.
Google Scholar
C. Ashcraft, The fan-both family of column-based distributed Cholesky factorization algorithms, These proceedings.
Google Scholar
P. Bjorstad AND M. D. Skogen, Domain decomposition algorithms of Schwarz type, designed for massively parallel computers. Proceedings of the Fifth International Symposium on Domain Decomposition. SIAM, 1992.
Google Scholar
J. Dongarra, R VAN DE Geijn, AND D. Walker, A look at scalable dense linear algebra libraries, Proceedings, Scalable High Performance Computer Conference, Williamsburg, VA, 1992.
Google Scholar
A. George, J. W. H. Liu, AND E. Ng, Communication results for parallel sparse Cholesky factorization on a hypercube, Parallel Comput. 10 (1989), pp. 287–298.
Article MathSciNet MATH Google Scholar
A. George, M. T. Heath, J. W. H. Liu, AND E. Ng, Solution of sparse positive definite systems on a hypercube, J. Comput. Appl. Math. 27 (1989), pp. 129–156.
Article MathSciNet MATH Google Scholar
J. R. Gilbert AND R. Schreiber, Highly parallel sparse Cholesky factorization,SIAM J. Scient. Stat. Comput., to appear.
Google Scholar
J. R. Gilbert, C. Moler, AND R. Schreiber, Sparse matrices in MATLAB: design and implementation, SIAM J. Matrix Anal. Appl. 13 (1992), pp. 333–356.
Article MathSciNet MATH Google Scholar
S. W. Hammond, Mapping Unstructured Grid Computations to Massively Parallel Computers, PhD thesis, Dept. of Comp. Sci., Rensselaer Polytechnic Institute, 1992.
Google Scholar
S. W. Hammond AND R. Schreiber, Mapping unstructured grid problems to the Connection Machine, In Piyush Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 11–30. MIT Press, 1992.
Google Scholar
M. T. Heath, E. Ng, AND B. W. Peyton, Parallel algorithms for sparse linear systems, SIAM Review 33 (1991), pp. 420–460.
Article MathSciNet MATH Google Scholar
S. G. Kratzer, Massively parallel sparse matrix computations, In P. Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 178–186. MIT Press, 1992. A more complete version will appear in J. Supercomputing.
Google Scholar
C. E. Leiserson, Fat-trees: universal networks for hardware-efficient supercomputing, IEEE Trans. Comput. C-34 (1985), pp. 892–901.
Google Scholar
Guangye LI AND Thomas F. Coleman, A parallel triangular solver for a distributed memory multiprocessor, SIAM J. Scient. Stat. Comput. 9 (1988), pp. 485–502.
Google Scholar
M. Mu AND J. R. Rice, Performance of PDE sparse solvers on hypercubes, In P. Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 345–370. MIT Press, 1992.
Google Scholar
M. Mu AND J. R. Rice, A grid based subtree-subcube assignment strategy for solving PDEs on hypercubes, Siam J. Scient. Stat. Comput., 13 (1992), pp. 826–839.
Article MathSciNet MATH Google Scholar
A. T. Ogielski AND W. Aiello, Sparse matrix algebra on parallel processor arrays,These proceedings.
Google Scholar
D. P. O’leary AND G. W. Stewart, Data-flow algorithms for parallel matrix computations, Comm. ACM, 28 (1985), pp. 840–853.
Article MathSciNet MATH Google Scholar
L.S. Ostrouchov, M.T. Heath, AND C.H. Romine, Modeling speedup in parallel sparse matrix factorization, Tech Report ORNL/TM-11786, Mathematical Sciences Section, Oak Ridge National Lab., December, 1990.
Google Scholar
D. Patterson, Massively parallel computer architecture: observations and ideas on a new theoretical model, Comp. Sci. Dept., Univ. of California at Berkeley, 1992.
Google Scholar
C. Pommerell, M. Annaratone, AND W. Fichtner, A set of new mapping and coloring heuristics for distributed-memory parallel processors, SIAM J. Scient. Stat. Comput. 13 (1992), pp. 194–226.
Article MathSciNet MATH Google Scholar
A. Pothen, H. D. Simon, And L. Wang, Spectral nested dissection,Report CS-92-01, Comp. Sci. Dept., Penn State Univ. Submitted to J. Parallel and Distrib. Comput.
Google Scholar
E. Rothberg AND A. Gupta, The performance impact of data reuse in parallel dense Cholesky factorization, Stanford Comp. Sci. Dept. Report STAN-CS-92–1401.
Google Scholar
E. Rothberg AND A. Gupta, An efficient block-oriented approach to parallel sparse Cholesky factorization, Stanford Comp. Sci. Dept. Tech. Report, 1992.
Google Scholar
Y. Saad AND M.H. Schultz, Data communication in parallel architectures, Parallel Comput. 11 (1989), pp. 131–150.
Article MathSciNet MATH Google Scholar
S. Venugopal AND V. K. Naik, Effects of partitioning and scheduling sparse matrix factorization on communication and load balance, Proceedings, Supercomputing 91, pp. 866–875. IEEE Computer Society Press, 1991.
Google Scholar
L. Hulbert AND E. Zmijewski, Limiting communication in parallel sparse Cholesky factorization, SIAM J. Matrix Anal. Applics. 12 (1991), pp. 1184–1197.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Research Institute for Advanced Computer Science, MS T045-1 NASA Ames Research Center, Moffett Field, CA, USA, 94035
Robert Schreiber

Authors

Robert Schreiber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Waterloo, Needles Hall, Waterloo, Ontario, N2L 3G1, Canada
Alan George
Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA, 94304-1314, USA
John R. Gilbert
Department of Computer Science, York University, North York, Ontario, M3J 1P3, Canada
Joseph W. H. Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schreiber, R. (1993). Scalability of Sparse Direct Solvers. In: George, A., Gilbert, J.R., Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation. The IMA Volumes in Mathematics and its Applications, vol 56. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8369-7_9

Download citation

DOI: https://doi.org/10.1007/978-1-4613-8369-7_9
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8371-0
Online ISBN: 978-1-4613-8369-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics