Skip to main content

Scalability of Sparse Direct Solvers

  • Conference paper
Graph Theory and Sparse Matrix Computation

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 56))

Abstract

We shall say that a scalable algorithm achieves efficiency that is bounded away from zero as the number of processors and the problem size increase in such a way that the size of the data structures increases linearly with the number of processors. In this paper we show that the column-oriented approach to sparse Cholesky for distributed-memory machines is not scalable. By considering message volume, node contention, and bisection width, one may obtain lower bounds on the time required for communication in a distributed algorithm. Applying this technique to distributed, column-oriented, dense Cholesky leads to the conclusion that N (the order of the matrix) must scale with P (the number of processors) so that storage grows like P 2. So the algorithm is not scalable. Identical conclusions have previously been obtained by consideration of communication and computation latency on the critical path in the algorithm; these results complement and reinforce that conclusion.

For the sparse case, both theory and some new experimental measurements, reported here, make the same point: for column-oriented distributed methods, the number of gridpoints (which is O(N)) must grow as P 2 in order to maintain parallel efficiency bounded above zero. Our sparse matrix results employ the “fan-in” distributed scheme, implemented on machines with either a grid or a fat-tree interconnect using a subtree-to-submachine mapping of the columns.

The alternative of distributing the rows and columns of the matrix to the rows and columns of a grid of processors is shown to be scalable for the dense case. Its scalability for the sparse case has been established previously [10]. To date, however, none of these methods has achieved high efficiency on a highly parallel machine.

Finally, open problems and other approaches that may be more fruitful are discussed.

Research Institute for Advanced Computer Science, MS T045-1 NASA Ames Research Center, Moffett Field, CA 94035. This author’s work was supported by the NAS Systems Division via Cooperative Agreement NCC 2-387 between NASA and the University Space Research Association (USRA).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Anderson, A. Benzoni, J. Dongarra, S. Moulton, S. Ostrouchov, B. TourancheauAND R. VAN DE Geijn, LAPACK for distributed memory architectures: progress report, In Parallel Processing for Scientific Computing, SIAM, 1992.

    Google Scholar 

  2. C. Ashcraft, S. C. Eisenstat, AND J. W. H. Liu, A fan-in algorithm for distributed sparse numerical factorization, SIAM J. Scient. Stat. Comput. 11 (1990), pp. 593–599.

    Article  MathSciNet  MATH  Google Scholar 

  3. C. Ashcraft, S. C. Eisenstat, J. W. H. Liu, AND A. H. Sherman, A comparison of three column-based distributed sparse factorization schemes, Research Report YALEU/DCS/RR810, Comp. Sci. Dept., Yale Univ., 1990.

    Google Scholar 

  4. C. Ashcraft, S. C. Eisenstat, J. W. H. Liu, B.W. Peyton, AND A. H. Sherman, A compute-ahead fan-in scheme for parallel sparse matrix factorization, In D. Pelletier, editor, Proceedings, Supercomputing Symposium ’90, pp. 351–361. École Polytechnique de Montréal, 1990.

    Google Scholar 

  5. C. Ashcraft, The fan-both family of column-based distributed Cholesky factorization algorithms, These proceedings.

    Google Scholar 

  6. P. Bjorstad AND M. D. Skogen, Domain decomposition algorithms of Schwarz type, designed for massively parallel computers. Proceedings of the Fifth International Symposium on Domain Decomposition. SIAM, 1992.

    Google Scholar 

  7. J. Dongarra, R VAN DE Geijn, AND D. Walker, A look at scalable dense linear algebra libraries, Proceedings, Scalable High Performance Computer Conference, Williamsburg, VA, 1992.

    Google Scholar 

  8. A. George, J. W. H. Liu, AND E. Ng, Communication results for parallel sparse Cholesky factorization on a hypercube, Parallel Comput. 10 (1989), pp. 287–298.

    Article  MathSciNet  MATH  Google Scholar 

  9. A. George, M. T. Heath, J. W. H. Liu, AND E. Ng, Solution of sparse positive definite systems on a hypercube, J. Comput. Appl. Math. 27 (1989), pp. 129–156.

    Article  MathSciNet  MATH  Google Scholar 

  10. J. R. Gilbert AND R. Schreiber, Highly parallel sparse Cholesky factorization,SIAM J. Scient. Stat. Comput., to appear.

    Google Scholar 

  11. J. R. Gilbert, C. Moler, AND R. Schreiber, Sparse matrices in MATLAB: design and implementation, SIAM J. Matrix Anal. Appl. 13 (1992), pp. 333–356.

    Article  MathSciNet  MATH  Google Scholar 

  12. S. W. Hammond, Mapping Unstructured Grid Computations to Massively Parallel Computers, PhD thesis, Dept. of Comp. Sci., Rensselaer Polytechnic Institute, 1992.

    Google Scholar 

  13. S. W. Hammond AND R. Schreiber, Mapping unstructured grid problems to the Connection Machine, In Piyush Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 11–30. MIT Press, 1992.

    Google Scholar 

  14. M. T. Heath, E. Ng, AND B. W. Peyton, Parallel algorithms for sparse linear systems, SIAM Review 33 (1991), pp. 420–460.

    Article  MathSciNet  MATH  Google Scholar 

  15. S. G. Kratzer, Massively parallel sparse matrix computations, In P. Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 178–186. MIT Press, 1992. A more complete version will appear in J. Supercomputing.

    Google Scholar 

  16. C. E. Leiserson, Fat-trees: universal networks for hardware-efficient supercomputing, IEEE Trans. Comput. C-34 (1985), pp. 892–901.

    Google Scholar 

  17. Guangye LI AND Thomas F. Coleman, A parallel triangular solver for a distributed memory multiprocessor, SIAM J. Scient. Stat. Comput. 9 (1988), pp. 485–502.

    Google Scholar 

  18. M. Mu AND J. R. Rice, Performance of PDE sparse solvers on hypercubes, In P. Mehrotra, J. Saltz, and R. Voigt, editors, Unstructured Scientific Computation on Multiprocessors, pp. 345–370. MIT Press, 1992.

    Google Scholar 

  19. M. Mu AND J. R. Rice, A grid based subtree-subcube assignment strategy for solving PDEs on hypercubes, Siam J. Scient. Stat. Comput., 13 (1992), pp. 826–839.

    Article  MathSciNet  MATH  Google Scholar 

  20. A. T. Ogielski AND W. Aiello, Sparse matrix algebra on parallel processor arrays,These proceedings.

    Google Scholar 

  21. D. P. O’leary AND G. W. Stewart, Data-flow algorithms for parallel matrix computations, Comm. ACM, 28 (1985), pp. 840–853.

    Article  MathSciNet  MATH  Google Scholar 

  22. L.S. Ostrouchov, M.T. Heath, AND C.H. Romine, Modeling speedup in parallel sparse matrix factorization, Tech Report ORNL/TM-11786, Mathematical Sciences Section, Oak Ridge National Lab., December, 1990.

    Google Scholar 

  23. D. Patterson, Massively parallel computer architecture: observations and ideas on a new theoretical model, Comp. Sci. Dept., Univ. of California at Berkeley, 1992.

    Google Scholar 

  24. C. Pommerell, M. Annaratone, AND W. Fichtner, A set of new mapping and coloring heuristics for distributed-memory parallel processors, SIAM J. Scient. Stat. Comput. 13 (1992), pp. 194–226.

    Article  MathSciNet  MATH  Google Scholar 

  25. A. Pothen, H. D. Simon, And L. Wang, Spectral nested dissection,Report CS-92-01, Comp. Sci. Dept., Penn State Univ. Submitted to J. Parallel and Distrib. Comput.

    Google Scholar 

  26. E. Rothberg AND A. Gupta, The performance impact of data reuse in parallel dense Cholesky factorization, Stanford Comp. Sci. Dept. Report STAN-CS-92–1401.

    Google Scholar 

  27. E. Rothberg AND A. Gupta, An efficient block-oriented approach to parallel sparse Cholesky factorization, Stanford Comp. Sci. Dept. Tech. Report, 1992.

    Google Scholar 

  28. Y. Saad AND M.H. Schultz, Data communication in parallel architectures, Parallel Comput. 11 (1989), pp. 131–150.

    Article  MathSciNet  MATH  Google Scholar 

  29. S. Venugopal AND V. K. Naik, Effects of partitioning and scheduling sparse matrix factorization on communication and load balance, Proceedings, Supercomputing 91, pp. 866–875. IEEE Computer Society Press, 1991.

    Google Scholar 

  30. L. Hulbert AND E. Zmijewski, Limiting communication in parallel sparse Cholesky factorization, SIAM J. Matrix Anal. Applics. 12 (1991), pp. 1184–1197.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag New York, Inc.

About this paper

Cite this paper

Schreiber, R. (1993). Scalability of Sparse Direct Solvers. In: George, A., Gilbert, J.R., Liu, J.W.H. (eds) Graph Theory and Sparse Matrix Computation. The IMA Volumes in Mathematics and its Applications, vol 56. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8369-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-8369-7_9

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4613-8371-0

  • Online ISBN: 978-1-4613-8369-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics