Skip to main content

High-Performance Library Software for QR Factorization

  • Conference paper
  • First Online:
Applied Parallel Computing. New Paradigms for HPC in Industry and Academia (PARA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1947))

Included in the following conference series:

Abstract

In [5],[6], we presented algorithm RGEQR3, a purely recursive formulation of the QR factorization. Using recursion leads us to a natural way to choose the k-way aggregating Householder transform of Schreiber and Van Loan [10]. RGEQR3 is a performance critical subroutine for the main (hybrid recursive) routine RGEQRF for QR factorization of a general m×n matrix. This contribution presents a new version of RGEQRF and its accompanying SMP parallel counterpart, implemented for a future release of the IBM ESSL library. It represents a robust high-performance piece of library software for QR factorization on uniprocessor and multiprocessor systems. The implementation builds on previous results [5],[6]. In particular, the new version is optimized in a number of ways to improve the performance; e.g., for small matrices and matrices with a very small number of columns. This is partly done by including mini blocking in the otherwise pure recursive RGEQR3. We describe the salient features of this implementation. Our serial implementation outperforms the corresponding LAPACK routine by 10-65% for square matrices and 10-100% on tall and thin matrices on the IBM POWER2 and POWER3 nodes. The tests covered matrix sizes which varied from very small to very large. The SMP parallel implementation shows close to perfect speedup on a 4-processor PPC604e node.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R.C. Agarwal, F.G. Gustavson, and M. Zubair. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms. IBM J. Res. Develop, 38(5):563–576, September 1994.

    Google Scholar 

  2. E._Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Green-baum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users’ Guide-Release 2.0. SIAM, Philadelphia, 1994.

    Google Scholar 

  3. C. Bischof and C. Van Loan. The WY representation for products of householder matrices. SIAM J. Scientific and Statistical Computing, 8(1):s2–s13, 1987.

    Article  Google Scholar 

  4. A. Chalmers and J. Tidmus. Practical Parallel Processing. International Thomson Computer Press, UK, 1996.

    Google Scholar 

  5. E. Elmroth and F. Gustavson. Applying Recursion to Serial and Parallel QR Facto-rization Leads to Better Performance. IBM Journal of Research and Development, 44, No. 4, 605–624, 2000.

    Article  Google Scholar 

  6. E. Elmroth and F. Gustavson. New serial and parallel recursive QR factorization algorithms for SMP systems. In B. Kågström et al., editors, Applied Parallel Com-puting, Large Scale Scientific and Industrial Problems, Lecture Notes in Computer Science, No. 1541, pages 120–128, 1998.

    Chapter  Google Scholar 

  7. F. Gustavson. Recursion Leads to Automatic Variable Blocking for Dense Linear-Algebra Algorithms. IBM Journal of Research and Development, Vol. 41, No. 6, 1997.

    Google Scholar 

  8. F. Gustavson, A. Henriksson, I. Jonsson, B. Kågström and P. Ling. Superscalar GEMM-based Level 3 BLAS-The On-going Evolution of a Portable and High-Performance Library. In Kågström et al. (eds), Applied Parallel Computing. Large Scale Scientific and Industrial Problems, Lecture Notes in Computer Science, Vol. 1541, pp 207–215, Springer-Verlag, 1998.

    Chapter  Google Scholar 

  9. F. Gustavson and I. Jonsson. High Performance Cholesky Factorization via Blocking and Recursion that uses Minimal Storage. This Proceedings.

    Google Scholar 

  10. R. Schreiber and C. Van Loan. A storage efficient WY representation for products of householder transformations. SIAM J. Scientific and Statistical Computing, 10(1):53–57, 1989.

    Article  MATH  Google Scholar 

  11. S. Toledo. Locality of reference in LU decomposition with partial pivoting. SIAM J. Matrix. Anal. Appl., 18(4), 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Elmroth, E., Gustavson, F. (2001). High-Performance Library Software for QR Factorization. In: Sørevik, T., Manne, F., Gebremedhin, A.H., Moe, R. (eds) Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. PARA 2000. Lecture Notes in Computer Science, vol 1947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-70734-4_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-70734-4_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41729-3

  • Online ISBN: 978-3-540-70734-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics