Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

Tomás, Andrés E.; Quintana-Ortí, Enrique S.

doi:10.1007/s11227-020-03176-3

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

Published: 24 January 2020

Volume 76, pages 8771–8786, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

496 Accesses
Explore all metrics

Abstract

We present a novel method for the QR factorization of large tall-and-skinny matrices that introduces an approximation technique for computing the Householder vectors. This approach is very competitive on a hybrid platform equipped with a graphics processor, with a performance advantage over the conventional factorization due to the reduced amount of data transfers between the graphics accelerator and the main memory of the host. Our experiments show that, for tall–skinny matrices, the new approach outperforms the code in MAGMA by a large margin, while it is very competitive for square matrices when the memory transfers and CPU computations are the bottleneck of the Householder QR factorization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent advances in implicit representation-based 3D shape generation

Article Open access 25 March 2024

Quantum bridge analytics I: a tutorial on formulating and using QUBO models

Article 07 April 2022

Density-matrix renormalization group: a pedagogical introduction

Article Open access 14 August 2023

Notes

Available at https://github.com/antodo/approxhouseholder.

References

Abdelfattah A, Haidar A, Tomov S, Dongarra J (2018) Analysis and design techniques towards high-performance and energy-efficient dense linear solvers on GPUs. IEEE Trans Parallel Distrib Syst 29(12):2700–2712. https://doi.org/10.1109/TPDS.2018.2842785
Article Google Scholar
Ballard G, Demmel J, Grigori L, Jacquelin M, Knight N, Nguyen H (2015) Reconstructing Householder vectors from tall-skinny QR. J Parallel Distrib Comput 85:3–31. https://doi.org/10.1016/j.jpdc.2015.06.003
Article Google Scholar
Barrachina S, Castillo M, Igual FD, Mayo R, Quintana-Ortí ES (2008) Solving dense linear systems on graphics processors. In: Luque E, Margalef T, Benítez D (eds) Euro-Par 2008—parallel processing. Springer, Heidelberg, pp 739–748
Chapter Google Scholar
Benson AR, Gleich DF, Demmel J (2013) Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: 2013 IEEE International Conference on Big Data, pp 264–272. https://doi.org/10.1109/BigData.2013.6691583
Businger P, Golub GH (1965) Linear least squares solutions by householder transformations. Numer Math 7(3):269–276. https://doi.org/10.1007/BF01436084
Article MathSciNet MATH Google Scholar
Demmel J, Grigori L, Hoemmen M, Langou J (2012) Communication-optimal parallel and sequential QR and LU factorizations. SIAM J Sci Comput 34(1):206–239. https://doi.org/10.1137/080731992
Article MathSciNet MATH Google Scholar
Dongarra J, Du Croz J, Hammarling S, Duff IS (1990) A set of level 3 basic linear algebra subprograms. ACM Trans Math Softw 16(1):1–17. https://doi.org/10.1145/77626.79170
Article MathSciNet MATH Google Scholar
Drmač Z, Bujanović Z (2008) On the failure of rank-revealing qr factorization software—a case study. ACM Trans Math Softw 35(2):12:1–12:28. https://doi.org/10.1145/1377612.1377616
Article MathSciNet Google Scholar
Fukaya T, Nakatsukasa Y, Yanagisawa Y, Yamamoto Y (2014) CholeskyQR2: A simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: 2014 5th workshop on latest advances in scalable algorithms for large-scale systems, pp 31–38. https://doi.org/10.1109/ScalA.2014.11
Fukaya T, Kannan R, Nakatsukasa Y, Yamamoto Y, Yanagisawa Y (2018) Shifted CholeskyQR for computing the QR factorization of ill-conditioned matrices, arXiv:1809.11085
Golub G, Van Loan C (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, Baltimore
Google Scholar
Gunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78. https://doi.org/10.1145/1055531.1055534
Article MathSciNet MATH Google Scholar
Joffrain T, Low TM, Quintana-Ortí ES, Rvd Geijn, Zee FGV (2006) Accumulating householder transformations, revisited. ACM Trans Math Softw 32(2):169–179. https://doi.org/10.1145/1141885.1141886
Article MathSciNet MATH Google Scholar
Puglisi C (1992) Modification of the householder method based on the compact WY representation. SIAM J Sci Stat Comput 13(3):723–726. https://doi.org/10.1137/0913042
Article MathSciNet MATH Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems, 3rd edn. Society for Industrial and Applied Mathematics, Philadelphia
Book Google Scholar
Schreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J Sci Comput 10(1):53–57. https://doi.org/10.1137/0910005
Article MathSciNet MATH Google Scholar
Stathopoulos A, Wu K (2001) A block orthogonalization procedure with constant synchronization requirements. SIAM J Sci Comput 23(6):2165–2182. https://doi.org/10.1137/S1064827500370883
Article MathSciNet MATH Google Scholar
Strazdins P (1998) A comparison of lookahead and algorithmic blocking techniques for parallel matrix factorization. Tech. Rep. TR-CS-98-07, Department of Computer Science, The Australian National University, Canberra 0200 ACT, Australia
Tomás Dominguez AE, Quintana Orti ES (2018) Fast blocking of householder reflectors on graphics processors. In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp 385–393. https://doi.org/10.1109/PDP2018.2018.00068
Volkov V, Demmel JW (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs. Tech. Rep. 202, LAPACK Working Note. http://www.netlib.org/lapack/lawnspdf/lawn202.pdf
Yamamoto Y, Nakatsukasa Y, Yanagisawa Y, Fukaya T (2015) Roundoff error analysis of the Cholesky QR2 algorithm. Electron Trans Numer Anal 44:306–326
MathSciNet MATH Google Scholar
Yamazaki I, Tomov S, Dongarra J (2015) Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J Sci Comput 37(3):C307–C330. https://doi.org/10.1137/14M0973773
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research was supported by the Project TIN2017-82972-R from the MINECO (Spain) and the EU H2020 Project 732631 “OPRECOMP. Open Transprecision Computing”.

Author information

Authors and Affiliations

Dept. d’Enginyeria i Ciència dels Computadors, Universitat Jaume I, 12071, Castelló de la Plana, Spain
Andrés E. Tomás
Dept. de Sistemes Informàtics i Computació, Universitat Politècnica de València, 46022, Valencia, Spain
Andrés E. Tomás
Dept. d’Informàtica de Sistemes i Computadors, Universitat Politècnica de València, 46022, Valencia, Spain
Enrique S. Quintana-Ortí

Authors

Andrés E. Tomás
View author publications
You can also search for this author in PubMed Google Scholar
Enrique S. Quintana-Ortí
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrés E. Tomás.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tomás, A.E., Quintana-Ortí, E.S. Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors. J Supercomput 76, 8771–8786 (2020). https://doi.org/10.1007/s11227-020-03176-3

Download citation

Published: 24 January 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s11227-020-03176-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

Abstract

Access this article

Similar content being viewed by others

Recent advances in implicit representation-based 3D shape generation

Quantum bridge analytics I: a tutorial on formulating and using QUBO models

Density-matrix renormalization group: a pedagogical introduction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors

Abstract

Access this article

Similar content being viewed by others

Recent advances in implicit representation-based 3D shape generation

Quantum bridge analytics I: a tutorial on formulating and using QUBO models

Density-matrix renormalization group: a pedagogical introduction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation