Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

Li, Huamin; Kluger, Yuval; Tygert, Mark

doi:10.1007/s10444-018-9600-1

Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

Published: 19 March 2018

Volume 44, pages 1651–1672, (2018)
Cite this article

Advances in Computational Mathematics Aims and scope Submit manuscript

Huamin Li¹,
Yuval Kluger² &
Mark Tygert³

228 Accesses
7 Citations
Explore all metrics

Abstract

Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark (the popular platform for distributed computation); in particular, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Randomized Algorithms for Low-Rank Matrix Factorizations: Sharp Performance Bounds

Article 24 May 2014

Faster randomized block sparse Kaczmarz by averaging

Article Open access 28 December 2022

Distributed Randomized Algorithms for PageRank Computation: Recent Advances

References

Ailon, N., Rauhut, H.: Fast and RIP-optimal transforms. Discrete Comput. Geom. 52(4), 780–798 (2014)
Article MathSciNet Google Scholar
Ballard, G., Demmel, J., Dumitriu, I.: Minimizing communication for eigenproblems and the singular value decomposition. Tech. Rep. UCB/EECS-2011-14, Dept. EECS, UC Berkeley (2011)
Benson, A.R., Gleich, D.F., Demmel, J.: Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. In: Proc. IEEE Internat. Conf. Big Data, pp. 264–272. IEEE (2013)
Demmel, J., Dumitriu, I., Holtz, O.: Fast linear algebra is stable. Numer. Math. 108(1), 59–91 (2007)
Article MathSciNet Google Scholar
Demmel, J., Grigori, L., Gu, M., Xiang, H.: Communication-avoiding rank-revealing QR factorization with column pivoting. SIAM J. Matrix. Anal. Appl. 36(1), 55–89 (2015)
Article MathSciNet Google Scholar
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), 206–239 (2012)
Article MathSciNet Google Scholar
Durstenfeld, R.: Algorithm 235: random permutation. Commun. ACM 7(7), 420 (1964)
Article Google Scholar
Fukaya, T., Nakatsukasa, Y., Yanagisawa, Y., Yamamoto, Y.: CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. In: Proc. 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 31–38. IEEE (2014)
Gittens, A., Devarakonda, A., Racah, E., Ringenburg, M., Gerhardt, L., Kottalam, J., Liu, J., Maschhoff, K., Canon, S., Chhugani, J., Sharma, P., Yang, J., Demmel, J., Harrell, J., Krishnamurthy, V., Mahoney, M.W.: Prabhat: Matrix factorization at scale: a comparison of scientific data analytics in Spark and C+MPI using three case studies. In: Proc. 2016 IEEE International Conference on Big Data, pp. 204–213. IEEE (2016)
Golub, G., Van Loan, C.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2012)
Google Scholar
Golub, G.H., Mahoney, M.W., Drineas, P., Lim, L.: Bridging the gap between numerical linear algebra, theoretical computer science, and data applications. SIAM News 39(8), 1–3 (2006)
Google Scholar
Halko, N., Martinsson, P.G., Tropp, J.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet Google Scholar
Jolliffe, I.T.: Principal component analysis, 2nd edn. Springer Series in Statistics. Springer-Verlag, New York (2002)
Google Scholar
Lehoucq, R., Sorensen, D., Yang, C.: ARPACK user’s guide: Solution of large-scale eigenvalue problems with implicitly restarted arnoldi methods. SIAM, Philadelphia, PA (1998)
Li, H., Linderman, G., Szlam, A., Stanton, K., Kluger, Y., Tygert, M.: Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans. Math. Soft. 43(3), 28:1–28:14 (2016)
MathSciNet MATH Google Scholar
Linden, A., Krensky, P., Hare, J., Idoine, C.J., Sicular, S., Vashisth, S.: Magic quadrant for data science platforms. Tech. Rep. G00301536, Gartner (2017)
Shabat, G., Shmueli, Y., Aizenbud, Y., Averbuch, A.: Randomized LU decomposition. Appl. Comput. Harmon. Anal. To appear (2016)
Stathopoulos, A., Wu, K.: A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comput. 23(6), 2165–2182 (2002)
Article MathSciNet Google Scholar
Yamazaki, I., Tomov, S., Dongarra, J.: Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput. 37(3), C307—C330 (2015)
Article MathSciNet Google Scholar
Yamazaki, I., Tomov, S., Dongarra, J.: Stability and performance of various singular value QR implementations on multicore CPU with a GPU. ACM Trans. Math. Soft. 43(2), 10:1–10:18 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous editor and referees for shaping the presentation.

Author information

Authors and Affiliations

Program in Applied Mathematics, Yale University, 51 Prospect St., New Haven, CT, 06510, USA
Huamin Li
Department of Pathology, School of Medicine, Yale University, Suite 505L, 300 George St., New Haven, CT, 06520, USA
Yuval Kluger
Facebook Artificial Intelligence Research, 1 Facebook Way, Menlo Park, CA, 94025, USA
Mark Tygert

Authors

Huamin Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuval Kluger
View author publications
You can also search for this author in PubMed Google Scholar
Mark Tygert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark Tygert.

Additional information

Communicated by: Gunnar J Martinsson

Y. Kluger and H. Li were supported in part by United States National Institutes of Health grant 1R01HG008383-01A1. Y. Kluger is with the Program in Applied Mathematics, the Program in Biological and Biomedical Sciences, the Cancer Center, the Center for Medical Informatics, and the Department of Pathology in the School of Medicine at Yale University.

Appendices

Appendix A: Restricting to ten times fewer executors

Tables 11, 12, 13, 14, 15, 16, 17 and 18 display results analogous to those in Tables 3–5, 6–8, 9 and 10, but with the number of executors, spark.dynamicAllocation.maxExecutors, set to 18 (rather than 180). The results are broadly comparable to those presented earlier. This indicates how the timings scale with the number of machines. Of course, other processing in Spark (not necessarily related to principal component analysis or singular value decomposition) can benefit from having the data stored over more executors, and moving data around the cluster can dominate the overall timings in real-world usage (see also Remark 2 in the introduction of the present paper).

Table 11 m = 1,000,000; n = 2,000; restricted to ten times fewer executors

Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

Abstract

Access this article

Similar content being viewed by others

Randomized Algorithms for Low-Rank Matrix Factorizations: Sharp Performance Bounds

Faster randomized block sparse Kaczmarz by averaging

Distributed Randomized Algorithms for PageRank Computation: Recent Advances

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Restricting to ten times fewer executors

Appendix B: Another example with ten times fewer executors

Appendix C: Timings for generating the test matrices

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation