Ensemble Nyström

Kumar, Sanjiv; Mohri, Mehryar; Talwalkar, Ameet

doi:10.1007/978-1-4419-9326-7_7

Sanjiv Kumar³,
Mehryar Mohri⁴ &
Ameet Talwalkar⁵

13k Accesses

Abstract

A crucial technique for scaling kernel methods to very large datasets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices. The Nyström method is a popular technique to generate low-rank matrix approximations but it requires sampling of a large number of columns from the original matrix to achieve good accuracy. This chapter describes a new family of algorithms based on mixtures of Nyström approximations, Ensemble Nyström algorithms, that yield more accurate low-rank approximations than the standard Nyström method. We give a detailed study of variants of these algorithms based on simple averaging, an exponential weight method, and regression-based methods. A theoretical analysis of these algorithms, including novel error bounds guaranteeing a better convergence rate than the standard Nyström method is also presented. Finally, experiments with several datasets containing up to 1 M points are presented, demonstrating significant improvement over the standard Nyström approximation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Similar results (not reported here) were observed for other values of k and l as well.

References

Dimitris Achlioptas and Frank Mcsherry. Fast computation of low-rank matrix approximations. Journal of the ACM, 54(2), 2007.
Google Scholar
Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for sparsifying matrices. In Approx-Random, 2006.
Google Scholar
A. Asuncion and D.J. Newman. UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.
Francis R. Bach and Michael I. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3:1–48, 2002.
MathSciNet MATH Google Scholar
Francis R. Bach and Michael I. Jordan. Predictive low-rank decomposition for kernel methods. In International Conference on Machine Learning, 2005.
Google Scholar
Christopher T. Baker. The numerical treatment of integral equations. Clarendon Press, Oxford, 1977.
MATH Google Scholar
M.-A. Belabbas and P. J. Wolfe. On landmark selection and sampling in high-dimensional data analysis. arXiv:0906.4582v1[stat.ML], 2009.
Google Scholar
M. A. Belabbas and P. J. Wolfe. Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences of the United States of America, 106(2):369–374, January 2009.
Article Google Scholar
Bernhard E. Boser, Isabelle Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Conference on Learning Theory, 1992.
Google Scholar
Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Symposium on Discrete Algorithms, 2009.
Google Scholar
Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.
Article MathSciNet MATH Google Scholar
Emmanuel J. Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. arXiv:0903.1476v1[cs.IT], 2009.
Google Scholar
Corinna Cortes, Mehryar Mohri, Dmitry Pechyony, and Ashish Rastogi. Stability of transductive regression algorithms. In International Conference on Machine Learning, 2008.
Google Scholar
Corinna Cortes, Mehryar Mohri, and Ameet Talwalkar. On the impact of kernel approximation on learning accuracy. In Conference on Artificial Intelligence and Statistics, 2010.
Google Scholar
Corinna Cortes and Vladimir N. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995.
Article MATH Google Scholar
Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation and projective clustering via volume sampling. In Symposium on Discrete Algorithms, 2006.
Google Scholar
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal of Computing, 36(1), 2006.
Google Scholar
Petros Drineas and Michael W. Mahoney. On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175, 2005.
MathSciNet MATH Google Scholar
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844–881, 2008.
Google Scholar
Shai Fine and Katya Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2002.
MATH Google Scholar
Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. Spectral grouping using the Nyström method. Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225, 2004.
Article Google Scholar
Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. In Foundation of Computer Science, 1998.
Google Scholar
Gene Golub and Charles Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 2nd edition, 1983.
Google Scholar
S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A theory of pseudoskeleton approximations. Linear Algebra and Its Applications, 261:1–21, 1997.
Article MathSciNet MATH Google Scholar
G. Gorrell. Generalized Hebbian algorithm for incremental Singular Value Decomposition in natural language processing. In European Chapter of the Association for Computational Linguistics, 2006.
Google Scholar
Ming Gu and Stanley C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal of Scientific Computing, 17(4):848–869, 1996.
Article MathSciNet MATH Google Scholar
A. Gustafson, E. Snitkin, S. Parker, C. DeLisi, and S. Kasif. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC:Genomics, 7:265, 2006.
Google Scholar
Nathan Halko, Per Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061v1[math.NA], 2009.
Google Scholar
Sariel Har-peled. Low-rank matrix approximation in linear time, manuscript, 2006.
Google Scholar
Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM, 53(3):307–323, 2006.
Article MathSciNet MATH Google Scholar
W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26:189–206, 1984.
Article MathSciNet MATH Google Scholar
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Ensemble Nyström method. In Neural Information Processing Systems, 2009.
Google Scholar
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. On sampling-based approximate spectral decomposition. In International Conference on Machine Learning, 2009.
Google Scholar
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling techniques for the Nyström method. In Conference on Artificial Intelligence and Statistics, 2009.
Google Scholar
Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
Mu Li, James T. Kwok, and Bao-Liang Lu. Making large-scale Nyström approximation possible. In International Conference on Machine Learning, 2010.
Google Scholar
Edo Liberty. Accelerated dense random projections. Ph.D. thesis, computer science department, Yale University, New Haven, CT, 2009.
Google Scholar
N. Littlestone and M. K. Warmuth. The Weighted Majority algorithm. Information and Computation, 108(2):212–261, 1994.
Article MathSciNet MATH Google Scholar
David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004.
Article Google Scholar
Michael W Mahoney and Petros Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697–702, 2009.
Google Scholar
E.J. Nyström. Über die praktische auflösung von linearen integralgleichungen mit anwendungen auf randwertaufgaben der potentialtheorie. Commentationes Physico-Mathematicae, 4(15):1–52, 1928.
Google Scholar
Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent Semantic Indexing: a probabilistic analysis. In Principles of Database Systems, 1998.
Google Scholar
John C. Platt. Fast embedding of sparse similarity graphs. In Neural Information Processing Systems, 2004.
Google Scholar
Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for Principal Component Analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100–1124, 2009.
Article MathSciNet MATH Google Scholar
Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, 54(4):21, 2007.
Article MathSciNet MATH Google Scholar
A. F. Ruston. Auerbachs theorem. Mathematical Proceedings of the Cambridge Philosophical Society, 56:476–480, 1964.
Google Scholar
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.
Article Google Scholar
Terence Sim, Simon Baker, and Maan Bsat. The CMU pose, illumination, and expression database. In Conference on Automatic Face and Gesture Recognition, 2002.
Google Scholar
Alex J. Smola and Bernhard Schölkopf. Sparse Greedy Matrix Approximation for machine learning. In International Conference on Machine Learning, 2000.
Google Scholar
G. W. Stewart. Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numerische Mathematik, 83(2):313–323, 1999.
Article MathSciNet MATH Google Scholar
Ameet Talwalkar, Sanjiv Kumar, and Henry Rowley. Large-scale manifold learning. In Conference on Vision and Pattern Recognition, 2008.
Google Scholar
Ameet Talwalkar and Afshin Rostamizadeh. Matrix coherence and the Nyström method. In Conference on Uncertainty in Artificial Intelligence, 2010.
Google Scholar
Christopher K. I. Williams and Matthias Seeger. Using the Nyström method to speed up kernel machines. In Neural Information Processing Systems, 2000.
Google Scholar
Kai Zhang and James T. Kwok. Density-weighted Nyström method for computing large kernel eigensystems. Neural Computation, 21(1):121–146, 2009.
Article MathSciNet MATH Google Scholar
Kai Zhang, Ivor Tsang, and James Kwok. Improved Nyström low-rank approximation and error analysis. In International Conference on Machine Learning, 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., 76, Ninth Avenue, New York, NY, 10011, USA
Sanjiv Kumar
Courant Institute, New York, NY, USA
Mehryar Mohri
Division of Computer Science, University of California, Berkeley, CA, USA
Ameet Talwalkar

Authors

Sanjiv Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Mehryar Mohri
View author publications
You can also search for this author in PubMed Google Scholar
Ameet Talwalkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjiv Kumar .

Editor information

Editors and Affiliations

Microsoft, One Microsoft Road, Redmond, 98052, USA
Cha Zhang
Honeywell, Douglas Drive North 1985, Golden Valley, 55422, USA
Yunqian Ma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kumar, S., Mohri, M., Talwalkar, A. (2012). Ensemble Nyström. In: Zhang, C., Ma, Y. (eds) Ensemble Machine Learning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9326-7_7

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9326-7_7
Published: 19 January 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9325-0
Online ISBN: 978-1-4419-9326-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics