Abstract
A crucial technique for scaling kernel methods to very large datasets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices. The Nyström method is a popular technique to generate low-rank matrix approximations but it requires sampling of a large number of columns from the original matrix to achieve good accuracy. This chapter describes a new family of algorithms based on mixtures of Nyström approximations, Ensemble Nyström algorithms, that yield more accurate low-rank approximations than the standard Nyström method. We give a detailed study of variants of these algorithms based on simple averaging, an exponential weight method, and regression-based methods. A theoretical analysis of these algorithms, including novel error bounds guaranteeing a better convergence rate than the standard Nyström method is also presented. Finally, experiments with several datasets containing up to 1 M points are presented, demonstrating significant improvement over the standard Nyström approximation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Similar results (not reported here) were observed for other values of k and l as well.
References
Dimitris Achlioptas and Frank Mcsherry. Fast computation of low-rank matrix approximations. Journal of the ACM, 54(2), 2007.
Sanjeev Arora, Elad Hazan, and Satyen Kale. A fast random sampling algorithm for sparsifying matrices. In Approx-Random, 2006.
A. Asuncion and D.J. Newman. UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.
Francis R. Bach and Michael I. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3:1–48, 2002.
Francis R. Bach and Michael I. Jordan. Predictive low-rank decomposition for kernel methods. In International Conference on Machine Learning, 2005.
Christopher T. Baker. The numerical treatment of integral equations. Clarendon Press, Oxford, 1977.
M.-A. Belabbas and P. J. Wolfe. On landmark selection and sampling in high-dimensional data analysis. arXiv:0906.4582v1[stat.ML], 2009.
M. A. Belabbas and P. J. Wolfe. Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences of the United States of America, 106(2):369–374, January 2009.
Bernhard E. Boser, Isabelle Guyon, and Vladimir N. Vapnik. A training algorithm for optimal margin classifiers. In Conference on Learning Theory, 1992.
Christos Boutsidis, Michael W. Mahoney, and Petros Drineas. An improved approximation algorithm for the column subset selection problem. In Symposium on Discrete Algorithms, 2009.
Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.
Emmanuel J. Candès and Terence Tao. The power of convex relaxation: Near-optimal matrix completion. arXiv:0903.1476v1[cs.IT], 2009.
Corinna Cortes, Mehryar Mohri, Dmitry Pechyony, and Ashish Rastogi. Stability of transductive regression algorithms. In International Conference on Machine Learning, 2008.
Corinna Cortes, Mehryar Mohri, and Ameet Talwalkar. On the impact of kernel approximation on learning accuracy. In Conference on Artificial Intelligence and Statistics, 2010.
Corinna Cortes and Vladimir N. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273–297, 1995.
Amit Deshpande, Luis Rademacher, Santosh Vempala, and Grant Wang. Matrix approximation and projective clustering via volume sampling. In Symposium on Discrete Algorithms, 2006.
Petros Drineas, Ravi Kannan, and Michael W. Mahoney. Fast Monte Carlo algorithms for matrices II: Computing a low-rank approximation to a matrix. SIAM Journal of Computing, 36(1), 2006.
Petros Drineas and Michael W. Mahoney. On the Nyström method for approximating a Gram matrix for improved kernel-based learning. Journal of Machine Learning Research, 6:2153–2175, 2005.
Petros Drineas, Michael W. Mahoney, and S. Muthukrishnan. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844–881, 2008.
Shai Fine and Katya Scheinberg. Efficient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2002.
Charless Fowlkes, Serge Belongie, Fan Chung, and Jitendra Malik. Spectral grouping using the Nyström method. Transactions on Pattern Analysis and Machine Intelligence, 26(2):214–225, 2004.
Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. In Foundation of Computer Science, 1998.
Gene Golub and Charles Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 2nd edition, 1983.
S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin. A theory of pseudoskeleton approximations. Linear Algebra and Its Applications, 261:1–21, 1997.
G. Gorrell. Generalized Hebbian algorithm for incremental Singular Value Decomposition in natural language processing. In European Chapter of the Association for Computational Linguistics, 2006.
Ming Gu and Stanley C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM Journal of Scientific Computing, 17(4):848–869, 1996.
A. Gustafson, E. Snitkin, S. Parker, C. DeLisi, and S. Kasif. Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC:Genomics, 7:265, 2006.
Nathan Halko, Per Gunnar Martinsson, and Joel A. Tropp. Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061v1[math.NA], 2009.
Sariel Har-peled. Low-rank matrix approximation in linear time, manuscript, 2006.
Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. Journal of the ACM, 53(3):307–323, 2006.
W. B. Johnson and J. Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert space. Contemporary Mathematics, 26:189–206, 1984.
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Ensemble Nyström method. In Neural Information Processing Systems, 2009.
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. On sampling-based approximate spectral decomposition. In International Conference on Machine Learning, 2009.
Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. Sampling techniques for the Nyström method. In Conference on Artificial Intelligence and Statistics, 2009.
Yann LeCun and Corinna Cortes. The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
Mu Li, James T. Kwok, and Bao-Liang Lu. Making large-scale Nyström approximation possible. In International Conference on Machine Learning, 2010.
Edo Liberty. Accelerated dense random projections. Ph.D. thesis, computer science department, Yale University, New Haven, CT, 2009.
N. Littlestone and M. K. Warmuth. The Weighted Majority algorithm. Information and Computation, 108(2):212–261, 1994.
David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004.
Michael W Mahoney and Petros Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697–702, 2009.
E.J. Nyström. Über die praktische auflösung von linearen integralgleichungen mit anwendungen auf randwertaufgaben der potentialtheorie. Commentationes Physico-Mathematicae, 4(15):1–52, 1928.
Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent Semantic Indexing: a probabilistic analysis. In Principles of Database Systems, 1998.
John C. Platt. Fast embedding of sparse similarity graphs. In Neural Information Processing Systems, 2004.
Vladimir Rokhlin, Arthur Szlam, and Mark Tygert. A randomized algorithm for Principal Component Analysis. SIAM Journal on Matrix Analysis and Applications, 31(3):1100–1124, 2009.
Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM, 54(4):21, 2007.
A. F. Ruston. Auerbachs theorem. Mathematical Proceedings of the Cambridge Philosophical Society, 56:476–480, 1964.
Bernhard Schölkopf, Alexander Smola, and Klaus-Robert Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.
Terence Sim, Simon Baker, and Maan Bsat. The CMU pose, illumination, and expression database. In Conference on Automatic Face and Gesture Recognition, 2002.
Alex J. Smola and Bernhard Schölkopf. Sparse Greedy Matrix Approximation for machine learning. In International Conference on Machine Learning, 2000.
G. W. Stewart. Four algorithms for the efficient computation of truncated pivoted QR approximations to a sparse matrix. Numerische Mathematik, 83(2):313–323, 1999.
Ameet Talwalkar, Sanjiv Kumar, and Henry Rowley. Large-scale manifold learning. In Conference on Vision and Pattern Recognition, 2008.
Ameet Talwalkar and Afshin Rostamizadeh. Matrix coherence and the Nyström method. In Conference on Uncertainty in Artificial Intelligence, 2010.
Christopher K. I. Williams and Matthias Seeger. Using the Nyström method to speed up kernel machines. In Neural Information Processing Systems, 2000.
Kai Zhang and James T. Kwok. Density-weighted Nyström method for computing large kernel eigensystems. Neural Computation, 21(1):121–146, 2009.
Kai Zhang, Ivor Tsang, and James Kwok. Improved Nyström low-rank approximation and error analysis. In International Conference on Machine Learning, 2008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kumar, S., Mohri, M., Talwalkar, A. (2012). Ensemble Nyström. In: Zhang, C., Ma, Y. (eds) Ensemble Machine Learning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9326-7_7
Download citation
DOI: https://doi.org/10.1007/978-1-4419-9326-7_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9325-0
Online ISBN: 978-1-4419-9326-7
eBook Packages: EngineeringEngineering (R0)