Abstract
Kernel methods are a broad class of algorithms that find application in approximation theory and non-parametric statistics. In this article, we review the literature with a focus on methods for uncertainty quantification and we discuss computational challenges related to kernel methods. In particular, we focus on approximating kernel matrices, one of the main computational bottlenecks in kernel methods. The most popular method for constructing approximations of kernel matrices is the Nystrom method, which uses randomized sampling to construct a low-rank factorization of a kernel matrix. We present a parallel implementation of the Nystrom method using the Elemental parallel linear algebra library and discuss an efficient variant called the one-shot Nystrom method. We conclude with examples of a regression problems for binary classification in high dimensions that illustrate the capabilities and limitations of Nystrom methods. In our largest test, we consider a dataset from high-energy physics in 28 dimensions with ten million points.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In mathematical physics, the kernel is the Green’s function of the partial differential equations (PDEs) that model the target application and the weights are the right-hand side of the PDE.
- 2.
Throughout, we refer to a point \(\underline{x}_{i}\) for which we compute y i as a target and a point \(\underline{x}_{j}\) as a source with weight w j .
- 3.
ASKIT stands for Approximate Skeletonization Kernel Independent Treecode.
- 4.
For example, the intrinsic dimension of a set of points distributed on a curve in three dimensions is one.
- 5.
We use the term interaction between two points \(\underline{x}_{i}\) and \(\underline{x}_{j}\) to refer to \(K(\underline{x}_{i},\underline{x}_{j})\).
References
Alwan, A., Aluru, N.: Improved statistical models for limited datasets in uncertainty quantification using stochastic collocation. J. Comput. Phys. 255, 521–539 (2013)
Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D.W., O’Neil, M.: Fast direct methods for Gaussian processes and the analysis of NASA Kepler mission data. arXiv preprint (2014) [arXiv:1403.6015]
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51(1), 117 (2008)
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
Bardeen, J., Bond, J., Kaiser, N., Szalay, A.: The statistics of peaks of Gaussian random fields. Astrophys. J. 304, 15–61 (1986)
Biegler, L., Biros, G., Ghattas, O., Marzouk, Y., Heinkenschloss, M., Keyes, D., Mallick, B., Tenorio, L., van Bloemen Waanders, B., Willcox, K. (eds.): Large-Scale Inverse Problems and Quantification of Uncertainty. Wiley, New York (2011)
Bilionis, I., Zabaras, N., Konomi, B.A., Lin, G.: Multi-output separable gaussian process: towards an efficient, fully bayesian paradigm for uncertainty quantification. J. Comput. Phys. 241, 212–239 (2013)
Buhmann, M.D.: Radial Basis Functions: Theory and Implementations, vol. 12. Cambridge University Press, Cambridge (2003)
Bungartz, H.J., Griebel, M.: Sparse grids. In: Acta Numerica, vol. 13, pp. 147–269. Cambridge University Press, Cambridge (2004)
Camps-Valls, G., Bruzzone, L., et al.: Kernel Methods for Remote Sensing Data Analysis, vol. 26. Wiley, New York (2009)
Cecil, T., Qian, J., Osher, S.: Numerical methods for high dimensional Hamilton-Jacobi equations using radial basis functions. J. Comput. Phys. 196(1), 327–347 (2004)
Chen, J., Wang, L., Anitescu, M.: A fast summation tree code for Matérn kernel. SIAM J. Sci. Comput. 36(1), A289–A309 (2014)
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)
Cubuk, E.D., Schoenholz, S.S., Rieser, J.M., Malone, B.D., Rottler, J., Durian, D.J., Kaxiras, E., Liu, A.J.: Identifying structural flow defects in disordered solids using machine-learning methods. Phys. Rev. Lett. 114, 108001 (2015). http://link.aps.org/doi/10.1103/PhysRevLett.114.108001
Drineas, P., Mahoney, M.W.: On the nyström method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
Elman, H.C., Miller, C.W.: Stochastic collocation with kernel density estimation. Comput. Methods Appl. Mech. Eng. 245–246, 36–46 (2012)
Evensen, G.: Data Assimilation: The Ensemble Kalman Filter. Springer, Heidelberg (2006)
Farrell, K., Oden, J.T.: Calibration and validation of coarse-grained models of atomic systems: application to semiconductor manufacturing. Comput. Mech. 54(1), 3–19 (2014)
Fornberg, B., Piret, C.: A stable algorithm for flat radial basis functions on a sphere. SIAM J. Sci. Comput. 30(1), 60–80 (2007)
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, New York (2001)
Gittens, A., Mahoney, M.: Revisiting the Nystrom method for improved large-scale machine learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 567–575 (2013)
Gorodetsky, A., Marzouk, Y.: Efficient localization of discontinuities in complex computational simulations. SIAM J. Sci. Comput. 36(6), A2584–A2610 (2014)
Greengard, L.: Fast algorithms for classical physics. Science 265(5174), 909–914 (1994)
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73, 325–348 (1987)
Greengard, L., Strain, J.: The fast Gauss transform. SIAM J. Sci. Stat. Comput. 12(1), 79–94 (1991)
Griebel, M., Wissel, D.: Fast approximation of the discrete Gauss transform in higher dimensions. J. Sci. Comput. 55(1), 149–172 (2013)
Hofmann, T., Schölkopf, B., Smola, A.J.: Kernel methods in machine learning. Ann. Stat. 36, 1171–1220 (2008)
Klaas, M., Briers, M., De Freitas, N., Doucet, A., Maskell, S., Lang, D.: Fast particle smoothing: if I had a million particles. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 481–488. ACM, New York (2006)
Kress, R.: Linear Integral Equations. Applied Mathematical Sciences. Springer, New York (1999)
Ma, X., Zabaras, N.: Kernel principal component analysis for stochastic input model generation. J. Comput. Phys. 230(19), 7311–7331 (2011)
Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends Mach. Learn. 3(2), 123–224 (2011)
March, W.B., Biros, G.: Far-field compression for fast kernel summation methods in high dimensions, pp. 1–43 (2014) [arxiv.org/abs/1409.2802v1]
March, W.B., Xiao, B., Biros, G.: ASKIT: approximate skeletonization kernel-independent treecode in high dimensions. SIAM J. Sci. Comput. 37(2), 1089–1110 (2015). http://dx.doi.org/10.1137/140989546
March, W.B., Xiao, B., Tharakan, S., Yu, C.D., Biros, G.: Robust treecode approximation for kernel machines. In: Proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Sydney, pp. 1–10 (2008). http://dx.doi.org/10.1145/2783258.2783272
March, W.B., Xiao, B., Yu, C., Biros, G.: An algebraic parallel treecode in arbitrary dimensions. In: Proceedings of IPDPS 2015. 29th IEEE International Parallel and Distributed Processing Symposium, Hyderabad (2015). http://padas.ices.utexas.edu/static/papers/ipdps15askit.pdf
Medina, J.C., Taflanidis, A.A.: Adaptive importance sampling for optimization under uncertainty problems. Comput. Methods Appl. Mech. Eng. 279, 133–162 (2014)
Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl. Comput. Harmon. Anal. 21(1), 113–127 (2006)
Nichol, R., Sheth, R.K., Suto, Y., Gray, A., Kayo, I., Wechsler, R., Marin, F., Kulkarni, G., Blanton, M., Connolly, A., et al.: The effect of large-scale structure on the SDSS galaxy three-point correlation function. Mon. Not. R. Astron. Soc. 368(4), 1507–1514 (2006)
Peherstorfer, B., Pflüger, D., Bungartz, H.J.: Density estimation with adaptive sparse grids for large data sets. In: Proceedings of the 2014 SIAM International Conference on Data Mining, pp. 443–451. Society for Industrial and Applied Mathematics, Philadelphia (2014)
Petra, N., Martin, J., Stadler, G., Ghattas, O.: A computational framework for infinite-dimensional Bayesian inverse problems, part II: stochastic Newton MCMC with application to ice sheet flow inverse problems. SIAM J. Sci. Comput. 36(4), A1525–A1555 (2014)
Petschow, M., Peise, E., Bientinesi, P.: High-performance solvers for dense hermitian eigenproblems. SIAM J. Sci. Comput. 35(1), C1–C22 (2013)
Poulson, J., Marker, B., van de Geijn, R.A., Hammond, J.R., Romero, N.A.: Elemental: a new framework for distributed memory dense matrix computations. ACM Trans. Math. Softw. 39(2), 13:1–13:24 (2013). http://doi.acm.org/10.1145/2427023.2427030
Rasmussen, C.E., Williams, C.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
Schaback, R., Wendland, H.: Kernel techniques: from machine learning to meshless methods. Acta Numer. 15, 543–639 (2006)
Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Artificial Neural Networks—ICANN’97, pp. 583–588. Springer, Heidelberg (1997)
Schwab, C., Todor, R.A.: Karhunen-Loeve approximation of random fields by generalized fast multipole methods. J. Comput. Phys. 217(1), 100–122 (2006). http://dx.doi.org/10.1016/j.jcp.2006.01.048
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Spivak, M., Veerapaneni, S.K., Greengard, L.: The fast generalized Gauss transform. SIAM J. Sci. Comput. 32(5), 3092–3107 (2010)
Talmon, R., Coifman, R.R.: Intrinsic modeling of stochastic dynamical systems using empirical geometry. Appl. Comput. Harmon. Anal. 39(1), 138–160 (2015)
Talwalkar, A., Rostamizadeh, A.: Matrix coherence and the nystrom method. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010) (2010)
Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2005)
Wan, X., Karniadakis, G.E.: Solving elliptic problems with non-gaussian spatially-dependent random coefficients. Comput. Methods Appl. Mech. Eng. 198(21–26), 1985–1995 (2009)
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer, New York (2004)
Weber, R., Schek, H., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the International Conference on Very Large Data Bases, pp. 194–205. Morgan Kaufmann, San Francisco (1998)
Wendland, H.: Scattered Data Approximation, vol. 17. Cambridge University Press, Cambridge (2004)
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Proceedings of the 14th Annual Conference on Neural Information Processing Systems, pp. 682–688 (2001)
Xiao, B.: Parallel algorithms for the generalized n-body problem in high dimensions and their applications for bayesian inference and image analysis. Ph.D. thesis, Georgia Institute of Technology (2014)
Xiu, D.: Fast numerical methods for stochastic computations: a review. Commun. Comput. Phys. 5(2–4), 242–272 (2009)
Ying, L., Biros, G., Zorin, D.: A kernel-independent adaptive fast multipole method in two and three dimensions. J. Comput. Phys. 196(2), 591–626 (2004)
Acknowledgements
This material is based upon work supported by AFOSR grants FA9550-12-10484 and FA9550-11-10339; by NSF grants CCF-1337393, OCI-1029022; by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under Award Numbers DE-SC0010518, DE-SC0009286, and DE- FG02-08ER2585; by NIH grant 10042242; and by the Technische Universität München—Institute for Advanced Study, funded by the German Excellence Initiative (and the European Union Seventh Framework Programme under grant agreement 291763). Any opinions, findings, and conclusions or recommendations expressed herein are those of the authors and do not necessarily reflect the views of the AFOSR or the NSF. Computing time on the Texas Advanced Computing Centers Stampede system was provided by an allocation from TACC and the NSF.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Tharakan, S., March, W.B., Biros, G. (2015). Scalable Kernel Methods for Uncertainty Quantification. In: Mehl, M., Bischoff, M., Schäfer, M. (eds) Recent Trends in Computational Engineering - CE2014. Lecture Notes in Computational Science and Engineering, vol 105. Springer, Cham. https://doi.org/10.1007/978-3-319-22997-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-22997-3_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22996-6
Online ISBN: 978-3-319-22997-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)