Kernel Methods

Signoretto, Marco; Suykens, Johan A. K.

doi:10.1007/978-3-662-43505-2_32

Marco Signoretto³ &
Johan A. K. Suykens³

Part of the book series: Springer Handbooks ((SHB))

10k Accesses

Abstract

This chapter addresses the study of kernel methods, a class of techniques that play a major role in machine learning and nonparametric statistics. Among others, these methods include support vector machines (GlossaryTerm

SVM

s) and least squares GlossaryTerm

SVM

s, kernel principal component analysis, kernel Fisher discriminant analysis, and Gaussian processes. The use of kernel methods is systematic and properly motivated by statistical principles. In practical applications, kernel methods lead to flexible predictive models that often outperform competing approaches in terms of generalization performance. The core idea consists of mapping data into a high-dimensional space by means of a feature map. Since the feature map is normally chosen to be nonlinear, a linear model in the feature space corresponds to a nonlinear rule in the original domain. This fact suits many real world data analysis problems that often require nonlinear models to describe their structure.

In Sect. 32.1 we present historical notes and summarize the main ingredients of kernel methods. In Sect. 32.2 we present the core ideas of statistical learning and show how regularization can be employed to devise practical learning algorithms. In Sect. 32.3 we show a selection of techniques that are representative of a large class of kernel methods; these techniques – termed primal–dual methods – use Lagrange duality as the main mathematical tools. Section 32.4 discusses Gaussian processes, a class of kernel methods that uses a Bayesian approach to perform inference and learning. Section 32.5 recalls different approaches for the tuning of parameters. In Sect. 32.6 we review the mathematical properties of different yet equivalent notions of kernels and recall a number of specialized kernels for learning problems involving structured data. We conclude the chapter by presenting applications in Sect. 32.7.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 269.00; Price excludes VAT (USA)

Hardcover Book: USD 349.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

ERM:: empirical risk minimization
GACV:: generalized approximate cross-validation
GP:: Gaussian process
HS:: Hilbert space
i.i.d.:: independent, identically distributed
KKT:: Karush–Kuhn–Tucker
LASSO:: least absolute shrinkage and selection operator
LOO:: leave-one-out
LS:: least square
MAP:: maximum a posteriori
MEG:: magnetoencephalography
MKL:: multiple kernel learning
ML:: maximum likelihood
MLP:: multilayer perceptron
PCA:: principal component analysis
QP:: quadratic programming
r.k.:: reproducing kernel
RBF:: radial basis function
RKHS:: reproducing kernel Hilbert space
SMO:: sequential minimum optimization
SRM:: structural risk minimization
SVC:: support vector classification
SVD:: singular value decomposition
SVM:: support vector machine
VC:: Vapnik–Chervonenkis

References

J. Shawe-Taylor, N. Cristianini: Kernel Methods for Pattern Analysis (Cambridge Univ. Press, Cambridge 2004)
Book MATH Google Scholar
B. Schölkopf, A.J. Smola: Learning with Kernels: Support Vector Machines, Regularization, Optimization, Beyond (MIT Press, Cambridge 2002)
Google Scholar
A.J. Smola, B. Schölkopf: A tutorial on support vector regression, Stat. Comput. 14(3), 199–222 (2004)
Article MathSciNet Google Scholar
T. Hofmann, B. Schölkopf, A.J. Smola: Kernel methods in machine learning, Ann. Stat. 36(3), 1171–1220 (2008)
Article MathSciNet MATH Google Scholar
K.R. Müller, S. Mika, G. Ratsch, K. Tsuda, B. Schölkopf: An introduction to kernel-based learning algorithms, IEEE Trans. Neural Netw. 12(2), 181–201 (2001)
Article Google Scholar
F. Jäkel, B. Schölkopf, F.A. Wichmann: A tutorial on kernel methods for categorization, J. Math. Psychol. 51(6), 343–358 (2007)
Article MathSciNet MATH Google Scholar
C. Campbell: Kernel methods: A survey of current techniques, Neurocomputing 48(1), 63–84 (2002)
Article MATH Google Scholar
J. Mercer: Functions of positive and negative type, and their connection with the theory of integral equations, Philos. Trans. R. Soc. A 209, 415–446 (1909)
Article MATH Google Scholar
E.H. Moore: On properly positive Hermitian matrices, Bull. Am. Math. Soc. 23(59), 66–67 (1916)
MATH Google Scholar
T. Kailath: RKHS approach to detection and estimation problems – I: Deterministic signals in Gaussian noise, IEEE Trans. Inf. Theory 17(5), 530–549 (1971)
Article MathSciNet MATH Google Scholar
E. Parzen: An approach to time series analysis, Ann. Math. Stat. 32, 951–989 (1961)
Article MathSciNet MATH Google Scholar
N. Aronszajn: Theory of reproducing kernels, Trans. Am. Math. Soc. 68, 337–404 (1950)
Article MathSciNet MATH Google Scholar
G. Wahba: Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, Vol. 59 (SIAM, Philadelphia 1990)
Book MATH Google Scholar
A. Berlinet, C. Thomas-Agnan: Reproducing Kernel Hilbert Spaces in Probability and Statistics (Springer, New York 2004)
Book MATH Google Scholar
S. Saitoh: Integral Transforms, Reproducing Kernels and Their Applications, Chapman Hall/CRC Research Notes in Mathematics, Vol. 369 (Longman, Harlow 1997)
MATH Google Scholar
M. Aizerman, E.M. Braverman, L.I. Rozonoer: Theoretical foundations of the potential function method in pattern recognition learning, Autom. Remote Control 25, 821–837 (1964)
MathSciNet Google Scholar
V. Vapnik: Pattern recognition using generalized portrait method, Autom. Remote Control 24, 774–780 (1963)
Google Scholar
V. Vapnik, A. Chervonenkis: A note on one class of perceptrons, Autom. Remote Control 25(1), 112–120 (1964)
MATH Google Scholar
V. Vapnik, A. Chervonenkis: Theory of Pattern Recognitition (Nauka, Moscow 1974), in Russian, German Translation: W. Wapnik, A. Tscherwonenkis, Theorie der Zeichenerkennung (Akademie-Verlag, Berlin 1979)
MATH Google Scholar
V. Vapnik: Estimation of Dependences Based on Empirical Data (Springer, New York 1982)
MATH Google Scholar
B.E. Boser, I.M. Guyon, V.N. Vapnik: A training algorithm for optimal margin classifiers, Proc. 5th Ann. ACM Workshop Comput. Learn. Theory, ed. by D. Haussler (1992) pp. 44–152
Google Scholar
I. Guyon, B. Boser, V. Vapnik: Automatic capacity tuning of very large VC-dimension classifiers, Adv. Neural Inf. Process. Syst. 5, 147–155 (1993)
Google Scholar
I. Guyon, V. Vapnik, B. Boser, L. Bottou, S.A. Solla: Structural risk minimization for character recognition, Adv. Neural Inf. Process. Syst. 4, 471–479 (1992)
Google Scholar
C. Cortes, V. Vapnik: Support vector networks, Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
V. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)
Book MATH Google Scholar
J.A.K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle: Least squares support vector machines (World Scientific, Singapore 2002)
Book MATH Google Scholar
O. Chapelle, B. Schölkopf, A. Zien: Semi-Supervised Learning (MIT Press, Cambridge 2006)
Book Google Scholar
M. Belkin, P. Niyogi: Semi-supervised learning on Riemannian manifolds, Mach. Learn. 56(1), 209–239 (2004)
Article MATH Google Scholar
M. Belkin, P. Niyogi, V. Sindhwani: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res. 7, 2399–2434 (2006)
MathSciNet MATH Google Scholar
M. Belkin, P. Niyogi: Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
V. Sindhwani, P. Niyogi, M. Belkin: Beyond the point cloud: From transductive to semi-supervised learning, Int. Conf. Mach. Learn. (ICML), Vol. 22 (2005) pp. 824–831
Google Scholar
V. Vapnik, A. Chervonenkis: The necessary and sufficient conditions for consistency in the empirical risk minimization method, Pattern Recognit. Image Anal. 1(3), 283–305 (1991)
Google Scholar
V. Vapnik, A. Chervonenkis: Uniform convergence of frequencies of occurrence of events to their probabilities, Dokl. Akad. Nauk SSSR 181, 915–918 (1968)
MATH Google Scholar
V. Vapnik, A. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl. 16(2), 264–280 (1971)
Article MATH Google Scholar
O. Bousquet, S. Boucheron, G. Lugosi: Introduction to statistical learning theory, Lect. Notes Comput. Sci. 3176, 169–207 (2004)
Article MATH Google Scholar
F. Cucker, D.X. Zhou: Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics (Cambridge Univ. Press, New York 2007)
Book MATH Google Scholar
I. Steinwart, A. Christmann: Support Vector Machines, Information Science and Statistics (Springer, New York 2008)
MATH Google Scholar
V. Vapnik: Transductive inference and semi-supervised learning. In: Semi-Supervised Learning, ed. by O. Chapelle, B. Schölkopf, A. Zien (MIT Press, Cambridge 2006) pp. 453–472
Google Scholar
A.N. Tikhonov: On the stability of inverse problems, Dokl. Akad. Nauk SSSR 39, 195–198 (1943)
MathSciNet Google Scholar
A.N. Tikhonov: Solution of incorrectly formulated problems and the regularization method, Sov. Math. Dokl. 5, 1035 (1963)
MATH Google Scholar
A.N. Tikhonov, V.Y. Arsenin: Solutions of Ill-posed Problems (W.H. Winston, Washington 1977)
MATH Google Scholar
J. Hadamard: Sur les problèmes aux dérivées partielles et leur signification physique, Princet. Univ. Bull. 13, 49–52 (1902)
MathSciNet Google Scholar
G. Kimeldorf, G. Wahba: Some results on Tchebycheffian spline functions, J. Math. Anal. Appl. 33, 82–95 (1971)
Article MathSciNet MATH Google Scholar
T. Evgeniou, M. Pontil, T. Poggio: Regularization networks and support vector machines, Adv. Comput. Math. 13(1), 1–50 (2000)
Article MathSciNet MATH Google Scholar
B. Schölkopf, R. Herbrich, A.J. Smola: A generalized representer theorem, Proc. Ann. Conf. Comput. Learn. Theory (COLT) (2001) pp. 416–426
Chapter Google Scholar
F. Dinuzzo, B. Schölkopf: The representer theorem for Hilbert spaces: A necessary and sufficient condition, Adv. Neural Inf. Process. Syst. 25, 189–196 (2012)
Google Scholar
S.P. Boyd, L. Vandenberghe: Convex Optimization (Cambridge Univ. Press, Cambridge 2004)
Book MATH Google Scholar
A.E. Hoerl, R.W. Kennard: Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12(1), 55–67 (1970)
Article MathSciNet MATH Google Scholar
D.W. Marquardt: Generalized inverses, ridge regression, biased linear estimation, and nonlinear estimation, Technometrics 12(3), 591–612 (1970)
Article MATH Google Scholar
C. Gu: Smoothing Spline ANOVA Models (Springer, New York 2002)
Book MATH Google Scholar
D.P. Bertsekas: Nonlinear Programming (Athena Scientific, Belmont 1995)
MATH Google Scholar
R. Tibshirani: Regression shrinkage and selection via the LASSO, J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
P. Zhao, G. Rocha, B. Yu: The composite absolute penalties family for grouped and hierarchical variable selection, Ann. Stat. 37, 3468–3497 (2009)
Article MathSciNet MATH Google Scholar
R. Jenatton, J.Y. Audibert, F. Bach: Structured variable selection with sparsity-inducing norms, J. Mach. Learn. Res. 12, 2777–2824 (2011)
MathSciNet MATH Google Scholar
M. Yuan, Y. Lin: Model selection and estimation in regression with grouped variables, J. R. Stat. Soc. Ser. B 68(1), 49–67 (2006)
Article MathSciNet MATH Google Scholar
C.A. Micchelli, M. Pontil: Learning the Kernel Function via Regularization, J. Mach. Learn. Res. 6, 1099–1125 (2005)
MathSciNet MATH Google Scholar
C.A. Micchelli, M. Pontil: Feature space perspectives for learning the kernel, Mach. Learn. 66(2), 297–319 (2007)
Article Google Scholar
F.R. Bach, G.R.G. Lanckriet, M.I. Jordan: Multiple kernel learning, conic duality, and the SMO algorithm, Proc. 21st Int. Conf. Mach. Learn. (ICML) (ACM, New York 2004)
Google Scholar
G.R.G. Lanckriet, T. De Bie, N. Cristianini, M.I. Jordan, W.S. Noble: A statistical framework for genomic data fusion, Bioinformatics 20(16), 2626–2635 (2004)
Article Google Scholar
F.R. Bach, R. Thibaux, M.I. Jordan: Computing regularization paths for learning multiple kernels, Adv. Neural Inf. Process. Syst. 17, 41–48 (2004)
Google Scholar
J. Baxter: Theoretical models of learning to learn. In: Learning to Learn, ed. by L. Pratt, S. Thrun (Springer, New York 1997) pp. 71–94
Google Scholar
R. Caruana: Multitask learning. In: Learning to Learn, ed. by S. Thrun, L. Pratt (Springer, New York 1998) pp. 95–133
Chapter Google Scholar
S. Thrun: Life-long learning algorithms. In: Learning to Learn, ed. by S. Thrun, L. Pratt (Springer, New York 1998) pp. 181–209
Chapter Google Scholar
A. Argyriou, T. Evgeniou, M. Pontil: Multi-task feature learning, Adv. Neural Inf. Process. Syst. 19, 41–48 (2007)
Google Scholar
A. Argyriou, T. Evgeniou, M. Pontil: Convex multi-task feature learning, Mach. Learn. 73(3), 243–272 (2008)
Article Google Scholar
L. Debnath, P. Mikusiński: Hilbert Spaces with Application (Elsevier, San Diego 2005)
MATH Google Scholar
M. Fazel: Matrix Rank Minimization with Application, Ph.D. Thesis (Stanford University, Stanford 2002)
Google Scholar
Z. Liu, L. Vandenberghe: Semidefinite programming methods for system realization and identification, Proc. 48th IEEE Conf. Decis. Control (CDC) (2009) pp. 4676–4681
Google Scholar
Z. Liu, L. Vandenberghe: Interior-point method for nuclear norm approximation with application to system identification, SIAM J. Matrix Anal. Appl. 31(3), 1235–1256 (2009)
Article MathSciNet MATH Google Scholar
M. Signoretto, J.A.K. Suykens: Convex estimation of cointegrated var models by a nuclear norm penalty, Proc. 16th IFAC Symp. Syst. Identif. (SYSID) (2012)
Google Scholar
A. Argyriou, C.A. Micchelli, M. Pontil: On spectral learning, J. Mach. Learn. Res. 11, 935–953 (2010)
MathSciNet MATH Google Scholar
J. Abernethy, F. Bach, T. Evgeniou, J.P. Vert: A new approach to collaborative filtering: Operator estimation with spectral regularization, J. Mach. Learn. Res. 10, 803–826 (2009)
MATH Google Scholar
P.L. Bartlett, S. Mendelson: Rademacher and Gaussian complexities: Risk bounds and structural results, J. Mach. Learn. Res. 3, 463–482 (2003)
MathSciNet MATH Google Scholar
P.K. Shivaswamy, T. Jebara: Maximum relative margin and data-dependent regularization, J. Mach. Learn. Res. 11, 747–788 (2010)
MathSciNet MATH Google Scholar
P.K. Shivaswamy, T. Jebara: Relative margin machines, Adv. Neural Inf. Process. Syst. 21(1–8), 7 (2008)
MATH Google Scholar
B. Schölkopf, A.J. Smola, R.C. Williamson, P.L. Bartlett: New support vector algorithms, Neural Comput. 12(5), 1207–1245 (2000)
Article Google Scholar
J. Platt: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods – Support Vector Learning, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 1999) pp. 185–208
Google Scholar
C.C. Chang, C.J. Lin: LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar
R.E. Fan, P.H. Chen, C.J. Lin: Working set selection using second order information for training support vector machines, J. Mach. Learn. Res. 6, 1889–1918 (2005)
MathSciNet MATH Google Scholar
T. Joachims: Making large–scale SVM learning practical. In: Advance in Kernel Methods – Support Vector Learning, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 1999) pp. 169–184
Google Scholar
J.A.K. Suykens, J. Vandewalle: Least squares support vector machine classifiers, Neural Process. Lett. 9(3), 293–300 (1999)
Article MathSciNet MATH Google Scholar
J. Nocedal, S.J. Wright: Numerical Optimization (Springer, New York 1999)
Book MATH Google Scholar
K. Pelckmans, J. De Brabanter, J.A.K. Suykens, B. De Moor: The differogram: Non-parametric noise variance estimation and its use for model selection, Neurocomputing 69(1), 100–122 (2005)
Article Google Scholar
K. Saadi, G.C. Cawley, N.L.C. Talbot: Fast exact leave-one-out cross-validation of least-square support vector machines, Eur. Symp. Artif. Neural Netw. (ESANN-2002) (2002)
Google Scholar
R.M. Rifkin, R.A. Lippert: Notes on regularized least squares, Tech. Rep. MIT-CSAIL-TR-2007-025, CBCL-268 (2007)
Google Scholar
T. Van Gestel, J.A.K. Suykens, B. Baesens, S. Viaene, J. Vanthienen, G. Dedene, B. De Moor, J. Vandewalle: Benchmarking least squares support vector machine classifiers, Mach. Learn. 54(1), 5–32 (2004)
Article MATH Google Scholar
G. Baudat, F. Anouar: Generalized discriminant analysis using a kernel approach, Neural Comput. 12(10), 2385–2404 (2000)
Article Google Scholar
S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.R. Müllers: Fisher discriminant analysis with kernels, Proc. 1999 IEEE Signal Process. Soc. Workshop (1999) pp. 41–48
Google Scholar
T. Poggio, F. Girosi: Networks for approximation and learning, Proc. IEEE 78(9), 1481–1497 (1990)
Article MATH Google Scholar
C. Saunders, A. Gammerman, V. Vovk: Ridge regression learning algorithm in dual variables, Int. Conf. Mach. Learn. (ICML) (1998) pp. 515–521
Google Scholar
N. Cressie: The origins of kriging, Math. Geol. 22(3), 239–252 (1990)
Article MathSciNet MATH Google Scholar
D.J.C. MacKay: Introduction to Gaussian processes, NATO ASI Ser. F Comput. Syst. Sci. 168, 133–166 (1998)
MATH Google Scholar
C.K.I. Williams, C.E. Rasmussen: Gaussian processes for regression, Advances in Neural Information Processing Systems, Vol.8 (MIT Press, Cambridge 1996) pp. 514–520
Google Scholar
J.A.K. Suykens, T. Van Gestel, J. Vandewalle, B. De Moor: A support vector machine formulation to pca analysis and its kernel version, IEEE Trans. Neural Netw. 14(2), 447–450 (2003)
Article Google Scholar
C. Alzate, J.A.K. Suykens: Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA, IEEE Trans. Pattern Anal. Mach. Intell. 32(2), 335–347 (2010)
Article Google Scholar
T. Van Gestel, J.A.K. Suykens, J. De Brabanter, B. De Moor, J. Vandewalle: Kernel canonical correlation analysis and least squares support vector machines, Lect. Notes Comput. Sci. 2130, 384–389 (2001)
Article MATH Google Scholar
J.A.K. Suykens: Data visualization and dimensionality reduction using kernel maps with a reference point, IEEE Trans. Neural Netw. 19(9), 1501–1517 (2008)
Article Google Scholar
J.A.K. Suykens, J. Vandewalle: Recurrent least squares support vector machines, IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 47(7), 1109–1114 (2000)
Article Google Scholar
J.A.K. Suykens, J. Vandewalle, B. De Moor: Optimal control by least squares support vector machines, Neural Netw. 14(1), 23–35 (2001)
Article Google Scholar
J.A.K. Suykens, C. Alzate, K. Pelckmans: Primal and dual model representations in kernel-based learning, Stat. Surv. 4, 148–183 (2010)
Article MathSciNet MATH Google Scholar
J.A.K. Suykens, J. De Brabanter, L. Lukas, J. Vandewalle: Weighted least squares support vector machines: Robustness and sparse approximation, Neurocomputing 48(1), 85–105 (2002)
Article MATH Google Scholar
C.K.I. Williams, M. Seeger: Using the Nyström method to speed up kernel machines, Adv. Neural Inf. Process. Syst. 15, 682–688 (2001)
Google Scholar
K. De Brabanter, J. De Brabanter, J.A.K. Suykens, B. De Moor: Optimized fixed-size kernel models for large data sets, Comput. Stat. Data Anal. 54(6), 1484–1504 (2010)
Article MathSciNet MATH Google Scholar
B. Schölkopf, A. Smola, K.-R. Müller: Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10, 1299–1319 (1998)
Article Google Scholar
I. Jolliffe: Principle Component Analysis. In: Encyclopedia of Statistics in Behavioral Science, (Wiley, Chichester 2005)
Google Scholar
J.A.K. Suykens, T. Van Gestel, J. Vandewalle, B. De Moor: A support vector machine formulation to PCA analysis and its kernel version, IEEE Trans. Neural Netw. 14(2), 447–450 (2003)
Article Google Scholar
N. Weiner: Extrapolation, Interpolation, Smoothing of Stationary Time Series with Engineering Applications (MIT Press, Cambridge 1949)
Google Scholar
A.N. Kolmogorov: Sur l'interpolation et extrapolation des suites stationnaires, CR Acad. Sci. 208, 2043–2045 (1939)
MATH Google Scholar
C.E. Rasmussen, C.K.I. Williams: Gaussian Processes for Machine Learning, Vol. 1 (MIT Press, Cambridge 2006)
MATH Google Scholar
J.O. Berger: Statistical Decision Theory and Bayesian Analysis (Springer, New York 1985)
Book MATH Google Scholar
K. Duan, S.S. Keerthi, A.N. Poo: Evaluation of simple performance measures for tuning svm hyperparameters, Neurocomputing 51, 41–59 (2003)
Article Google Scholar
P.L. Bartlett, S. Boucheron, G. Lugosi: Model selection and error estimation, Mach. Learn. 48(1), 85–113 (2002)
Article MATH Google Scholar
N. Shawe-Taylor, A. Kandola: On kernel target alignment, Adv. Neural Inf. Process. Syst. 14(1), 367–373 (2002)
Google Scholar
G.C. Cawley: Leave-one-out cross-validation based model selection criteria for weighted LS-SVMS, Int. Joint Conf. Neural Netw. (IJCNN) (2006) pp. 1661–1668
Google Scholar
G.C. Cawley, N.L.C. Talbot: Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters, J. Mach. Learn. Res. 8, 841–861 (2007)
MATH Google Scholar
D.J.C. MacKay: Bayesian interpolation, Neural Comput. 4, 415–447 (1992)
Article MATH Google Scholar
D.J.C. MacKay: The evidence framework applied to classification networks, Neural Comput. 4(5), 720–736 (1992)
Article Google Scholar
D.J.C. MacKay: Probable networks and plausible predictions – A review of practical Bayesian methods for supervised neural networks, Netw. Comput. Neural Syst. 6(3), 469–505 (1995)
Article MathSciNet MATH Google Scholar
I. Steinwart, D. Hush, C. Scovel: An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels, IEEE Trans. Inform. Theory 52, 4635–4643 (2006)
Article MathSciNet MATH Google Scholar
J.B. Conway: A Course in Functional Analysis (Springer, New York 1990)
MATH Google Scholar
F. Riesz, B.S. Nagy: Functional Analysis (Frederick Ungar, New York 1955)
MATH Google Scholar
I. Steinwart: On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res. 2, 67–93 (2002)
MathSciNet MATH Google Scholar
T. Gärtner: Kernels for Structured Data, Machine Perception and Artificial Intelligence, Vol. 72 (World Scientific, Singapore 2008)
MATH Google Scholar
D. Haussler: Convolution kernels on discrete structures, Tech. Rep. (UC Santa Cruz, Santa Cruz 1999)
Google Scholar
T. Jebara, R. Kondor, A. Howard: Probability product kernels, J. Mach. Learn. Res. 5, 819–844 (2004)
MathSciNet MATH Google Scholar
T.S. Jaakkola, D. Haussler: Exploiting generative models in discriminative classifiers, Adv. Neural Inf. Process. Syst. 11, 487–493 (1999)
Google Scholar
K. Tsuda, S. Akaho, M. Kawanabe, K.R. Müller: Asymptotic properties of the Fisher kernel, Neural Comput. 16(1), 115–137 (2004)
Article MATH Google Scholar
S.V.N. Vishwanathan, N.N. Schraudolph, R. Kondor, K.M. Borgwardt: Graph kernels, J. Mach. Learn. Res. 11, 1201–1242 (2010)
MathSciNet MATH Google Scholar
T. Gärtner, P. Flach, S. Wrobel: On graph kernels: Hardness results and efficient alternatives, Lect. Notes Comput. Sci. 2777, 129–143 (2003)
Article MATH Google Scholar
S.V.N. Vishwanathan, A.J. Smola, R. Vidal: Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes, Int. J. Comput. Vis. 73(1), 95–119 (2007)
Article Google Scholar
P.M. Kroonenberg: Applied Multiway Data Analysis (Wiley, Hoboken 2008)
Book MATH Google Scholar
M. Signoretto, L. De Lathauwer, J.A.K. Suykens: A kernel-based framework to tensorial data analysis, Neural Netw. 24(8), 861–874 (2011)
Article MATH Google Scholar
L. De Lathauwer, B. De Moor, J. Vandewalle: A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Article MathSciNet MATH Google Scholar
M. Signoretto, E. Olivetti, L. De Lathauwer, J.A.K. Suykens: Classification of multichannel signals with cumulant-based kernels, IEEE Trans. Signal Process. 60(5), 2304–2314 (2012)
Article MathSciNet Google Scholar
Y. LeCun, L.D. Jackel, L. Bottou, A. Brunot, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, U.A. Muller, E. Sackinger, P. Simard, V. Vapnik: Comparison of learning algorithms for handwritten digit recognition, Int. Conf. Artif. Neural Netw. (ICANN) 2 (1995) pp. 53–60
Google Scholar
D. Decoste, B. Schölkopf: Training invariant support vector machines, Mach. Learn. 46(1), 161–190 (2002)
Article MATH Google Scholar
V. Blanz, B. Schölkopf, H. Bülthoff, C. Burges, V. Vapnik, T. Vetter: Comparison of view-based object recognition algorithms using realistic 3D models, Lect. Notes Comput. Sci. 1112, 251–256 (1996)
Article Google Scholar
T. Joachims: Text categorization with support vector machines: Learning with many relevant features, Lect. Notes Comput. Sci. 1398, 137–142 (1998)
Article Google Scholar
S. Dumais, J. Platt, D. Heckerman, M. Sahami: Inductive learning algorithms and representations for text categorization, Proc. 7th Int. Conf. Inf. Knowl. Manag. (1998) pp. 148–155
Google Scholar
S. Mukherjee, E. Osuna, F. Girosi: Nonlinear prediction of chaotic time series using support vector machines, 1997 IEEE Workshop Neural Netw. Signal Process. VII (1997) pp. 511–520
Chapter Google Scholar
D. Mattera, S. Haykin: Support vector machines for dynamic reconstruction of a chaotic system. In: Advances in Kernel Methods, ed. by B. Schölkopf, C.J.C. Burges, A.J. Smola (MIT Press, Cambridge 1999) pp. 211–241
Google Scholar
K.R. Müller, A. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, V. Vapnik: Predicting time series with support vector machines, Lect. Notes Comput. Sci. 1327, 999–1004 (1997)
Article Google Scholar
M. Espinoza, J.A.K. Suykens, B. De Moor: Short term chaotic time series prediction using symmetric ls-svm regression, Proc. 2005 Int. Symp. Nonlinear Theory Appl. (NOLTA) (2005) pp. 606–609
Google Scholar
M. Espinoza, T. Falck, J.A.K. Suykens, B. De Moor: Time series prediction using ls-svms, Eur. Symp. Time Ser. Prediction (ESTSP), Vol. 8 (2008) pp. 159–168
Google Scholar
M. Espinoza, J.A.K. Suykens, R. Belmans, B. De Moor: Electric load forecasting, IEEE Control Syst. 27(5), 43–57 (2007)
Article MathSciNet Google Scholar
T. Van Gestel, J.A.K. Suykens, D.E. Baestaens, A. Lambrechts, G. Lanckriet, B. Vandaele, B. De Moor, J. Vandewalle: Financial time series prediction using least squares support vector machines within the evidence framework, IEEE Trans. Neural Netw. 12(4), 809–821 (2001)
Article Google Scholar
M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M. Ares, D. Haussler: Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Sci. USA 97(1), 262–267 (2000)
Article Google Scholar
J. Luts, F. Ojeda, R. Van de Plas, B. De Moor, S. Van Huffel, J.A.K. Suykens: A tutorial on support vector machine-based methods for classification problems in chemometrics, Anal. Chim. Acta 665(2), 129 (2010)
Article Google Scholar
A. Daemen, M. Signoretto, O. Gevaert, J.A.K. Suykens, B. De Moor: Improved microarray-based decision support with graph encoded interactome data, PLoS ONE 5(4), 1–16 (2010)
Article Google Scholar
S. Yu, L.C. Tranchevent, B. Moor, Y. Moreau: Kernel-based Data Fusion for Machine Learning, Studies in Computational Intelligence, Vol. 345 (Springer, Berlin 2011)
MATH Google Scholar
T. Jaakkola, M. Diekhans, D. Haussler: A discriminative framework for detecting remote protein homologies, J. Comput. Biol. 7(1/2), 95–114 (2000)
Article Google Scholar
C. Lu, T. Van Gestel, J.A.K. Suykens, S. Van Huffel, D. Timmerman, I. Vergote: Classification of ovarian tumors using Bayesian least squares support vector machines, Lect. Notes Artif. Intell. 2780, 219–228 (2003)
Google Scholar
F. Ojeda, M. Signoretto, R. Van de Plas, E. Waelkens, B. De Moor, J.A.K. Suykens: Semi-supervised learning of sparse linear models in mass spectral imaging, Pattern Recognit. Bioinform. (PRIB) (Nijgmegen) (2010) pp. 325–334
Chapter Google Scholar
D. Widjaja, C. Varon, A.C. Dorado, J.A.K. Suykens, S. Van Huffel: Application of kernel principal component analysis for single lead ECG-derived respiration, IEEE Trans. Biomed. Eng. 59(4), 1169–1176 (2012)
Article Google Scholar
V. Van Belle, K. Pelckmans, S. Van Huffel, J.A.K. Suykens: Support vector methods for survival analysis: A comparison between ranking and regression approaches, Artif. Intell. Med. 53(2), 107–118 (2011)
Article Google Scholar
V. Van Belle, K. Pelckmans, S. Van Huffel, J.A.K. Suykens: Improved performance on high-dimensional survival data by application of survival-SVM, Bioinformatics 27(1), 87–94 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001, Leuven, Belgium
Marco Signoretto & Johan A. K. Suykens

Authors

Marco Signoretto
View author publications
You can also search for this author in PubMed Google Scholar
Johan A. K. Suykens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Signoretto .

Editor information

Editors and Affiliations

Systems Research Inst., Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Janusz Kacprzyk
Dep. Electrical and Computer Engineering, University of Alberta, 116 Street 9107, T6J 2V4, Edmonton, Alberta, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Signoretto, M., Suykens, J.A.K. (2015). Kernel Methods. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-662-43505-2_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43504-5
Online ISBN: 978-3-662-43505-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics