Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If the input dimensionality is higher than 2, the line has to be replaced with a plane or a hyperplane.
- 2.
The number of solutions is (at least) \(\infty ^1\).
- 3.
The function signum sgn(u) is defined as follows: \(sgn(u)=1\) if \(u>0\); \(sgn(u)=-1\) if \(u<0\); \(sgn(u)=0\) if \(u=0\).
- 4.
This convention is adopted in the rest of the chapter.
- 5.
The term regularization constant is motivated in Sect. 9.3.6.
- 6.
\(\theta (\beta )\) is 1 if \(\beta >0\), 0 otherwise.
- 7.
In [102] the continuity requirement is replaced with the stability.
- 8.
\(\delta _{ij}\) is 1 if \(i=j\), 0 otherwise.
- 9.
\(\mathrm{MATLAB}^{\copyright}\) is a registered trademark of The Mathworks, Inc.
References
M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Controld, 25:821–837, 1964.
F. R. Bach and M. I. Jordan. Learning spectral clustering. Technical report, EECS Department, University of California, 2003.
A. Barla, E. Franceschi, F. Odone, and F. Verri. Image kernels. In Proceedings of SVM2002, pages 83–96, 2002.
A. Ben Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. A support vector method for clustering. In Advances in Neural Information and Processing Systems, volume 12, pages 125–137, 2000.
A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. Support vector clustering. Journal of Machine Learning Research, 2(2):125–137, 2001.
Y. Bengio, O. Dellaleau, N. Le Roux, J.F. Paiement, Vincent. P., and M. Ouimet. Learning eigenfunction links spectral embedding and kernel pca. Neural Computation, 16(10):2197–2219, 2004.
Y. Bengio, Vincent. P., and J.F. Paiement. Spectral clustering and kernel pca are learning eigenfunctions. Technical report, CIRANO, 2003.
C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic analysis on semigroups. Springer-Verlag, 1984.
C.M. Bishop. Neural Networks for Pattern Recognition. Cambridge University Press, 1995.
M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.
L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7:200–217, 1967.
F. Camastra and A. Verri. A novel kernel method for clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):801–805, 2005.
N. Cancedda, E. Gaussier, C. Goutte, and J.-M. Renders. Word-sequence kernels. Journal of Machine Learning Research, 3(1):1059–1082, 2003.
S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy. SVM and kernel methods Matlab toolbox. Technical report, Perception Systemes et Information, INSA de Rouen, 2005.
Y. Censor. Row-action methods for huge and sparse systems and their applications. SIAM Reviews, 23(4):444–467, 1981.
Y. Censor and A. Lent. An iterative row-action method for interval convex programming. Journal of Optimization Theory and Application, 34(3):321–353, 1981.
P.K. Chan, M. Schlag, and J.Y. Zien. Spectral k-way radio-cut partitioning and clustering. In Proceedings of the 1993 International Symposium on Research on Integrated Systems, pages 123–142. MIT Press, 1993.
J.H. Chiang. A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems, 11(4):518–527, 2003.
F.R.K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.
R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1(2):143–160, 2001.
R. Collobert, S. Bengio, and J. Mariethoz. Torch: a modular machine learning software library. Technical report, IDIAP, 2002.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20(3):1–25, 1995.
N. Cressie. Statistics for Spatial Data. John Wiley, 1993.
N. Cristianini, J.S. Taylor, and J. S. Kandola. Spectral kernel methods for clustering. In Advances in Neural Information Processing Systems 14, pages 649–655. MIT Press, 2001.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal Royal Statistical Society, 39(1):1–38, 1977.
I.S. Dhillon, Y. Guan, and B. Kullis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the \(10^{th}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 551–556. ACM Press, 2004.
I.S. Dhillon, Y. Guan, and B. Kullis. A unified view of kernel k-means, spectral clustering and graph partitioning. Technical report, UTCS, 2005.
I.S. Dhillon, Y. Guan, and B. Kullis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944–1957, 2007.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.
T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1):1–50, 2001.
P.-H. Fan, R.-E. andChen and C.-J. Lin. Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6:1889–1918, 2005.
P. Fermat. Methodus ad disquirendam maximam et minimam. In Oeuvres de Fermat. MIT Press, 1891 (First Edition 1679).
M. Ferris and T. Munson. Interior point method for massive support vector machines. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2000.
M. Ferris and T. Munson. Semi-smooth support vector machines. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2000.
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Math. J., 23(98):298–305, 1973.
M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of spectral and kernel methods for clustering. Pattern Recognition, 41(1):176–190, 2008.
I. Fischer and I. Poland. New methods for spectral clustering. Technical report, IDSIA, 2004.
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.
J. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165–175, 1989.
T.T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. In Proceedings of \(15^{th}\) International Conference on Machine Learning, pages 188–196. Morgan Kaufman Publishers, 1998.
K. Fukunaga. An Introduction to Statistical Pattern Recognition. Academic Press, 1990.
T. Gärtner, J.W. Lloyd, and P.A. Flach. Kernels and distances for structured data. Machine Learning, 57(3):205–232, 2004.
M. Girolami. Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(3):780–784, 2002.
F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural network architectures. Neural Computation, 7(2):219–269, 1995.
G.H. Golub and C.F.V. Loan. Matrix computation. The Johns Hopkins University Press, 1996.
T. Graepel and K. Obermayer. Fuzzy topographic kernel clustering. In Proceedings of the Fifth GI Workshop Fuzzy Neuro Systems’98, pages 90–97, 1998.
J. Hadamard. Sur les problemes aux derivees partielles et leur signification physique. Bull. Univ. Princeton, 13:49–52, 1902.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.
R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, 2004.
R. Inokuchi and S. Miyamoto. LVQ clustering and SOM using a kernel function. In Proceedings of IEEE International Conference on Fuzzy Systems, pages 367–373, 2004.
T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods, pages 169–184. MIT Press, 1999.
T. Joachims, N. Cristianini, and J. Shawe-Taylor. Composite kernels for hypertext classification. In Proceedings of the \(18^{th}\) International Conference on Machine Learning, pages 250–257. IEEE Press, 2001.
R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. In Proceedings of the 41\(^{st}\) Annual Symposium on the Foundation of Computer Science, pages 367–380. IEEE Press, 2000.
A. Karatzoglou, A. Smola, K. Hornik, and A. Zeleis. kernlab- an s4 package for kernel methods in r. Journal of Statistical Software, 11(9):1–20, 2004.
S. Keerthi, S. Shevde, C. Bhattacharyya, and K. Murthy. Improvements to platt’s smo algorithm for SVM classifier design. Technical report, Department of CSA, Bangalore, India,, 1999.
S. Keerthi, S. Shevde, C. Bhattacharyya, and K. Murthy. A fast iterative nearest point algorithm for support vector machine design. IEEE Transaction on Neural Networks, 11(1):124–136, 2000.
B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(1):291–307, 1970.
G.A. Korn and T.M. Korn. Mathematical Handbook for Scientists and Engineers. Mc Graw-Hill, 1968.
R. Krishnapuram and J.M. Keller. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Sets, 1(2):98–110, 1993.
R. Krishnapuram and J.M. Keller. The possibilistic c-means algorithms: insight and recommandations. IEEE Transactions on Fuzzy Sets, 4(3):385–393, 1996.
H.W. Kuhn and A.W. Tucker. Nonlinear programming. In Proceedings of \(2^{nd}\) Berkeley Symposium on Mathematical Statistics and Probabilistics, pages 367–380. University of California Press, 1951.
J.-L. Lagrange. Mecanique analytique. Chez La Veuve Desaint Libraire, 1788.<!– Missing/Wrong Year –>
D. Lee. An improved cluster labeling method for support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):461–464, 2005.
C. Leslie, E. Eskin, A. Cohen, J. Weston, and A. Noble. Mismatch string kernels for discriminative protein classification. Bioinformatics, 20(4):467–476, 2004.
D. Lueberger. Linear and Nonlinear Programming. Addison-Wesley, 1984.
D. Macdonald and C. Fyfe. The kernel self-organizing map. In Fourth International Conference on Knowledge-based Intelligent Engineering Systems and Allied Technologies, pages 317–320, 2000.
D.J.C. MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 4(3):448–472, 1992.
O.L. Mangasarian. Linear and non-linear separation of patterns by linear programming. Operations Research, 13(3):444–452, 1965.
O.L. Mangasarian and D. Musicant. Lagrangian support vector regression. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, June 2000.
G. Matheron. Principles of geostatistics. Economic Geology, 58:1246–1266, 1963.
M. Meila and J. Shi. Spectral methods for clustering. In Advances in Neural Information Processing Systems 12, pages 873–879. MIT Press, 2000.
S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.R. Müller. Fisher discriminant analysis with kernels. In Proceedings of IEEE Neural Networks for Signal Processing Workshop, pages 41–48. IEEE Press, 2001.
M.L. Minsky and S.A. Papert. Perceptrons. MIT Press, 1969.
J. Moody and C. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281–294, 1989.
R. Neal. Bayesian Learning in Neural Networks. Springer-Verlag, 1996.
A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849–856. MIT Press, 2002.
E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing VII, Proceedings of the 1997 IEEE Workshop, pages 276–285. IEEE Press, 1997.
E. Osuna and F. Girosi. Reducing the run-time complexity in support vector machines. In Advances in Kernel Methods, pages 271–284. MIT Press, 1999.
A. Paccanaro, C. Chennubhotla, J.A. Casbon, and M.A.S. Saqi. Spectral clustering of protein sequences. In Proceedings of International Joint Conference on Neural Networks, pages 3083–3088. IEEE Press, 2003.
J.C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods, pages 185–208. MIT Press, 1999.
J.C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin dags for multiclass classification. In Advances in Neural Information Processing Systems 12, pages 547–553. MIT Press, 2000.
T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, 1990.
M. Pontil and A. Verri. Support vector machines for 3-d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6):637–646, 1998.
M.J.D. Powell. Radial basis functions for multivariable interpolation: A review. In Algorithms for Approximation, pages 143–167. Clarendon Press, 1987.
A.K. Qinand and P.N. Sugantham. Kernel neural gas algorithms with application to cluster analysis. In iCPR- 17th International Conference on Fuzzy Systems, pages 617–620. Clarendon Press, 2004.
C.E. Rasmussen and C. Willims. Gaussian Processes for Machine Learning. MIT Press, 2006.
K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problem. Proceedings of the IEEE, 86(11):2210–2239, 1998.
R. Rosipal and M. Girolami. An expectation maximization approach to nonlinear component analysis. Neural Computation, 13(3):505–510, 2001.
V. Roth, J. Laub, M. Kawanabe, and J.M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1540–1551, 2003.
B. Schölkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.
B. Schölkopf, A.J. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.
B. Schölkopf, A.J. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Technical report, Max Planck Institut für Biologische Kybernetik, 1998.
B. Schölkopf, R.C. Williamson, A.J. Smola, J. Shawe-Taylor, and J. Platt. Support vector method for novelty detection. In Advances in Neural Information Processing Systems 12, pages 526–532. MIT Press, 2000.
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
D.M.J. Tax and R.P.W. Duin. Support vector domain description. Pattern Recognition Letters, 20(11–13):1191–1199, 1999.
A.N. Tikhonov. On solving ill-posed problem and method of regularization. Dokl. Acad. Nauk USSR, 153:501–504, 1963.
A.N. Tikhonov and V.Y. Arsenin. Solution of ill-posed problems. W.H. Winston, 2002.
I. Tsochantaridis, T. Hoffman, T. Joachims, and Y. Altun. Support vector learning for interdependent and structured output spaces. In Proceedings of ICML04. IEEE Press, 2004.
C.J. Twining and C.J. Taylor. The use of kernel principal component analysis to model data distributions. Pattern Recognition, 36(1):217–227, 2003.
V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.
V.N. Vapnik. Statistical Learning Theory. John Wiley, 1998.
V.N. Vapnik and A.Ya. Chervonenkis. A note on one class of perceptron. Automation and Remote Control, 25:103–109, 1964.
V.N. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963.
S. Vishwanathan and A.J. Smola. Fast kernels for string and tree matching. In Advances in Neural Information Processing Systems 15, pages 569–576. MIT Press, 2003.
U. von Luxburg, M. Belkin, and O. Bosquet. Consistency of spectral clustering. Technical report, Max Planck Institut für Biologische Kybernetik, 2004.
U. von Luxburg, M. Belkin, and O. Bosquet. Limits of spectral clustering. In Advances in Neural Information Processing Systems 17. MIT Press, 2005.
D. Wagner and F. Wagner. Between min cut and graph bisection. In Mathematical Foundations of Kernel Methods, pages 744–750, 1993.
G. Wahba. Spline Models for Observational Data. SIAM, 1990.
J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, and C. Watkins. Support vector density estimation. In Advances in Kernel Methods, pages 293–306. MIT Press, 1999.
J. Weston and C. Watkins. Multi-class support vector machines. In Proceedings of ESANN99, pages 219–224. D. Facto Press, 1999.
C.K.I. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998.
W.H. Wolberg and O. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, U.S.A., 87:9193–9196, 1990.
Z.D. Wu, W.X. Xie, and J.P. Yu. Fuzzy c-means clustering algorithm based on kernel method. In Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2003, pages 49–54. IEEE, 2003.
J. Yang, V. Estvill-Castro, and S.K. Chalup. Support vector clustering through proximity graph modelling. In Neural Information Processing 2002, ICONIP’02, pages 898–903, 2002.
S.X. Yu and J. Shi. Multiclass spectral clustering. In ICCV’03: Proceedings of the Ninth IEEE Conference on Computer Vision. IEEE Computer Society, 2003.
D.-Q. Zhang and S.-C. Chen. Fuzzy clustering using kernel method. In The 2002 International Conference on Control and Automation, pages 162–163, 2002.
D.-Q. Zhang and S.-C. Chen. Kernel based fuzzy and possibilistic c-means clustering. In Proceedings of the Fifth International Conference on Artificial Neural Networks, ICANN 2003, pages 122–125, 2003.
D.-Q. Zhang and S.-C. Chen. A novel kernelized fuzzy c-means algorithms with applications in image segmentation. Artificial Intelligence in Medicine, 32(1):37–50, 2004.
Author information
Authors and Affiliations
Corresponding author
Problems
Problems
9.1
Consider the function \(K: X \times X \rightarrow \mathbb {R}\), where \(X \subseteq \mathbb {R}^n\). Prove that if \(K(\mathbf {x}, \mathbf {y}) = \varPhi ( \mathbf {x}) \cdot \varPhi (\mathbf {y})\) then \(K(\cdot )\) is a Mercer kernel.
9.2
Prove that the Cauchy kernel \(C(\mathbf {x}, \mathbf {y})= \alpha (1 + \Vert \mathbf {x}- \mathbf {y}\Vert ^2)\) is positive definite for \(\alpha > 0\). (Hint: Read Appendix D).
9.3
Prove that the Epanechnikov kernel , defined by
is conditionally positive definite . (Hint: Read Appendix D).
9.4
Prove that the optimal hyperplane is unique.
9.5
Consider the SMO algorithm for classification. What is the minimum number of Lagrange multipliers which can be optimized in an iteration? Explain your answer.
9.6
Consider the SMO algorithm for classification. Show that in the case of unconstrained maximum we obtain the following updating rule
where \(E_i = f(\mathbf {x}_i - y_i) \).
9.7
Consider the data Set A of the SantaFe time series competition. Using a public domain SVM regression package and the four preceeding values of the time series as input, predict the actual value of the time series. The data set A can be downloaded from http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html. Implement a Gaussian process for regression and repeat the exercise replacing SVM with the Gaussian process. Discuss the results.
9.8
Using the o-v-r method and a public domain SVM binary classifier (e.g., SVMLight or SVMTorch), test a multiclass SVM on Iris Data [38] that can be dowloaded by ftp.ics.uci.edu/pub/machine-learning-databases/iris. Repeat the same experiment replacing the o-v-r method with the o-v-o strategy. Discuss the results.
9.9
Implement kernel PCA and test it on a dataset (e.g. Iris Data). Use as Mercer kernel the Gaussian and verify the Twining and Taylor’s result [100], that is, that for large values of the variance the kernel PCA eigenspectrum tends to PCA eigenspectrum.
9.10
Consider one-class SVM. Prove there are no bounded support vector when the regularization constant C is equal to 1.
9.11
Implement Kernel K-Means and test your implementation on a dataset (e.g. Iris Data). Verify that when you choose as Mercer kernel the inner product you obtain the same results of batch K-Means.
9.12
Implement the Ng-Jordan algorithm using a mathematical toolbox. Test your implementation on Iris data. Compare your results with the ones reported in [12].
Rights and permissions
Copyright information
© 2015 Springer-Verlag London
About this chapter
Cite this chapter
Camastra, F., Vinciarelli, A. (2015). Kernel Methods. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6735-8_9
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6735-8_9
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6734-1
Online ISBN: 978-1-4471-6735-8
eBook Packages: Computer ScienceComputer Science (R0)