An empirical approach for probing the definiteness of kernels

Zaefferer, Martin; Bartz-Beielstein, Thomas; Rudolph, Günter

doi:10.1007/s00500-018-3648-1

An empirical approach for probing the definiteness of kernels

Methodologies and Application
Published: 26 November 2018

Volume 23, pages 10939–10952, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

Martin Zaefferer ORCID: orcid.org/0000-0003-2372-2092¹,
Thomas Bartz-Beielstein¹ &
Günter Rudolph²

120 Accesses
Explore all metrics

Abstract

Models like support vector machines or Gaussian process regression often require positive semi-definite kernels. These kernels may be based on distance functions. While definiteness is proven for common distances and kernels, a proof for a new kernel may require too much time and effort for users who simply aim at practical usage. Furthermore, designing definite distances or kernels may be equally intricate. Finally, models can be enabled to use indefinite kernels. This may deteriorate the accuracy or computational cost of the model. Hence, an efficient method to determine definiteness is required. We propose an empirical approach. We show that sampling as well as optimization with an evolutionary algorithm may be employed to determine definiteness. We provide a proof of concept with 16 different distance measures for permutations. Our approach allows to disprove definiteness if a respective counterexample is found. It can also provide an estimate of how likely it is to obtain indefinite kernel matrices. This provides a simple, efficient tool to decide whether additional effort should be spent on designing/selecting a more suitable kernel or algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Particle swarm optimization algorithm: an overview

Article 17 January 2017

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Article Open access 19 April 2023

Genetic algorithms: theory, genetic operators, solutions, and applications

Article 03 February 2023

Notes

The package CEGO is available on CRAN at http://cran.r-project.org/package=CEGO.

References

Bader DA, Moret BM, Warnow T, Wyman SK, Yan M, Tang J, Siepel AC, Caprara A (2004) Genome rearrangements analysis under parsimony and other phylogenetic algorithms (grappa) 2.0. https://www.cs.unm.edu/~moret/GRAPPA/. Accessed 16 Nov 2016
Bartz-Beielstein T, Zaefferer M (2017) Model-based methods for continuous and discrete global optimization. Appl Soft Comput 55:154–167
Article Google Scholar
Berg C, Christensen JPR, Ressel P (1984) Harmonic analysis on semigroups, volume 100 of graduate texts in mathematics. Springer, New York
Book Google Scholar
Beume N, Naujoks B, Emmerich M (2007) SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur J Oper Res 181(3):1653–1669
Article Google Scholar
Boytsov L (2011) Indexing methods for approximate dictionary searching: comparative analysis. J Exp Algorithmics 16:1–91
Article MathSciNet Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Camastra F, Vinciarelli A (2008) Machine learning for audio, image and video analysis: theory and applications. Advanced information and knowledge processing. Springer, London
Book Google Scholar
Campos V, Laguna M, Martí R (2005) Context-independent scatter and tabu search for permutation problems. INFORMS J Comput 17(1):111–122
Article MathSciNet Google Scholar
Camps-Valls G, Martín-Guerrero JD, Rojo-Álvarez JL, Soria-Olivas E (2004) Fuzzy sigmoid kernel for support vector classifiers. Neurocomputing 62:501–506
Article Google Scholar
Chen Y, Gupta MR, Recht B (2009) Learning kernels from indefinite similarities. In: Proceedings of the 26th annual international conference on machine learning (ICML ’09), New York, NY, USA. ACM, pp 145–152
Constantine G (1985) Lower bounds on the spectra of symmetric matrices with nonnegative entries. Linear Algebra Appl 65:171–178
Article MathSciNet Google Scholar
Cortes C, Haffner P, Mohri M (2004) Rational kernels: theory and algorithms. J Mach Learn Res 5:1035–1062
MathSciNet MATH Google Scholar
Curriero F (2006) On the use of non-euclidean distance measures in geostatistics. Math Geol 38(8):907–926
Article MathSciNet Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
Deza M, Huang T (1998) Metrics on permutations, a survey. J Comb Inf Syst Sci 23(1–4):173–185
MathSciNet MATH Google Scholar
Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer, Berlin
Book Google Scholar
Feller W (1971) An introduction to probability theory and its applications, vol 2. Wiley, Hoboken
MATH Google Scholar
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling. Wiley, Hoboken
Book Google Scholar
Gablonsky J, Kelley C (2001) A locally-biased form of the direct algorithm. J Glob Optim 21(1):27–37
Article MathSciNet Google Scholar
Gärtner T, Lloyd J, Flach P (2003) Kernels for structured data. In: Matwin S, Sammut C (eds) Inductive logic programming, vol 2583. Lecture Notes in Computer Science. Springer, Berlin, pp 66–83
Chapter Google Scholar
Gärtner T, Lloyd J, Flach P (2004) Kernels and distances for structured data. Mach Learn 57(3):205–232
Article Google Scholar
Haussler D (1999) Convolution kernels on discrete structures. Technical report UCSC-CRL-99-10, Department of computer science, University of California at Santa Cruz
Hirschberg DS (1975) A linear space algorithm for computing maximal common subsequences. Commun ACM 18(6):341–343
Article MathSciNet Google Scholar
Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In Proceedings of LION-5, pp 507–523
Ikramov K, Savel’eva N (2000) Conditionally definite matrices. Journal of Mathematical Sciences 98(1):1–50
Article MathSciNet Google Scholar
Jiao Y, Vert J.-P (2015) The Kendall and Mallows kernels for permutations. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1935–1944
Kendall M, Gibbons J (1990) Rank correlation methods. Oxford University Press, Oxford
MATH Google Scholar
Lee C (1958) Some properties of nonbinary error-correcting codes. IRE Trans Inf Theory 4(2):77–82
Article MathSciNet Google Scholar
Li H, Jiang T (2004) A class of edit kernels for SVMS to predict translation initiation sites in eukaryotic mrnas. In: Proceedings of the eighth annual international conference on resaerch in computational molecular biology (RECOMB ’04), New York, NY, USA. ACM, pp 262–271
Loosli G, Canu S, Ong C (2015) Learning SVM in Krein spaces. IEEE Trans Pattern Anal Mach Intell 38(6):1204–1216
Article Google Scholar
Marteau P-F, Gibet S (2014) On recursive edit distance kernels with application to time series classification. IEEE Trans Neural Netw Learn Syst PP(99):1–1
Google Scholar
Moraglio A, Kattan A (2011) Geometric generalisation of surrogate model based optimisation tocombinatorial spaces. In: Proceedings of the 11th European conference on evolutionary computation in combinatorial optimization (EvoCOP’11), Berlin, Heidelberg, Germany. Springer, pp 142–154
Motwani R, Raghavan P (1995) Randomized algorithms. Cambridge University Press, Cambridge
Book Google Scholar
Murphy KP (2012) Machine learning. MIT Press Ltd., Cambridge
MATH Google Scholar
Ong CS, Mary X, Canu S, Smola AJ (2004) Learning with non-positive kernels. In: Proceedings of the twenty-first international conference on machine learning (ICML ’04), New York, NY, USA. ACM, pp 81–88
Pawlik M, Augsten N (2015) Efficient computation of the tree edit distance. ACM Trans Database Syst 40(1):1–40
Article MathSciNet Google Scholar
Pawlik M, Augsten N (2016) Tree edit distance: robust and memory-efficient. Inf Syst 56:157–173
Article Google Scholar
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge
MATH Google Scholar
Reeves CR (1999) Landscapes, operators and heuristic search. Ann Oper Res 86:473–490
Article MathSciNet Google Scholar
Schiavinotto T, Stützle T (2007) A review of metrics on permutations for search landscape analysis. Comput Oper Res 34(10):3143–3153
Article Google Scholar
Schleif F-M, Tino P (2015) Indefinite proximity learning: a review. Neural Comput 27(10):2039–2096
Article MathSciNet Google Scholar
Schleif F-M, Tino P (2017) Indefinite core vector machine. Pattern Recognit 71:187–195
Article Google Scholar
Schölkopf B (2001) The kernel trick for distances. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, pp 301–307
Google Scholar
Sevaux M, Sörensen K (2005) Permutation distance measures for memetic algorithms with population management. In: Proceedings of 6th metaheuristics international conference (MIC’05), University of Vienna, pp. 832–838
Singhal A (2001) Modern information retrieval: a brief overview. IEEE Bull Data Eng 24(4):35–43
Google Scholar
Smola AJ, Ovári ZL, Williamson RC (2000) Regularization with dot-product kernels. In: Advances in neural information processing systems vol 13, Proceedings. MIT Press, pp 308–314
van der Loo MP (2014) The stringdist package for approximate string matching. R J 6(1):111–122
Article Google Scholar
Vapnik VN (1998) Statistical learning theory, vol 1. Wiley, New York
MATH Google Scholar
Voutchkov I, Keane A, Bhaskar A, Olsen TM (2005) Weld sequence optimization: the use of surrogate models for solving sequential combinatorial problems. Comput Methods Appl Mech Eng 194(30–33):3535–3551
Article Google Scholar
Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173
Article MathSciNet Google Scholar
Wu G, Chang EY, Zhang Z (2005) An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines. In: Proceedings of the 22nd international conference on machine learning
Zaefferer M, Bartz-Beielstein T (2016) Efficient global optimization with indefinite kernels. In: Parallel problem solving from nature-PPSN XIV. Springer, pp 69–79
Zaefferer M, Stork J, Bartz-Beielstein T (2014a) Distance measures for permutations in combinatorial efficient global optimization. In: Bartz-Beielstein T, Branke J, Filipič B, Smith J (eds) Parallel problem solving from nature-PPSN XIII. Springer, Cham, pp 373–383
Chapter Google Scholar
Zaefferer M, Stork J, Friese M, Fischbach A, Naujoks B, Bartz-Beielstein T (2014b) Efficient global optimization for combinatorial problems. In: Proceedings of the 2014 conference on genetic and evolutionary computation (GECCO ’14), New York, NY, USA. ACM, pp 871–878
Zhan X (2006) Extremal eigenvalues of real symmetric matrices with entries in an interval. SIAM J Matrix Anal Appl 27(3):851–860
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering Science, TH Köln - University of Applied Sciences, Steinmüllerallee 1, 51643, Gummersbach, Germany
Martin Zaefferer & Thomas Bartz-Beielstein
Department of Computer Science, TU Dortmund University, Otto-Hahn-Str. 14, 44227, Dortmund, Germany
Günter Rudolph

Authors

Martin Zaefferer
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Bartz-Beielstein
View author publications
You can also search for this author in PubMed Google Scholar
Günter Rudolph
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Zaefferer.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Distance measures for permutations

In the following, we describe the distance measures employed in the experiments.

The Levenshtein distance is an edit distance measure:

${d} _{Lev}(\pi ,\pi ') = edits_{\pi \rightarrow \pi '}$

Here, $edits_{\pi \rightarrow \pi '}$ is the minimal number of deletions, insertions, or substitutions required to transform one string (or here: permutation) $\pi $ into another string $\pi '$. The implementation is based on Wagner and Fischer (1974).
Swaps are transpositions of two adjacent elements. The Swap distance [also: Kendall’s Tau (Kendall and Gibbons 1990; Sevaux and Sörensen 2005) or Precedence distance (Schiavinotto and Stützle 2007)] counts the minimum number of swaps required to transform one permutation into another. For permutations, it is (Sevaux and Sörensen 2005):
$$\begin{aligned} {d} _{Swa}(\pi ,\pi ')&= \sum _{i=1}^{m} \sum _{j=1}^{m} z_{ij} ~~ \text {with}\\ z_{ij}&= \left\{ \begin{array}{l l} 1 &{} \quad \text {if } \pi _i < \pi _j ~\text {and}~ \pi '_i > \pi '_j ,\\ 0 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
An interchange operation is the transposition of two arbitrary elements. Respectively, the Interchange (also: Cayley) distance counts the minimum number of interchanges ($interchanges_{\pi \rightarrow \pi '}$) required to transform one permutation into another (Schiavinotto and Stützle 2007):

${d} _{Int}(\pi ,\pi ') = interchanges_{\pi \rightarrow \pi '}$
The Insert distance is based on the longest common subsequence $LCSeq(\pi ,\pi ')$. The longest common subsequence is the largest number of elements that follow each other in both permutations, with interruptions. The corresponding distance is

${d} _{Ins}(\pi ,\pi ') = m-LCSeq(\pi ,\pi ').$

We use the algorithm described by Hirschberg (1975). The name is due to its interpretation as an edit distance measure. The corresponding edit operation is a combination of insertion and deletion. A single element is moved from one position (delete) to a new position (insert). It is also called Ulam’s distance (Schiavinotto and Stützle 2007).
The Longest Common Substring distance is based on the largest number of elements that follow each other in both permutations, without interruption. Unlike the longest common subsequence all elements have to be adjacent. If $LCStr(\pi ,\pi ')$ is the length of the longest common string, the distance is
$$\begin{aligned} {d} _{LCStr}(\pi ,\pi ')= m-LCStr(\pi ,\pi '). \end{aligned}$$
The R-distance (Campos et al. 2005; Sevaux and Sörensen 2005) counts the number of times that one element follows another in one permutation, but not in the other. It is identical with the uni-directional adjacency distance (Reeves 1999). It is computed by
$$\begin{aligned} {d} _{R}(\pi ,\pi ')&= \sum _{i=1}^{m-1} y_i ~~ \text {with}\\ y_i&= \left\{ \begin{array}{ll} 0 &{} \quad \text {if }\exists j : \pi _i=\pi '_j ~\text {and}~ \pi _{i+1}=\pi '_{j+1} ,\\ 1 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
The (bi-directional) Adjacency distance (Reeves 1999; Schiavinotto and Stützle 2007) counts the number of times two elements are neighbors in one, but not in the other permutation. Unlike R-distance (uni-directional), the order of the two elements does not matter. It is computed by
$$\begin{aligned} {d} _{Adj}(\pi ,\pi ')&= \sum _{i=1}^{m-1} y_i ~~ \text {with}\\ y_i&= \left\{ \begin{array}{l l} 0 &{} \quad \text {if }\exists j : \pi _i=\pi '_j ~\text {and}~ \pi _{i+1} \in \{\pi '_{j+1}, \pi '_{j-1} \},\\ 1 &{} \quad \text {otherwise.} \end{array} \right. \end{aligned}$$
The Position distance (Schiavinotto and Stützle 2007) is identical with the Deviation distance or Spearman’s footrule (Sevaux and Sörensen 2005), ${d} _{\text {Pos}}(\pi ,\pi ') = \sum _{k=1}^{m} |i-j | ~~\text {where}~~\pi _i = \pi '_j = k$ .
The non-metric Squared Position distance is Spearman’s rank correlation coefficient (Sevaux and Sörensen 2005). In contrast to the Position distance, the term $|i-j|$ is replaced by $(i-j)^2$.
The Hamming distance or Exact Match distance simply counts the number of unequal elements in two permutations, i.e., ${d} _{Ham}(\pi ,\pi ') = \sum _{i=1}^{m} a_i, ~~\text {where}~~ a_i = \left\{ \begin{array}{l l} 0 &{} \quad \text {if } \pi _i = \pi '_i,\\ 1 &{} \quad \text {otherwise.} \end{array} \right. $
The Euclidean distance is ${d} _{Euc}(\pi ,\pi ') = \sqrt{\sum _{i=1}^{m} (\pi _i-\pi '_i)^2}$ .
The Manhattan distance (A-Distance, cf. (Sevaux and Sörensen 2005; Campos et al. 2005)) is ${d} _{Man}(\pi ,\pi ') = \sum _{i=1}^{m} |\pi _i-\pi '_i|$ .
The Chebyshev distance is ${d} _{Che}(\pi ,\pi ') = \underset{1 \le i \le m}{\max }(|\pi _i-\pi '_i|)$ .
For permutations, the Lee distance (Lee 1958; Deza and Huang 1998) is ${d} _{Lee}(\pi ,\pi ') = \sum _{i=1}^{m} \min (|\pi _i-\pi '_i|,m-|\pi _i-\pi '_i|)$ .
The non-metric Cosine distance is based on the dot product of two permutations. It is derived from the cosine similarity (Singhal 2001) of two vectors:
$$\begin{aligned} {d} _{Cos}(\pi ,\pi ') = 1 - \frac{\pi \cdot \pi '}{||\pi ||~||\pi '||}. \end{aligned}$$
The Lexicographic distance regards the lexicographic ordering of permutations. If the position of a permutation $\pi $ in the lexicographic ordering of all permutations with fixed m is $L(\pi )$, then the Lexicographic distance metric is
$$\begin{aligned} {d} _{Lex}(\pi ,\pi ') =| L(\pi ) - L(\pi ')|. \end{aligned}$$

Table 3 Minimal examples for indefinite distance matrices. The matrix in the table is the actual distance matrix, while the eigenvalue refers to the transformed matrix $\hat{D}$ derived from Eq. (3). The lower triangular matrix is omitted due to symmetry

Full size table

Appendix B: Minimal examples for indefinite sets

To showcase the usefulness of the proposed methods, this section lists small example datasets and the respective indefinite distance matrices. Besides the standard permutation distances we also tested:

Signed permutations, reversal distance Permutations where each element has a sign are referred to as signed permutations. An application example for signed permutations is, e.g., weld path optimization (Voutchkov et al. 2005). The reversal distance counts the number of reversals required to transform one permutation into another. We used the non-cyclic reversal distance provided in the GRAPPA library version 2.0 (Bader et al. 2004).
Labeled trees, tree edit distance Trees in general are widely applied as solution representation, e.g., in Genetic Programming. In this study, we considered labeled trees. The tree edit distance counts the number node insertions, deletions or relabels. We used the efficient implementation in the APTED 0.1.1 library (Pawlik and Augsten 2015, 2016). The labeled trees will be denoted with the bracket notation: curly brackets indicate the tree structure, letters indicate labels (internal and terminal nodes).
Strings, optimal String Alignment distance (OSA) The OSA is an non-metric edit distance that counts insertions, deletions, substitutions and transpositions of characters. Each substring can be edited no more than once. It is also called the restricted Damerau-Levenshtein distance (Boytsov 2011). We used the implementation in the stringdist R-package (van der Loo 2014).
Strings, Jaro–Winkler distance The Jaro Winkler distance is based on the number of matching characters in two strings as well as the number of transpositions required to bring all matches in the same order. We used the implementation in the stringdist R-package (van der Loo 2014).

The respective results are listed in Table 3. All of the listed distance measures are shown to be non-CNSD.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zaefferer, M., Bartz-Beielstein, T. & Rudolph, G. An empirical approach for probing the definiteness of kernels. Soft Comput 23, 10939–10952 (2019). https://doi.org/10.1007/s00500-018-3648-1

Download citation

Published: 26 November 2018
Issue Date: November 2019
DOI: https://doi.org/10.1007/s00500-018-3648-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical approach for probing the definiteness of kernels

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Genetic algorithms: theory, genetic operators, solutions, and applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A: Distance measures for permutations

Appendix B: Minimal examples for indefinite sets

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical approach for probing the definiteness of kernels

Abstract

Access this article

Similar content being viewed by others

Particle swarm optimization algorithm: an overview

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Genetic algorithms: theory, genetic operators, solutions, and applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A: Distance measures for permutations

Appendix B: Minimal examples for indefinite sets

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation