Abstract
We present a theoretical framework for reproducing kernel-based reconstruction methods in certain generalized Besov spaces based on positive, essentially self-adjoint operators. An explicit representation of the reproducing kernel is given in terms of an infinite series. We provide stability estimates for the kernel, including inverse Bernstein-type estimates for kernel-based trial spaces, and we give condition estimates for the interpolation matrix. Then, a deterministic error analysis for regularized reconstruction schemes is presented by means of sampling inequalities. In particular, we provide error bounds for a regularized reconstruction scheme based on a numerically feasible approximation of the kernel. This allows us to derive explicit coupling relations between the series truncation, the regularization parameters and the data set.
Similar content being viewed by others
1 Introduction
In this article, we develop an analysis for numerically feasible reproducing kernel-based reconstruction methods in the general setting of metric measure spaces with heat kernel induced geometries, see [10, 18]. Such spaces are of practical importance in various machine learning applications. There, high-dimensional reconstruction problems in the Euclidean setting suffer from the so-called curse of dimension, see [15, 27]. One reason why reconstruction is, however, sometimes still feasible is that the underlying metric structure of the data is highly non-Euclidean. Over the last years, there has been large interest in exploiting this fact, see, for instance, [17, 63]. Moreover, practical data analysis and machine learning tasks are usually formulated in the setting of probability spaces, i.e., measures spaces, see, for instance, [54]. A further analytical challenge for the numerical analysis of machine learning problems is the fact that the underlying domain, in which the data are contained, might not be an easily recognizable (sub-)manifold of the ambient Euclidean space but might rather have a discrete structure such as a graph or a tree, see [5, 6], and see [14] for reproducing kernels on Riemannian manifolds. The advantage of the general framework presented here is that both, continuous Riemannian manifolds and discrete structures, can be treated simultaneously. One focus of recent research in this direction has been on diffusion polynomials, see, for instance, [31]. Moreover, approximation problems in such spaces have also gained attention. Here, to the best of our knowledge, most works have been devoted to wavelet-type approximation schemes, see [34], and also [39] for the sphere and [7] for the torus.
Furthermore, many successful algorithms in machine learning make use of a reproducing kernel Hilbert space structure, see [52]. A practical benefit of those methods is that they have an energy optimization principle in the background with a solution which, though nominally the solution of an infinite-dimensional optimization problem, can be expressed in terms of finite linear combinations of the kernel. The weights can usually be obtained by solving a finite-dimensional optimization problem whose dimension is linear in the number of data points. Therefore, it is desirable to have reproducing kernels in this general framework available, see also [54].
To this end, generalized Besov spaces \(\mathcal {B}_{p,q}^{\sigma }(M;\mathcal {D})\) were introduced in [10], which are based on an essentially self-adjoint operator \(\mathcal {D}\) and its associated heat kernel defined on a rather general metric measure space \((M,\rho ,\mu )\). This setting covers in particular uniformly elliptic operators in divergence form, Laplace–Beltrami operators on Riemannian manifolds with nonnegative Ricci curvatures, heat kernels generated by Jacobi operators, but also Schrödinger-type operators, see [20], and graph Laplacians if the graph satisfies a relative Faber–Krahn inequality, see [9] and also [25]. The embedding properties derived in [10] show that the generalized Besov spaces \(\mathcal {B}_{p,q}^{\sigma }(M;\mathcal {D})\) are indeed reproducing kernel Hilbert spaces for \(p=q=2\) and \(\sigma >0\) large enough. For those spaces, we derive an explicit multiscale representation of the associated reproducing kernel, see Theorem 2, i.e.,
where \(b>1\) is a fixed parameter, \(\sigma >0\) determines the smoothness, and \({{{\varvec{{S}}}^{(\ell ;\sigma )}}}\) are integral kernels corresponding to operators defined via smooth functional calculus in terms of the operator \(\mathcal {D}\), see Sects. 2 and 3 for the precise definitions. We show how the analysis from [19, 42, 43] for kernel-based approximation with kernels given in series form can be extended to this setting, and we show improved results using properties of kernels of the specific multiscale form (1).
Then, our main focus lies on the error analysis of reconstruction processes of the following form. Given data \(y_n=f(x_n)\), \(x_n\in X_N \subset M\), generated by an unknown function \(f\in \mathcal {B}_{2,2}^\sigma (M;\mathcal {D})\), we consider minimizers of the regression functional
with a regularization parameter \(\alpha >0\). It is well known that the minimizer lies in
The main technical difficulty here is that the kernel \(K^{\left( \sigma \right) }\) of (3) is usually not available in closed form, and hence, further numerical approximations have to be incorporated to obtain a numerically feasible approximation. In particular, we make use of the multiscale representation (1) and address both a careful truncation of the infinite series and the numerical errors in the approximative computation of the spectral projections. Thus, we first consider in the spirit of [19] the error made when, instead of \(\mathcal {L}_{X_N}\) from (3), one uses
In view of the Mairhuber–Curtis theorem [12, 32], the truncation parameter L has to be coupled to the point set \(X_{N}\) in order to maintain unisolvency. To this end, we derive an explicit lower bound on L depending on \(X_{N}\), which guarantees the existence of a quasi-optimal interpolant from \(\mathcal {L}^L_{X_{N}}\) at \(X_{N}\) for arbitrary data, see [19]. The lower bound takes the form \(L\ge \ln (cq_{X_N}^{-1})\) with the separation distance \(q_{X_{N}}:=\inf _{x_n\ne x_m\in X_N}\rho (x_n,x_m)\) and a generic constant \(c>0\). Furthermore, in practical applications the mere truncation (4) of the kernel is not yet sufficient since, in general, the eigenfunctions and eigenvectors of \(\mathcal {D}\) which enter the definition of \({{{\varvec{{S}}}^{(\ell ;\sigma )}}}\) are not analytically available and must be approximated properly. Therefore, we take into account also the numerical error made by approximating these eigenfunctions with prescribed accuracy.
Subsequently, the discretization error is addressed by means of sampling inequalities for functions from generalized Besov spaces \(\mathcal {B}_{p,q}^{\sigma }(M;\mathcal {D})\). Such sampling inequalities have been studied over the last years for classical Sobolev spaces, see e.g., [2,3,4, 16, 19, 30, 41, 44, 47, 49, 57, 61] and have proven to be useful for the error analysis of stable and consistent approximation schemes, including spline smoothing and support vector regression algorithms, see [47, 48, 61]. A typical example takes the form
where \(\sigma >d/p\), and \(X_N=\{x_1,\dots ,x_N\}\subset M\) is a discrete point set with sufficiently small fill distance \(h_{X_N,M}:=\sup _{z\in M}\inf _{x_n\in X_N}\rho (z,x_n)\le h_0\), compare also Theorem 6. Here, d stands for a parameter of the metric measure space M which generalizes the notion of dimension, see Sect. 2. Such inequalities can be used to obtain approximation error bounds for a large class of reconstruction methods in the following way, see [47] and the references given there: Suppose that a reconstruction process assigns to any function f sampled at the discrete points \(X_N\) an approximation \(\mathcal {R}f\) for which a stability property \(\Vert f-\mathcal {R}f\Vert _{\mathcal {B}_{p,q}^\sigma (M;\mathcal {D}) }\le C\Vert f\Vert _{\mathcal {B}_{p,q,}^\sigma (M;\mathcal {D})}\) and a consistency property \(\max _{x_n\in X_{N}}|f(x_n)-\mathcal {R}f(x_n)|\le G(f)\) holds. Then, by (5) applied to the residual \(f-\mathcal {R}f\), we obtain the error estimate
Thus applied to minimizers of a numerically feasible approximation of \( J_{\mathbf {y};\alpha ;X_{N}}\) as defined in (2), we obtain an upper bound on the approximation error which is explicit in the problem parameters and hence suggests coupling conditions of the parameters to ensure asymptotic convergence.
As a side product of our analysis, we derive two important stability properties, namely inverse Bernstein-type inequalities for \(\mathcal {L}_{X_N}\) and lower bounds on the smallest eigenvalues \(\lambda _{\min }\) of interpolation matrices based on \(K^{(\sigma )}\). Precisely, we show (see Proposition 7) that for all point sets \(X_N\) with sufficiently small separation distance \(q_{X_N}\), it holds
and (see Proposition 8)
where \(K^{\left( \sigma \right) }_{X_{N},X_{N}}:=\left( K^{\left( \sigma \right) }(x_n,x_m)\right) _{x_n,x_m\in X_N}\) denotes the Gramian matrix. Proving Bernstein estimates of the form (6) for kernel-based trial spaces has been an active area of research during the last years (see, for example, [22, 33, 35, 37, 38, 46, 50, 58]). If combined with sampling inequalities of the form (5), such inverse estimates can be used to prove stability of reconstruction schemes (see [47]). We point out that we obtain inverse estimates for a variety of domains, including such with boundaries. In our case, the kernel depends explicitly on the domain, while often the kernel is defined on the whole space, and inverse estimates are proven for subdomains.
On the other hand, eigenvalue estimates of the form (7), see e.g. [8, 60] for related results, can be used to get upper and lower bounds on the covering numbers which are frequently used in learning theory [11, 64].
The remainder of this paper is organized as follows: In Sect. 2, we introduce the necessary technical framework of metric measure spaces and heat kernels. In Sect. 3, we show how certain generalized Besov spaces carry the additional structure of a reproducing kernel Hilbert space and we characterize their kernels as infinite series (see Theorem 2). In Sect. 4, we present sampling inequalities for Besov spaces. Here we discuss two approaches. First, to derive bounds on arbitrary \(L^p(M;\hbox {d}\mu )\)-norms of Besov functions also on measure spaces with infinite mass, we consider maximal \(\delta \)-nets as sampling points (see Theorem 3). Second, for the special case of finite volume measure spaces, we derive bounds on the strongest \(L^\infty (M;\hbox {d}\mu )\)-norm for more general data sets, see Theorem 6. In Sect. 5, the truncation of the kernel series expansion (1) is discussed. We derive an explicit lower bound for the truncation parameter L depending on the point set \(X_N\) which guarantees the existence of a quasi-optimal interpolant to arbitrary given data at \(X_N\), see Theorem 7. The essential step in the proof is the derivation of a Riesz basis involving the integral kernels \({{{\varvec{{S}}}^{(\ell ;\sigma )}}}\), see Lemma 7. Section 5.2 is devoted to the proof of two stability properties of kernel-based approximation. Here we show inverse Bernstein-type estimates for \(\mathcal {L}_{X_N}\) as defined in (3), see Proposition 7, and derive bounds on the condition of the interpolation matrix based on the kernel \(K^{(\sigma )}\), see Proposition 8. Finally, in Sect. 6, we analyze a regularized reconstruction method using a numerically feasible approximation of the kernel function. Here, besides the discretization error, we take into account the series truncation error and the error that stems from the numerical approximation of the spectrum of \(\mathcal {D}\), see Theorem 9.
2 Notation and Auxiliary Results
In the sequel, we denote by C and c generic positive constants that may change from line to line and from expression to expression. We denote by \(C^\infty ([0,\infty ))\) the space of smooth functions \([0,\infty )\rightarrow \mathbb {R}\), and for \(R>0\), we denote by \(C_c^\infty ([0,R])\) the subspace of \(C^{\infty }([0,\infty ))\) of functions with compact support in [0, R].
2.1 Measurable Metric Spaces
Let us use the framework of [10] and briefly recall basic definitions and assumptions. Suppose \(\left( M,\rho ,\mu \right) \) is a metric measure space with the following properties:
-
(i)
\(\left( M,\rho \right) \) is a locally compact separable metric space with distance \(\rho : M \times M \rightarrow [0,\infty )\). Furthermore, \(\mu \) is a positive Radon measure with the volume doubling property, i.e., there is a constant \(d>0\) such that
$$\begin{aligned} 0< \mu \left( B\left( x,2r \right) \right) \le 2^{d} \mu \left( B\left( x,r \right) \right) < \infty \quad \text {for all } x \in M \text { and } r>0. \end{aligned}$$(8)Here \(B\left( x,r \right) :=\left\{ y \in M \ : \ \rho \left( x,y \right) <r \right\} \) denotes the open ball with radius r around x, and the constant d is a parameter of the space \((M,\rho )\), which generalizes the notion of dimension.
-
(ii)
The reverse doubling condition holds, i.e., there is a constant \(\beta >0\) such that
$$\begin{aligned} \mu \left( B\left( x,2r \right) \right) \ge 2^{\beta }\mu \left( B\left( x,r \right) \right) \quad \text {for all } x\in M \text { and all } 0<r<{{\mathrm{diam}}}M /3. \end{aligned}$$(9)Note that in general \(\beta \) can be different from d. The reverse doubling condition follows from (8) if M is connected (see [10, Proposition 2.2]).
-
(iii)
The non-collapsing condition holds, i.e., there is a constant \(c>0\) such that
$$\begin{aligned} \inf _{x \in M} \mu \left( B\left( x,1 \right) \right) \ge c >0 \quad \text {for all } x \in M. \end{aligned}$$(10)
It follows from (8) that for x, \(y\in M\) and \(r>0\) (see [10, (2.2)])
If M is connected and \(\mu (M)<\infty \), then (8) implies (9) and (10), see [10]. We refer to [26] for a discussion of the parameters and their connection to the Assouad dimension.
2.2 Heat Kernels on Metric Measure Spaces
We recall that a family \((p_{t})_{t>0}\) of kernel functions \(p_{t}:M\times M \rightarrow \mathbb {R}\) is called heat kernel if for almost all \(x,y \in M\), all \(s,t>0\) and all \(f \in L^{2}\left( M;d\mu \right) \)
where we use the notation \(L^2\!\!-\!\!\lim _{t\rightarrow 0}a_t=b\) if \(\lim _{t\rightarrow 0}\Vert a_{t}-b\Vert _{L^2(M;d\mu )}=0\). As in [10], we impose the following additional conditions on the heat kernel:
-
(i)
Small time Gaussian upper bound: For all \(0< t\le 1\) and \(x,y \in M \)
$$\begin{aligned} |p_t\left( x,y\right) |\le C\frac{\exp \left( -c\frac{\rho ^2\left( x,y\right) }{t}\right) }{ \sqrt{\mu \left( B\left( x, \sqrt{t}\right) \right) \mu \left( B\left( y, \sqrt{t}\right) \right) }}. \end{aligned}$$(13) -
(ii)
Hölder continuity with exponent \(\alpha _{H}>0\): For all \(0<t\le 1\) and \(x,y,\tilde{y} \in M\) with \(\rho \left( y,\tilde{y} \right) \le \sqrt{t}\),
$$\begin{aligned} \left| p_{t}\left( x,y \right) -p_{t}\left( x,\tilde{y} \right) \right| \le C_1 \left( \frac{\rho \left( y,\tilde{y} \right) }{\sqrt{t}} \right) ^{\alpha _{H}} \frac{\exp \left( -c \frac{\rho ^{2}\left( x,y \right) }{t} \right) }{\sqrt{\mu \left( B\left( x,\sqrt{t} \right) \right) \mu \left( B\left( y,\sqrt{t} \right) \right) }}. \end{aligned}$$(14) -
(iii)
Markov property: For all \(t>0\)
$$\begin{aligned} \int _{M}p_{t}\left( x,y \right) \,d\mu \left( y \right) \equiv 1. \end{aligned}$$(15)
A heat kernel gives rise to a family of operators
As worked out in [21], the conditions on the heat kernel ensure that the family of operators \(\{P_{t}\}_{t>0}\) is a strongly continuous, symmetric Markovian semigroup in \(L^{2}\left( M;d \mu \right) \). The associated infinitesimal generator \(\mathcal {D}\) is defined by
Moreover, the domain \({{\mathrm{Dom}}}\mathcal {D}\) of \(\mathcal {D}\), i.e., the subspace of \(L^{2}\left( M;d\mu \right) \) for which the limit exists, is a dense subspace of \(L^{2}\left( M;d\mu \right) \). By construction, \(\mathcal {D}\) is a self-adjoint and positive definite operator. Furthermore, there is a unique associated spectral resolution of the identity, denoted by \(\{\mathbb {E}_{\lambda } \}_{\lambda \in [0,\infty )}\), such that \(\mathbb {E}_\lambda \) is a bounded linear operator \(L^{2}\left( M;d\mu \right) \rightarrow L^{2}\left( M;d\mu \right) \) for every \(\lambda >0\), and
The spectral resolution can be used to define for continuous \({{\varvec{{t}}}:\mathbb {R}\rightarrow [0,\infty )}\) the operator
Such operators are often integral operators. In [13, Theorem 6], it is shown that a linear operator \(\mathfrak {g}:L^{1}\left( M;d\mu \right) \rightarrow L^{\infty }(M;d\mu )\) is bounded if and only if there is an integral kernel \( G:M \times M \rightarrow \mathbb {R}\) satisfying \( G \in L^{\infty }\left( M \times M \right) \) and
From now on, we always consider a metric measure space \(\left( M,\rho ,\mu \right) \) with the above outlined properties, in particular (8), (9) and (10), and an essentially self-adjoint positive operator \(\mathcal {D}\) on \(L^2(M;d\mu )\) with corresponding heat kernel satisfying (13), (14) and (15). Throughout the remainder of this paper, we employ the following notation.
Notation 1
We denote smooth functions \([0,\infty ) \rightarrow [0,\infty )\) that are used to define operators via spectral calculus by bold lowercase letters, and the corresponding (integral) kernels by the corresponding bold uppercase letters. Thus, for \({{\varvec{{t}}}:\mathbb {R}\rightarrow \mathbb {R}}\) and \(\delta >0\) such that \({{\varvec{{t}}}(\delta \sqrt{\mathcal {D}})}\) given by (16) is bounded as operator \(L^{1}\left( M;d\mu \right) \rightarrow L^{\infty }(M;d\mu )\), we denote the associated integral kernel by \({{\varvec{{T}}}_\delta }\). Precisely, for \(0<\delta \le 1\) and a smooth function \({{\varvec{{t}}}:[0,\infty )\rightarrow [0,\infty )}\), we set
Moreover, we set \({{\varvec{{T}}}:={\varvec{{T}}}_1}\).
We note that under the above assumptions, the kernel \({{\varvec{{t}}}_\delta (x,y)}\) is real-valued, see [28, Sect. 2.5].
2.3 Smooth Cutoff Functions
Next, we will need several smooth cutoff functions with various properties that we collect in this subsection. Our computations follow the lines of [10].
Definition 1
Let \(b>1\), \( \zeta \ge 1\), \(1>c_1>0\) and \(c>0\).
-
(a)
We set
$$\begin{aligned} \mathcal {G}\left( b, \zeta \right):= & {} \left\{ {\varvec{{t}}}\in C^\infty ([0,\infty ){;[0,\infty )}):\right. \nonumber \\&\left. \left. \frac{d^\ell }{du^\ell }{\varvec{{t}}}(u)\right| _{u=0}=0 \quad \text {for all }1\le \ell \le \zeta ,\text { and }{\text {supp}}{\varvec{{t}}}\subset [0,b] \right\} . \end{aligned}$$(18)Elements of \(\mathcal {G}(b, \zeta )\) are called smooth cutoff functions of order \(\zeta \) with support b. We set
$$\begin{aligned} \mathcal {G}(b):=\bigcap _{\zeta \ge 1}\mathcal {G}(b,\zeta ). \end{aligned}$$(19)A function \({{\varvec{{t}}}\in \mathcal {G}(b,\zeta )}\) is called normalized if \({0\le {\varvec{{t}}}\le 1}\).
-
(b)
We define \(\mathcal {A}(b,c_1)\) to consist of those \({{\varvec{{t}}}\in \mathcal {G}(b)}\) for which
-
(i)
\({{\text {supp}}{\varvec{{t}}}\subset [0,b]}\),
-
(ii)
\({{\varvec{{t}}}\ge c_1>0}\) on \([0,b^{3/4}]\).
-
(i)
-
(c)
We define \(\mathcal {E}(b,c)\) to consist of those \({{\varvec{{t}}}\in \mathcal {G}(b)}\) for which
-
(iii)
\({{\text {supp}}{\varvec{{t}}}\subset [b^{-1},b]}\),
-
(iv)
\({{\varvec{{t}}}\ge c>0}\) on \([b^{-3/4},b^{3/4}]\).
-
(iii)
-
(d)
A pair \({({\varvec{{t}}},{\varvec{{p}}})\in {\mathcal {C}^\infty ([0,\infty );[0,\infty ))\times \mathcal {C}^\infty ([0,\infty );[0,\infty ))}}\) is called a partition of unity if
$$\begin{aligned} {{\varvec{{t}}}(u)+\sum _{\ell =1}^\infty {\varvec{{p}}}(b^{-\ell }u)=1 \quad \text {for all }u\in [0,\infty ).} \end{aligned}$$(20)
Remark 1
Let \(b>1\) and \(c>0\). If \({({\varvec{{t}}},{\varvec{{p}}})\in \mathcal {G}(b)\times \mathcal {G}(b)}\) is a partition of unity, then \({{\varvec{{t}}}(0)=1}\) and \({{\varvec{{p}}}(0)=0}\).
We introduce some abbreviations that will be used for the rest of this paper. From now on, we assume that \(b>1\) is fixed, and the generic constants c and C may depend on b without further mentioning.
Notation 2
For \(b>1\) and \({{\varvec{{t}}} \in C^{\infty }([0,\infty ))}\), we set
and, for \(\ell \in \mathbb {N}\)
Lemma 1
Let \({{\varvec{{t}}}\in C^\infty ([0,\infty ){;[0,\infty )})}\) with \({{\varvec{{t}}}(0)=1}\). Then
and \({({\varvec{{t}}},{\varDelta }{\varvec{{t}}})}\) form a partition of unity.
Proof
We compute the partial sums for \(L\in \mathbb {N}\). By a telescopic sum argument, we have for every \(u\in [0,\infty )\),
\(\square \)
We collect some more properties of the cutoff functions that will be used later on.
Remark 2
Let \(b>1\).
-
(i)
Suppose that \(\tau >d+1\) and \({{\varvec{{t}}}\in \mathcal {G}(b)}\). By [28, Theorem 3.1] and [10, (2.7)], \({{\varvec{{t}}}(\delta \sqrt{\mathcal {D}})}\) for \(0<\delta \le 1\) is an integral operator and there exist \(c_\tau >0\) and \(c_\tau ^{\prime }>0\) such that, for every \(0<\delta \le 1\), the associated integral kernel (compare (17)) satisfies
$$\begin{aligned} {\big |{\varvec{{t}}}_\delta \left( x,y\right) \big | \le c_\tau \mu \left( B\left( x,\delta \right) \right) ^{-1}\Big (1+\frac{\rho \left( x,y\right) }{\delta }\Big )^{d/2-\tau };} \end{aligned}$$(24)and, if \(\rho (x,x')\le \delta \), then with \(\alpha _{H}\) from (14) it holds
$$\begin{aligned} {\big |{\varvec{{t}}}_\delta \left( x,y\right) -{\varvec{{t}}}_\delta (x^{\prime },y) \big | \le c_\tau ^{\prime } \Big ( \frac{\rho ( x,x^{\prime })}{\delta } \Big )^{\alpha _{H} }\mu \left( B\left( x,\delta \right) \right) ^{-1}\Big (1+\frac{\rho \left( x,y\right) }{\delta }\Big )^{d/2-\tau }.} \end{aligned}$$(25)Let \(1\le p\le \infty \). Then, by [10, Corollary 3.6], there is \(C>0\) such that for all \(0<\delta \le 1\) and all \(f\in L^p(M;d\mu )\)
$$\begin{aligned} {\Vert {\varvec{{t}}}(\delta \sqrt{\mathcal {D}})f\Vert _{L^{p}\left( M;d\mu \right) }\le C\Vert f\Vert _{L^{p}\left( M;d\mu \right) }.} \end{aligned}$$(26) -
(ii)
Littlewood–Paley-type decomposition: Suppose further that \({\left( {\varvec{{t}}},{\varvec{{p}}} \right) \in \mathcal {G}\left( b \right) \times \mathcal {G}(b)}\) with \({\text {supp}}\mathbf {p} \subset [b^{-1}, b]\) is a partition of unity. Then, by [10, Corollary 3.9] it holds for all \(f\in L^{p}\left( M;d\mu \right) \) with \(1\le p<\infty \)
$$\begin{aligned} {f={\varvec{{t}}}(\sqrt{\mathcal {D}})f+\sum _{\ell =1}^\infty {\varvec{{p}}}(b^{-\ell }\sqrt{\mathcal {D}})f.} \end{aligned}$$(27) -
(iii)
Suppose that \({{\varvec{{f}}}}\), \({{\varvec{{g}}}\in C_c^\infty ([0,R])}\) for some \(R>0\) and \(0\le {\varvec{{f}}}\le {\varvec{{g}}}\). Then, by [10, (3.43)], it holds for the associated integral kernels with \(\delta =1\) that
$$\begin{aligned} {0\le {\varvec{{F}}}(x,x)\le {\varvec{{G}}}(x,x)\text { for every }x\in M.}\end{aligned}$$ -
(iv)
By [10, Lemma 3.19 (b)], there exists \(\tilde{b}>1\) such that for all \(r\ge \max \{1,\, 3/{{\mathrm{diam}}}M\}\) and all \(x\in M\)
$$\begin{aligned} {C_1\mu \left( B(x,r^{-1}) \right) ^{-1}\le {\varvec{{1}}}_{[r,\tilde{b}r]}(x,x)\le C_2\mu \left( B(x,r^{-1}) \right) ^{-1},} \end{aligned}$$(28)where \({{\varvec{{1}}}_{[r,\tilde{b}r]}}\) denotes the integral kernel associated with the operator \({{\varvec{{1}}}_{[r,\tilde{b}r]}(\sqrt{\mathcal {D}})}\) with the characteristic function \({{\varvec{{1}}}}\). The constants \(C_1\) and \(C_2\) depend only on the parameters of the space.
The parameter \(\tilde{b}\) from Remark 2 (iv) will play a crucial role in our analysis, see also [28]. Therefore, we present a lower bound on \(\tilde{b}\) in the Euclidean case \(M=\mathbb {R}^d\) in appendix. In particular, it turns out that \(\tilde{b}\) depends on the space dimension d and \(\tilde{b}(d)\rightarrow 1\) as \(d\rightarrow \infty \).
2.4 Spectral Spaces
Spectral spaces are usually defined as invariant sets under integral operators. Precisely, for \(1\le p\le \infty \) and a compact set \(K \subset [0,\infty )\), we define the associated spectral space \(\varSigma _{K}^{p}\), see [10, Definition 3.10],
We will need the following result proven in [10, Proposition 3.12].
Proposition 1
Let \(1\le p\le q\le \infty \). If \(R\ge 1\), then \(\varSigma _{[0,R]}^{p} \subset \varSigma _{[0,R]}^{q}\), and there is a constant \(C>0\) (independent of R) such that
and
with \(\alpha _{H}\) from (14).
Remark 3
Note that Proposition 1 implies in particular that functions in \(\varSigma _{[0,R]}^{p}\) for \(R\ge 1\) and \(1\le p\le \infty \) have continuous representatives.
3 Generalized Besov Spaces as Reproducing Kernel Hilbert Spaces
In this section, we recall the notion of generalized Besov spaces. Our main result is Proposition 4, in which we explicitly identify the reproducing kernels for such spaces and derive multiscale decompositions of them. To introduce Besov-type reproducing kernel Hilbert spaces, we define cutoff functions controlling the spectral decay in the kernel expansion. Here, we restrict ourselves to the case \(\delta _{\ell }:=b^{-\ell }\) with a fixed \(b>1\). We essentially follow the notion of Besov spaces based on spectral decompositions as introduced in [10, Section 6] and [28, Section 6], which in turn build on [45, 55, 56].
Now, we briefly recall the setting (see [10, Definition 6.1]) in the case \(p,q\ge 1\) and as in [28] allow for different normalizations of the support (\(b=2\) in the notation of [10]):
Definition 2
Let \(\sigma >0\), \(1\le p\le \infty \), and \({0<} q\le \infty \). Suppose that \({{\varvec{{\phi }}}\in \mathcal {A}(b,c_1)}\), and \({{\varvec{{\psi }}}\in \mathcal {E}(b,c_1)}\) for some \(0<c_1<1\). The Besov space \(\mathcal {B}_{p,q}^\sigma (M;\mathcal {D})\) is defined as
equipped with the norm (see [10, Definition 6.1] for \(q = \infty \))
For \(\mathcal {B}_{2,2}^\sigma (M;\mathcal {D})\), we denote the associated inner product by
The space \(\mathcal {B}^\sigma _{2,2}\left( M;\mathcal {D}\right) \) and its topology do not depend on the specific choice of the functions \({{\varvec{{\psi }}}}\) and \({{\varvec{{\phi }}}}\) (see [10, Section 6]). As in [28, Proof of Proposition 6.5], we will use special cutoff functions \({{\varvec{{\phi }}}}\) and \({{\varvec{{\psi }}}}\) for which \({({\varvec{{\phi }}}^2,{\varvec{{\psi }}}^2)}\) build a partition of unity, see [28, Section 4.4] for the construction. We now build on [10] to show a relation between the Besov spaces and Bessel potential spaces (see Theorem 1).
Lemma 2
Let \(\sigma >0\). Fix \({{\varvec{{\phi }}}}\) and \({{\varvec{{\psi }}}}\) that satisfy the assumptions of Definition 2, such that \({{\varvec{{\phi }}}}|_{[0, 1]}\equiv 1 \) and \({({\varvec{{\phi }}}^2,{\varvec{{\psi }}}^2)}\) form a partition of unity. We set
Then, \({\tilde{{\varvec{{\phi }}}}}\) and \({\tilde{{\varvec{{\psi }}}}}\) satisfy the conditions of Definition 2 and it holds
for all \(u\in [0,\infty )\).
Proof
We check the conditions on \({\tilde{{\varvec{{\phi }}}}}\) first. We have \({{\text {supp}}\tilde{{\varvec{{\phi }}}}={\text {supp}}{\varvec{{\phi }}}\subset [0,b]}\), and \({\left. \frac{d^\nu }{du^\nu }\tilde{{\varvec{{\phi }}}}(u)\right| _{u=0}=b^\sigma \left. \frac{d^\nu }{du^\nu }{\varvec{{\phi }}}(u)\right| _{u=0}=0}\) for \(\nu \ge 1\). Furthermore, if \(u\in [0,b^{3/4}]\), then \({\left| \tilde{{\varvec{{\phi }}}}(u)\right| =b^\sigma {\varvec{{\phi }}}(u)\ge b^\sigma c_{{1}}>0}\), with \(c_{1}\) from Definition 1. Similarly for \({\tilde{{\varvec{{\psi }}}}}\): We have \({{\text {supp}}\tilde{{\varvec{{\psi }}}}={\text {supp}}{\varvec{{\psi }}}\subset [b^{-1},b]}\); and if \(u\in [b^{-3/4},b^{3/4}]\), then \(\left| \tilde{{\varvec{{\psi }}}}(u)\right| = u^\sigma {\varvec{{\psi }}}(u)\ge u^\sigma c_{{1}}\ge b^{-3\sigma /4}c_{{1}}>0\). To show (34), note first that, since \({({\varvec{{\phi }}}^{2},{\varvec{{\psi }}}^{2})}\) is a partition of unity, there holds
which implies
To prove the upper bound in (34), observe that, for \(u\in [0,\infty )\), we have
To show the lower bound, note that, for \(u\le 1\), we have
while, for \(1\le u\), we have
This concludes the proof. \(\square \)
Lemma 2 immediately implies that the Besov spaces \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) are (norm-) equivalent to Bessel potential spaces based on \(\mathcal {D}\). Moreover, we have the following result for the Hilbert space case \(p=q=2\).
Theorem 1
Let \(b>1\). Then, there are constants c, \(C>0\) such that
Proof
Consider more generally smooth cutoff functions \({{\varvec{{s}}}}\), \({{\varvec{{t}}}\in \mathcal {G}(b)}\), and suppose that \({{\varvec{{s}}}\le {\varvec{{t}}}}\) holds pointwise. Then, we have \({\Vert {\varvec{{s}}}(\sqrt{\mathcal {D}})f\Vert _{L^2(M;d\mu )}\le \Vert {\varvec{{t}}}({\sqrt{\mathcal {D}}})f\Vert _{L^2(M;d\mu )}}\) for all \(f\in L^2(M;d\mu )\) since
Hence, the assertion follows by (34), using that \(\frac{1}{2}(\mu ^\sigma +1)^2\le \mu ^{2\sigma }+1\le (\mu ^\sigma +1)^2\) for all \(\mu \ge 0\). \(\square \)
3.1 The Reproducing Kernel and Its Multiscale Decomposition
We fix some cutoff functions that will be used throughout the remainder of this paper. First, we keep \({{\varvec{{\phi }}}}\) and \({{\varvec{{\psi }}}}\) fixed that satisfy the assumptions of Lemma 2. (This corresponds to the choice of the cutoff functions used in the proof of [28, Proposition 6.5].)
Notation 3
We set
Furthermore, for \(\ell \ge {0}\), we set
and (recall Notation 2), we define
and
Lemma 3
We have \({{{{\varvec{{w}}}^{(\ell ;\sigma )}}}\equiv {{{\varvec{{w}}}^{(2;\sigma )}}}}\) for all \(\ell \ge 2\). Furthermore, there is a \(c>0\) such that for all \(\ell \ge 1\), we have the lower bound
and in particular \({{{\varvec{{s}}}^{(\ell ;\sigma )}}\ge c\mathbf {1}_{[b^{\ell }, b^{\ell +\frac{3}{4}}]}}\). Moreover,
Proof
We note that \({b^\ell u\in {\text {supp}}{\varvec{{\phi }}}\subset [0,b]}\) implies that \(u\in [0,b^{-\ell +1}]\). Recall also that for \(\ell \ge 2\) and \({u\in {\text {supp}}{{{{\varvec{{w}}}^{(\ell ;\sigma )}}}}}\), we have \(b^\ell u\ge b^2b^{-1}=b\), and thus \({{\varvec{{\phi }}}(b^\ell u)=0}\). Hence, for all \(\ell \ge 2\),
This shows in particular that \({{{{\varvec{{w}}}^{(\ell ;\sigma )}}}\equiv {{{\varvec{{w}}}^{(2;\sigma )}}}}\) for all \(\ell \ge 2\).
It remains to consider \({{{\varvec{{s}}}^{(\ell ;\sigma )}}}\). We note that by definition there holds
Consider \(u\in [b^{\ell }, b^{\ell +\frac{3}{4}}]\). First, we get with (34)
Second, we have
Putting (40), (41) and (42) together, we obtain
Furthermore, note that \({{\varvec{{\phi }}}(u)-{\varvec{{\phi }}}(bu)\ge 0}\) for all \(u\in \mathbb {R}^+\) since \({{\varvec{{\phi }}}(u)=1\ge {\varvec{{\phi }}}(bu)}\) if \(u\le 1\), and \({{\varvec{{\phi }}}(bu)=0}\) if \(u\ge 1\), and thus, \({{\varvec{{s}}}^{(\ell ;\sigma )}\ge c\mathbf {1}_{[b^{\ell }, b^{\ell +\frac{3}{4}}]}}\). Finally, to show (39), we observe that by Lemma 1 since \({{\varvec{{\phi }}}(0)=1}\), we have
\(\square \)
Next, we aim to show that the Besov spaces \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) are reproducing kernel Hilbert spaces if the index \(\sigma \) is sufficiently large. We recall that \(K^{(\sigma )}:M \times M \rightarrow \mathbb {R}\) is called reproducing kernel for \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) if \(K^{(\sigma {)}}(x,\cdot ) \in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) for all \(x \in M\) and \(f(x)=\left( f,K^{(\sigma {)}}(x,\cdot )\right) _{ \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})}\) for all \(x \in M\) and all \(f \in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) holds. A necessary condition is that point evaluations are continuous linear functionals on \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\). Recall that, for classical Sobolev spaces \(W^{2,\sigma }(\varOmega )\) on Lipschitz domains \(\varOmega \subset \mathbb {R}^d\), the Sobolev embedding theorem guarantees continuous point evaluations if \(\sigma >\frac{d}{2}\), see [1]. In our abstract setting, we do not have a natural meaning of dimensionality, but, as already mentioned, the number d in (8) plays the role of space dimension. Moreover, the parameter \(\sigma >0\) resembles the smoothness parameter of classical Sobolev spaces. Roughly speaking, the larger \(\sigma \), the smoother the respective kernel, and thus the smaller the reproducing kernel Hilbert space. Here small is meant in terms of embeddings, see [10].
In the following proposition, we show that the reproducing kernel possesses a multilevel decomposition which in the sequel will be used to approximate the kernel. This is in the spirit of [7, 19, 36, 42, 43]. In the following, we shall use \({{\varvec{{s}}}^{(\ell ;\sigma )}}\) and \({{\varvec{{k}}}^{(\sigma )}}\) introduced in Notation 3. Recall that we denote the associated integral kernels by \({{{\varvec{{S}}}^{(\ell ;\sigma )}}}\) and \({\varvec{{K}}}^{(\sigma )}\), respectively.Footnote 1
Proposition 2
Let \(\sigma >d/2\) and let \(b>1\). Then, the Besov space \(\mathcal {B}^\sigma _{2,2}\left( M;\mathcal {D}\right) \) with cutoff functions \({\tilde{{\varvec{{\phi }}}}}\) and \({\tilde{{\varvec{{\psi }}}}}\) from Lemma 2 is a reproducing kernel Hilbert space with kernel
Proof
By [10, Propositions 3.12 and 6.7], we have the same embedding properties as for usual Besov spaces, namely \(\mathcal {B}_{p,q}^\sigma (M;\mathcal {D})\subset \mathcal {B}_{p_1,q_1}^{\sigma _1}(M;\mathcal {D})\) if \(1\le p\le p_1\le \infty \), \(0<q\le q_1\le \infty \), \(0<{\sigma _1\le \sigma }<\infty \) and \(\frac{\sigma }{d}-\frac{1}{p}=\frac{\sigma _1}{d}-\frac{1}{p_1}\). In particular, for \(p=q=2\), \(p_1=q_1=\infty \) and \(\sigma _1=\sigma -\frac{d}{2}>0\), we obtain \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\subset \mathcal {B}^{\sigma _1}_{\infty ,\infty }(M;\mathcal {D})\). Furthermore, by [10, Proposition 6.4(b)], we have \(\mathcal {B}^{\sigma _1}_{\infty ,\infty }(M;\mathcal {D})\subset {{\mathrm{Lip}}}\sigma _1\) for \(0<\sigma _1<\alpha _{H}\) with \(\alpha _{H}\) from (14), where, for \(L>0\), \({{\mathrm{Lip}}}L\) denotes the space of functions f for which \(\Vert f\Vert _{L^\infty (M;d\mu )}+\sup _{x\ne y}\frac{|f(x)-f(y)|}{\rho ^L(x,y)}\) is finite. Since we also have the trivial embedding \(\mathcal {B}_{p,q}^\sigma (M;\mathcal {D})\subset \mathcal {B}_{p,q}^{\tau }(M;\mathcal {D})\) if \(\tau \le \sigma \), we deduce that \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\subset {{\mathrm{Lip}}}{\tilde{ \sigma _1}}\) for some \(0<{\tilde{\sigma _1}}<\alpha _{H}\). Hence, point evaluations are well defined on \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\). We show the reproducing property
For that, we compute for \(f\in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) and \(x\in M\), using the notation introduced in (17)
Next, recall at this point that \({{{\varvec{{S}}}^{(\ell ;\sigma )}}}\) is as usual the integral kernel associated with the operator \({{{\varvec{{s}}}^{(\ell ;\sigma )}}(\sqrt{\mathcal {D}})={{\varvec{{g}}}^{(\ell ;\sigma )}}(b^{-\ell }\sqrt{\mathcal {D}}){\varDelta _{\ell }}{\varvec{{\phi }}}(\sqrt{\mathcal {D}})}\). If we denote the integral kernels associated with \({{{\varvec{{g}}}^{(\ell ;\sigma )}}(b^{-\ell }\sqrt{\mathcal {D}})}\) and \({{\varDelta _{\ell }}{\varvec{{\phi }}}(\sqrt{\mathcal {D}})}\) by \({{{\varvec{{G}}}_{b^{-\ell }}^{(\ell ;\sigma )}}}\) and \({{\varDelta _{\ell }}{\varvec{{\varPhi }}}}\), respectively, we obtain
that is,
Inserting this representation into (45), we obtain using (35)
where the last step follows from Lemma 1 and Remark 2 (ii), see [10, Corollary 3.9]. The second equality in (43) follows from (39). \(\square \)
So far, we have identified the reproducing kernel \(K^{(\sigma )}(x,y)\) for the Hilbert space \(\mathcal {B}_{2,2}^\sigma (M;\mathcal {D})\). In the remaining sections, we shall focus on kernel-based approximation schemes based on these reproducing kernels.
4 Sampling Inequalities
In this section, we prove sampling inequalities. They provide a systematic tool for the deterministic error analysis of stable and consistent processes for the reconstruction of functions \(f\in \mathcal {B}_{p,q}^\sigma (M;\mathcal {D})\) from given point values at discrete locations \(X\subset M\), as outlined in the introduction. We will present two approaches. First, we consider maximal \(\delta \)-nets as sampling points which allow in special cases also for estimates if \(\mu (M)=\infty \) (see Corollary 1). Then, we specialize to the case of finite volume and derive estimates in the \(L^{\infty }(M;d\mu )\)-norm using a norming set approach. The latter technique allows for more general scattered sampling points. Note that we derive sampling inequalities for general spaces \(\mathcal {B}_{p,q}^{\sigma }(M;\mathcal {D})\) which can be continuously embedded into the space of bounded continuous functions, while in the applications, we focus on kernel-based methods and hence on Hilbert spaces \(\mathcal {B}_{2,2}^{\sigma }(M;\mathcal {D})\). Related estimates can be found, for instance, in [16, 44].
4.1 Sampling Inequalities Based on Maximal \(\delta \)-Nets
To describe appropriate sampling points, we use the notion of maximal nets (see [10, Definition 2.4]). Let \(\delta >0\). A discrete (possibly infinite) set \(X\subset M\) is called \(\delta \)-net on M if \(q_{X} \ge \delta \), where
denotes the separation distance of X. A \(\delta \)-net X is called a maximal \(\delta \)-net on M if it cannot be enlarged, i.e., if for every \(z\in M \setminus X\) there is an \(x_{n} \in X\) such that \(\rho (x_{n},z){<} \delta \). Note that in this case
where \(h_{X,M}\) denotes the fill distance of X in M. We use the following result from [10, Proposition 2.5]. Under the assumptions on \((M,\mu ,\rho )\) outlined in Sect. 2, for every \(\delta >0\), there exists a maximal \(\delta \)-net \(X_{\delta }\), which consists of at most countably many points. Furthermore, there exists a family of pairwise disjoint, measurable sets \(A_{n}\), \(x_n\in X_\delta \), such that \(M=\bigcup _{x_n\in X_\delta }A_{n}\), and \(B(x_n,\frac{\delta }{2})\subset A_{n}\subset B(x_n,\delta )\).
It turns out that maximal nets are norming sets for the spectral spaces (29). Precisely, we have the following result from [10, Theorem 4.2].
Theorem 2
There is a constant \(\tilde{C}>0\) such that for \(0<\gamma <1\) with
with the constant \(\alpha _H\) from (14), the following holds: For \(R \ge b\) set \(\delta :=\frac{\gamma }{R}\) and consider a maximal \(\delta \)-net \(X_{\delta }\subset M\) with associated disjoint cover \(\left\{ A_{n} \right\} _{x_n \in X_{\delta }} \). Then, for every \(f_{R} \in \varSigma _{[0,R]}^{p}\) with \(1\le p < \infty \), we have
and for \(p=\infty \), we get
Our main contribution in this section is the following sampling inequality.
Theorem 3
Let \(\mu (M) < \infty \) and \(0<\gamma <1\) be such that condition (50) holds. Let \({r}, p \in [1,\infty ]\), \(q \in ( 0{,} \infty ]\), and \(\sigma > \max \left\{ d\left( {1/r-1/p}\right) _+,\,d/{r}\right\} =\frac{d}{r}\), where \((x)_+:=\max \{x,\ 0\}\) for \(x\in \mathbb {R}\). Suppose that \(r\le p\). Then, there is a constant \(c>0\) with the following property: For every \(\delta \in (0,\gamma /b]\) and for every maximal \(\delta \)-net \(X_{\delta }\) with associated disjoint partition \(A_{n}\) of M, \(x_n\in X_\delta \), we have for all \(f\in \mathcal {B}_{{r},q}^\sigma (M;\mathcal {D})\)
where we set \(\left( \sum _{x_n\in X_{\delta }} \mu (A_{n}) \left| f(x_n) \right| ^{p}\right) ^{1/p} := \sup _{x_n \in X_{\delta }} |f(x_n)|\) if \(p=\infty \).
Proof
We note that, by [10, Proposition 6.7], \(\mathcal {B}_{{r},q}^\sigma (M;\mathcal {D})\subset \mathcal {B}_{\infty ,\infty }^{\tau }(M;\mathcal {D})\) with \(\tau =\sigma -d/{r}>0\), and thus, by [10, Proposition 6.4(b)], every element of \(\mathcal {B}_{r,q}^\sigma (M;\mathcal {D})\) has a continuous representative, cf. also the proof of Theorem 2. Therefore, the last term in (53) is well defined. We proceed similarly to the proof of classical sampling inequalities (see, for example, [19, 41, 61]). Set \(R:=\gamma /\delta \ge b\). Let \(f\in \mathcal {B}_{r,q}^\sigma (M;\mathcal {D})\) and \(f_R\in \varSigma _{[0,R]}^{p}\) be arbitrary. Then, for every maximal \(\delta \)-net \(X_\delta \) with associated partition \(A_n\), \(x_n\in X_\delta \), we have for \(p<\infty \) by (51) and with \(\sum _{x_n\in X_\delta }\mu (A_n)=\mu (M)\),
and similarly, using (52),
We will show in Theorem 4 below that for every admissible p, \(r\), q and \(\sigma \), there is \(C>0\) such that for every \(f\in \mathcal {B}^\sigma _{r,q}(M;\mathcal {D})\), we have
which finishes the proof. \(\square \)
Remark 4
Note that for \(p=\infty \), we obtain in Theorem 4 the anticipated rateFootnote 2 \(\delta ^{{\sigma -}d/r}\).
Corollary 1
In the case \(p=\infty \), we can skip the assumption \(\mu (M)<\infty \) in Theorem 3 and we get that there is a constant \(c>0\) with the following property: For every \(\delta \in (0,\gamma /b]\) and for every maximal \(\delta \)-net \(X_{\delta }\) with associated disjoint partition \(A_{n}\) of M, \(x_n\in X_\delta \), we have for all \(f\in \mathcal {B}_{r,q}^\sigma (M;\mathcal {D})\)
It remains to prove the best approximation error estimate (56). The following theorem generalizes results from [10, Theorem 3.15].
Theorem 4
Let \(R\ge b>1\). Suppose that \(1\le r,\, p \le \infty \), \(0<q\le \infty \) and \(\sigma > d\left( {1/r-1/p}\right) _+\), where again \((x)_{+}=\max \{x,0\}\). If \(r\le p\), then there is a constant \(c>0\) such that for every \(f\in \mathcal {B}_{r,q}^\sigma \left( M;\mathcal {D}\right) \)
If additionally \(\mu (M)<\infty \), then for all \(1\le p,r\le \infty \), there is a constant such that for every \(f\in \mathcal {B}_{r,q}^\sigma \left( M;\mathcal {D}\right) \)
Proof
We follow the lines of the proof of [10, Theorem 3.15 & Proposition 3.12]. Here we employ different truncation functions \({{\varvec{{\phi }}}}\) and \({{\varvec{{\psi }}}}\), which yield an equivalent norm to the Besov norm corresponding to the reproducing kernel \(K^{(\sigma )}\). Precisely, let \({{\varvec{{\phi }}} \in \mathcal {A}(b,c_1)}\), and set \({{\varvec{{\psi }}}:={\varDelta }{\varvec{{\phi }}}}\), such that \({{\varvec{{\phi }}}}\) and \({{\varvec{{\psi }}}}\) satisfy the assumptions of Definition 2, and \({({\varvec{{\phi }}},{\varvec{{\psi }}})}\) form a partition of unity. Now choose \(L_R\in \mathbb {N}\), such that \(b^{L_R+1}\le R \le b^{L_R+2}\). Note that this is possible since \(R\ge b\). Then, by Lemma 1, we have for all \(f\in {\mathcal {B}_{r,q}^\sigma }(M;d\mu )\)
Since \({{\varvec{{\psi }}}({b^{-\ell }}\sqrt{\mathcal {D}})f\in \varSigma _{[0,b^{\ell +1}]}^p}\) for \(f\in {\mathcal {B}_{r,q}^\sigma }(M;d\mu )\), we have by (27) and (30)
Note that
for every \(m\in \mathbb {N}\). Thus, we can further estimate (recall the choice of \(L_R\))
where the geometric series converges because of the condition \(\sigma > d\left( 1/r-1/p\right) \). This concludes the proof of (58).
If \(\mu (M)<\infty \) and \(r\ge p\), we proceed using observations from [10, Proposition 3.20]. We use Hölder’s inequality (\(1/p=1/r+(r-p)/(pr)\)) to obtain for every \(\ell \in \mathbb {N}\)
and similarly for \({{\varvec{{\phi }}}}\). This estimate replaces the Nikloskii-type inequality (30) in (60), and the rest of the proof follows as above. \(\square \)
4.2 \(L^\infty (M;d\mu )\)-Estimates for the Case \(\mu (M)<\infty \)
We now follow the lines of [19, 61] and derive a generalized polynomial reproduction, where the spectral spaces \( \varSigma _{[0,R]}^{\infty }\) play the role of the classical polynomial spaces. To prove reproduction formulas, we use norming sets (see [24, 59]) and proceed along the lines of [59] (see also [60]). In the case \(\mu (M)<\infty \), we will always work with finite discrete point sets \(X_{N}\subset M\). To stress this fact, we will indicate the number of points with a subscript \(N \in \mathbb {N}\).
Proposition 3
There is a constant \(\tilde{C}>0\) such that for all finite sets \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(h_{X_N,M}\le \tilde{C}\) and all \(R\le \tilde{C}/h_{X_N,M}\), the sampling operator \(T: \varSigma _{[0,R]}^{\infty }\rightarrow \mathbb {R}^N\), \(f\mapsto (f(X_N))\) is injective, and \(\Vert T^{-1}\Vert \le 2\), where \(\varSigma _{[0,R]}^{\infty }\) and \(\mathbb {R}^{X_N}\) are equipped with the \(\sup \)-norms.
Proof
Let \(f\in \varSigma _{[0,R]}^{\infty }\) with \(\Vert f\Vert _{L^\infty (M;d\mu )}=1\). Then, there exists \(x^*\in M\) such that \(|f(x^*)|\ge 3/4\). We need to show that there exists \(x_n\in X_N\) with \(|f(x_n)|\ge 1/2\). By definition of the fill distance, there exists \(x_n\in X\) such that \(\rho (x_n,x^*)\le h_{X_N,\varOmega }\). We get from (31) that
where \(\alpha _{H}\) is from (14) and the last inequality holds if \(\tilde{C}>0\) is chosen small enough. Therefore,
This concludes the argument. \(\square \)
Furthermore, we will use the following result, which is a special case of [60, Theorem 3.4].
Theorem 5
Suppose V is a finite-dimensional normed linear space and let \(\{x_1,\dots ,x_N\}\) be such that \(T:V\rightarrow \mathbb {R}^N\), \(v\mapsto (v(x_1),\dots ,v(x_N))^T\) is injective. Then, for every \({\varphi }\in V^*\), there exists a vector \(u\in \mathbb {R}^N\) such that \({\varphi }\left( v\right) =\sum _{n=1}^Nu_n v(x_n)\) for every \(v\in V\), and \(\left\| u\right\| _{\mathbb {R}^{N^*}}\le \left\| {\varphi }\right\| _{V^*}\left\| T^{-1}\right\| _{T\left( V\right) \rightarrow V}\).
Combining Theorem 5 and Proposition 3 gives the following result.
Proposition 4
There is a constant \(\tilde{C}>0\) such that for all point sets \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(h_{X_N,M}\le \tilde{C}\), there exist \(a_n:M\rightarrow \mathbb {R}\), \(n=1,\dots , N\) such that for \(R\le \tilde{C}/h_{X_N,M}\)
-
(i)
\(\sum _{n=1}^Na_n\left( x\right) f_R\left( x_n\right) =f_R\left( x\right) \) for all \(x\in M\) and all \(f_R\in \varSigma _{[0,R]}^{\infty }\); and
-
(ii)
\(\sum _{n=1}^N\left| a_n\left( x\right) \right| \le 2\) for all \(x\in M\).
With Proposition 4 at hand, we can proceed along the lines of the proof of Theorem 3 and obtain finally the following result.
Theorem 6
Let \(1\le {r}\le \infty \), \(0<q\le \infty \) and \(\sigma >d/{r}\). There are C, \(h_0>0\) with the following property: For every set \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(h_{X_N,M}\le h_0\), we have for all \(f\in \mathcal {B}_{{r},q}^\sigma (M;\mathcal {D})\)
5 Truncation of the Kernel
In this section, we derive several technical estimates concerning the approximation of the kernel \(K^{\left( \sigma \right) }\). Here, we closely follow [7, 8], see also [35]. The estimates are practically relevant, since linear combinations of truncated kernels lie in finite-dimensional spectral spaces. Elements of spectral spaces will take the role of polynomials in classical analysis for Sobolev spaces on Euclidean domains. Furthermore, from now on we assume additionally that \(\sup _{x \in M}\mu (B(x,r))\le C(r){<\infty }\) for all \(r>0\). Note that this is trivially true if \(\mu (M)<\infty \).
Lemma 4
Let \(1\le p\le \infty \) and \(\sigma >d/2\). There is \(C>0\) such that for all N, \(\ell \in \mathbb {N}\), all \(X_{N}=\{x_1,\dots ,x_N\}\subset M\), and all \(a_1,\dots ,a_N\in \mathbb {R}\),
Proof
Recall from (40) that \({b^{-2\sigma \ell }{{\varvec{{s}}}^{(\ell ;\sigma )}}(u)= {\varDelta _{\ell }}{\varvec{{\phi }}}{(u)}{\varvec{{k}}}^{(\sigma )}(u)}\). Hence,
Thus, by the triangle inequality and (26), we obtain
This concludes the argument. \(\square \)
Now, we use the following notation. For \(X_{N}=\left\{ x_{1},\dots ,x_{N} \right\} \subset M\), \(x\in M\) and \(k\in \mathbb {N}_0\), we set
where \(q_{X_N}\) denotes the separation distance of \(X_N\) as defined in (48). We need a combinatorial estimate that we briefly discuss. Note that in the special case of quasi-uniform point sets in \(\mathbb {R}^d\), there is a constant \(C>0\) such that for all \(X_N\subset \mathbb {R}^d\), every \(k\in \mathbb {N}\) and every \(x\in \mathbb {R}^d\), we have
In the general setting, we have by (8) and (11) for all \(X_N\subset M\), every \(k\in \mathbb {N}_0\), every \(x\in M\) and every \(x_n\in X_N\cap \mathcal {A}_k(x)\)
Furthermore, by definition of the separation distance \(q_{X_N}\), we have \(B(x_n,q_{X_N}/2)\cap B(x_m,q_{X_N}/2)=\emptyset \) if \(x_n\ne x_m\in X_N\). Thus, since \(B(x_n,q_{X_N}/2)\subset B(x,(k+2)q_{X_N})\) for every \(x_n\in \mathcal {A}_k(x)\), we have the rough estimate
which implies that
In particular, there is some \(\tau >d+1\), which depends only on d such that there is a constant \(C_1>0\) with
Next, we will need the following auxiliary lemma, which is in the spirit of [7, Proposition 6.2].
Lemma 5
Let \(b>1\), \(\tau >d+1\) be such that (63) holds, and let \({{\varvec{{e}}}\in \mathcal {G}(b)}\). There are constants C, \(C_1>0\) such that, for all \(X_N\subset M\), all \(\delta <\min \{1,\,q_{X_N}\}\) and all \(x\in M\), we have for the associated integral kernel \({{\varvec{{E}}}_\delta }\) the estimates
The constant C depends only on the parameters of the space, the constant \(c_\tau \) from (24), and the constant \(C_1\) from (63).
Proof
We show the two inequalities separately. For all \(X_N\subset M\), all \(x\in M\) and all \(\delta \le \min \{q_{X_N},\,1\}\), we have by (24), (63) and since \(d/2-\tau {<0}\)
To show the second assertion, we note first that by (62) with \(k=0\), we have
Thus, we obtain by (24) and (65)
which is exactly (64). \(\square \)
For \(g\in \varSigma _{[0,R]}^{p}\) and \(X_{N}=\left\{ x_1,\dots ,x_N\right\} \) set
We will now, under suitable assumptions, derive explicit norm equivalence constants for \(\Vert \cdot \Vert _{\ell ^p(X_{N})}\) and \(\Vert \cdot \Vert _{L^p(M;d\mu )}\) on \(\varSigma _{[0,R]}^{p}\). The following lemma is in the spirit of [8, Theorem 4.3].
Lemma 6
Suppose that \(1\le p\le \infty \), \(R>1\), and let \(\tau >d+1\) be such that (63) holds. Then, there is \(C>0\) such that, for all \(\delta \le 1\), all \(g\in \varSigma _{[0,R/\delta ]}^p\) and all \(X_{N}\subset M\),
Proof
We follow the lines of [8, Theorem 4.3]. Let \({{\varvec{{e}}} \in C_c\left( [0,2R] \right) }\) with \({{\varvec{{e}}}|_{\left[ 0,R \right] }\equiv 1}\). Then, \({g={\varvec{{E}}}(\delta \sqrt{\mathcal {D}})g}\) for all \(g \in \varSigma _{[0,R/\delta ]}^{p}\), and we thus get from Lemma 5
Since \(X_N\subset M\), we have \(\left\| g \right\| _{\ell ^{\infty }\left( X_{N} \right) }\le \left\| g \right\| _{L^{\infty }\left( M;d\mu \right) }\). The assertion then follows by Riesz–Thorin interpolation. \(\square \)
Now we are in the position to show that the translates \({{{\varvec{{S}}}^{(\ell ;\sigma )}}\left( \cdot ,x_{n} \right) }\) for points \(x_{n}\in X_{N}\) build a weighted Riesz basis in \(L^{p}\left( M;d\mu \right) \). The following lemma is similar to a result from [23] which holds for a less general class of compact metric measure spaces.
Lemma 7
Suppose that \(b\ge \tilde{b}^{4/3}\) with \(\tilde{b}>1\) from Remark 2 and let \(1\le p,p^{\prime } \le \infty \) such that \(\frac{1}{p}+\frac{1}{p^{\prime }}=1\). There are constants \(c^{\prime },c,C>0\) and \(q_0>0\) such that, for all \(N\in \mathbb {N}\), all \(X_{N} \subset M\) with \(q_{X_N}\le q_0\), all \(\ell \ge \log _{b}\left( c^{\prime } q_{X_{N}}^{-1} \right) \), all \(\mathbf {a}=(a_{1},\dots ,a_{N})^{T}\in \mathbb {R}^N\) and all \(\sigma >0\), there holds
and similarly for \(p=\infty \).
Proof
We prove the two inequalities separately. Recall from (38) that \({{\varvec{{s}}}^{(\ell ;\sigma )}}(u)={{{{\varvec{{w}}}^{(\ell ;\sigma )}}}}(b^{-\ell }u)\). Since \({{{{\varvec{{w}}}^{(\ell ;\sigma )}}}\equiv {{{\varvec{{w}}}^{(2;\sigma )}}}}\) for all \(\ell \ge 2\) (see Lemma 3), we have by Lemma 5 with \(\delta =b^{-\ell }\)
where C is independent of \(\ell \). Furthermore, from (26) we get
Thus, combining (70) and (69), we obtain by Riesz–Thorin interpolation
This finishes the proof of the first inequality.
Since \(\rho (x_n,x_m)\ge q_{X_N}\) for \(x_n\ne x_m\in X_N\), we get with Lemma 5
Furthermore, Lemma 3 implies that \({{{\varvec{{s}}}^{(\ell ;\sigma )}}\ge c \mathbf {1}_{[b^{\ell },\,b^{\ell +3/4}]}}\), and we thus obtain from Remark 2 (note that for \(q_0>0\) small enough, the assumptions are satisfied)
Inserting (72) into (71) yields for \(\ell \ge \log _{b}\left( c^{\prime }q_{X_{N}}^{-1} \right) \)
for some constant \(c^{\prime }>0\) independent of \(\ell \). We now proceed similarly to [7]. Consider the Gramian matrix \({G:=\left( {{\varvec{{S}}}^{(\ell ;\sigma )}}(x_n,x_m)\right) _{n,m=1,\dots ,N}\in \mathbb {R}^{N\times N}}\). Note that G is a symmetric matrix and satisfies \({\sum _{\genfrac{}{}{0.0pt}{}{n=1}{n \ne m}}^{N}\left| {{\varvec{{S}}}^{(\ell ;\sigma )}}\left( x_{m},x_{n} \right) \right| \le \frac{1}{2}{{\varvec{{S}}}^{(\ell ;\sigma )}}\left( x_{m},x_{m} \right) }\) for all \(m=1,\dots N\), see (73). By (72) and (10), we have
Hence, it follows from [34, Proposition 6.1] that G is invertible, and
We use (75) and (67) with \(R:=b\) and \(\delta :=b^{-\ell }\) to deduce for \(1\le p<\infty \)
The assertion follows similarly for \(p=\infty \). \(\square \)
Remark 5
Since by (10) and (8) we have \(\mu \left( B\left( y,r\right) \right) \ge cr^d\) for all \(y\in M\) and all \(0<r<r_0\), we can estimate the quantity on the right-hand side of (68) by
If additionally \(\mu \left( B\left( y,r\right) \right) \le Cr^d\) for all \(y\in M\) and \(r>0\), then (68) reduces to
Note that the condition \(\ell \ge \log _b(c^\prime q_{X_N})\) in Lemma 7 couples the truncation index L to the sampling points \(X_N\) (see also [19]). Thus, we are in the position to analyze approximation schemes using the properly truncated kernel. This is the first step toward a numerically feasible method for kernels that are not given analytically but merely by an infinite series as in (1). Moreover, since the \({{{\varvec{{S}}}^{(\ell ;\sigma )}}}\) are usually also not analytically given but must be approximated themselves, there is a second step needed where a proper approximation of the truncated kernel is computed. This will be discussed in Sect. 6. We now proceed along the general lines of [7, 19, 40]. We use the following short-hand notation.
Notation 4
For \(\sigma >d/2\), N, \(L\in \mathbb {N}\), \(\mathbf {a}\in \mathbb {R}^N\), and \(X_N=\{x_1,\dots ,x_N\}\subset M\) set
where the last equation follows as in (47), i.e., using Lemma 1 in the last line, we have
Lemma 8
Suppose that \(b\ge \tilde{b}^{4/3}\). Let \(2\sigma >d/p^\prime \), \(1\le p\le \infty \), and \(0<\gamma <1\). Then, there are constants C, \(q_0>0\) such that for all discrete sets \(X_{N}=\left\{ x_{1},\dots ,x_{N} \right\} \subset M\) with separation distance \(q_{X_N}\le q_0\), all \(L \ge {C}\log _{b}\left( Cq^{-1}_{X_{N}} \right) \) and all \(\mathbf {a} \in \mathbb {R}^{N}\), we have
Proof
By definition, we have
If we choose \(C>c'\), we have by Lemma 7, for all \(\ell \ge L+1\)
where in the last step we used (10) and (8) which imply that \(\mu \left( B(x,b^{-\ell }) \right) \ge Cb^{-\ell d}\) for all \(x\in M\). Using (79) to further estimate (78), we obtain
Choose now \(\tilde{\ell }\in \mathbb {N}\) such that \(\log _b(c^\prime q_{X_N}^{-1})\le \tilde{\ell }\le 2\log _b(c^\prime q_{X_N}^{-1})\). Note that this is possible if \(q_0\) is small enough. Then, we obtain by Lemmata 7 and 4, and Remark 5
since \(\mu \left( B(x,cq_{X_N}) \right) \le C\). Using this to further estimate (80), we obtain
if \(L\ge \frac{2\sigma +d/p}{2\sigma -d/p^{\prime }}\log _b(Cq_X^{-1})\) and if \(q_0>0\) is chosen small enough. \(\square \)
5.1 Interpolation with Truncated Kernel
We now focus on the Hilbert space \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) and suppose that \(\sigma >d/2\). We are interested in the analysis of reconstruction methods based on truncated kernels. To this end, for given data \(f(x_n)\) at some scattered locations \(x_n\in X_N=\{x_1,\dots ,x_N\}\subset M\), we consider trial spaces of the form (cf. also (4))
Note that truncated kernels of the form \({{\varvec{{\phi }}}(b^{-L+1}\sqrt{\mathcal {D}}){\varvec{{K}}}^{(\sigma )}}\) are in general only positive semi-definite, so that the existence of an interpolant from \(\mathcal {L}_{X_N}^{(L)}\) to arbitrary given data is not obvious.
Remark 6
Note that \(K^{(\sigma ,L)}\) is the reproducing kernel of \(\left( \varSigma ^{{2}}_{[0,b^{L}]},\left( \cdot ,\cdot \right) _{B^{\sigma }_{2,2}(M;\mathcal {D})} \right) \).
We follow the general lines of [19] and first derive a lower bound on the truncation index L (depending on the point set \(X_N\)) such that, for arbitrary values, there always exists an interpolant from \(\mathcal {L}_{X_N}^{(L)}\). The following lemma corresponds to [19, Proposition 3.3], but in contrast to the situation considered there, we use smooth truncation functions here. We will use the notation introduced in (76).
Lemma 9
Suppose that \(b\ge \tilde{b}^{4/3}\). There are constants \(C>0\), \(q_0>0\) and \(\kappa >1\) such that, for all N, \(L\in \mathbb {N}\), all \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(q_{X_N}\le q_0\) and \(L\ge \log _b(Cq_{X_N}^{-1})\), and for all \(\mathbf {a}\in \mathbb {R}^N\), we have
where again \(\Vert \cdot \Vert _{(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D}))'}\) denotes the dual norm.
Proof
By Riesz’ representation theorem, we have \(\left\| z \right\| _{\mathcal {B}_{2,2}^\sigma \left( M;\mathcal {D}\right) ^{\prime }}=\left\| {\mathcal {I}}_{\sigma ;\mathbf {a};X_{N}}\right\| _{\mathcal {B}_{2,2}^\sigma \left( M;\mathcal {D}\right) }\) and furthermore \(\left\| z|_{\varSigma _{[0,b^L]}^2}\right\| _{\left( \varSigma _{[0,b^L]}^2\right) ^\prime }=\left\| \mathcal {I}_{\sigma ;\mathbf {a};X_{N}}^{(L)}\right\| _{\mathcal {B}_{2,2}^\sigma \left( M;\mathcal {D}\right) }\). To estimate the Besov-norms, we employ a smoothness shift as in [7]. Precisely, we have (see the proof of Proposition 2)
Note that we have by (34)
and thus there are constants \(c_\sigma \), \(C_\sigma >0\) such that \(c_{\sigma }{\varvec{{k}}}^{(\sigma /2)}(u)\le ({\varvec{{k}}}^{(\sigma )}(u))^{1/2}\le C_{\sigma }{\varvec{{k}}}^{(\sigma /2)}(u)\) for all \(u\in [0,\infty )\). Thus, we can further estimate the left-hand side in (84)
where the notation \(\sim \) indicates that the two norms are equivalent with norm equivalence constants that depend only on \(\sigma \) and the parameters of the space. Similarly,
Therefore, to prove (83), it suffices to show that there is a constant \(\tilde{\kappa }>1\) such that
By Lemma 8, we have (note that \(2\sigma /2>d/2\) and \(L\ge \log _b(c'q_{X_N}^{-1})\))
with \(\gamma \in (0,1)\) from (77), and thus with \(\tilde{\kappa }:=\frac{1}{1-\gamma }\)
i.e., we obtain (86). This concludes the proof. \(\square \)
Following [19], we now invoke an abstract result from [38, Proposition 3.1].
Proposition 5
Let \(\mathcal {Y}\) be a Banach space, \(\mathcal {V} \subset \mathcal {Y}\) a subspace, and \(\mathcal {Z}^{\prime }\) a finite-dimensional subspace of the dual space \(\mathcal {Y}^{\prime }\). If for every \(z^{\prime } \in \mathcal {Z}^{\prime }\) and some \(\hat{\gamma }>1\) independent of \(z^{\prime }\)
then for any \(y \in \mathcal {Y}\), there exists \(v=v(y) \in \mathcal {V}\) such that v(y) interpolates y on \(\mathcal {Z}^{\prime }\), that is, \(z^{\prime }\left( y \right) =z^{\prime }\left( v(y) \right) \) for all \(z^{\prime } \in \mathcal {Z}^{\prime }\). In addition, v(y) approximates y in the sense that
where \({\text {dist}}_\mathcal {Y}\left( y,\mathcal {V}\right) :=\inf _{v \in \mathcal {V}} \left\| y-v \right\| _{\mathcal {Y}}\).
We now apply Proposition 5 with
and use Lemma 9 to obtain the following result.
Theorem 7
Suppose that \(b\ge \tilde{b}^{4/3}\). There are \(q_0>0\) and \(c'>0\) such that, for every N, \(L\in \mathbb {N}\), \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(q_{X_N}\le q_0\) and \(L\ge \log _b(c'q_{X_N}^{-1})\), and for every \(f\in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\), there is an \(f_L\in \varSigma _{[0,b^L]}^2\) satisfying
Note that the result states that there always exists an interpolant which is also a quasi-optimal approximant. We can then follow the lines of [19, Theorem 4.1] and obtain that under the assumptions of Theorem 7, there exists a quasi-optimal interpolant from \(\mathcal {L}_{X_N}^{(L)}\). We state the following stability result.
Proposition 6
Suppose that \(b\ge \tilde{b}^{4/3}\). There exists \(C>0\) with the following property: Let \(q_0>0\) and \(c'>0\) be as in Theorem 7. Then, for every N, \(L\in \mathbb {N}\), \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(q_{X_N}\le q_0\) and \(L\ge \log _b(c'q_{X_N}^{-1})\), and for every \(f\in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) there is \(\mathbf {a}=(a_{1},\dots ,a_{N})^{T}\in \mathbb {R}^N\) such that \(\mathcal {I}_{\sigma ;\mathbf {a};X_N}^{(L)}\) satisfies
Proof
The proof follows as in [19, Theorem 4.1], and we recall it only for completeness. By Theorem 7, there exists \(f_L\in \varSigma _{[0,b^L]}^2\) such that (89) holds. In particular, we can view the data \(f(x_n)\) as generated by \(f_L\). Thus, since \(K^{(L,\sigma )}\) is the reproducing kernel of \(\varSigma _{[0,b^L]}^2\), there exists a kernel-based interpolant \(\mathcal {I}_{\sigma ;\mathbf {a};X_N}^{(L)}\) such that
The assertion then follows by triangle inequality since \(\Vert f-\mathcal {I}_{\sigma ;\mathbf {a};X_N}^{(L)}\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}\le \Vert f-f_L\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}+\Vert f_L-\mathcal {I}_{\sigma ;\mathbf {a};X_N}^{(L)}\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}\) and \(\Vert f_L-\mathcal {I}_{\sigma ;\mathbf {a};X_N}^{(L)}\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}\le \Vert f_L\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}\le \Vert f-f_L\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}+\Vert f\Vert _{\mathcal {B}^\sigma _{2,2}(M;\mathcal {D})}\). \(\square \)
5.2 Stability of Kernel-Based Methods
In this section, we give two important stability properties for the approximation with trial spaces \(\mathcal {L}_{X_N}\). Both results can be seen as corollaries to Lemma 8 but are, in our opinion, interesting in their own. The first one is a Bernstein estimate on the finite-dimensional trial space \(\mathcal {L}_{X_N}\) in the spirit of [34, 35, 65]. Such estimates are crucial to the analysis of various kinds of unsymmetric reconstruction methods (see, for example, [47]). We continue to use the notation introduced in (76).
Proposition 7
Suppose that \(b\ge \tilde{b}^{4/3}\). There are constants \(q_0>0\) and \(C>0\) such that, for all discrete point sets \(X_{N}=\left\{ x_{1},\dots ,x_{N} \right\} \subset M\) with \(q_{X_N}\le q_0\) and all \(\mathbf {a} \in \mathbb {R}^{N}\), we have
Proof
We choose \(L\ge \log _b(c'q_{X_N}^{-1})\). From (34) and (86), it follows that
On \(\varSigma _{[0,b^L]}^2\), there is an inverse estimate from [10, Theorem 3.13]. Precisely, it holds
Since \(\mathcal {I}_{\sigma ;\mathbf {a},X_N}^{(L)}\in \varSigma _{[0,b^L]}^2\) we obtain, inserting (91) into (90)
This concludes the proof. \(\square \)
The second result concerns lower bounds on the smallest eigenvalue of Gramian matrices \(K^{\left( \sigma \right) }_{X_{N},X_{N}}=\left( K^{(\sigma )}(x_{n},x_{m}) \right) _{n,m=1,\dots ,N}\). Such bounds are of importance in practical considerations since the condition number of the Gramian matrix is mainly controlled by the smallest eigenvalue. Hence, lower bounds on the smallest eigenvalue imply upper bounds on the condition number of the Gramian \(K^{\left( \sigma \right) }_{X_{N},X_{N}}\).
Proposition 8
Let \(\sigma >d/2\). There are constants \(q_0>0\) and \(C>0\) such that for all sets \(X_N=\{x_1,\dots ,x_N\}\subset M\) with \(q_{X_N}\le q_0\), we have
If \(\mu \left( B\left( y,r\right) \right) \le Cr^d\) for all \(y\in M\) and all \(r>0\), then
Proof
We closely follow the lines of the proof of [8, Theorem 4.5]. Note that we get from (85) that we have for all \(\mathbf {a}\in \mathbb {R}^N\)
Then, by Lemmata 4 and 7 with \(p=2\) and \(\ell \sim \log (c'q_{X_N}^{-1})\), we have using \(Cr^d\le \mu ({B\left( y,r\right) })\le C\)
This concludes the proof since
The proof of (93) works analogously, using in (94) the bound
\(\square \)
5.3 Relation to Complexity Estimates in Statistical Learning Theory
At this point, let us recall the concept of a covering number from statistical learning theory [11, Definition 3.1].
Definition 3
Let M be a metric space and \(\eta >0\). The \(\eta \)-covering number \(\mathcal {N}(M,\eta )\) is defined as
For a reproducing kernel Hilbert space \(\mathcal {H}_{K}(M)\), we denote the norm balls by
and we denote the embedding \(j_{K}:\mathcal {H}_{K}(M) \hookrightarrow C(M)\). There is a result [64, Theorem 1] which holds for M being a compact metric space and \(\mu \) a Borel measure on it and which states that for \(0 < \eta \le \frac{R}{2}\)
where \(Y_{n} \subset M\) with \(n \in \mathbb {N}\) is such that
Here, the quantity
is the so-called power function [51, 60]. If unique interpolation with the kernel is possible, the optimal functions \(a_{y}(x) \) can be computed and are given by the cardinal functions. Since the quantity (99) measures the interpolation error, we can invoke our sampling inequality to obtain with Theorem 6, \(\mathcal {H}_{K}= \mathcal {B}_{2,2}^\sigma (M;\mathcal {D})\) and \(K=K^{(\sigma )}\)
Also by (92), we have
We assume quasi-uniformity, i.e., there is a constant \(c<1\) such that \(ch_{Y_{n},M} \le q_{Y_{n}}\le h_{Y_{n},M}\) holds. Here, we use the notation \(a \cong b\) to indicate that there are constants \(c,C>0\) such that \(ca \le b \le Ca\) and similarly for \(\gtrsim \). For any quasi-uniform set of points \(Y_{n}\) such that \(h_{Y_n,M}^{\sigma -d/2}\cong \frac{\eta }{2R}\), we have
using \(\#Y_{n} \cong h_{Y_{n},M}^{-d}\). Hence, we get
Inserting this into the bound (97), for \(h_{Y_{n},M}\) small enough we arrive atFootnote 3
6 Regularized Reconstruction
From now on, we assume that \(\mu (M)<\infty \). Since in this case the spectral spaces \(\varSigma _{[0,R]}^{p}\) are equivalent for all \(1\le p\le \infty \), we skip the parameter p. We consider the reconstruction of functions \(f\in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\), \(\sigma >d/2\), from given data \(y_n\approx f(x_n)\) at discrete points \(X_N=\{x_1,\dots ,x_N\}\subset M\). An approximant \(\hat{g}^{(\mathbf {y};\alpha ;X_{N})}\) is given as a minimizer of the smoothing-type functional from (2), i.e.,
with a regularization parameter \(\alpha \ge 0\). For the ease of notation, we restrict ourselves to the case of exact data
In this section, we follow the general lines of [47, 48, 61] but take into account also the difficulty that the (truncated) kernel might be available only approximately. If the given data are corrupted by some additive noise, one can use the same machinery outlined below to derive error estimates which are explicit in terms of the noise (see [48]). Also, we restrict ourselves to the spline-smoothing regularization but point out that various kinds of additional regularization terms can be treated analogously (see [47, 48, 57]). For some results in this direction using weak data, we refer to [47].
We shall show by a representer theorem that for L large enough, a minimizer \(\hat{g}^{(\mathbf {y};\alpha ;X_{N})}\) of (2) can be computed by solving a finite linear system which is built of translates of the truncated kernel. Furthermore, a deterministic error analysis will be derived by means of the sampling inequalities proven in Sect. 4. We first shall consider the (special) case of exact spectral projections, the case of numerically computed, approximated eigenfunctions of \(\mathcal {D}\) will be dealt with afterward.
We consider the spectral spaces \(\varSigma _{[0,b^{L}]}\) [see (29)] as reproducing kernel Hilbert spaces. Precisely, we denote the ordered eigenvalues of \(\mathcal {D}\) by \(0<\lambda _1\le \lambda _2\le \dots \) and the associated orthonormal eigenfunctions by \(\varphi _m^{(\ell )}\), \(m=1,\dots , \dim V_\ell \), \(\ell =1,2,\dots \), where \(V_\ell \) denotes the eigenspace to \(\lambda _\ell \). On \(\varSigma _{[0,b^L]}\), we have the norm (recall that supp \({\tilde{\varvec{\psi }}} {(b^{-\ell }\cdot )}\subset [b^{\ell -1}, b^\ell ]\), see Definition 2) with \(\tilde{\varvec{\psi }}_0 := \tilde{\varvec{\varphi }}\)
For \(\sigma >d/2\), the reproducing kernel of \(\varSigma _{[0,b^L]}\) can be expressed as
We are now in the position to give a representer theorem and a priori error estimates for minimizers of (2).
Theorem 8
Let \(\sigma >d/2\). There are constants \(c^\prime \), C, \(h_0>0\) such that, for all \(X_{N}=\left\{ x_{1},\dots ,x_{N} \right\} \subset M\) with \(h_{X_N,M}\le h_0\), all \(L\in \mathbb {N}\) with \(L \ge \log _{b}\left( c^\prime q^{-1}_{X_{N}} \right) \) and all \(f\in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\), there exists a minimizer \(\hat{g}^{ (\mathbf {y};\alpha ;X_N)}\) of (2) with data (103) such that
Furthermore, there is the a priori estimate
Proof
By Theorem 7, there exists \(f_L\in \varSigma _{[0,b^L]}\) such that
Thus, we can view the data \(y_n=f(x_n)=f_L(x_n)\) to be generated by a function \(f_L\in \varSigma _{[0,b^L]}\). By a classical representer theorem (see, for example, [52]) applied to the reproducing kernel Hilbert space \(\varSigma _{[0,b^L]}\), there is a solution \( \hat{g}^{(\mathbf {y};\alpha ;X_N)}\in \text{ span }\{ K^{(\sigma , L)}(x_1,\cdot ),\dots , K^{(\sigma ,L)}(x_N,\cdot )\}\) to the optimization problem (102). We, furthermore, get the following stability and consistency estimates: First, we have
which shows that
Second, we get similarly
Therefore, we obtain by Theorem 6 the estimate
The fact that the solution to a possibly infinite optimization problem is contained in a finite-dimensional linear space is usually called representer theorem. Thus, (104) is indeed a representer theorem and hence a minimizer of (2) can be computed by solving the linear system with \(\mathbf {y}=f|_{X_{N}}\in \mathbb {R}^{N}\)
for the coefficient vector \({\hat{\mathbf {c}}}\) of \(\hat{g}^{(\mathbf {y};\alpha ;X_{N})}(\cdot )=\sum _{n=1}^N\hat{c}_n K^{(\sigma ,L)}(\cdot , x_n)\). Here, an entry of the Gramian matrix \(K_{X_N,X_N}^{( \sigma ,L)}\in \mathbb {R}^{N\times N}\) is given by
To set up this matrix, one needs to compute the values \(\varphi _m^{(\ell )}(x_i)\) and \(\lambda _\ell \).
Unfortunately, the eigenfunctions \(\varphi _m^{(\ell )}\) and eigenvalues \(\lambda _\ell \) of \(\mathcal {D}\) are typically not available in analytically closed form, but can only be numerically computed up to some error. We therefore assume that there are level-dependent errors \(\varepsilon _\ell \ge 0\) such that a corrupted kernel \(K_{\varepsilon }^{(\sigma ,L)}\) is at hand, for which
for all \( n,m=1,\dots , N\), where some prescribed accuracy \(\varepsilon _{\max }\) is respected, i.e.,
Thus, instead of working with the practically unavailable optimal system (107), we compute an approximant
with coefficients \({\hat{\mathbf {c}}}^{\varepsilon } \in \mathbb {R}^{N}\) given as solution of
where \( ({K}^{\left( \sigma ,L,\varepsilon \right) }_{X_N,X_N})_{n,m}:=K^{(\sigma ,L)}_\varepsilon (x_n,x_m)\) for \(n,m=1\dots N\) and with \(\mathbf {y}=f|_{X_{N}}\in \mathbb {R}^{N}\). We obtain the following deterministic error estimate for this approximant.
Theorem 9
There is a constant \(C>0\) such that, for all sets \(X_{N}=\left\{ x_{1},\dots ,x_{N} \right\} \subset M\), all \(L \ge \log _{b}\left( Cq^{-1}_{X_{N}} \right) \), all \(\alpha >N\varepsilon _{\max }\) and every \(f\in \mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\), we have for \(\hat{g}_\varepsilon ^{(\mathbf {y};\alpha ;X_N)}\) given by (109) and (110) with data \(\mathbf {y} \in \mathbb {R}^{N}\) given by (103)
Proof
In view of Theorem 8, it suffices to estimate the error \(\left\| \hat{g}^{(\mathbf {y};\alpha ;X_N)}-\hat{g}^{(\mathbf {y};\alpha ;X_N)}_{\varepsilon } \right\| _{L^{\infty }\left( M;d\mu \right) }\). For that, we proceed in two steps and first estimate the error in the coefficients and deduce subsequently the error in the function values. We employ the following classical result of perturbation theory (see, for example, [62]). Suppose that the matrix \(A\in \mathbb {R}^{N\times N}\) has full rank, and let \(\delta A\in \mathbb {R}^{N\times N}\) be such that \(\Vert A^{-1}\Vert _{\ell ^2(N) \rightarrow \ell ^{2}(N)}\Vert \delta A\Vert _{\ell ^2(N)\rightarrow \ell ^{2}(N)}<1\). For a given vector \(\mathbf {b} \in \mathbb {R}^N\setminus \{0\}\) let the vectors \(\mathbf {z}\), \(\delta \mathbf {z}\in \mathbb {R}^N\) be such that \(A\mathbf {z}=\mathbf {b}\) and \((A+\delta A)(\mathbf {z}+\delta \mathbf {z})=\mathbf {b}\). Then,
We apply this result with \(A:=K_{X_N,X_N}^{(\sigma ,L)}+\alpha {{\mathrm{Id}}}\), \(\delta A:=R_\varepsilon \), \(\delta \mathbf {z}= {\hat{\mathbf {c}}}-{\hat{\mathbf {c}}^{\epsilon }}\) and \(\mathbf {z}= {\hat{\mathbf {c}}}\). If we denote by \(\lambda _{\max }(A)\) and \(\lambda _{\min }(A)\) the largest and smallest eigenvalues of a matrix A, respectively, then
Since, furthermore, \(\left\| R_{\varepsilon }\right\| _{\ell ^2(N)\rightarrow \ell ^{2}(N)} \le N \varepsilon _{\max }\), the condition \(\Vert A^{-1}\Vert _{\ell ^2(N)\rightarrow \ell ^{2}(N)}\Vert \delta A\Vert _{\ell ^2(N)\rightarrow \ell ^{2}(N)}<1\) is satisfied if \(N\varepsilon _{\max }<\alpha \). Hence, by (112) and the definition of \({\hat{\mathbf {c}}}\), we obtain
We now turn to the approximants and note that by the triangle inequality
We estimate the two terms of the right-hand side of (114) separately. First, by Hölder’s inequality and (113), we have
and second,
Putting together (114), (115) and (116), we obtain
and the assertion then follows by Theorem 8 since
This concludes the proof. \(\square \)
Note that Theorem 9 provides an error estimate which is explicit in the problem parameters \(\varepsilon _{\max }\) and \(\alpha \). We shall now outline how to choose these parameters in order to ensure asymptotic convergence.Footnote 4
Corollary 2
Suppose that the assumptions of Theorem 9 are valid and suppose additionally that for \(N \rightarrow \infty \), it holds that \(1/\sqrt{N}\left\| f|_{X_{N}} \right\| _{\ell ^2(X_{N})} \rightarrow \left\| f\right\| _{L^{2}(M;d\mu )}\). If one chooses for \(h_{X_{N},M}\) small enough
then
6.1 Relation to Error Analysis for Statistical Regression Theory
Here, we closely follow [53], see also [29] for recent applications of spherical needlet kernels in learning theory. The problem can be formulated as follows: We are given a probability measure \(\mathbb {P}\) on \(M \times \mathbb {R}\) where \(M\subset \mathbb {R}^{n}\) is supposed to be a compact set and try to recover a function \(f:M \rightarrow \mathbb {R}\) from its sampled values \((x_{i},y_{i}) \in M \times \mathbb {R}\) for \(1\le i \le N\) and \(x_{i}\ne x_{j}\) for \(i \ne j\). Moreover, we assume K to be the reproducing kernel of \(B^{\sigma }_{2,2}(M;\mathcal {D})\) for \(\sigma >d/2\). We define the expressions [53, Eqs. 2.1 & 2.2]
which are called least squares error and empirical error. We denote their respective minimizers by
with the cardinal functions \(a_{x_{i}}\) as in the definition of the power function (99). The fact that \(f^{\star }\) is indeed minimizing the least squares problem follows from [11, Proposition 1.8], where for any \(f:M \rightarrow \mathbb {R}\) being square-integrable the decomposition
with \(\mu \) being the marginal measure on M is shown. Furthermore, we define the target function from [53]
with \(B_{K}(R)\) from (96). We point out here that both functions \(f^{\star }\) and \(\mathcal {I}_{y,X_{N}}\) are not at our disposal. The first one is the unknown global solution, and the second one is not available since we cannot evaluate the kernel directly. We therefore use the function \(\hat{g}_{\varepsilon }^{(\mathbf {y};\alpha ;X_N)}\) as an approximation for the latter. The aim is to give an upper bound of the error
The latter error is already bounded in Corollary 2 if M and \(\mu \) satisfy all conditions given there. The first term is the usual error in statistical learning theory. It is decomposed into the sampling error and the approximation error, i.e.,
where the first summand, i.e., the content of the first bracket, is called sampling error and the second term is called approximation error (see [53]). Under the assumption that there is a constant \(V>0\) such that \(\left| f(x)-y\right| \le V\), we have the bound [53, Eq. (2.6)]
for all \(\eta >0\), where we used (101) in the last step. For the approximation error, we can use the following result from interpolation theory, [53, Theorem 3.1]: We have that \(B^{\sigma }_{2,2}(M;\mathcal {D})\) is a dense subspace of \(L^{2}(M;d\mu )\) such that \(\Vert f\Vert _{L^{2}(M;d\mu )} \le c \Vert f\Vert _{B^{\sigma }_{2,2}(M;\mathcal {D})}\). For \(0<\theta <1\), we assume \(f \in \left( L^{2}(M;d\mu ), B^{\sigma }_{2,2}(M;\mathcal {D})\right) _{\theta ,\infty }\). Then, we have [53, Theorem 3.1]
By the equivalence of the Besov spaces \(\mathcal {B}^{\sigma }_{2,2}(M;\mathcal {D})\) to the Bessel potential spaces \({\text {dom}}({{\mathrm{Id}}}+\sqrt{\mathcal {D}}^{{\sigma }})\) and [10, Proposition 6.2 & Theorem 3.16], we get \(\left( L^{2}(M;d\mu ), B^{\sigma /\theta }_{2,2}(M;\mathcal {D})\right) _{\theta ,\infty } \approx B^{\sigma }_{2,\infty }(M;\mathcal {D})\). The final error bounds for (117) now stem from an optimization with respect to R, see [53] for an example of such calculations. For the sake of brevity, we do not give the lengthy computations here.
7 Concluding Remarks
We derived an explicit representation of the reproducing kernel for Besov spaces in the abstract framework of metric measure spaces, see Proposition 2. As fundamental step toward a priori error estimates for reconstruction schemes in such Besov spaces, we proved sampling inequalities, see Theorem 6. Such sampling inequalities quantify the observation that a reconstruction scheme with small residuals has small global error as long as it is stable in the Besov space norm. In order to design numerically feasible approximation schemes, we discussed the truncation of the infinite series representation of the kernel in Theorem 7. We gave an explicit condition to couple the truncation parameters to the discrete point set which guarantees well-posedness and quasi-optimality of the reconstruction process. Furthermore, sampling inequalities lead also to a priori error estimates for regularized reconstruction schemes, see Theorem 9. The resulting error bounds are explicit in the regularization parameters, the discretization error and the modeling error, and therefore allow for a balancing these terms. Finally, we explained how the well-established machinery of error estimates for statistical learning can be applied here.
Notes
For \(p\ne \infty \), the convergence rate is \(\delta ^{{\sigma -}d/r}\) instead of the anticipated rate \(\delta ^{{\sigma -}d(1/r-1/p)_+}\), where \((x)_{+}=\max \{x,0\}\) (cf. [61] for the case of classical Sobolev spaces). This is most likely due to the fact that we work with the global estimates (51) and (52) instead of local estimates on a cover (see also [41, 61]).
There are also lower bounds for the covering number, cf. [11, Theorem 5.21].
For practical consideration other parameter choices could be more useful. We do not give the details here, but leave those considerations to the reader since we work in a very general framework and hence do not have a model for the numerical costs for realizing \(\varepsilon _{\max }\). In many specific applications, an estimate for these costs is available and can be employed in an exhaustive cost–benefit discussion.
References
R. A. Adams and J. J. F. Fournier, Sobolev Spaces, Academic Press, Oxford (UK), 2003.
R. Arcangéli, M. C. L. di Silanes, and J. J. Torrens, An extension of a bound for functions in Sobolev spaces, with applications to (m,s)-spline interpolation and smoothing, Numer. Math., 107(2) (2007), pp. 181–211.
R. Arcangéli, M. C. L. di Silanes, and J. J. Torrens, Estimates for functions in Sobolev spaces defined on unbounded domains, J. Approx. Theory, 161 (2009), pp. 198 – 212.
R. Arcangéli, M. C. L. di Silanes, and J. J. Torrens, Extension of sampling inequalities to Sobolev semi-norms of fractional order and derivative data, Numer. Math,, 121 (2012), pp. 587–608.
L. Boytsov and B. Naidan, Learning to prune in metric and non-metric spaces, in Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States., C. J. C. Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger, eds., 2013, pp. 1574–1582.
T. Bozkaya and M. Ozsoyoglu, Distance-based indexing for high-dimensional metric spaces, in Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, SIGMOD ’97, New York, NY, USA, 1997, ACM, pp. 357–368.
S. Chandrasekaran, K. R. Jayaraman, and H. N. Mhaskar, Minimum Sobolev norm interpolation with trigonometric polynomials on the torus, J. Comput. Phys, 249 (2013), pp. 96 – 112.
S. Chandrasekaran and H. N. Mhaskar, A construction of linear bounded interpolatory operators on the torus, Preprint. https://arxiv.org/pdf/1011.5448.pdf.
T. Coulhon and A. Grigor’yan, Random walks on graphs with regular volume growth, Geom. Funct. Anal., 8 (1998), pp. 656–701.
T. Coulhon, G. Kerkyacharian, and P. Petrushev, Heat kernel generated frames in the setting of Dirichlet spaces, J. Fourier Anal. Appl., 18 (2012), pp. 995–1066.
F. Cucker and D.-X. Zhou, Learning Theory: An Approximation Theory Viewpoint, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, 2007.
P. C. Curtis, \(n\) -parameter families and best approximation, Pac. J. Math., 9 (1959), p. 10131027.
N. Dunford and J. Schwartz, Linear Operators, Part I: General Theory, Interscience Publishers, New York, 1958.
N. Dyn, F. J. Narcowich, and J. D. Ward, Variational principles and Sobolev-type estimates for generalized interpolation on a Riemannian manifold, Constr. Approx., 15 (1999), pp. 175–208.
P. F. Evangelista, M. J. Embrechts, and B. K. Szymanski, Taming the curse of dimensionality in kernels and novelty detection, in Applied Soft Computing Technologies: The Challenge of Complexity, A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., vol. 34 of Advances in Soft Computing, Springer Berlin Heidelberg, 2006, pp. 425–438.
D. Geller and I. Z. Pesenson, Band-limited localized Parseval frames and Besov spaces on compact homogeneous manifolds, J. Geomet. Anal., 21 (2011), pp. 334–371.
A. Globerson and S. Roweis, Metric learning by collapsing classes, Adv. Neural Inf. Process Syst., 18 (2006), pp. 451–458.
M. Gordina, T. Kumagai, L. Saloff-Coste, and K.-T. Sturm, Heat kernels, stochastic processes and functional inequalities, Oberwolfach Reports, 10 (2013), pp. 1359–1443.
M. Griebel, C. Rieger, and B. Zwicknagl, Multiscale approximation and reproducing kernel Hilbert space methods, SIAM J. Numer. Anal., 53 (2015), pp. 852–873.
A. Grigor’yan, Heat kernels on weighted manifolds and applications, Cont. Math., 398 (2006), pp. 93–191.
A. Grigor’yan, Heat Kernel and Analysis on Manifolds, vol. 47 of AMS/IP Studies in Advanced Mathematics, American Mathematical Society, USA, 2009.
T. Hangelbroek, F. J. Narcowich, C. Rieger, and J. D. Ward, An inverse theorem for compact Lipschitz regions in \(\mathbb{R}^d\) using localized kernel bases. Math. Comp., AMS early view version. doi:10.1090/mcom/3256.
T. Hanglebroek, F. J. Narcowich, X. Sun, and J. D. Ward, Kernel approximation on manifolds II: the \(L_{\infty }\) projector, SIAM J. Math. Anal., 43 (2011), pp. 662–684.
K. Jetter, J. Stöckler, and J. D. Ward, Norming sets and scattered data approximation on spheres, in Approximation Theory IX, Vol. II: Computational Aspects, Vanderbilt University Press, 1998, pp. 137 – 144.
P. Jorgensen and F. Tian, Frames and factorization of graph Laplacians, Opuscula Math., 35 (2015), pp. 293–332.
A. Kaenmaki, J. Lehrback, and M. Vuorinen, Dimensions, Whitney covers, and tubular neighborhoods, Indiana Univ. Math. J., 62 (2013), pp. 1861–1889.
E. Keogh and A. Mueen, Curse of dimensionality, in Encyclopedia of Machine Learning, C. Sammut and G. I. Webb, eds., Springer US, 2010, pp. 257–258.
G. Kerkyacharian and P. Petrushev, Heat kernel based decomposition of spaces of distributions in the framework of Dirichlet spaces, Trans. Amer. Math. Soc., 367 (2015), pp. 121–189.
S. Lin, Nonparametric regression using needlet kernels for spherical data. available at arXiv:1502.04168, 2015.
W. Madych, An estimate for multivariate approximation II, J. Approx. Theory, 142 (2006), pp. 116–128.
M. Maggioni and H. N. Mhaskar, Diffusion polynomial frames on metric measure spaces, Appl. Comp. Harm. Anal., 24 (2008), pp. 329 – 353.
J. Mairhuber, On Haar’s theorem concerning Chebysheff problems having unique solutions, Proc. Amer. Math. Soc., 7 (1956), pp. 609–615.
H. N. Mhaskar, A Markov- Bernstein inequality for Gaussian networks, in Trends and Applications in Constructive Approximation, vol. 151 of Internat. Ser. Numer. Math., Birkhäuser, Basel, 2005, pp. 165–180.
H. N. Mhaskar, Eignets for function approximation on manifolds, Appl. Comp. Harm. Anal., 29 (2010), pp. 63 – 87.
H. N. Mhaskar, F. J. Narcowich, J. Prestin, and J. D. Ward, \(\text{L}^{p}\) Bernstein estimates and approximation by spherical basis functions, Math. Comp., 79 (2010), pp. 1647–1679.
F. J. Narcowich, P. Petrushev, and J. D. Ward, Decomposition of Besov and Triebel-Lizorkin spaces on the sphere, J. Funct. Anal., 238 (2006), pp. 530–564.
F. J. Narcowich, X. Sun, J. D. Ward, and H. Wendland, Direct and inverse Sobolev error estimates for scattered data interpolation via spherical basis functions, Found. Comput. Math., 7 (2007), pp. 369–390.
F. J. Narcowich, J. D. Ward, and H. Wendland, Sobolev error estimates and a Bernstein inequality for scattered data interpolation via radial basis functions, Constr. Approx., 24 (2006), pp. 175–186.
F. J. Narcowich, P. Petrushev, and J. D. Ward, Localized Tight Frames on Spheres, SIAM J. Math. Anal., 38 (2006), pp. 574–594.
F. J. Narcowich and J. D. Ward, Scattered-data interpolation on \(\mathbb{R} ^n\): Error estimates for radial basis and band-limited functions, SIAM J. Math. Anal., 36 (2004), pp. 284–300.
F. J. Narcowich, J. D. Ward, and H. Wendland, Sobolev bounds on functions with scattered zeros, with applications to radial basis function surface fitting, Math. Comp., 74 (2005), pp. 743–763.
R. Opfer, Multiscale kernels, Adv. Comp. Math., 25 (2006), pp. 357–380.
R. Opfer, Tight frame expansions of multiscale reproducing kernels in Sobolev spaces, Appl. Comput. Harm. Anal., 20 (2006), pp. 357–374.
I. Pesenson, A sampling theorem on homogeneous manifolds, Trans. Amer. Math. Soc., 352 (2000), pp. 4257–4269.
P. Petrushev and Y. Xu, Decomposition of spaces of distributions induced by Hermite expansions, J. Fourier Anal. Appl., 14 (2008), pp. 371–414.
C. Rieger, Sampling inequalities and applications, PhD thesis, University of Göttingen, 2008. http://hdl.handle.net/11858/00-1735-0000-0006-B3B9-0.
C. Rieger, R. Schaback, and B. Zwicknagl, Sampling and stability, Mathematical Methods for Curves and Surfaces, vol. 5862 of Lecture Notes in Computer Science, M. Dæhlen, M. Floater, T. Lyche, J. L. Merrien, K. Mørken, and L. L. Schumaker, eds., Springer, Berlin, Heidelberg, 2010, pp. 347–369.
C. Rieger and B. Zwicknagl, Deterministic error analysis of support vector machines and related regularized kernel methods, J. Mach. Learn. Res., 10 (2009), pp. 2115–2132.
C. Rieger and B. Zwicknagl, Sampling inequalities for infinitely smooth functions, with applications to interpolation and machine learning, Adv. Comp. Math., 32(1) (2010), pp. 103–129.
R. Schaback and H. Wendland, Inverse and saturation theorems for radial basis function interpolation, Math. Comp., 71 (2002), pp. 669–681.
R. Schaback and H. Wendland, Kernel techniques: From machine learning to meshless methods, Acta Numerica, 15 (2006), pp. 543–639.
B. Schölkopf and A. J. Smola, Learning with kernels - Support Vector Machines, Regularisation, and Beyond, MIT Press, Cambridge, Massachusetts, 2002.
S. Smale and D.-X. Zhou, Estimating the approximation error in learning theory, Anal. Appl., 01 (2003), pp. 17–41.
B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. R. G. Lanckriet, Hilbert space embeddings and metrics on probability measures, J. Mach. Learn. Res., 11 (2010), pp. 1517–1561.
H. Triebel, Interpolation theory, function spaces, differential operators, North-Holland Mathematical Library, Amsterdam-New York, 1978.
H. Triebel, Theory of function spaces, vol. 78 of Monographs in Math., Birkhäuser Verlag, Basel, 1983.
F. Utreras, Convergence rates for multivariate smoothing spline functions, J. Approx. Theory, 52 (1988), pp. 1–27.
J. P. Ward, \(\text{ L }^p\) Bernstein inequalities and inverse theorems for RBF approximation on \(\text{ R }^d\) , J. Approx. Theory., 164 (2012), pp. 1577–1593.
H. Wendland, Local polynomial reproduction and moving least squares approximation, IMA J. Numer. Anal., 21 (2001), pp. 285–300.
H. Wendland, Scattered Data Approximation, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, 2005.
H. Wendland and C. Rieger, Approximate interpolation with applications to selecting smoothing parameters, Numer. Math., 101 (2005), pp. 729–748.
J. Werner, Numerische Mathematik 1. Lineare und nichtlineare Gleichungssysteme, Interpolation, numerische Integration, Vieweg, Braunschweig-Wiesbaden, 1992.
W. Zhang, X. Xue, Z. Sun, Y. F. Guo, and H. Lu, Optimal dimensionality of metric space for classification, in ICML ’07: Proceedings of the 24th international conference on Machine learning, New York, NY, USA, 2007, ACM Press, pp. 1135–1142.
D.-X. Zhou, The covering number in learning theory, J. of Complexity, 18 (2002), pp. 739 – 767.
B. Zwicknagl, Mathematical analysis of microstructures and low hysteresis shape memory alloys, PhD thesis, University of Bonn, 2011.
Acknowledgements
We are grateful for the comments and suggestions of the anonymous referees. The authors acknowledge support of the Deutsche Forschungsgemeinschaft (DFG) through the Sonderforschungsbereich 1060: The Mathematics of Emergent Effects.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Pencho Petrushev.
Appendix
Appendix
In this appendix, we give an explicit bound on the constant \(\tilde{b}\) of Remark 2 for the Euclidean space \(\mathbb {R}^d\) with \(d\ge 2\), where we closely follow the lines of the proof of the statement in [10, Lemma 3.19]. We recall the general strategy first and then perform the necessary estimates in our setting. For measurable sets \(\varOmega \subset \mathbb {R}^d\), we set \(|\varOmega |:=\mu (\varOmega )\). In this case, (8) and (9) hold with \(\beta =d\), i.e.,
Furthermore,
Consequently,
and in particular, if \(\mathbf {1}\) denotes the characteristic function,
It is shown in [10, Lemma 3.19] that for \(\tau >0\) and \(r\in \mathbb {N}\), we can set \(\tau \sqrt{t}=2^{r}\) such that
holds, where the constants \(c_4:=\frac{e}{2^d\varGamma (d/2+1)}=:ec'\) can be obtained from (118). Hence, to make the lower bound positive, we need to choose \(r\in \mathbb {N}\) large enough such that
Once we have an appropriate \(r\in \mathbb {N}\) at hand, we follow the argument in [10] and set (see [10, (3.44)])
and choose a \(\ell >0\) large enough such that
Then, following the proof of [10, Lemma 3.19], we may set \(\tilde{b}:=2^{\ell }\).
Lemma 10
If we choose for \(d\ge 2\), \(r(d)\in \mathbb {N}\) as the smallest integer such that
then (119) holds.
Proof
We determine \(r\in \mathbb {N}\) such that
since then
To determine \(r:=r(d)\) such that (121) holds, we set
Then, \(h_d'(x)=-2\ln (2)\exp (2x\ln (2))+d\ln (2)+1\), and, since \(h_d(x)\rightarrow -\infty \) as \(x \rightarrow \pm \infty \), \(h_d\) has a unique global maximum at
Note that \(h_d(\tilde{x})>0\). Therefore, we look for \(r\ge \tilde{x}\) such that \(h_d(r)<0\), and then (121) follows. We make the ansatz \(r=:s\tilde{x}\) with \(s\ge 1\) and use the abbreviation \(a_d:=d\ln (2)\) and \(b:=2\ln (2)\). Then,
If \(d=2,\dots ,10\), it suffices to choose \(s=6\). If \(d\ge 10\), we set
Now we set
and estimate very roughly as follows: Since \(\ln (\frac{a_d+1}{b})\le \frac{a_d+1}{b}\) and \(b\le 2\frac{a_d+1}{b}\), we have
since by the choice of s,
Note that for \(d\ge 10\), s(d) is monotonically decreasing. Therefore, \(s(d)\le s(10)\le 7\), and with \(r(d)\ge 7\tilde{x}\) the assertion follows. \(\square \)
We now turn to (121) and use r as obtained in Lemma 10. By (122), it suffices to choose \(\ell >0\) such that
which holds for
In particular, \(\ell \rightarrow 0\) as \(d\rightarrow \infty \), and thus \(\tilde{b}\rightarrow 1\).
Rights and permissions
About this article
Cite this article
Griebel, M., Rieger, C. & Zwicknagl, B. Regularized Kernel-Based Reconstruction in Generalized Besov Spaces. Found Comput Math 18, 459–508 (2018). https://doi.org/10.1007/s10208-017-9346-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-017-9346-z
Keywords
- Reproducing kernels
- A priori error analysis
- Generalized Besov spaces
- Feasible reconstruction schemes
- Spline smoothing