Skip to main content

Advertisement

Log in

A variable selection procedure for depth measures

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

We herein introduce variable selection procedures based on depth similarity, aimed at identifying a small subset of variables that can better explain the depth assigned to each point in space. Our study is not intended to deal with the case of high-dimensional data. Identifying noisy and dependent variables helps us understand the underlying distribution of a given dataset. The asymptotic behaviour of the proposed methods and numerical aspects concerning the computational burden are studied. Furthermore, simulations and a real data example are analysed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

Download references

Acknowledgements

The authors would like to thank Centro de Cómputos de Alto Rendimiento (CeCAR) for granting use of computational resources which allowed us to perform most of the experiments included in this work. And also the anonymous reviewers for their careful reading and their insightful comments and suggestions which improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Agustín Alvarez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was partially supported by Grant pict 2018-00740 from anpcyt at Buenos Aires, Argentina, and by Grant pict 2018-00740 from anpcyt at Buenos Aires, Argentina, and by the Spanish Agencia Estatal de Investigación (AEI) and Fondo Europeo de Desarrollo Regional (FEDER), Grant CTM2016-79741-R for MICROAIPOLAR project.

Appendix

Appendix

Proof of Theorem 1.

Proof

Convergence \(I_n(k)\buildrel {\mathrm{a.s.}}\over \longrightarrow I_\infty (k)\) stated in the Theorem must be understood as almost surely (\(\omega\)) there exists \(n_0=n_0(\omega )\) such that \(I_n(k)=I_\infty (k)\) for \(n\ge n_0.\) Denote the objective function of the population definition (1) by h(I) and the objective function for the estimation (3) by \(h_n(I)\), i.e.

$$\begin{aligned} h(I)&= {\mathbb {E}}[q(D(I))-q(D({\mathbf {X}},P))]^2,\\ h_n(I)&= \frac{1}{n}\sum _{j=1}^n[q(D_{n,j}(I))-q(D({\mathbf {X}}_j,P_n))]^2. \end{aligned}$$

In order to prove the consistency stated in the theorem, due to the finiteness of the quantity of subjects I and the uniqueness of the minimizer in (1), it is enough to prove that for all I

$$\begin{aligned} h_n(I)\buildrel {\mathrm{a.s.}}\over \longrightarrow h(I). \end{aligned}$$
(11)

Denote by \(A_n(I)=(\sum _{j=1}^n[q(D({\mathbf {X}}_j[I],P[I]))-q(D({\mathbf {X}}_j,P))]^2)/n\). By the law of large numbers, we have that \(A_n(I)\buildrel {\mathrm{a.s.}}\over \longrightarrow h(I)\), then by setting \(h_n(I)=h_n(I)-A_n(I)+A_n(I)\), to prove (11) it suffices to prove that \(|h_n(I)-A_n(I)|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\). After some calculations we get

$$\begin{aligned} h_n(I)-A_n(I)&=\frac{1}{n}\sum _{j=1}^n[q^2(D_{n,j}(I))-q^2(D({\mathbf {X}}_j[I],P[I]))]\\&\quad +\frac{1}{n}\sum _{j=1}^n[q^2(D({\mathbf {X}}_j,P_n))-q^2(D({\mathbf {X}}_j,P))]\\&\quad +\frac{1}{n}\sum _{j=1}^n 2[q(D({\mathbf {X}}_j,P[I]))q(D({\mathbf {X}}_j,P))-q(D_{n,j}(I))q(D({\mathbf {X}}_j,P_n))]\\&=S_1+S_2+S_3, \end{aligned}$$

where \(S_i\) is the ith average of the last equation. Let us first see that \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\). By denoting \(Y_{n,j}(\omega )=D({\mathbf {X}}_j(\omega ),P_n(\omega ))\) and \(Y_j(\omega )=D({\mathbf {X}}_j(\omega ),P)\) we have that \(|Y_{n,j}(\omega )-Y_j(\omega )|\le s_n(\omega )\) independent of j and therefore \(Y_{n,j}\buildrel {\mathrm{a.s.}}\over \longrightarrow Y_j\). Let in general denote \(F_{Y}\) the cumulative distribution of the random variable Y. By basic properties of convergence we have that \(F_{Y_{n,1}}\buildrel {D}\over \longrightarrow F_{Y_1}\). Since \(F_{Y_1}\) is continuous, we have that \(F_{Y_{n,1}}\buildrel {u}\over \longrightarrow F_{Y_1}\). Then, since \(F_{Y_{n,j}}=F_{Y_{n,1}}\) and \(F_{Y_j}=F_{Y_1}\), we have that

$$\begin{aligned} |q(Y_{n,j})-q(Y_j)|&=|F_{Y_{n,1}}(Y_{n,j})-F_{Y_1}(Y_j)|\\&\le |F_{Y_{n,1}}(Y_{n,j})-F_{Y_1}(Y_{n,j})|+|F_{Y_1}(Y_{n,j})-F_{Y_1}(Y_j)|\\&< \epsilon +\epsilon , \end{aligned}$$

where the first \(\epsilon\) in the last equation is due to the uniform convergence \(F_{Y_{n,1}}\buildrel {u}\over \longrightarrow F_{Y_1}\) and holds for \(n\ge n_0\) and the second \(\epsilon\) is due to the uniform continuity of \(F_{Y_1}\) together with the almost sure convergence \(Y_{n,j}\buildrel {\mathrm{a.s.}}\over \longrightarrow Y_j\). In fact, fixed \(\omega\) for which (5) holds, there exists \(n_1(\omega )\) such that \(n\ge n_1(\omega )\) implies that \(|F_{Y_1}(Y_{n,j}(\omega ))-F_{Y_1}(Y_j(\omega ))|<\epsilon\). Then, given \(\epsilon >0\) and \(\omega\) for which (5) holds, taking \(n_2(\omega )=\max \{n_0,n_1(\omega )\}\), if \(n\ge n_2(\omega )\) then \(|q(D({\mathbf {X}}_j,P_n))-q(D({\mathbf {X}}_j,P))|<2\epsilon\). Now, by using that the quantiles take values in [0, 1], that the function \(f(x)=x^2\) satisfies \(0\le f^{\prime }(c)\le 2\) for \(0\le c\le 1\) and the Lagrange theorem we have that \(|q^2(D({\mathbf {X}}_j,P_n))-q^2(D({\mathbf {X}}_j,P))|< 4 \epsilon \,\, \forall j\) and \(n\ge n_2(\omega )\) and so the same will hold for the average \(S_2\), and so we have proved that \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\).

The proof that \(S_1\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\) runs in the same way as the proof of \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\). Using that \(S_1\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\) and \(S_2\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\) is very simple to see that also \(S_3\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\), and we have proved that \(h_n(I)\buildrel {\mathrm{a.s.}}\over \longrightarrow h(I)\).\(\square\)

Proof of Theorem 2.

Proof

Since the cardinality of the subsets I with \(|I|=k\) is finite for every k, it will suffice to prove that the empirical correlation \(\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)\) of (4) converges almost surely to \(\rho (D(I),D({\mathbf {X}},P))\) of (2). Recall that \(\mathbf {D}_n=(D({\mathbf {X}}_1,P_n),\ldots ,D({\mathbf {X}}_n,P_n))\) and \(\mathbf {D}_n(I)=(D({\mathbf {X}}_1(I),P_n(I)),\ldots ,D({\mathbf {X}}_n(I),P_n(I)))\). Denote by \(\mathbf {D}^*_n=(D({\mathbf {X}}_1,P),\ldots ,D({\mathbf {X}}_n,P))\) and \(\mathbf {D}^*_n(I)=(D({\mathbf {X}}_1(I),P(I)),\ldots ,D({\mathbf {X}}_n(I),P(I)))\). We have by the triangular inequality that in order to prove \(|\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)-\rho (D(I),D({\mathbf {X}},P))|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0\), it suffices to prove the following:

$$\begin{aligned}&|\rho _n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)-\rho (D(I),D({\mathbf {X}},P))|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0. \end{aligned}$$
(12)
$$\begin{aligned}&|\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)-\rho _n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0, \end{aligned}$$
(13)

We will first prove (12). By definition of the empirical correlation, we have that

$$\begin{aligned} \rho _n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)= \frac{\text{ Cov}_n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)}{\sqrt{\text{ Var}_n(\mathbf {D}^*_n(I))\text{ Var}_n(\mathbf {D}^*_n)}}, \end{aligned}$$
(14)

where \(\text{ Cov}_n\) and \(\text{ Var}_n\) are the empirical covariance and variance, respectively. On the other hand,

$$\begin{aligned} \rho (D(I),D({\mathbf {X}},P))=\frac{\text{ Cov }(D(I),D({\mathbf {X}},P))}{\sqrt{\text{ Var }(D(I))\text{ Var }(D({\mathbf {X}},P))}} \end{aligned}$$
(15)

Using the strong law of large numbers, we get almost sure convergence both for the numerator and the denominator of (14) to the numerator and the denominator of (15), respectively. Since both \(\text{ Var }(D(I))\) and \(\text{ Var }(D({\mathbf {X}},P))\) are different from 0 we get that (12) holds, as desired.

We now prove (13). By definition we have:

$$\begin{aligned} \rho _n(\mathbf {D}_n(I),\mathbf {D}_n)= \frac{\text{ Cov}_n(\mathbf {D}_n(I),\mathbf {D}_n)}{\sqrt{\text{ Var}_n(\mathbf {D}_n(I))\text{ Var}_n(\mathbf {D}_n)}}, \end{aligned}$$
(16)

Based on the expressions, we have for \(\rho _n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)\) and \(\rho _n(\mathbf {D}_n(I),\mathbf {D}_n)\) in (14) and (16), respectively, and since the denominator of (14) has a limit different from zero, to prove (13) it will suffice to prove that both the numerator and denomiator of (14) and (16) approach each other a.s., i.e.,

$$\begin{aligned}&|\text{ Cov}_n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)-\text{ Cov}_n(\mathbf {D}_n(I),\mathbf {D}_n)|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0 \end{aligned}$$
(17)
$$\begin{aligned}&\left|\sqrt{\text{ Var}_n(\mathbf {D}^*_n(I))\text{ Var}_n(\mathbf {D}^*_n)}-\sqrt{\text{ Var}_n(\mathbf {D}_n(I))\text{ Var}_n(\mathbf {D}_n)} \right|\buildrel {\mathrm{a.s.}}\over \longrightarrow 0. \end{aligned}$$
(18)

We will concentrate in proving (17). The proof of (18) is analogous. First note that since (5) holds, given any \(\varepsilon >0\), there exists \(n_0=n_0(\varepsilon )\) such that \(\Vert \mathbf {D}_n(I)-\mathbf {D}^*_n(I)\Vert _\infty <\varepsilon\) and \(\Vert \mathbf {D}_n-\mathbf {D}^*_n\Vert _\infty <\varepsilon\) for \(n\ge n_0\) with probability one. Also note that the coordinates of \(\mathbf {D}_n^*(I)\) and \(\mathbf {D}_n^*\) lie in [0, 1] because of being depths. Elementary calculations allow to see that \(|\text{ Cov}_n(\mathbf {D}^*_n(I),\mathbf {D}^*_n)-\text{ Cov}_n(\mathbf {D}_n(I),\mathbf {D}_n)|\le 4\varepsilon +\varepsilon ^2\) for \(n\ge n_0\), and from this convergence (17) can be concluded.\(\square\)

Proof of Corollary 1.

Proof

Since the cardinality of the subsets I is finite, it will suffice to prove that \(\frac{1}{n}\sum _{j=1}^n[q(D_{n,j}(I))-q(D({\mathbf {X}}_j,P_n))]^2+\lambda |I|\) converges almost surely to \({\mathbb {E}}[q(D(I))-q(D({\mathbf {X}},P))]^2+\lambda |I|\). This result holds as a consequence of Theorem 1.\(\square\)

Proof of Lemma 1.

Proof

Let \(\lambda _1<\lambda _2\). To prove that \(K^*(\lambda _2)\le K^*(\lambda _1)\), it suffices to show that \(c_k+\lambda _2 k > c_{K^*(\lambda _1)}+\lambda _2 K^*(\lambda _1)\) for every \(k>K^*(\lambda _1)\).

Let \(k>K^*(\lambda _1)\), we have that \(c_{K^*(\lambda _1)}+\lambda _1 K^*(\lambda _1)\le c_k +\lambda _1 k\). Finally, adding \((\lambda _2-\lambda _1)K^*(\lambda _1)\) to both sides of the inequality the proof is complete.

$$\begin{aligned} c_{K^*(\lambda _1)}+\lambda _2K^*(\lambda _1)&\le c_k +\lambda _1 k +(\lambda _2-\lambda _1)K^*(\lambda _1)\\&< c_k+\lambda _1 k + (\lambda _2-\lambda _1)K^*(\lambda _1)+(\lambda _2-\lambda _1)(k-K^*(\lambda _1))\\&= c_k+\lambda _2k. \end{aligned}$$

\(\square\)

Proof of Lemma 2.

Proof

The proof has two steps. First, we show that if \(\lambda \ge \varepsilon\) then \(K^*(\lambda )\le k_0\) and then that if \(\lambda < d/(k_0-1)\), then \(K^*(\lambda )\ge k_0\).

  • Step 1 We will show that \(K^*(\varepsilon )\le k_0\). Consider \(k>k_0\), from H1, we have that \(c_k=c_{k_0}+(c_{k}-c_{k_0})=c_{k_0}-{\tilde{\varepsilon }}(k-k_0)\), with \(0<{\tilde{\varepsilon }}\le \varepsilon\). Hence,

    $$\begin{aligned} c_k+\varepsilon k&= c_{k_0}+\varepsilon k_0 + (\varepsilon -{\tilde{\varepsilon }})(k-k_0)\\ & \ge c_{k_0}+\varepsilon k_0, \end{aligned}$$

    thus we conclude that \(K^*(\varepsilon )\le k_0.\) Moreover, from Lemma 1 we have that \(K^*(\lambda )\le k_0\) if \(\lambda \ge \varepsilon\).

  • Step 2 We prove that if \(\lambda < d/(k_0-1)\), then \(K^*(\lambda )\ge k_0\).

    We’ll show that if \(k<k_0\) then \(c_k+\lambda k > c_{k_0}+\lambda k_0\).

    Let \(k<k_0\), then

    $$\begin{aligned} c_k+\lambda k&\ge (c_{k_0}+d)+ \lambda \\ &> c_{k_0}+\lambda (k_0-1)+\lambda \\ &= c_{k_0}+\lambda k_0, \end{aligned}$$

    the last inequality holds since \(\lambda < d/(k_0-1).\) \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alvarez, A., Svarc, M. A variable selection procedure for depth measures. AStA Adv Stat Anal 105, 247–271 (2021). https://doi.org/10.1007/s10182-021-00391-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-021-00391-y

Keywords

Mathematics Subject Classification

Navigation