Abstract
We focus on the minimization of the least square loss function under a k-sparse constraint encoded by a \(\ell _0\) pseudo-norm. This is a non-convex, non-continuous and NP-hard problem. Recently, for the penalized form (sum of the least square loss function and a \(\ell _0\) penalty term), a relaxation has been introduced which has strong results in terms of minimizers. This relaxation is continuous and does not change the global minimizers, among other favorable properties. The question that has driven this paper is the following: can a continuous relaxation of the k-sparse constraint problem be developed following the same idea and same steps as for the penalized \(\ell _2-\ell _0\) problem? We calculate the convex envelope of the constrained problem when the observation matrix is orthogonal and propose a continuous non-smooth, non-convex relaxation of the k-sparse constraint functional. We give some equivalence of minimizers between the original and the relaxed problems. The subgradient is calculated as well as the proximal operator of the new regularization term, and we propose an algorithm that ensures convergence to a critical point of the k-sparse constraint problem. We apply the algorithm to the problem of single-molecule localization microscopy and compare the results with well-known sparse minimization schemes. The results of the proposed algorithm are as good as the state-of-the-art results for the penalized form, while fixing the constraint constant is usually more intuitive than fixing the penalty parameter.
Similar content being viewed by others
References
Andersson, F., Carlsson, M., Olsson, C.: Convex envelopes for fixed rank approximation. Optim. Lett. 11(8), 1783–1795 (2017)
Bechensteen, A., Blanc-Féraud, L., Aubert, G.: New \(l_2- l_0\) algorithm for single-molecule localization microscopy. Biomed. Opt. Express 11(2), 1153–1174 (2020)
Beck, A., Eldar, Y.C.: Sparsity constrained nonlinear optimization: optimality conditions and algorithms. SIAM J. Optim. 23(3), 1480–1509 (2013)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Betzig, E., Patterson, G.H., Sougrat, R., Lindwasser, O.W., Olenych, S., Bonifacino, J.S., Davidson, M.W., Lippincott-Schwartz, J., Hess, H.F.: Imaging intracellular fluorescent proteins at nanometer resolution. Science 313(5793), 1642–1645 (2006). https://doi.org/10.1126/science.1127344
Bi, S., Liu, X., Pan, S.: Exact penalty decomposition method for zero-norm minimization based on mpec formulation. SIAM J. Sci. Comput. 36(4), A1451–A1477 (2014)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014). https://doi.org/10.1007/s10107-013-0701-9
Bourguignon, S., Ninin, J., Carfantan, H., Mongeau, M.: Exact sparse approximation problems via mixed-integer programming: formulations and computational performance. IEEE Trans. Signal Process. 64(6), 1405–1419 (2016)
Breiman, L.: Better subset regression using the nonnegative garrote. Technometrics 37(4), 373–384 (1995). https://doi.org/10.2307/1269730
Burke, J.V., Curtis, F.E., Lewis, A.S., Overton, M.L., Simões, L.E.: Gradient sampling methods for nonsmooth optimization. arXiv preprint arXiv:1804.11003 (2018)
Candes, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006). https://doi.org/10.1109/TIT.2005.862083
Candes, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)
Carlsson, M.: On convexification/optimization of functionals including an l2-misfit term. arXiv:1609.09378 [math] (2016)
Carlsson, M.: On convex envelopes and regularization of non-convex functionals without moving global minima. J. Optim. Theory Appl. 183(1), 66–84 (2019)
Chahid, M.: Echantillonnage compressif appliqué à la microscopie de fluorescence et à la microscopie de super résolution. Ph.D. thesis, Bordeaux (2014)
Clarke, F.H.: Optimization and Nonsmooth Analysis, vol. 5. SIAM, Philadelphia (1990)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Gazagnes, S., Soubies, E., Blanc-Féraud, L.: High density molecule localization for super-resolution microscopy using CEL0 based sparse approximation. In: 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 28–31. IEEE (2017)
Hess, S.T., Girirajan, T.P.K., Mason, M.D.: Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91(11), 4258–4272 (2006). https://doi.org/10.1529/biophysj.106.091116
Larsson, V., Olsson, C.: Convex low rank approximation. Int. J. Comput. Vis. 120(2), 194–214 (2016). https://doi.org/10.1007/s11263-016-0904-7
Li, H., Lin, Z.: Accelerated proximal gradient methods for nonconvex programming. In: Advances in Neural Information Processing Systems, pp. 379–387 (2015)
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448–2478 (2013)
Mallat, S.G., Zhang, Z.: Matching pursuits with time–frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993). https://doi.org/10.1109/78.258082
Mordukhovich, B.S., Nam, N.M.: An easy path to convex analysis and applications. Synth. Lect. Math. Stat. 6(2), 1–218 (2013)
Nikolova, M.: Relationship between the optimal solutions of least squares regularized with \(\ell _0\)-norm and constrained by k-sparsity. Appl. Comput. Harmonic Anal. 41(1), 237–265 (2016). https://doi.org/10.1016/j.acha.2015.10.010
Pati, Y.C., Rezaiifar, R., Krishnaprasad, P.S.: Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of 27th Asilomar Conference on Signals, Systems and Computers, Vol. 1, pp. 40–44 (1993). https://doi.org/10.1109/ACSSC.1993.342465
Peleg, D., Meir, R.: A bilinear formulation for vector sparsity optimization. Signal Process. 88(2), 375–389 (2008). https://doi.org/10.1016/j.sigpro.2007.08.015
Pilanci, M., Wainwright, M.J., El Ghaoui, L.: Sparse learning via Boolean relaxations. Math. Program. 151(1), 63–87 (2015)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, vol. 317. Springer, Berlin (2009)
Rust, M.J., Bates, M., Zhuang, X.: Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Methods 3(10), 793–796 (2006). https://doi.org/10.1038/nmeth929
Sage, D., Kirshner, H., Pengo, T., Stuurman, N., Min, J., Manley, S., Unser, M.: Quantitative evaluation of software packages for single-molecule localization microscopy. Nat. Methods 12(8), 717 (2015)
Sage, D., Pham, T.A., Babcock, H., Lukes, T., Pengo, T., Chao, J., Velmurugan, R., Herbert, A., Agrawal, A., Colabrese, S., et al.: Super-resolution fight club: assessment of 2d and 3d single-molecule localization microscopy software. Nat. Methods 16(5), 387–395 (2019)
Selesnick, I.: Sparse regularization via convex analysis. IEEE Trans. Signal Process. 65(17), 4481–4494 (2017)
Simon, B.: Trace Ideals and Their Applications, Vol. 120. American Mathematical Society, Philadelphia (2005)
Soubies, E., Blanc-Féraud, L., Aubert, G.: A continuous exact \(\ell _0\) penalty (CEL0) for least squares regularized problem. SIAM J. Imaging Sci. 8(3), 1607–1639 (2015)
Soubies, E., Blanc-Féraud, L., Aubert, G.: A unified view of exact continuous penalties for \(\backslash \)ell\_2-\(\backslash \)ell\_0 minimization. SIAM J. Optim. 27(3), 2034–2060 (2017)
Soussen, C., Idier, J., Brie, D., Duan, J.: From Bernoulli–Gaussian deconvolution to sparse signal restoration. IEEE Trans. Signal Process. 59(10), 4572–4584 (2011)
Tono, K., Takeda, A., Gotoh, J.: Efficient dc algorithm for constrained sparse optimization. arXiv preprint arXiv:1701.08498 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors would like to thank the anonymous reviewers for their detailed comments and suggestions. This work has been supported by the French government, through a financial Ph.D. allocation from MESRI and through the 3IA Côte d’Azur Investments in the Future project managed by the National Research Agency (ANR) with the Reference Number ANR-19-P3IA-0002.
A Appendix
A Appendix
1.1 A.1 Preliminary Results for Lemma 1
Proposition 2 (Reminder) Let \(x\in {\mathbb {R}}^N.\) There exists \(j\in {\mathbb {N}}\) such that \(0<j\le k\) and
where the left inequality is strict if \(j\ne 1\), and where \(x_0=+\infty \). Furthermore, \(T_k(x)\) is defined as the smallest integer that verifies the double inequality.
Proof
First, we suppose that (23) is not true for \(j\in \{1,2,\dots , k-1\}\), i.e., either
or
or both. We prove by recurrence that if (23) is not true \(\forall j \in \{1,2,\dots ,k-1\}\), then (24) is false, and (25) is true. We investigate the case \(j=1\):
The above inequality is obvious, and we can conclude that for \(j=1\), (24) is false, and thus, (25) must be true, i.e.,
We suppose that for some \(j\in \{1,2,\dots ,k-1\}\), (24) is false and (25) is true, and we investigate \(j+1\).
We get (28) since we have supposed (25) is true for j. Thus, by recurrence, we can conclude that (24) is false, and (25) is true \(\forall j \in \{1,2,\dots ,k-1\}\).
Now, we investigate \(j=k\):
We use the fact that (25) is true for \(j=k-1\) to obtain the above inequality. Thus, (24) is false. By definition \(x^\downarrow _0=+\infty \), and thus, (25) is also false. Thus, \(T_k(x)=k\) verifies the double inequality in (23).
To conclude, either \(T_k(x)=k\), or there exists \(j\in \{1,2,\dots ,k-1\}\) such that \(T_k(x)=j\). \(\square \)
Definition 4
Let \(P^{(x)}\in {\mathbb {R}}^{N\times N}\) be a permutation matrix such that \(P^{(x)}x=x^{\downarrow }\). The space \({\mathcal {D}}(x)\) is defined as:
\(z \in {\mathcal {D}}(x)\) means \(<z,x>=<z^{\downarrow },x^{\downarrow }>\).
Remark 3
\({\mathcal {D}}(x)={\mathcal {D}}(|x|)\), since we have \(|x^{\downarrow }|=|x|^\downarrow \).
Proposition 4
Let \((a,b)\in {\mathbb {R}}_{\ge 0}^N\times {\mathbb {R}}_{\ge 0}^N\). Then,
and the inequality is strict if \(b\notin {\mathcal {D}}(a)\).
Proof
[34, Lemma 1.8] proves it without proving the strict inequality.
We assume that a is not on the form \(a=t(1,1\dots ,1)^T\), i.e., there exists \( i\ne j,\,\, a_i\ne a_j\). If \(a=t(1,1\dots ,1)^T\), then \(b\in {\mathcal {D}}(a)\), and \(\sum _i a_ib_i =\sum _i a^{\downarrow }_i b^{\downarrow }_i\). Moreover, for simplicity, without loss of generality, we suppose \(a=a^{\downarrow }\). We write
As it is obvious that \(\forall \, j=1,\dots N\)
and since \(a_{j-1}-a_j\ge 0\, \forall \, j\), we get
The goal of Proposition 4 is to show that the inequality in (32) is strict if \(b\notin {\mathcal {D}}(a)\).
First, we can remark if \(b\notin {\mathcal {D}}(a)\), then there exists \(j_0\in \{2,3,\dots ,N\}\) such
By contradiction, if (33) is not true, we have \(\forall \, j\in \{2,3,\dots ,N\}\)
and with (31), we get
From (34), we easily obtain \(\forall \, j,\)
which means \(b^{\downarrow }=b\), i.e., \(b\in {\mathcal {D}}(a)\), which contradicts the hypothesis \(b\notin {\mathcal {D}}(a)\). So there exists \(j_0\) such that (33) is true, and if \(a_{j_0-1}\ne a_{j_0}\)
which, with (30), implies
It remains to examine the case where \(a_{j_0-1}=a_{j_0}\). In this case, we claim there exists \(j_1\in \{1,\dots ,j_{0-2}\}\) such that
or \(j_1\in \{j_0,\dots ,N\}\) such that
If not, with the same proof as before we get
i.e., we have
where \((x_1,x_2)=(b_{j_0-1},b_{j_0})\) or \((b_{j_0},b_{j_0-1})\). The order does not matter since \(a_{j_0-1}=a_{j_0}\). This implies that \(b\in {\mathcal {D}}(a)\), which contradicts the hypothesis. So (35) and (36) are true and we get, for example,
and if \(a_{j_1-1}-a_{j_1}\ne 0\) we deduce
If \(a_{j_1-1}=a_{j_1}\), we repeat the same argument and proof as above, and we are sure to find an index \(j_w\) such that \(a_{j_w-1}-a_{j_w}\ne 0\) since we have supposed that \(a\ne t(1,1,\dots ,1)^T\). Therefore, (37) is always true which concludes the proof. \(\square \)
Proposition 5
[38] \(g(x):{\mathbb {R}}^N\rightarrow {\mathbb {R}}\) defined as \( g(x)=\frac{1}{2}\sum _{i=1}^k x_i^{\downarrow 2}\), is convex. Furthermore, note that \(g(|x|)=g(x)\).
Lemma 4
Let \(f_1(z,x)\in {\mathbb {R}}^N\times {\mathbb {R}}^N \rightarrow {\mathbb {R}}\) be defined as
Let us consider the concave problem
Problem (38) has the following optimal arguments
where \({\hat{z}}\) is defined as
We can remark that \({\hat{z}}={\hat{z}}^\downarrow \), and \(T_k(x)\) is defined in Proposition 2. The value of the supremum problem is
Proof
Problem (38) can be written as:
We remark that finding the supremum for \(z^{\downarrow }_i \, , i>k\) reduces to finding the supremum of the following term, knowing that \(z^{\downarrow }_i \) is upper bounded by \(z^{\downarrow }_{i-1}\):
Let \(z^{\downarrow }_k\) be a constant. The sum in (43) is nonnegative and increasing with respect to \(z^{\downarrow }_j\), and the supremum is obtained when \(z^{\downarrow }_j\) reaches its upper bound, i.e., \(z^{\downarrow }_j=z^{\downarrow }_{j-1} \, \forall j>k\) and \(|x^{\downarrow }_j|\ne 0\). By recursion, \(z^{\downarrow }_j=z^{\downarrow }_{k} \, \forall j>k\) and \(|x^{\downarrow }_j|\ne 0\). When \(\exists \, j>k, |x^{\downarrow }_j|=0\), we observe that \(z^{\downarrow }_j\) is multiplied with zero, and can take on every value between its lower bound and upper bounds, which is between 0 and \(z^{\downarrow }_k\). Then, obviously, the supremum argument for (43) is
Further, from (42), we observe that for \(i<k\), the optimal argument is
By recursion, we can write this as
It remains to find the value of \(z^{\downarrow }_k\).
Inserting (44) and (46) into (42), we obtain:
To treat the term \(\max (|x^{\downarrow }_i|,z^{\downarrow }_k)\), we introduce \(j^*(k)= \sup _j \{j: z^{\downarrow }_k\le |x^{\downarrow }_j|\}\) , i.e., \(j^*(k)\) is the largest index such that \(|x^{\downarrow }_{j^*(k)}|\ge z^{\downarrow }_k\), and we define \(x^{\downarrow }_0=+\infty \). Therefore, (47) is rewritten as:
(48) is a concave problem, and the optimality condition yields
We define \(\sum _{i=j^*(k)+1}^k 1 =S\). Then, \(j^*(k)=k-S\) and
Furthermore, since \(j^*(k)=k-S\) was the largest index such that \(|x_{k-S}|\ge z^{\downarrow }_k> |x_{k-S+1}|\). This translates to
which implies \(S=T_k(x)\) (see Proposition 2). Note that if \(j^*(k)=k\) (which is the same to say \(T_k(x)=1\)), then the right part of the above inequality is not strict.
Now, assume \(|x^{\downarrow }_{j^*(k)}|= z^{\downarrow }_k\). Then, the max function can both take \(z^{\downarrow }_k\) or \(|x^{\downarrow }_{j^*(k)}|\). If it is the latter, than the expression above is correct. In the former case, \(\max (|x^{\downarrow }_{j^*(k)}|,z^{\downarrow }_k)=z^{\downarrow }_k\). We obtain
Furthermore, we use the fact that \(|x^{\downarrow }_{j^*(k)}|= z^{\downarrow }_k\) and \(j^*(k)=k-T_k(x)\), and develop (51) as:
The unique value of \(z^{\downarrow }_k\) is given by (50). \(\square \)
Lemma 5
Let \(x\in {\mathbb {R}}^N\) and \(f_2(y,x)\in {\mathbb {R}}^N\times {\mathbb {R}}^N \rightarrow {\mathbb {R}}\), defined as
The following concave supremum problem
is equivalent to
The arguments are such that \({\hat{y}}_i^\downarrow ={{\,\mathrm{sign}\,}}^*(x_i^{\downarrow {\hat{z}}}){\hat{z}}_i^\downarrow \).
Proof
Let \({\hat{z}}\in {\mathbb {R}}^N_{\ge 0}\) be the argument of the supremum in (57), \({\hat{y}}\) be such that \({\hat{y}}_i={{\,\mathrm{sign}\,}}(x_i){\hat{z}}_i\), and note that \(f_2(y,x)=-g(y)+<y,x>\) with g defined as in Proposition 5 in “Appendix A.1.” First, \(f_2(y,x)\) is a concave function in y (see Proposition 5). Furthermore, \(f_2(y,x)\) is such that \(-f_2(y,x)\) is coercive in y. Thus, a supremum exists. Further note that \(g({\hat{y}})=g(|{\hat{y}}|)=g({\hat{z}})\). Then, the following sequence of equalities/inequalities completes the proof:
\(\square \)
1.2 A.2 Proof of Lemma 1
Proof
Note that a similar problem has been studied in [1]. They do, however, work with low-rank approximation; therefore, they did not have the problem of how to permute x since they work with matrices. First, let \({\mathcal {D}}(x)\) be as defined in Definition 4.
We are interested in
and its arguments, with \(f_2\) defined in Lemma 5. From this lemma, we know that we can rather study
Furthermore, from Lemma 4, we know the expression of \(\sup _{z\in {\mathbb {R}}^N_{\ge 0}}f_1(z,|x|)\) and its arguments. We want to show that \(\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)=\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)\), and to find a connection between the arguments of \(f_2\) and \(f_1\).
First, note that
From [34, Lemma 1.8] and Proposition 4, we have that \(\forall (y,x) \in {\mathbb {R}}_{\ge 0}^N\times {\mathbb {R}}_{\ge 0}^N\):
and the inequality is strict if \(y\notin {\mathcal {D}}(x)\), and thus
Note that we have \({\mathcal {D}}(|x|)={\mathcal {D}}(x)\), then \(\forall z \in {\mathcal {D}}(x)\), \(f_2(z,|x|)=f_1(z,|x|)\) and:
Using inequalities (58) and (59) and connecting them to (60), we obtain
\(f_2(z,|x|)\) is upper and lower bounded by the same value; thus, we have
The \(\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)\) is known from Lemma 4:
with the optimal arguments:
where \({\hat{z}}\) is such that:
Now we are interested in the optimal arguments of \(f_2\). Let \(P^{(x)}\) be such that \(P^{(x)}x=x^{\downarrow }\). We define \(z^*=P^{(x)^{-1}} {\hat{z}}\). Evidently, \(P^{(x)}z^*={\hat{z}}\), and since \({\hat{z}}\) is sorted by its absolute value, \(P^{(x)}z^*=z^{* \downarrow }\), and thus, \(z^*\in {\mathcal {D}}(x)\). Furthermore, from Lemma 4, \(z^*\) is an optimal argument of \(f_1\).
We have then \(f_2(z^*,|x|)=f_1(z^*,|x|)=\sup _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)\). \(z^*\) is therefore an optimal argument of \(f_2\) since (61) shows the equality between the supremum value of \(f_1\) and \(f_2\).
We have shown that there exists \({\hat{z}}\in \mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_1(z,|x|)\), from which we can construct \(z^*\in {\mathcal {D}}(x)\), an optimal argument of \(f_2\). Now, by contradiction, we show that all optimal arguments of \(f_2\) are in \({\mathcal {D}}(x)\). Assume \({\hat{z}} = \mathop {{\mathrm{arg \, sup}}}\limits _{z\in {\mathbb {R}}^N_{\ge 0}} f_2(z,|x|)\) and that \({\hat{z}}\notin {\mathcal {D}}(x)\). We can construct \(z^*\), such that \(z^{* \downarrow }={\hat{z}}^{\downarrow }\), and \(z^*\in {\mathcal {D}}(x)\). We have then
The last equality is due to \(z^*\in {\mathcal {D}}(x)\), and the last inequality is from Proposition 4. Thus, \({\hat{z}}\) is not an optimal argument for \(f_2\), and all optimal arguments of \(f_2\) must be in \({\mathcal {D}}(x)\).
Furthermore, thus it suffices to study \(\sup _{z\in {\mathbb {R}}^N_{\ge 0}\in {\mathcal {D}}(z)}f_2(z,|x|)\), and from (60), we can rather study \(f_1\), and construct all supremum arguments of \(f_2\) from \(f_1\).
where \({\hat{z}}\) is defined in (64). \(\square \)
1.3 A.3 Calculation of Proximal Operator of \(\zeta (x)\)
As preliminary results, we state and prove the two following lemmas 6 and 7.
Lemma 6
Let \(j:{\mathbb {R}}\rightarrow {\mathbb {R}}\) be a strictly convex and coercive function, let \(w=\mathop {{\mathrm{arg\, min}}}\limits _t j(t)\), and let us suppose that j is symmetric with respect to its minimum, i.e., \(j(w-t)=j(w+t)\, \forall t \in {\mathbb {R}}\). The problem
with a and b positive, has the following solution:
Proof
However, j is symmetric with respect to its minimum \(j(w+t_1)\le j(w+t_2)\,\forall |t_1|\le |t_2|\). Assume that \(0<w\le b\). We can write \(j(b)=j(w+\alpha )\), \(\alpha > 0\) and \(j(-b)=j(w+\beta ), \beta <0\). Since \(w>0\), then \(|\alpha |<|\beta |\), and thus, the minimum is reached with \(z=b\) on the interval [b, a]. Similar reasoning can be used to prove the other cases. \(\square \)
Lemma 7
Let \(g_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\, , i\in [1..N]\) be strictly convex and coercive. Let \(w=(w_1,w_2,\dots w_N)^T=\mathop {{\mathrm{arg\, min}}}\limits _{t_i} \sum g_i(t_i)\), i.e., \(w_i=\mathop {{\mathrm{arg\, min}}}\limits _{t_i}g_i(t_i)\). Assume that \(|w_1|\ge |w_2|\ge \dots \ge |w_k|\) and \(|w_{k+1}|\ge |w_{k+2}|\ge \dots \ge |w_N|\). Let \(g_i\) be symmetric with respect to its minimum. Consider the following problem:
The optimal solution is
where \(\tau \in {\mathbb {R}}\) is in \([\min (|w_k|,|w_{k+1}|),\max (|w_k|,|w_{k+1}|)]\) and is the value that minimizes \(\sum g_i(t_i(\tau ))\).
Proof
Note that this proof is inspired by [20, Theorem 2], with some modifications. First, if \(|w_k|\ge |w_{k+1}|\), then w satisfies the constraints in Problem (66), and thus, w is the optimal solution. If \(|w_k|<|w_{k+1}|\), we must search a little more. In both cases, we can, since each \(g_i\) is convex and symmetric with respect to its minimum, apply Lemma 6 for \(t_i\), and the choices can be limited to the following choices:
This can be rewritten in a shorter form, at first in the case where \(i\le k\).
This can be proved by recursion. In the case of \(i=1\), \(w_1\) is the optimal argument if \(|w_1|\ge |t_2|\); otherwise, \({{\,\mathrm{sign}\,}}^*(w_1)|t_2|\) is optimal. Therefore, \(t_1={{\,\mathrm{sign}\,}}^*(w_1)\max (|w_1|,|t_{2}|)\). Assume that this is true for the ith index.
But \(t_{i}={{\,\mathrm{sign}\,}}^*(w_i)\max (|w_{i}|,|t_{i+1}|)\), which yields \(|t_{i}|\ge |w_{i}|\ge |w_{i+1}|\) and thus, the third case of (70) can be ignored.
Now assume for an \(i\le k\) that \(t_i\ne w_i\). This implies that
Since \(w_i\) is non-increasing for \(i\le k\), the following inequality \(|t_{i+1}|>|w_{i+1}|\) is true. Furthermore, \(|t_{i+1}|= \max (|w_{i+1}|,|t_{i+2}|) = |t_{i+2}|\). By recursion, we have
To facilitate the notations, \(|t_k|=\tau \). The lemma is proved by inserting \(\tau \) instead of \(|t_{i+1}|\) and \(|t_k|\) into Eq. (69)
When \(i> k\), a similar proof of recursion gives:
and by adopting the notation \(\tau \), we finish the proof. \(\square \)
Remark 4
Note that if w, defined in Lemma 7 is such that \(|w_k|\ge |w_{k+1}|\), then w is solution of (66).
Lemma 8
Let \(y\in {\mathbb {R}}^N\). Define \(\zeta : {\mathbb {R}}^N\rightarrow {\mathbb {R}}\) as \(\zeta (x){:}{=}-(\frac{\rho -1}{\rho })\sum _{i=k+1}^N(x_i)^{\downarrow 2}\). The proximal operator of \(\zeta \) is such that
If \(|y^{\downarrow }_k|<\rho |y^{\downarrow }_{k+1}|\), then \(\tau \) is a value in the interval \([|y^{\downarrow }_k|,\rho |y^{\downarrow }_{k+1}|]\), and is defined as
where \(n_1\) and \(n_2\) are two groups of indices such that \(\forall \, i \in n_1, y^{\downarrow }_i<\tau \) and \(\forall \, i \in n_2,\, \tau \le \rho |y^{\downarrow }_i|\) for an \(\#n_1\) and \(\#n_2\) are the sizes of n1 and n2. To go from \({\text {prox}_{\zeta (\cdot )}}(y)^{\downarrow y}\) to \(\text {prox}_{\zeta (\cdot )}(y)\), we apply the inverse permutation that sorts y to \(y^{\downarrow }\).
Note that we search
We define two functions, \(l_1: {\mathbb {R}}^{N}\times {\mathbb {R}}^{N}\rightarrow {\mathbb {R}} \) and \(l_2: {\mathbb {R}}^{N}\times {\mathbb {R}}^{N}\rightarrow {\mathbb {R}}\).
As in Lemma 1, we can create relations between \(l_1\) and \(l_2\), where \(l_2\) can be solved using Lemma 7.
We omit the proof as it is similar to the one of Lemma 1.
1.4 A.4 The Algorithm
Rights and permissions
About this article
Cite this article
Bechensteen, A.H., Blanc-Féraud, L. & Aubert, G. A Continuous Relaxation of the Constrained \(\ell _2-\ell _0\) Problem. J Math Imaging Vis 63, 472–491 (2021). https://doi.org/10.1007/s10851-020-01014-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-020-01014-y