Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms

Tran-Dinh, Quoc; Sun, Tianxiao; Lu, Shu

doi:10.1007/s10107-018-1264-6

Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms

Full Length Paper
Series A
Published: 30 March 2018

Volume 177, pages 173–223, (2019)
Cite this article

Mathematical Programming Submit manuscript

744 Accesses
4 Citations
Explore all metrics

Abstract

We study a class of monotone inclusions called “self-concordant inclusion” which covers three fundamental convex optimization formulations as special cases. We develop a new generalized Newton-type framework to solve this inclusion. Our framework subsumes three schemes: full-step, damped-step, and path-following methods as specific instances, while allows one to use inexact computation to form generalized Newton directions. We prove the local quadratic convergence of both full-step and damped-step algorithms. Then, we propose a new two-phase inexact path-following scheme for solving this monotone inclusion which possesses an ${\mathcal {O}}(\sqrt{\nu }\log (1/\varepsilon ))$-worst-case iteration-complexity to achieve an $\varepsilon $-solution, where $\nu $ is the barrier parameter and $\varepsilon $ is a desired accuracy. As byproducts, we customize our scheme to solve three convex problems: the convex–concave saddle-point problem, the nonsmooth constrained convex program, and the nonsmooth convex program with linear constraints. We also provide three numerical examples to illustrate our theory and compare with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forward-partial inverse-half-forward splitting algorithm for solving monotone inclusions

Article 31 August 2022

Convergence analysis of two-step inertial Douglas-Rachford algorithm and application

Article 26 April 2021

On the Fulfillment of the Complementary Approximate Karush–Kuhn–Tucker Conditions and Algorithmic Applications

Article 13 March 2023

References

Auslender, A., Teboulle, M., Ben-Tiba, S.: A logarithmic-quadratic proximal method for variational inequalities. Comput. Optim. Appl. 12(1–3), 31–40 (1999)
Article MathSciNet MATH Google Scholar
Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)
Book MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding agorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Becker, S., Fadili, M.J.: A quasi-Newton proximal splitting method. In: Proceedings of Neutral Information Processing Systems Foundation (NIPS) (2012)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Volume 3 of MPS/SIAM Series on Optimization. SIAM, Philadelphia (2001)
Book MATH Google Scholar
Bonnans, J.F.: Local analysis of Newton-type methods for variational inequalities and nonlinear programming. Appl. Math. Optim. 29, 161–186 (1994)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. University Press, Cambridge (2004)
Book MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)
Article MathSciNet MATH Google Scholar
Combettes, P., Pesquet, J.-C.: Signal recovery by proximal forward-backward splitting. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, Berlin (2011)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
De Luca, T., Facchinei, F., Kanzow, C.: A semismooth equation approach to the solution of nonlinear complementarity problems. Math. Program. 75(3), 407–439 (1996)
Article MathSciNet MATH Google Scholar
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings: A View from Variational Analysis. Springer, Berlin (2014)
MATH Google Scholar
Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet MATH Google Scholar
Esser, J.E.: Primal-dual algorithm for convex models and applications to image restoration, registration and nonlocal inpainting. Ph.D. Thesis, University of California, Los Angeles (2010)
Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 1-2. Springer, Berlin (2003)
MATH Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)
Article MathSciNet Google Scholar
Friedlander, M., Goh, G.: Efficient evaluation of scaled proximal operators. Electron. Trans. Numer. Anal. 46, 1–22 (2017)
MathSciNet MATH Google Scholar
Fukushima, M.: Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. Program. 53, 99–110 (1992)
Article MathSciNet MATH Google Scholar
Goldstein, T., Li, M., Yuan, X., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle-point problems, pp. 1–15 (2015). arXiv:1305.0546v2
Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Liberti, L., Maculan, N. (eds.) Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications, pp. 155–210. Springer, Berlin (2006)
Chapter Google Scholar
Hajek, B., Wu, Y., Xu, J.: Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inf. Theory 62, 2788–2797 (2016)
Article MathSciNet MATH Google Scholar
Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: JMLR W&CP, vol. 28, no. 1, pp. 427–435 (2013)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems (NIPS), pp. 315–323 (2013)
Korpelevic, G.M.: An extragradient method for finding saddle-points and for other problems. Èkon. Mat. Metody 12(4), 747–756 (1976)
MathSciNet Google Scholar
Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)
MATH Google Scholar
Löefberg, J.: YALMIP : a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference, Taipei, Taiwan (2004)
Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM J. Optim. 22(3), 914–935 (2012)
Article MathSciNet MATH Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet MATH Google Scholar
Nemirovskii, A.: Prox-method with rate of convergence ${\cal{O}}(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Article MathSciNet Google Scholar
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Volume 87 of Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)
Book MATH Google Scholar
Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3), 319–344 (2007)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110(2), 245–259 (2007)
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet MATH Google Scholar
Nesterov, Y., Nemirovski, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial Mathematics, Philadelphia (1994)
Book Google Scholar
Nesterov, Y., Todd, M.J.: Self-scaled barriers and interior-point methods for convex programming. Math. Oper. Res. 22(1), 1–42 (1997)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Pang, J.-S.: A B-differentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems. Math. Program. 51(1), 101–131 (1991)
Article MathSciNet MATH Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)
Google Scholar
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367 (1993)
Article MathSciNet MATH Google Scholar
Ralph, D.: Global convergence of damped Newton’s method for nonsmooth equations via the path search. Math. Oper. Res. 19(2), 352–389 (1994)
Article MathSciNet MATH Google Scholar
Robinson, S.M.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)
Article MathSciNet MATH Google Scholar
Robinson, S.M.: Newton’s method for a class of nonsmooth functions. Set Valued Var. Anal. 2, 291–305 (1994)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Volume 28 of Princeton Mathematics Series. Princeton University Press, Princeton (1970)
Book MATH Google Scholar
Rockafellar, R.T., Wets, R .J.-B.: Variational Analysis. Springer, Berlin (1997)
MATH Google Scholar
Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)
Article MathSciNet MATH Google Scholar
Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set Valued Var. Anal. 7(4), 323–345 (1999)
Article MathSciNet MATH Google Scholar
Sturm, F.: Using SeDuMi 1.02: A Matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
Article MathSciNet MATH Google Scholar
Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In: Advances in Neural Information Processing Systems (NIPS), pp. 2510–2518 (2014)
Toh, K.-C., Todd, M.J., Tütüncü, R.H.: On the implementation and usage of SDPT3—a Matlab software package for semidefinite-quadratic-linear programming. Technical Report 4, NUS Singapore (2010)
Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4), 1718–1745 (2014)
Article MathSciNet MATH Google Scholar
Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: A single phase proximal path-following framework. Math. Oper. Res. (2018) (accepted)
Tran-Dinh, Q., Necoara, I., Savorgnan, C., Diehl, M.: An inexact perturbed path-following method for Lagrangian decomposition in large-scale separable convex optimization. SIAM J. Optim. 23(1), 95–125 (2013)
Article MathSciNet MATH Google Scholar
Tseng, P.: Applications of splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Control Optim. 29, 119–138 (1991)
Article MathSciNet MATH Google Scholar
Tseng, P.: Alternating projection-proximal methods for convex programming and variational inequalities. SIAM J. Optim. 7(4), 951–965 (1997)
Article MathSciNet MATH Google Scholar
Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput. 2, 203–230 (2010)
Article MathSciNet MATH Google Scholar
Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4(4), 333–361 (2012)
Article MathSciNet MATH Google Scholar
Womersley, R.S., Sun, D., Qi, H.: A feasible semismooth asymptotically Newton method for mixed complementarity problems. Math. Program. 94(1), 167–187 (2002)
Article MathSciNet MATH Google Scholar
Wright, S.J.: Applying new optimization algorithms to model predictive control. In: Kantor J.C., Garcia C.E., Carnahan B. (eds) Fifth International Conference on Chemical Process Control—CPCV, pp. 147–155. American Institute of Chemical Engineers (1996)
Xiu, N., Zhang, J.: Some recent advances in projection-type methods for variational inequalities. J. Comput. Appl. Math. 152(1), 559–585 (2003)
Article MathSciNet MATH Google Scholar
Yamashita, H., Yabe, H., Harada, K.: A primal-dual interior point method for nonlinear semidefinite programming. Math. Program. 135, 89–121 (2012)
Article MathSciNet MATH Google Scholar
Yang, L., Sun, D., Toh, K.-C.: SDPNAL+: a majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math. Program. Comput. 7(3), 331–366 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported in part by the NSF Grant, USA, Award Number: 1619884.

Author information

Authors and Affiliations

Department of Statistics and Operations Research, The University of North Carolina at Chapel Hill (UNC), 318 Hanes Hall, Chapel Hill, NC, 27599, USA
Quoc Tran-Dinh, Tianxiao Sun & Shu Lu

Authors

Quoc Tran-Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Tianxiao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Tran-Dinh.

Appendix: the proofs of technical results

This appendix provides the full proofs of all lemmas and theorems in the main text.

1.1 The proof of Lemma 1: the existence and uniqueness of the solution of (2).

Under Assumption A.1, the operator $t\nabla {F}(\cdot ) + \mathcal {A}(\cdot )$ is maximally monotone for any $t > 0$. We use [45, Theorem 12.51] to prove the solution existence of (2).

To this end, let $\varvec{\omega }\ne \varvec{0}$ be chosen from the horizon cone of $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$. We need to find $\mathbf {z}\in \mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$ with $\mathbf {v}\in t \nabla {F}(\mathbf {z}) + \mathcal {A}(\mathbf {z})$ such that $\langle \mathbf {v}, \varvec{\omega }\rangle > 0$. By assumption, there exists $\hat{\mathbf {z}}\in \mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$ with $\hat{\mathbf {a}}\in \mathcal {A}(\hat{\mathbf {z}}) $ such that $\langle \hat{\mathbf {a}}, \varvec{\omega }\rangle >0$.

First, we show that $\mathbf {z}_\tau = \hat{\mathbf {z}}+ \tau \varvec{\omega }$ belongs to $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$ for any $\tau >0$. To see this, note that the assumption $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\ne \emptyset $ implies that $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {ri} \ \mathrm {dom}(\mathcal {A})\ne \emptyset $, which implies that the closure of $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$ is exactly $\mathcal {Z}\cap \mathrm {cl}\!\left( \mathrm {dom}(\mathcal {A})\right) $. Choose $\tau '>\tau $; by definition of the horizon cone, $\mathbf {z}_{\tau '}$ belongs to the closure of $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$, so $\mathbf {z}_{\tau '}\in \mathcal {Z}$ and $\mathbf {z}_{\tau '}\in \mathrm {cl}\!\left( \mathrm {dom}(\mathcal {A})\right) $. Since $\mathbf {z}_{\tau }$ is a convex combination of $\hat{\mathbf {z}}$ and $\mathbf {z}_{\tau '}$, it belongs to $\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})$, where we use the assumption that $\mathrm {dom}(\mathcal {A})$ is either closed or open.

Next, for any $\mathbf {a}_\tau \in \mathcal {A}(\mathbf {z}_\tau )$, we have

$$\begin{aligned} \langle \mathbf {a}_\tau , \varvec{\omega }\rangle = \langle \mathbf {a}_\tau - \hat{\mathbf {a}}, \varvec{\omega }\rangle + \langle \hat{\mathbf {a}}, \varvec{\omega }\rangle =\langle \mathbf {a}_\tau - \hat{\mathbf {a}}, \tau ^{-1}(\mathbf {z}_\tau -\hat{\mathbf {z}})\rangle + \langle \hat{\mathbf {a}},\varvec{\omega }\rangle \ge \langle \hat{\mathbf {a}},\varvec{\omega }\rangle > 0. \end{aligned}$$

On the other hand, $\langle t \nabla {F}(\mathbf {z}_\tau ), \varvec{\omega }\rangle = \langle t \nabla {F}(\mathbf {z}_\tau ), \tau ^{-1}(\mathbf {z}_\tau -\hat{\mathbf {z}})\rangle \ge -\,\tau ^{-1} t\nu $ by [31, Theorem 4.2.4]. Combining the above two inequalities, we can see that

$$\begin{aligned} \langle t \nabla {F}(\mathbf {z}_\tau ) + \mathbf {a}_\tau , \varvec{\omega }\rangle \ge -\,\tau ^{-1} t\nu + \langle \hat{\mathbf {a}}, \varvec{\omega }\rangle >0 \end{aligned}$$

as long as $\tau ^{-1}t\nu < \langle \hat{\mathbf {a}},\varvec{\omega }\rangle $. We have thereby verified the condition in [45, Theorem 12.51], which needs to guarantee (2) for having a nonempty (and bounded) solution set. Since $\nabla F$ is strictly monotone, the solution of (2) is unique.

Note that $\mathbf {z}^{\star }_t$ is the solution of (2) and $\mathbf {z}^{\star }_t \in \mathrm {int}\left( \mathcal {Z}\right) $, we have $-\,t\nabla {F}(\mathbf {z}^{\star }_t) \in \mathcal {A}(\mathbf {z}^{\star }_t) = \mathcal {A}_{\mathcal {Z}}(\mathbf {z}^{\star }_t)$. Hence, $\mathrm {dist}_{\mathbf {z}^{\star }_t}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}^{\star }_t)) \le t\left\| \nabla {F}(\mathbf {z}^{\star }_t)\right\| _{\mathbf {z}^{\star }_t}^{*} \le t\sqrt{\nu }$ due to the property of F [31]. Using Definition 4, we have the last conclusion. $\square $

1.2 The proof of Lemma 3: approximate solution

First, since $\bar{\mathbf {z}}_{+}$ is a zero point of $\widehat{\mathcal {A}}_t(\cdot ;z)$, i.e., $0 \in \widehat{\mathcal {A}}_t(\bar{\mathbf {z}}_{+},z)$, we have $-\,t\nabla {F}(\mathbf {z}) - t\nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z}) \in \mathcal {A}(\bar{\mathbf {z}}_{+})$. Second, since $\mathbf {z}_{+}$ is a $\delta $-solution to (23), there exists $\mathbf {e}$ such that $\mathbf {e}\in t\nabla {F}(\mathbf {z}) + t\nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) + \mathcal {A}(\mathbf {z}_{+})$ with $\Vert \mathbf {e}\Vert _{\mathbf {z}}^{*} \le t\delta $ by Definition 5. Combining these expressions, and using the monotonicity of $\mathcal {A}$ in Definition 1, we can show that $\langle t[\nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})] -\mathbf {e}, \bar{\mathbf {z}}_{+} - \mathbf {z}_{+} \rangle \ge 0$. This inequality leads to

$$\begin{aligned} t\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}}^2 \le \langle \mathbf {e}, \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\rangle \le \Vert \mathbf {e}\Vert _{\mathbf {z}}^{*}\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}}, \end{aligned}$$

(65)

which implies $\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} \le t^{-1}\Vert \mathbf {e}\Vert _{\mathbf {z}}^{*}$. Hence, $\Vert \mathbf {e}\Vert _{\mathbf {z}}^{*}\le t\delta $ implies $\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} \le \delta $.

Next, since $\mathbf {z}_{+}$ is a $\delta $-approximate solution to (23) at t in the sense of Definition 5 up to the accuracy $\delta $, there exists $\mathbf {e}\in \mathbb {R}^p$ such that

$$\begin{aligned} \mathbf {e}\in t\left[ \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\right] + \mathcal {A}(\mathbf {z}_{+})~~\text {with}~~\left\| \mathbf {e}\right\| _{\mathbf {z}}^{*} \le t\delta . \end{aligned}$$

In addition, we have $\mathbf {z}_{+} \in \mathrm {int}\left( \mathcal {Z}\right) $ due to Theorem 1. Hence, we have $\mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+}) = \mathcal {A}(\mathbf {z}_{+})$. Using this relation and the above inclusion, we can show that

$$\begin{aligned} \begin{array}{ll} \mathrm {dist}_{\mathbf {z}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+})) &{}\le \Vert \mathbf {e}- t\left[ \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\right] \Vert _{\mathbf {z}}^{*} \\ &{}\le \left\| \mathbf {e}\right\| _{\mathbf {z}}^{*} + t\left\| \nabla {F}(\mathbf {z})\right\| _{\mathbf {z}}^{*} + t\Vert \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\Vert _{\mathbf {z}}^{*}\\ &{}\le t \left[ \delta + \sqrt{\nu } + \Vert \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})\Vert _{\mathbf {z}}^{*} + \Vert \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}}^{*}\right] \\ &{}\le t \left[ \delta + \sqrt{\nu } + \lambda _{t}(\mathbf {z}) + \left\| \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\right\| _{\mathbf {z}}\right] \\ &{}\le t\left( \sqrt{\nu } + \lambda _{t}(\mathbf {z}) + 2\delta \right) . \end{array} \end{aligned}$$

(66)

Here, we have used $\left\| \nabla {F}(\mathbf {z})\right\| _{\mathbf {z}}^{*} \le \sqrt{\nu }$, and $\left\| \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\right\| _{\mathbf {z}} \le \delta $ by the first part of this lemma. Note that if $\lambda _t(\mathbf {z}) + \delta < 1$, then $\mathrm {dist}_{\mathbf {z}_{+}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+})) \le (1-\lambda _t(\mathbf {z}) -\delta )^{-1}\mathrm {dist}_{\mathbf {z}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+}))$. Combining this inequality and the last estimate, we obtain (25). Finally, if we choose $t \le (1-\lambda _t(\mathbf {z})-\delta )\left( \sqrt{\nu } + \lambda _{t}(\mathbf {z}) + 2\delta \right) ^{-1}\varepsilon $, then $\mathrm {dist}_{\mathbf {z}_{+}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+})) \le \varepsilon $. Hence, $\mathbf {z}_{+}$ is an $\varepsilon $-solution to (1) in the sense of Definition 4. $\square $

1.3 The proof of Theorem 1: a key estimate of generalized Newton-type schemes

First, similar to [2], we can easily show the the following non-expansive property holds

$$\begin{aligned} \Vert \mathcal {P}_{\hat{\mathbf {z}}}(\mathbf {u}; t) - \mathcal {P}_{\hat{\mathbf {z}}}(\mathbf {v}; t)\Vert _{\hat{\mathbf {z}}} \le \Vert \mathbf {u}- \mathbf {v}\Vert _{\hat{\mathbf {z}}},~~~\forall \mathbf {u},\mathbf {v}\in \mathbb {R}^p. \end{aligned}$$

(67)

Note that $\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} \le \Vert \bar{\mathbf {z}}_{+} - \mathbf {z}\Vert _{\mathbf {z}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} = \lambda _t(\mathbf {z}) + \delta (\mathbf {z}) < 1$ by our assumption. This shows that $\mathbf {z}_{+}\in \mathrm {int}\left( \mathcal {Z}\right) $ due to [31, Theorem 4.1.5 (1)].

Next, we consider the generalized gradient mappings $G_{\mathbf {z}}(\mathbf {z};t_{+})$ and $G_{\mathbf {z}_{+}}(\mathbf {z}_{+}; t_{+})$ at $\mathbf {z}$ and $\mathbf {z}_{+}$, respectively defined by (20) as follows:

$$\begin{aligned} \begin{array}{ll} G_{\mathbf {z}}(\mathbf {z}; t_{+}) &{}:= \nabla ^2{F}(\mathbf {z})\left( \mathbf {z}- \mathcal {P}_{\mathbf {z}}\left( \mathbf {z}- \nabla ^2{F}(\mathbf {z})^{-1}\nabla {F}(\mathbf {z}); t_{+}\right) \right) , \\ G_{\mathbf {z}_{+}}(\mathbf {z}_{+}; t_{+}) &{}:= \nabla ^2{F}(\mathbf {z}_{+})\left( \mathbf {z}_{+} - \mathcal {P}_{\mathbf {z}_{+}}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) \right) . \end{array} \end{aligned}$$

(68)

Let $r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}) := \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})$. Then, by using $\bar{\mathbf {z}}_{+} := \mathcal {P}_{\mathbf {z}}\big (\mathbf {z}- \nabla ^2{F}(\mathbf {z})^{-1}\nabla {F}(\mathbf {z}); t_{+}\big )$ from (26), we can show that

$$\begin{aligned} -r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}) := -\left[ \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})\right] \in t_{+}^{-1}\mathcal {A}(\bar{\mathbf {z}}_{+}). \end{aligned}$$

(69)

Clearly, we can rewrite (69) as $\bar{\mathbf {z}}_{+} -\nabla ^2{F}(\mathbf {z}_{+})^{-1}r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}) \in \bar{\mathbf {z}}_{+} + t_{+}^{-1}\nabla ^2{F}(\mathbf {z}_{+})^{-1}\mathcal {A}(\bar{\mathbf {z}}_{+})$. Then, using the definition (16) of $\mathcal {P}_{\mathbf {z}_{+}}(\cdot ) := \left( \mathbb {I}+ t_{+}^{-1}\nabla ^2{F}(\mathbf {z}_{+})^{-1}\mathcal {A}\right) ^{-1}(\cdot )$, we can derive

$$\begin{aligned} \mathbf {z}_{+} = \mathcal {P}_{\mathbf {z}_{+}}\left( \bar{\mathbf {z}}_{+} -\nabla ^2{F}(\mathbf {z}_{+})^{-1}r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}); t_{+} \right) + (\mathbf {z}_{+} - \bar{\mathbf {z}}_{+}). \end{aligned}$$

(70)

Now, we can estimate $\lambda _{t_{+}}(\mathbf {z}_{+})$ defined by (21) using (68), (70), (67), and (69) as follows:

$$\begin{aligned} \lambda _{t_{+}}(\mathbf {z}_{+})&:= \Vert G_{\mathbf {z}_{+}}(\mathbf {z}_{+}; t_{+})\Vert ^{*}_{\mathbf {z}_{+}} \overset{(68)}{=} \Vert \mathbf {z}_{+} - \mathcal {P}_{\mathbf {z}_{+}}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) \Vert _{\mathbf {z}_{+}} \nonumber \\&\overset{(70)}{{}={}} \Big \Vert \mathcal {P}_{\mathbf {z}_{+}}{}\left( \bar{\mathbf {z}}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}{} r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}); t_{+} \right) \nonumber \\&\quad - \mathcal {P}_{\mathbf {z}_{+}}{}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}{}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) + (\mathbf {z}_{+} - \bar{\mathbf {z}}_{+}) \Big \Vert _{\mathbf {z}_{+}} \nonumber \\&\le \Big \Vert \mathcal {P}_{\mathbf {z}_{+}}\left( \bar{\mathbf {z}}_{+} -\nabla ^2{F}(\mathbf {z}_{+})^{-1}r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}); t_{+} \right) \nonumber \\&\quad - \mathcal {P}_{\mathbf {z}_{+}}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) \Big \Vert _{\mathbf {z}_{+}}\nonumber \\&\quad + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\overset{(67)}{\le } \Big \Vert \nabla ^2{F}(\mathbf {z}_{+})^{-1}\left[ \nabla {F}(\mathbf {z}_{+}) - r_{\mathbf {z}}(\bar{\mathbf {z}}_{+})\right] + (\bar{\mathbf {z}}_{+} - \mathbf {z}_{+}) \Big \Vert _{\mathbf {z}_{+}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\overset{(69)}{=} \Big \Vert \nabla ^2{F}(\mathbf {z}_{+})^{-1}\big [\nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) + (\nabla ^2{F}(\mathbf {z}_{+})\nonumber \\&\quad - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\big ]\Big \Vert _{\mathbf {z}_{+}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\le \Vert \nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) \Vert _{\mathbf {z}_{+}}^{*} \nonumber \\&\quad + \Vert (\nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}_{+}}^{*} +\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\le \tfrac{1}{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}}\Big [ \Vert \nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) \Vert _{\mathbf {z}}^{*} \nonumber \\&\quad + \Vert (\nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}}^{*} \Big ] + \tfrac{\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} }{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}}. \end{aligned}$$

(71)

Here, in the last equality of (71), we have used the fact that $\Vert \mathbf {w}\Vert _{\mathbf {z}_{+}}^2 = \langle \nabla ^2{F}(\mathbf {z}_{+})\mathbf {w}, \mathbf {w}\rangle \le (1-\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^{-2}\langle \nabla ^2{F}(\mathbf {z})\mathbf {w}, \mathbf {w}\rangle = (1-\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^{-2}\Vert \mathbf {w}\Vert _{\mathbf {z}}^2$ for any $\mathbf {w}$ and $\mathbf {z}, \mathbf {z}_{+}$ such that $\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} < 1$, and the analogous fact for the dual norms. Both facts can be derived from [31, Theorem 4.1.6]. The condition $\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} < 1$ is guaranteed since $\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} \le \Vert \mathbf {z}- \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} = \lambda _{t_{+}}(\mathbf {z}) + \delta (\mathbf {z}) < 1$ by our assumption.

Similar to the proof of [31, Theorem 4.1.14], we can show that

$$\begin{aligned} \left\| \nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\right\| _{\mathbf {z}}^{*} \le \frac{\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}}. \end{aligned}$$

(72)

Next, we need to estimate $B := \Vert (\nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}}^{*}$. We define

$$\begin{aligned} \Sigma := \nabla ^2{F}(\mathbf {z})^{-1/2}\left( \nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z})\right) \nabla ^2{F}(\mathbf {z})^{-1/2}. \end{aligned}$$

By [31, Theorem 4.1.6], we can show that

$$\begin{aligned} \Vert \Sigma \Vert\le & {} \max \left\{ 1 - (1 - \Vert \mathbf {z}_{+} -\mathbf {z}\Vert _{\mathbf {z}})^2, (1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^{-2} - 1\right\} \\= & {} \frac{2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^2}. \end{aligned}$$

Using this inequality we can estimate B as

$$\begin{aligned} B^2&= (\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})^{\top }\nabla ^2{F}(\mathbf {z})^{1/2}\Sigma ^2\nabla ^2{F}(\mathbf {z})^{1/2}(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+}) \le \Vert \Sigma \Vert ^2\Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}^2\nonumber \\&\le \left( \frac{2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^2}\right) ^2\Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}^2, \end{aligned}$$

which implies

$$\begin{aligned} B \le \left( \frac{2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^2}\right) \Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}. \end{aligned}$$

(73)

Substituting (72) and (73) into (71) we get

$$\begin{aligned} \lambda _{t_{+}}(\mathbf {z}_{+})&\le \frac{\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{\left( 1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}\right) ^2}\nonumber \\&\quad + \frac{\left[ 2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2\right] \Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^3} + \frac{\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} }{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}} \nonumber \\&= \frac{\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{\left( 1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}\right) ^2} + \frac{\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} }{\left( 1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}\right) ^3}. \end{aligned}$$

(74)

Finally, we note that $\lambda _{t_{+}}(\mathbf {z}) := \Vert G_{\mathbf {z}}(\mathbf {z}; t_{+})\Vert _{\mathbf {z}}^{*} = \Vert \mathbf {z}- \mathcal {P}_{\mathbf {z}}\left( \mathbf {z}- \nabla ^2{F}(\mathbf {z})^{-1}\nabla {F}(\mathbf {z}); t_{+} \right) \Vert _{\mathbf {z}} = \Vert \mathbf {z}- \bar{\mathbf {z}}_{+} \Vert _{\mathbf {z}}$ due to (26). Using the triangle inequality we have $\Vert \mathbf {z}_{+}-\mathbf {z}\Vert _{\mathbf {z}} \le \Vert \mathbf {z}- \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} = \lambda _{t_{+}}(\mathbf {z}) + \delta (\mathbf {z}) < 1$. Since the right-hand side of (74) is monotonically increasing with respect to $\Vert \mathbf {z}_{+}-\mathbf {z}\Vert _{\mathbf {z}}$, using the last inequality into (74), we obtain (27). $\square $

1.4 The proof of Theorem 2: local quadratic convergence of FGN

We first prove (a). Given a fixed parameter $t > 0$ sufficiently small, our objective is to find $\beta \in (0, 1)$ such that if $\lambda _{t}(\mathbf {z}^k) \le \beta $, then $\lambda _{t}(\mathbf {z}^{k+1}) \le \beta $. Indeed, using the key estimate (27) with t instead of $t_{+}$, we can see that to guarantee $\lambda _{t}(\mathbf {z}^{k+1}) \le \beta $, we require

$$\begin{aligned} \left( \frac{\lambda _{t}(\mathbf {z}^k) + \delta (\mathbf {z}^k)}{1 - \lambda _{t}(\mathbf {z}^k) - \delta (\mathbf {z}^k)}\right) ^2 + \frac{\delta (\mathbf {z}^k)}{\left( 1-\lambda _{t}(\mathbf {z}^k) - \delta (\mathbf {z}^k)\right) ^3} \le \beta . \end{aligned}$$

Since the left-hand side of this inequality is monotonically increasing when $\lambda _{t}(\mathbf {z}^k)$ and $\delta (\mathbf {z}^k)$ are increasing, we can overestimate it by

$$\begin{aligned} \left( \frac{\beta +\delta }{1 - \beta - \delta }\right) ^2 + \frac{\delta }{(1-\beta - \delta )^3} \le \beta . \end{aligned}$$

Using the identity $\frac{\beta +\delta }{1-\beta -\delta } = \frac{\beta }{1-\beta } + \frac{\delta }{(1-\beta )(1-\beta -\delta )}$, we can write the last inequality as

$$\begin{aligned} \Big [\frac{2\beta }{(1-\beta )^2(1-\beta -\delta )} + \frac{\delta }{(1-\beta )^2(1-\beta -\delta )^2} + \frac{1}{(1-\beta - \delta )^3}\Big ]\delta \le \beta - \left( \frac{\beta }{1-\beta }\right) ^2. \end{aligned}$$

(75)

Clearly, the left-hand side of (75) is positive if $0< \delta < 1-\beta $. Hence, we need to choose $\beta \in (0, 0.5(3-\sqrt{5}))$ such that the right-hand side of (75) is also positive. Now, we choose $\delta \ge 0$ such that $\delta \le \beta (1-\beta ) < 1-\beta $. Then, (75) can be one more time overestimated by

$$\begin{aligned} \Big (\frac{2\beta ^3 - 5\beta ^2 + 3\beta + 1}{(1-\beta )^4}\Big )\delta \le \beta (1 - 3\beta + \beta ^2), \end{aligned}$$

which implies

$$\begin{aligned} 0 \le \delta \le \frac{\beta (1 - 3\beta + \beta ^2)(1-\beta )^4}{2\beta ^3 - 5\beta ^2 + 3\beta + 1} < \beta (1-\beta ),~~\forall \beta \in \left( 0, 0.5(3-\sqrt{5})\right) . \end{aligned}$$

This inequality suggests that we can choose $\delta := \frac{\beta (1 - 3\beta + \beta ^2)(1-\beta )^4}{2\beta ^3 - 5\beta ^2 + 3\beta + 1} > 0$. In this case, we also have $\delta (\mathbf {z}) + \lambda _t(\mathbf {z}) \le \delta + \beta < 1$, which guarantees the condition of Theorem 1. Hence, we can conclude that $\lambda _t(\mathbf {z}^k) \le \beta $ implies $\lambda _t(\mathbf {z}^{k+1}) \le \beta $. In other words, $\left\{ \mathbf {z}^k\right\} $ belongs to $\mathcal {Q}_{t}(\beta )$.

(b) Next, to guarantee a quadratic convergence, we can choose $\delta _k$ such that $\delta (\mathbf {z}^k) \le \delta _k \le \bar{\delta }_k := \frac{\lambda _{t}(\mathbf {z}^k)^2}{1-\lambda _{t}(\mathbf {z}^k)}$. Substituting the upper bound $\bar{\delta }_k$ of $\delta (\mathbf {z}^k)$ into (27) we obtain

$$\begin{aligned} \lambda _{t}(\mathbf {z}^{k+1}) \le \left( \frac{2-4\lambda _t(\mathbf {z}^k) + \lambda _t(\mathbf {z}^k)}{(1-2\lambda _t(\mathbf {z}^k))^3}\right) \lambda _t(\mathbf {z}^k)^2. \end{aligned}$$

Let us consider the function $s(r) := \frac{(2 - 4r + r^2)r^2}{(1 - 2r)^3}$ on [0, 1]. We can easily check that $s(r) < 1$ for all $r \in [0, 1]$. Hence, $\lambda _{t}(\mathbf {z}^{k+1}) < 1$ as long as $\lambda _{t}(\mathbf {z}^k) < 1$. This proves the estimate (30).

Now, let us choose some $\beta \in (0, 1)$ such that $\lambda _t(\mathbf {z}^k) \le \beta $. Then (30) leads to

$$\begin{aligned} \lambda _t(\mathbf {z}^{k+1}) \le \left( \frac{2-4\beta +\beta ^2}{(1-2\beta )^3}\right) \lambda _t(\mathbf {z}^k)^2 = c\lambda _t(\mathbf {z})^2, \end{aligned}$$

where $c := \frac{2-4\beta +\beta ^2}{(1-2\beta )^3} > 0$. We need to choose $\beta \in (0, 1)$ such that $c\lambda _t(\mathbf {z}^k) < 1$. Since $\lambda _t(\mathbf {z}^k) \le \beta $, we choose $\beta $ such that $c\beta < 1$, which is equivalent to $9\beta ^3 - 16\beta ^2 + 8\beta - 1 < 0$. If $\beta \in (0, 0.18858]$, then $9\beta ^3 - 16\beta ^2 + 8\beta - 1 < 0$. Therefore, the radius of the quadratic convergence region of $\left\{ \lambda _t(\mathbf {z}^k)\right\} $ is $r := 0.18858$.

(c) Finally, for any $\beta \in (0, 0.18858]$, we can write $c\lambda _t(\mathbf {z}^{k+1}) \le (c\lambda _t(\mathbf {z}^k))^2$. By induction, $c\lambda _t(\mathbf {z}^k) \le (c\lambda _t(\mathbf {z}^0))^{2^k} \le c^{2^k}\beta ^{2^k} < 1$. We obtain $\lambda _t(\mathbf {z}^k) \le c^{2^{k-1}}\beta ^{2^k}$. Let us choose $\delta _k := \frac{\lambda _t(\mathbf {z}^k)^2}{1-\lambda _t(\mathbf {z}^k)}$. For $\epsilon \in (0, \beta )$, assume that $c^{2^{k-1}}\beta ^{2^k}\le \epsilon $. From Lemma 3, we can choose $t := (1-\epsilon )(\sqrt{\nu } + \epsilon + 2\epsilon ^2/(1-\epsilon ))^{-1}\varepsilon $ . Then, $\mathbf {z}^k$ is an $\varepsilon $-solution of (1). It remains to use the fact that $c^{2^{k-1}}\beta ^{2^k}\le \epsilon $ to upper bound the number of iterations $k := {\mathcal {O}}\left( \ln \left( \ln (1/\epsilon )\right) \right) $. $\square $

1.5 The proof of Theorem 3: local quadratic convergence of DGN

(a) Given a fixed parameter $t > 0$ sufficiently small, it follows from DGN and (70) that

$$\begin{aligned} \begin{array}{ll} \bar{\mathbf {z}}^{k+2} &{}= \mathcal {P}_{\mathbf {z}^{k+1}}\left( \mathbf {z}^{k+1} - \nabla ^2{F}(\mathbf {z}^{k+1})^{-1}\nabla {F}(\mathbf {z}^{k+1}); t\right) , \\ \mathbf {z}^{k+1} &{}= \mathcal {P}_{\mathbf {z}^{k+1}}\left( \bar{\mathbf {z}}^{k+1} - \nabla ^2{F}(\mathbf {z}^{k+1})^{-1}r_{\mathbf {z}^k}(\bar{\mathbf {z}}^{k+1}); t\right) + (\mathbf {z}^{k+1} - \bar{\mathbf {z}}^{k+1}). \end{array} \end{aligned}$$

Hence, using these notations and the same proof as (74) with t instead of $t_{+}$, and assuming $\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}<1$, we can derive

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}_{k+1}} \le \left( \frac{\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}}{1-\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}}\right) ^2 + \frac{\Vert \mathbf {z}^{k+1} - \bar{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k}}{(1-\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k})^3}. \end{aligned}$$

(76)

Now, let us define $\tilde{\lambda }_t(\mathbf {z}^k) := \Vert \tilde{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}$ and $\alpha _k := (1 + \tilde{\lambda }_t(\mathbf {z}^k))^{-1}$ as in DGN. From the update $\mathbf {z}^{k+1} := (1-\alpha _k)\mathbf {z}^k + \alpha _k\tilde{\mathbf {z}}^{k+1}$ of DGN, we have

$$\begin{aligned} \Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}= & {} \alpha _k\Vert \tilde{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k} = \alpha _k\tilde{\lambda }_t(\mathbf {z}^k),~~~\text {and}\\ \Vert \mathbf {z}^{k+1} - \bar{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k}\le & {} \Vert \mathbf {z}^{k+1} - \tilde{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k} + \Vert \tilde{\mathbf {z}}^{k+1} - \bar{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k} \\= & {} (1-\alpha _k)\Vert \tilde{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k} + \delta (\mathbf {z}^k) \\= & {} (1-\alpha _k)\tilde{\lambda }_t(\mathbf {z}^k) + \delta (\mathbf {z}^k). \end{aligned}$$

Substituting these expressions into (76) we get

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}_{k+1}}&\le \left( \frac{\alpha _k\tilde{\lambda }_{t}(\mathbf {z}^k)}{1 - \alpha _k\tilde{\lambda }_{t}(\mathbf {z}^k)}\right) ^2 + \frac{\delta (\mathbf {z}^k) + (1-\alpha _k)\tilde{\lambda }_t(\mathbf {z}^k)}{\left( 1 - \alpha _k\tilde{\lambda }_{t}(\mathbf {z}^k)\right) ^3}. \end{aligned}$$

Substituting $\alpha _k := (1 + \tilde{\lambda }_{t}(\mathbf {z}^k))^{-1}$ into the last inequality and simplifying the result, we get

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}_{k+1}} \le \left( 2 + 2\tilde{\lambda }_{t}(\mathbf {z}^k) + \tilde{\lambda }_{t}(\mathbf {z}^k)^2\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2 + \left( 1 + \tilde{\lambda }_{t}(\mathbf {z}^k)\right) ^3\delta (\mathbf {z}^k). \end{aligned}$$

Next, by the triangle inequality, it follows from (68) and the definition of $\lambda _t(\mathbf {z})$ and $\tilde{\lambda }_t(\mathbf {z})$ that $\tilde{\lambda }_t(\mathbf {z}^{k+1}) = \Vert \tilde{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}^{k+1}} \le \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}^{k+1}} + \Vert \tilde{\mathbf {z}}^{k+2} - \bar{\mathbf {z}}^{k+2}\Vert _{\mathbf {z}^{k+1}} = \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}^{k+1}} + \delta (\mathbf {z}^{k+1})$. Combining this estimate and the above inequality we get

$$\begin{aligned} \tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \left( 2 + 2\tilde{\lambda }_{t}(\mathbf {z}^k) + \tilde{\lambda }_{t}(\mathbf {z}^k)^2\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2 + \left( 1 + \tilde{\lambda }_{t}(\mathbf {z}^k)\right) ^3\delta (\mathbf {z}^k) + \delta (\mathbf {z}^{k+1}). \end{aligned}$$

If we choose $\delta (\mathbf {z}^k) \le \delta _k \le \frac{\tilde{\lambda }_{t}(\mathbf {z}^k)^2}{1+ \tilde{\lambda }_{t}(\mathbf {z}^k)}$, then, by induction, $\delta (\mathbf {z}^{k+1}) \le \delta _{k+1} \le \frac{\tilde{\lambda }_{t}(\mathbf {z}^{k+1})^2}{1+\tilde{\lambda }_{t}(\mathbf {z}^{k+1})}$. Substituting these bounds into the last inequality and simplifying the result, we obtain

$$\begin{aligned} \tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \left( \frac{2\tilde{\lambda }_t(\mathbf {z}^k)^2 + 4\tilde{\lambda }_t(\mathbf {z}^k) + 3}{1 - \tilde{\lambda }_t(\mathbf {z}^k)^2\left( 2\tilde{\lambda }_t(\mathbf {z}^k)^2 + 4\tilde{\lambda }_t(\mathbf {z}^k) + 3\right) }\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2, \end{aligned}$$

which is indeed (31).

From (31), after a few elementary calculations, we can see that $\tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \tilde{\lambda }_t(\mathbf {z}^k)$ if $\tilde{\lambda }_t(\mathbf {z}^k)(1 + \tilde{\lambda }_t(\mathbf {z}^k))( 2\tilde{\lambda }_t(\mathbf {z}^k)^2 + 4\tilde{\lambda }_t(\mathbf {z}^k) + 3) \le 1$. Note that the function $s(\tau ) := \tau (1+\tau )(2\tau ^2 + 4\tau +3)$ is increasing on $[0, 0.5(3-\sqrt{5}))$. By numerically computing $\tilde{\lambda }_t(\mathbf {z}^k)$ we can observe that if $\tilde{\lambda }_t(\mathbf {z}^k) \in [0, 0.21027]$, then $\tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \tilde{\lambda }_t(\mathbf {z}^k)$. Hence, if $\tilde{\lambda }_t(\mathbf {z}^k) \le \beta $ then $\tilde{\lambda }_t(\mathbf {z}^{k+1}) \le \beta $. In other words, we can say that $\left\{ \mathbf {z}^k\right\} \subset {\varOmega }_t(\beta )$.

We now prove (b). Indeed, if we take any $\beta \in (0, 0.21027]$, we can show from (31) that

$$\begin{aligned} \tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \left( \frac{2\beta ^2 + 4\beta + 3}{1 - \beta ^2\left( 2\beta ^2 + 4\beta + 3\right) }\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2, \end{aligned}$$

where $\bar{c} := \left( \frac{2\beta ^2 + 4\beta + 3}{1 - \beta ^2\left( 2\beta ^2 + 4\beta + 3\right) }\right) \in (0, +\,\infty )$. To guarantee $\bar{c}\beta < 1$, we need to choose $\beta > 0$ such that $2\beta ^4 + 6\beta ^3 + 7\beta ^2 + 3\beta - 1 < 0$. This condition leads to $\beta \in (0, 0.21027]$. Hence, for any $0 < \beta \le 0.21027$, if $\mathbf {z}^0\in \mathcal {Q}_{t}(\beta )$, then $\tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \bar{c}\tilde{\lambda }_{t}(\mathbf {z}^k)^2 < 1$ and, therefore, $\big \{\tilde{\lambda }_{t}(\mathbf {z}^k)\big \}$ quadratically converges to zero.

(c) To prove the last conclusion in (c), from (66), we can show that

$$\begin{aligned} \mathrm {dist}_{\mathbf {z}^k}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}^{k+1}))&\le t\delta _k + t\left\| \nabla {F}(\mathbf {z}^k)\right\| _{\mathbf {z}^k}^{*} + t\left\| \mathbf {z}^{k+1} - \mathbf {z}^k\right\| _{\mathbf {z}^k} \\&\le t(\delta _k + \sqrt{\nu } + \alpha _k\tilde{\lambda }_t(\mathbf {z}^k)). \end{aligned}$$

Since $\tilde{\lambda }_t(\mathbf {z}^k) \le \bar{c}^{2^k-1}\lambda _t(\mathbf {z}^0)^{2^k} \le \bar{c}^{2^k-1}\beta ^{2^k}$, $\delta _k \le \frac{\tilde{\lambda }_t(\mathbf {z}^k)^2}{1 + \tilde{\lambda }_t(\mathbf {z}^k)}$, and $\alpha _k = \frac{\tilde{\lambda }_t(\mathbf {z}^k)}{1 + \tilde{\lambda }_t(\mathbf {z}^k)}$, we obtain the last conclusion as a consequence of Lemma 3 with the same proof as in Theorem 2. $\square $

1.6 The proof of Lemma 4: the update rule for the penalty parameter

Let us define $\bar{\mathbf {u}}^k := \mathcal {P}_{\mathbf {z}^k}\left( \mathbf {z}^k - \nabla ^2{F}(\mathbf {z}^k)^{-1}\nabla {F}(\mathbf {z}^k); t_k\right) $. Then, $\lambda _{t_k}(\mathbf {z}^k)$ defined by (21) becomes $\lambda _{t_k}(\mathbf {z}^k) := \Vert G_{\mathbf {z}^k}(\mathbf {z}^k; t_k)\Vert _{\mathbf {z}^k}^{*} = \Vert \mathbf {z}^k - \mathcal {P}_{\mathbf {z}^k}\left( \mathbf {z}^k - \nabla ^2{F}(\mathbf {z}^k)^{-1}\nabla {F}(\mathbf {z}^k); t_k\right) \Vert _{\mathbf {z}^k} = \Vert \mathbf {z}^k - \bar{\mathbf {u}}^k \Vert _{\mathbf {z}^k}$. Note that $\bar{\mathbf {u}}^k = \mathcal {P}_{\mathbf {z}^k}\left( \mathbf {z}^k - \nabla ^2{F}(\mathbf {z}^k)^{-1}\nabla {F}(\mathbf {z}^k); t_k\right) $ leads to

$$\begin{aligned} -\,t_k \left( \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\right) \in \mathcal {A}(\bar{\mathbf {u}}^k). \end{aligned}$$

Combining this inclusion and (69) and using the monotonicity of $\mathcal {A}$, we can derive

$$\begin{aligned}&\langle t_{k+1}\left[ \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {z}}^{k+1} - \mathbf {z}^k)\right] - t_k\left[ \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\right] , \bar{\mathbf {z}}^{k+1}\\&\quad - \bar{\mathbf {u}}^k\rangle \le 0. \end{aligned}$$

By rearranging this expression using $t_{k+1} := (1-\sigma _{\beta })t_k$ from PFGN, we finally obtain

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k}^2&\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\langle \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k), \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\rangle \nonumber \\&\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\Vert \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*}\Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k}, \end{aligned}$$

where the last inequality follows from the elementary Cauchy–Schwarz inequality. This inequality eventually leads to

$$\begin{aligned} \begin{array}{ll} \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k} &{}\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\Vert \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*}\\ &{}\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\left[ \Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} + \Vert \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} \right] \\ &{}\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\left[ \Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} + \Vert \bar{\mathbf {u}}^k - \mathbf {z}^k \Vert _{\mathbf {z}^k} \right] . \end{array} \end{aligned}$$

Now, by the triangle inequality, we have $\Vert \bar{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k} \le \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k} + \Vert \bar{\mathbf {u}}^k - \mathbf {z}^k\Vert _{\mathbf {z}^k}$. This inequality is equivalent to $\lambda _{t_{k+1}}(\mathbf {z}^k) \le \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k} + \lambda _{t_k}(\mathbf {z}^k)$ due to the definitions $\lambda _{t_{k+1}}(\mathbf {z}^k) = \Vert \bar{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}$ and $\lambda _{t_k}(\mathbf {z}^k) = \Vert \bar{\mathbf {u}}^k - \mathbf {z}^k\Vert _{\mathbf {z}^k}$. Using the last estimate in the above inequality we get

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^k) \le \lambda _{t_k}(\mathbf {z}^k) + \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\left[ \Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} + \lambda _{t_k}(\mathbf {z}^k) \right] , \end{aligned}$$

which is (32). The second inequality of (32) follows from the fact that $\Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*}\le \sqrt{\nu }$.

Let us denote by $\gamma _k := \left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \left( \sqrt{\nu } + \lambda _{t_k}(\mathbf {z}^k)\right) $. For a given $\beta \in (0, 1)$, we now assume that $\lambda _{t_k}(\mathbf {z}^k) \le \beta $. Then, by using (32) in (27), and the monotonic increase of its right-hand side with respect to $\lambda _{t_{k+1}}(\mathbf {z}^k)$, we can derive

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^{k+1})&\le \left( \frac{\lambda _{t_k}(\mathbf {z}^k) + \vert \gamma _k\vert + \delta _k}{1 - \lambda _{t_k}(\mathbf {z}^k) - \vert \gamma _k\vert - \delta _k}\right) ^2 + \frac{\delta _k}{\left( 1 - \lambda _{t_k}(\mathbf {z}^k) - \vert \gamma _k\vert - \delta _k\right) ^3} \nonumber \\&\le \left( \frac{\beta + \vert \gamma _k\vert + \delta _k}{1 - \beta - \vert \gamma _k\vert - \delta _k}\right) ^2 + \frac{\delta _k}{(1 - \beta - \vert \gamma _k\vert - \delta _k)^3}, \end{aligned}$$

as long as $\beta + \vert \gamma _k\vert + \delta _k <1$. Let us denote $\theta _k := \beta + \vert \gamma _k\vert $. By using the identity $\frac{\beta + \vert \gamma _k\vert + \delta _k}{1 - \beta - \vert \gamma _k\vert -\delta _k} = \frac{\beta + \vert \gamma _k\vert }{1 - \beta - \vert \gamma _k\vert } + \frac{\delta _k}{(1-\theta _k)(1-\theta _k-\delta _k)}$, we can rewrite the last inequality as

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^{k+1})&\le \left( \frac{\theta _k}{1 - \theta _k }\right) ^2 \\&\quad + \left[ \frac{2\theta _k }{(1-\theta _k)^2(1-\theta _k -\delta _k)} + \frac{\delta _k}{(1-\theta _k)^2(1-\theta _k -\delta _k)^2} + \frac{1}{(1-\theta _k -\delta _k)^3}\right] \delta _k. \end{aligned}$$

If we choose $\delta _k$ such that $0\le \delta _k \le \theta _k(1-\theta _k)<1-\theta _k$, then the above inequality implies

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^{k+1})&\le \left( \frac{\theta _k}{1 - \theta _k }\right) ^2 \nonumber \\&\quad + \left[ \frac{2\theta _k(1-\theta _k)^2 + \theta _k(1 - \theta _k) + 1}{(1-\theta _k)^6}\right] \delta _k := \left( \frac{\theta _k}{1 - \theta _k }\right) ^2 + M_k\delta _k. \end{aligned}$$

(77)

Take any $c\in (0, 1)$, .e.g., $c := 0.95$, and choose $\delta _k$ such that $0 \le \delta _k \le \frac{(1-c^2)}{c^2M_k}\left( \frac{\theta _k}{1-\theta _k}\right) ^2$. Hence, in order to guarantee $\lambda _{t_{k+1}}(\mathbf {z}^{k+1}) \le \beta $, by using (77), we can impose the condition $\left( \frac{\theta _k}{1 - \theta _k }\right) ^2 + M_k\delta _k \le \frac{1}{c^2}\left( \frac{\theta _k}{1-\theta _k}\right) ^2 \le \beta $, which is equivalent to $\frac{\theta _k}{1 - \theta _k} \le c\sqrt{\beta }$. This condition leads to $\theta _k \ge \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }} $, and therefore, $\vert \gamma _k\vert \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }} - \beta $. Since $\vert \gamma _k\vert > 0$, we need to choose $\beta $ such that $0< \beta < 0.5(1 + 2c^2 - \sqrt{1 +4c^2})$.

Next, by the choice of $\delta _k$, we require $0 \le \delta _k \le \min \left\{ \frac{(1-c^2)}{c^2M_k}\left( \frac{\theta _k}{1-\theta _k}\right) ^2, \theta _k(1-\theta _k)\right\} $. Using the fact that $M_k = \frac{2\theta _k(1-\theta _k)^2 + \theta _k(1-\theta _k) + 1}{(1-\theta _k)^6}$ from (77) and $0 \le \theta _k \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }}$, we can show that the condition on $\delta _k$ holds if we choose

$$\begin{aligned} \delta _k \le \bar{\delta } := \frac{(1-c^2)\beta }{(1+c\sqrt{\beta })^3\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] }. \end{aligned}$$

On the other hand, we have $\vert \gamma _k\vert = \left| \left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \left( \sqrt{\nu } + \lambda _{t_k}(\mathbf {z}^k)\right) \right| \le \left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \left( \sqrt{\nu } + \beta \right) $. In order to guarantee that $\vert \gamma _k\vert \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }} - \beta $, we use the above estimate to impose a condition $\left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \le \frac{1}{\sqrt{\nu } + \beta }\left( \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }} - \beta \right) $, which leads to

$$\begin{aligned} \sigma _{\beta } \le \bar{\sigma }_{\beta } := \frac{c\sqrt{\beta } - \beta (1 + c\sqrt{\beta })}{(1+c\sqrt{\beta })\sqrt{\nu } + c\sqrt{\beta }}. \end{aligned}$$

This estimate is exactly the right-hand side of (33). Finally, using (32) and the definition of $\gamma _k$, we can easily show that $\lambda _{t_{k+1}}(\mathbf {z}^k) \le \lambda _{t_k}(\mathbf {z}^k) + \left| \gamma _k\right| \le \beta + \left| \gamma _k\right| \equiv \theta _k \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }}$.

$\square $

1.7 The proof of Theorem 4: the worst-case iteration-complexity of PFGN

By Lemma 3 and $\lambda _{t_{k+1}}(\mathbf {z}^k) \le \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }}$, we can see that $\mathbf {z}^k$ is an $\varepsilon $-solution of (1) if $t_k := M_0^{-1}\varepsilon $, where $M_0 := \left( 1 - \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }}\right) ^{-1}\left( \sqrt{\nu } + \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }} + 2\bar{\delta }_t(\beta )\right) = {\mathcal {O}}(\sqrt{\nu })$.

On the other hand, by induction, it follows from the update rule $t_{k+1} = (1-\sigma _{\beta })t_k$ of PFGN that $t_k = (1-\sigma _{\beta })^kt_0$. Hence, $\mathbf {z}^k$ is an $\varepsilon $-solution of (1) if we have $t_k = (1-\sigma _{\beta })^k t_0 \le \frac{\varepsilon }{M_0}$. This condition leads to $k\ln (1-\sigma _{\beta }) \ge \ln \left( \frac{\varepsilon }{M_0t_0}\right) $, which implies $k \le \frac{\ln (\varepsilon /(M_0t_0))}{\ln (1-\sigma _{\beta })}$. Using an elementary inequality $\ln (1-\sigma _{\beta }) \le -\sigma _{\beta }$, we can upper bound k as

$$\begin{aligned} k \ge \frac{1}{\bar{\sigma }_{\beta }}\ln \left( \frac{M_0 t_0}{\varepsilon }\right) = \frac{\left( (1+c\sqrt{\beta })\sqrt{\nu } + c\sqrt{\beta }\right) }{c\sqrt{\beta } - \beta (1 + c\sqrt{\beta })}\ln \left( \frac{M_0t_0}{\varepsilon }\right) . \end{aligned}$$

Consequently, the worst-case iteration-complexity of PFGN is ${\mathcal {O}}\left( \sqrt{\nu }\ln \left( \frac{\sqrt{\nu } t_0}{\varepsilon }\right) \right) $.

$\square $

1.8 The proof of Theorem 5: finding an initial point for PFGN

From (35), if we define $\nabla {\hat{F}}(\hat{\mathbf {z}}^j) := \nabla {F}(\hat{\mathbf {z}}^k) - t_0^{-1}\tau _{k+1}\zeta _0$, then we still have $\nabla ^2{\hat{F}}(\hat{\mathbf {z}}^j) = \nabla ^2{F}(\hat{\mathbf {z}}^j)$. Hence, the estimate (27) still holds for $\hat{\lambda }_{\tau }(\hat{\mathbf {z}}^j)$.

Next, if we define $\bar{\mathbf {v}}^j := \mathcal {P}_{\hat{\mathbf {z}}^j}\left( \hat{\mathbf {z}}^j - \nabla ^2{F}(\hat{\mathbf {z}}^j)^{-1}\left( \nabla {F}(\hat{\mathbf {z}}^j)- \tau _jt_0^{-1}\hat{\zeta }^0\right) ; t_0\right) $, then, by the definition of $\mathcal {P}_{\hat{\mathbf {z}}^j}$, we have

$$\begin{aligned} -t_0\left[ \nabla ^2{F}(\hat{\mathbf {z}}^j)(\bar{\mathbf {v}}^j - \hat{\mathbf {z}}^j) + \nabla {F}(\hat{\mathbf {z}}^j) - \tau _jt_0^{-1}\hat{\zeta }_0\right] \in \mathcal {A}(\bar{\mathbf {v}}^j). \end{aligned}$$

(78)

Similarly, since $\bar{\hat{\mathbf {z}}}^{j+1} := \mathcal {P}_{\hat{\mathbf {z}}^j}\left( \hat{\mathbf {z}}^j - \nabla ^2{F}(\hat{\mathbf {z}}^j)^{-1}\left( \nabla {F}(\hat{\mathbf {z}}^j)- \tau _{j+1}t_0^{-1}\hat{\zeta }^0\right) ; t_0\right) $, we have

$$\begin{aligned} -t_0\left[ \nabla ^2{F}(\hat{\mathbf {z}}^j)(\bar{\hat{\mathbf {z}}}^{j+1} - \hat{\mathbf {z}}^j) + \nabla {F}(\hat{\mathbf {z}}^j) - \tau _{j+1}t_0^{-1}\hat{\zeta }_0\right] \in \mathcal {A}(\bar{\hat{\mathbf {z}}}^{j+1}). \end{aligned}$$

(79)

Using (78), (79), and the monotonicity of $\mathcal {A}$, we have

$$\begin{aligned} t_0\langle \nabla ^2{F}(\hat{\mathbf {z}}^j)(\bar{\hat{\mathbf {z}}}^{j+1} - \bar{\mathbf {v}}^j), \bar{\hat{\mathbf {z}}}^{j+1} - \bar{\mathbf {v}}^j\rangle \le (\tau _j - \tau _{j+1})\langle \hat{\zeta }_0, \bar{\mathbf {v}}^j - \bar{\hat{\mathbf {z}}}^{j+1}\rangle . \end{aligned}$$

Using $\tau _{j+1} := \tau _j - {\varDelta }_j$ and the Cauchy–Schwarz inequality, the last inequality leads to

$$\begin{aligned} t_0\left\| \bar{\hat{\mathbf {z}}}^{j+1} - \bar{\mathbf {v}}^j\right\| _{\hat{\mathbf {z}}^j} \le {\varDelta }_j\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}. \end{aligned}$$

(80)

Now, similar to the proof of Lemma 4, using (80), we can derive

$$\begin{aligned} \hat{\lambda }_{\tau _{j+1}}(\hat{\mathbf {z}}^{j}) \le \hat{\lambda }_{\tau _{j}}(\hat{\mathbf {z}}^j) + \frac{{\varDelta }_j}{t_0}\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}. \end{aligned}$$

(81)

By the same argument as the proof of (33), we can show that with $\hat{\gamma }_k := \frac{{\varDelta }_j}{t_0}\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}$, we have $\left| \hat{\gamma }_k\right| \le \frac{c\sqrt{\eta }}{1+c\sqrt{\eta }} - \eta $. This shows that ${\varDelta }_j \le \frac{t_0}{\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}}\left( \frac{c\sqrt{\eta }}{1+c\sqrt{\eta }} - \eta \right) $, which is the first estimate of (37). The second estimate of (37) can be derived as in Lemma 4 using $\eta $ instead of $\beta $.

We prove (38). From (21) and (36), using the triangle inequality, we can upper bound

$$\begin{aligned} \lambda _{t_0}(\mathbf {z}^0)&:= \big \Vert \mathbf {z}^0 - \mathcal {P}_{\mathbf {z}^0}\big (\mathbf {z}^0 - \nabla ^2F(\mathbf {z}^0)^{-1}\nabla {F}(\mathbf {z}^0); t_0\big ) \big \Vert _{\mathbf {z}^0}\\&\overset{\tiny {\mathbf {z}^0 := \hat{\mathbf {z}}^j}}{=} \big \Vert \hat{\mathbf {z}}^j - \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\nabla {F}(\hat{\mathbf {z}}^j); t_0\big ) \big \Vert _{\hat{\mathbf {z}}^j}\\&\le \Big \Vert \hat{\mathbf {z}}^j - \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\big (\nabla {F}(\hat{\mathbf {z}}^j) - \tau _jt_0^{-1}\hat{\zeta }^0\big ); t_0\big )\Big \Vert _{\hat{\mathbf {z}}^j}\\&\quad + \Big \Vert \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\nabla {F}(\hat{\mathbf {z}}^j); t_0\big ) \\&\quad - \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\big (\nabla {F}(\hat{\mathbf {z}}^j) - \tau _jt_0^{-1}\hat{\zeta }^0\big ); t_0\big ) \Big \Vert _{\hat{\mathbf {z}}^j}\\&\overset{(36),(67)}{\le }\hat{\lambda }_{\tau _j}(\hat{\mathbf {z}}^j) + \big \Vert t_0^{-1}\tau _j\nabla ^2{F}(\hat{\mathbf {z}}^j)^{-1}\hat{\zeta }^j\big \Vert _{\hat{\mathbf {z}}^j}\\&= \hat{\lambda }_{\tau _j}(\hat{\mathbf {z}}^j) + \tau _jt_0^{-1}\Vert \hat{\zeta }^0\Vert _{\hat{\mathbf {z}}^j}^{*}, \end{aligned}$$

which proves the first inequality of (38).

By [31, Corollary 4.2.1], we have $\Vert \hat{\zeta }^0\Vert _{\hat{\mathbf {z}}^j}^{*} \le \kappa \Vert \hat{\zeta }^0\Vert _{\bar{\mathbf {z}}_F^{\star }}^{*}$, where $\bar{\mathbf {x}}_F^{\star }$ and $\kappa $ are given by (15) and below (15), respectively. Hence, $\bar{{\varDelta }}_{\eta } := \frac{\mu _{\eta }}{\kappa \Vert \hat{\zeta }^0\Vert _{\bar{\mathbf {z}}_F^{\star }}^{*}} \le \bar{{\varDelta }}_{j}$. The second estimate of (38) follows from $\tau _j := \tau - \sum _{l=0}^{j-1}{\varDelta }_j \le 1 - j\bar{{\varDelta }}_{\eta }$ due to the update rule (35) with ${\varDelta }_j := \bar{{\varDelta }}_{j} \ge \bar{{\varDelta }}_{\eta }$. In order to guarantee $\lambda _{t_0}(\mathbf {z}^0) \le \beta $, it follows from (38) and the update rule of $\tau _j$ that

$$\begin{aligned} j \ge \frac{1}{\bar{{\varDelta }}_{\eta }}\left( 1 - \frac{(\beta - \eta )t_0}{\kappa \Vert \hat{\zeta }^0\Vert _{\bar{\mathbf {z}}_F^{\star }}^{*}}\right) . \end{aligned}$$

Finally, substituting $\bar{{\varDelta }}_{\eta } = \frac{t_0}{\kappa \Vert \hat{\zeta }_0\Vert _{\bar{\mathbf {z}}^{\star }_F}^{*}}\left( \frac{c\sqrt{\eta }}{1+c\sqrt{\eta }} - \eta \right) $ into this estimate and after simplifying the result, we obtain the remaining conclusion of Theorem 5. $\square $

1.9 The proof of Theorem 7: primal recovery for (4) in Algorithm 2

By the definition of $\varphi $, we have $\varphi (\mathbf {y}) := f^{*}(\mathbf {c}- L^{*}\mathbf {y}) = f^{*}(t^{-1}(\mathbf {c}- L^{*}\mathbf {y})) - \nu \ln (t)$ due to the self-concordant logarithmic homogeneity of f. Using the property of the Legendre transformation $f^{*}$ of f, we can express this function as

$$\begin{aligned} \varphi (\mathbf {y}) = t^{-1}\max _{\mathbf {x}\in \mathrm {int}\left( \mathcal {K}\right) }\left\{ \langle \mathbf {c}- L^{*}\mathbf {y}, \mathbf {x}\rangle - tf(\mathbf {x}) \right\} - \nu \ln (t). \end{aligned}$$

We show that the point $\mathbf {x}^k$ given by (56) solves the above maximization problem. We can write down the optimality condition of the above maximization problem as

$$\begin{aligned} \mathbf {c}- L^{*}\mathbf {y}^{k+1} - t_{k+1}\nabla {f}(\mathbf {x}^{k+1}) = 0, \end{aligned}$$

which leads to $\nabla {f}(\mathbf {x}^{k+1}) = t_{k+1}^{-1}(\mathbf {c}- L^{*}\mathbf {y}^{k+1})$. On the other hand, by the well-known property of f [31], we have $\mathbf {x}^{k+1} = \nabla {f^{*}}(\nabla {f}(\mathbf {x}^{k+1})) = \nabla {f^{*}}\left( t_{k+1}^{-1}(\mathbf {c}- L^{*}\mathbf {y}^{k+1})\right) \in \mathrm {int}\left( \mathcal {K}\right) $.

Now, we prove (57). Note that $\mathbf {c}- L^{*}\mathbf {y}^{k+1} - t_{k+1}\nabla {f}(\mathbf {x}^{k+1}) = 0$ and $\Vert \nabla {f}(\mathbf {x})\Vert _{\mathbf {x}}^{*}\le \sqrt{\nu }$, which leads to

$$\begin{aligned} \Vert L^{*}\mathbf {y}^{k+1} -\mathbf {c}\Vert _{\mathbf {x}^{k+1}}^{*} = t_{k+1}\Vert \nabla {f}(\mathbf {x}^{k+1})\Vert _{\mathbf {x}^{k+1}}^{*} \le t_{k+1} \sqrt{\nu }. \end{aligned}$$

Since $t_{k+1} \le \varepsilon $, this estimate leads to the first inequality of (57).

From (24), there exists $\mathbf {e}^k\in \mathbb {R}^p$ such that $\mathbf {e}^k \in \nabla {\varphi }(\mathbf {y}^k) + \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})$ and $\Vert \mathbf {e}^k \Vert _{\mathbf {y}^{k}}^{*} \le \delta _k$. This condition leads to

$$\begin{aligned} \mathbf {e}^k + \nabla {\varphi }(\mathbf {y}^{k+1}) - \nabla {\varphi }(\mathbf {y}^k) - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k) \in \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1}). \end{aligned}$$

Therefore, we have

$$\begin{aligned}&\mathrm {dist}_{\mathbf {y}^{k+1}}\Big (0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\Big ) \le \Vert \mathbf {e}^k + \nabla {\varphi }(\mathbf {y}^{k+1}) \nonumber \\&\qquad - \nabla {\varphi }(\mathbf {y}^k) - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k)\Vert _{\mathbf {y}^{k+1}}^{*} \nonumber \\&\quad \le \Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*} + \Vert \nabla {\varphi }(\mathbf {y}^{k+1}) - \nabla {\varphi }(\mathbf {y}^k) \nonumber \\&\qquad - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k)\Vert _{\mathbf {y}^{k+1}}^{*}. \end{aligned}$$

(82)

To estimate the right-hand side of this inequality, we define $M_k := \Vert \nabla {\varphi }(\mathbf {y}^{k+1}) - \nabla {\varphi }(\mathbf {y}^k) - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k)\Vert _{\mathbf {y}^{k+1}}^{*}$. With the same proof as [31, Theorem 4.1.14], we can show that

$$\begin{aligned} M_k \le \left( 1 - \Vert \mathbf {y}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k}\right) ^{-2}\Vert \mathbf {y}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k}^2 \le \frac{\left( \delta (\mathbf {y}^k) + \lambda _{t_{k+1}}(\mathbf {y}^k)\right) ^2}{\left( 1- \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta (\mathbf {y}^k)\right) ^2}. \end{aligned}$$

(83)

Here, we use $\Vert \mathbf {y}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k} \le \Vert \mathbf {y}^{k+1} - \bar{\mathbf {y}}^{k+1}\Vert _{\mathbf {y}^k} + \Vert \bar{\mathbf {y}}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k} = \delta (\mathbf {y}^k) + \lambda _{t_{k+1}}(\mathbf {y}^k)$ by the definitions of $\lambda _{t_{+}}(\mathbf {y})$ in (21) and of $\delta (\mathbf {y})$ above (27). Substituting (83) into (82) we get

$$\begin{aligned} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\right) \le \Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*} + \frac{\left( \delta (\mathbf {y}^k) + \lambda _{t_{k+1}}(\mathbf {y}^k)\right) ^2}{\left( 1- \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta (\mathbf {y}^k)\right) ^2}. \end{aligned}$$

(84)

Next, it remains to estimate $\Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*}$. Indeed, we have

$$\begin{aligned} \begin{array}{ll} \Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*} &{}\le \big (1 - \Vert \mathbf {y}^{k+1} - \mathbf {y}^{k}_{t_k}\Vert _{\mathbf {y}^k}\big )^{-1}\Vert \mathbf {e}^k\Vert _{\mathbf {y}^k} \le \left( 1- \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta (\mathbf {y}^k)\right) ^{-1}\Vert \mathbf {e}^k\Vert _{\mathbf {y}^k} \\ &{}\le \frac{\delta _k}{1 - \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta _k}. \end{array} \end{aligned}$$

Using this estimate into (84) and $\lambda _{t_{k+1}}(\mathbf {y}^k) \le c\sqrt{\beta }(1+c\sqrt{\beta })^{-1}$ from Lemma 4, we obtain

$$\begin{aligned} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\right)\le & {} \frac{\delta _k(1+c\sqrt{\beta })}{(1-\delta _k(1+c\sqrt{\beta }))}\\&+ \frac{(\delta _k(1+c\sqrt{\beta })+c\sqrt{\beta })^2}{(1 -\delta _k(1+c\sqrt{\beta }))^2}. \end{aligned}$$

Substituting an upper bound $\delta _t := \frac{(1-c^2)\beta }{(1+c\sqrt{\beta })^3\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] }$ of $\delta _k$ from Lemma 4 into the last estimate and simplifying the result, we get

$$\begin{aligned} {}\mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\right) \le \theta (c,\beta ),{} \end{aligned}$$

(85)

where $\theta (c,\beta )$ is defined as

$$\begin{aligned} \theta (c,\beta ):= & {} \frac{(1-c^2)\beta }{(1+c\sqrt{\beta })^2\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] -(1-c^2)\beta }\nonumber \\&+ \left( \frac{(1-c^2)\beta + c\sqrt{\beta }(1+c\sqrt{\beta })^2\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] }{(1+c\sqrt{\beta })^2\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] -(1-c^2)\beta }\right) ^2.\nonumber \\ \end{aligned}$$

(86)

Using the fact that $c \in (0, 1)$ and $0 \le \beta < 0.5(1 + 2c^2 - \sqrt{1 + 4c^2})$, we have $\theta (c,\beta ) \le 1$. Since $\nabla {\varphi }(\cdot ) = -L\nabla {f^{*}}( \mathbf {c}-L^{*}(\cdot ) ) = -t_{k+1}^{-1}L\nabla {f^{*}}(t_{k+1}^{-1}(\mathbf {c}-L^{*}(\cdot )))$ due to (48), using (56) we can show that $\nabla {\varphi }(\mathbf {y}^{k+1}) = t_{k+1}^{-1}L\mathbf {x}^{k+1}$. Plugging this expression into (85) and noting that $\partial {\psi }(\cdot ) = \partial {g}^{*}(\cdot ) + \mathbf {b}$, we obtain

$$\begin{aligned} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( L\mathbf {x}^{k+1} - \mathbf {b}, \partial {g^{*}}(\mathbf {y}^{k+1})\right)= & {} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \mathbf {b}- L\mathbf {x}^{k+1} + \partial {g^{*}}(\mathbf {y}^{k+1})\right) \\\le & {} t_{k+1}\theta (c,\beta ). \end{aligned}$$

Let $\mathbf {s}^{k+1} = \pi _{\partial {g^{*}}(\mathbf {y}^{k+1})}(L\mathbf {x}^{k+1} - \mathbf {b})$ be the projection of $L\mathbf {x}^{k+1} - \mathbf {b}$ onto $\partial {g^{*}}(\mathbf {y}^{k+1})$. Then, $\mathbf {s}^{k+1} \in \partial {g^{*}}(\mathbf {y}^{k+1})$, and hence, $\mathbf {y}^{k+1} \in \partial {g}(\mathbf {s}^{k+1})$, which shows the second term of (57). Using this relation in the last inequality and the definition of $\mathbf {s}^{k+1}$, we obtain $\Vert L\mathbf {x}^{k+1} - \mathbf {b}- \mathbf {s}^{k+1}\Vert _{\mathbf {y}^{k+1}}^{*} \le t_{k+1}\theta (c,\beta )$, which is the third term of (57). Finally, since $\theta (c,\beta ) \le 1$, we have $\max \left\{ \sqrt{\nu }, \theta (c,\beta )\right\} = \sqrt{\nu }$. Using (57), we can conclude that $(\mathbf {x}^k, \mathbf {s}^k)$ is an $\varepsilon $-solution of (3) if $\sqrt{\nu }t_k \le \varepsilon $. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran-Dinh, Q., Sun, T. & Lu, S. Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms. Math. Program. 177, 173–223 (2019). https://doi.org/10.1007/s10107-018-1264-6

Download citation

Received: 18 October 2016
Accepted: 21 March 2018
Published: 30 March 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s10107-018-1264-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms

Abstract

Access this article

Similar content being viewed by others

Forward-partial inverse-half-forward splitting algorithm for solving monotone inclusions

Convergence analysis of two-step inertial Douglas-Rachford algorithm and application

On the Fulfillment of the Complementary Approximate Karush–Kuhn–Tucker Conditions and Algorithmic Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: the proofs of technical results

1.1 The proof of Lemma 1: the existence and uniqueness of the solution of (2).

1.2 The proof of Lemma 3: approximate solution

1.3 The proof of Theorem 1: a key estimate of generalized Newton-type schemes

1.4 The proof of Theorem 2: local quadratic convergence of FGN

1.5 The proof of Theorem 3: local quadratic convergence of DGN

1.6 The proof of Lemma 4: the update rule for the penalty parameter

1.7 The proof of Theorem 4: the worst-case iteration-complexity of PFGN

1.8 The proof of Theorem 5: finding an initial point for PFGN

1.9 The proof of Theorem 7: primal recovery for (4) in Algorithm 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms

Abstract

Access this article

Similar content being viewed by others

Forward-partial inverse-half-forward splitting algorithm for solving monotone inclusions

Convergence analysis of two-step inertial Douglas-Rachford algorithm and application

On the Fulfillment of the Complementary Approximate Karush–Kuhn–Tucker Conditions and Algorithmic Applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: the proofs of technical results

Appendix: the proofs of technical results

1.1 The proof of Lemma 1: the existence and uniqueness of the solution of (2).

1.2 The proof of Lemma 3: approximate solution

1.3 The proof of Theorem 1: a key estimate of generalized Newton-type schemes

1.4 The proof of Theorem 2: local quadratic convergence of FGN

1.5 The proof of Theorem 3: local quadratic convergence of DGN

1.6 The proof of Lemma 4: the update rule for the penalty parameter

1.7 The proof of Theorem 4: the worst-case iteration-complexity of PFGN

1.8 The proof of Theorem 5: finding an initial point for PFGN

1.9 The proof of Theorem 7: primal recovery for (4) in Algorithm 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation