Skip to main content
Log in

Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We study a class of monotone inclusions called “self-concordant inclusion” which covers three fundamental convex optimization formulations as special cases. We develop a new generalized Newton-type framework to solve this inclusion. Our framework subsumes three schemes: full-step, damped-step, and path-following methods as specific instances, while allows one to use inexact computation to form generalized Newton directions. We prove the local quadratic convergence of both full-step and damped-step algorithms. Then, we propose a new two-phase inexact path-following scheme for solving this monotone inclusion which possesses an \({\mathcal {O}}(\sqrt{\nu }\log (1/\varepsilon ))\)-worst-case iteration-complexity to achieve an \(\varepsilon \)-solution, where \(\nu \) is the barrier parameter and \(\varepsilon \) is a desired accuracy. As byproducts, we customize our scheme to solve three convex problems: the convex–concave saddle-point problem, the nonsmooth constrained convex program, and the nonsmooth convex program with linear constraints. We also provide three numerical examples to illustrate our theory and compare with existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Auslender, A., Teboulle, M., Ben-Tiba, S.: A logarithmic-quadratic proximal method for variational inequalities. Comput. Optim. Appl. 12(1–3), 31–40 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bauschke, H.H., Combettes, P.: Convex Analysis and Monotone Operators Theory in Hilbert Spaces, 2nd edn. Springer, Berlin (2017)

    Book  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding agorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  4. Becker, S., Fadili, M.J.: A quasi-Newton proximal splitting method. In: Proceedings of Neutral Information Processing Systems Foundation (NIPS) (2012)

  5. Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Volume 3 of MPS/SIAM Series on Optimization. SIAM, Philadelphia (2001)

    Book  MATH  Google Scholar 

  6. Bonnans, J.F.: Local analysis of Newton-type methods for variational inequalities and nonlinear programming. Appl. Math. Optim. 29, 161–186 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  7. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    Article  MATH  Google Scholar 

  8. Boyd, S., Vandenberghe, L.: Convex Optimization. University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  9. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Combettes, P., Pesquet, J.-C.: Signal recovery by proximal forward-backward splitting. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer, Berlin (2011)

  11. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. De Luca, T., Facchinei, F., Kanzow, C.: A semismooth equation approach to the solution of nonlinear complementarity problems. Math. Program. 75(3), 407–439 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings: A View from Variational Analysis. Springer, Berlin (2014)

    MATH  Google Scholar 

  14. Eckstein, J., Bertsekas, D.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  15. Esser, J.E.: Primal-dual algorithm for convex models and applications to image restoration, registration and nonlocal inpainting. Ph.D. Thesis, University of California, Los Angeles (2010)

  16. Facchinei, F., Pang, J.-S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, vol. 1-2. Springer, Berlin (2003)

    MATH  Google Scholar 

  17. Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)

    Article  MathSciNet  Google Scholar 

  18. Friedlander, M., Goh, G.: Efficient evaluation of scaled proximal operators. Electron. Trans. Numer. Anal. 46, 1–22 (2017)

    MathSciNet  MATH  Google Scholar 

  19. Fukushima, M.: Equivalent differentiable optimization problems and descent methods for asymmetric variational inequality problems. Math. Program. 53, 99–110 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  20. Goldstein, T., Li, M., Yuan, X., Esser, E., Baraniuk, R.: Adaptive primal-dual hybrid gradient methods for saddle-point problems, pp. 1–15 (2015). arXiv:1305.0546v2

  21. Grant, M., Boyd, S., Ye, Y.: Disciplined convex programming. In: Liberti, L., Maculan, N. (eds.) Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications, pp. 155–210. Springer, Berlin (2006)

    Chapter  Google Scholar 

  22. Hajek, B., Wu, Y., Xu, J.: Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inf. Theory 62, 2788–2797 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  23. Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: JMLR W&CP, vol. 28, no. 1, pp. 427–435 (2013)

  24. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems (NIPS), pp. 315–323 (2013)

  25. Korpelevic, G.M.: An extragradient method for finding saddle-points and for other problems. Èkon. Mat. Metody 12(4), 747–756 (1976)

    MathSciNet  Google Scholar 

  26. Kummer, B.: Newton’s method for non-differentiable functions. Adv. Math. Optim. 45, 114–125 (1988)

    MATH  Google Scholar 

  27. Löefberg, J.: YALMIP : a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference, Taipei, Taiwan (2004)

  28. Monteiro, R.D.C., Svaiter, B.F.: Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM J. Optim. 22(3), 914–935 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  29. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  30. Nemirovskii, A.: Prox-method with rate of convergence \({\cal{O}}(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    Article  MathSciNet  Google Scholar 

  31. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Volume 87 of Applied Optimization. Kluwer Academic Publishers, Dordrecht (2004)

    Book  MATH  Google Scholar 

  32. Nesterov, Y.: Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109(2–3), 319–344 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  33. Nesterov, Y.: Smoothing technique and its applications in semidefinite optimization. Math. Program. 110(2), 245–259 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  34. Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  35. Nesterov, Y., Nemirovski, A.: Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial Mathematics, Philadelphia (1994)

    Book  Google Scholar 

  36. Nesterov, Y., Todd, M.J.: Self-scaled barriers and interior-point methods for convex programming. Math. Oper. Res. 22(1), 1–42 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  37. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)

    MATH  Google Scholar 

  38. Pang, J.-S.: A B-differentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems. Math. Program. 51(1), 101–131 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  39. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2013)

    Google Scholar 

  40. Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58, 353–367 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  41. Ralph, D.: Global convergence of damped Newton’s method for nonsmooth equations via the path search. Math. Oper. Res. 19(2), 352–389 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  42. Robinson, S.M.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  43. Robinson, S.M.: Newton’s method for a class of nonsmooth functions. Set Valued Var. Anal. 2, 291–305 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  44. Rockafellar, R.T.: Convex Analysis. Volume 28 of Princeton Mathematics Series. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  45. Rockafellar, R.T., Wets, R .J.-B.: Variational Analysis. Springer, Berlin (1997)

    MATH  Google Scholar 

  46. Shefi, R., Teboulle, M.: Rate of convergence analysis of decomposition methods based on the proximal method of multipliers for convex minimization. SIAM J. Optim. 24(1), 269–297 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  47. Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set Valued Var. Anal. 7(4), 323–345 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  48. Sturm, F.: Using SeDuMi 1.02: A Matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  49. Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. In: Advances in Neural Information Processing Systems (NIPS), pp. 2510–2518 (2014)

  50. Toh, K.-C., Todd, M.J., Tütüncü, R.H.: On the implementation and usage of SDPT3—a Matlab software package for semidefinite-quadratic-linear programming. Technical Report 4, NUS Singapore (2010)

  51. Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4), 1718–1745 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  52. Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: A single phase proximal path-following framework. Math. Oper. Res. (2018) (accepted)

  53. Tran-Dinh, Q., Necoara, I., Savorgnan, C., Diehl, M.: An inexact perturbed path-following method for Lagrangian decomposition in large-scale separable convex optimization. SIAM J. Optim. 23(1), 95–125 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  54. Tseng, P.: Applications of splitting algorithm to decomposition in convex programming and variational inequalities. SIAM J. Control Optim. 29, 119–138 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  55. Tseng, P.: Alternating projection-proximal methods for convex programming and variational inequalities. SIAM J. Optim. 7(4), 951–965 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  56. Wen, Z., Goldfarb, D., Yin, W.: Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput. 2, 203–230 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  57. Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4(4), 333–361 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  58. Womersley, R.S., Sun, D., Qi, H.: A feasible semismooth asymptotically Newton method for mixed complementarity problems. Math. Program. 94(1), 167–187 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  59. Wright, S.J.: Applying new optimization algorithms to model predictive control. In: Kantor J.C., Garcia C.E., Carnahan B. (eds) Fifth International Conference on Chemical Process Control—CPCV, pp. 147–155. American Institute of Chemical Engineers (1996)

  60. Xiu, N., Zhang, J.: Some recent advances in projection-type methods for variational inequalities. J. Comput. Appl. Math. 152(1), 559–585 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  61. Yamashita, H., Yabe, H., Harada, K.: A primal-dual interior point method for nonlinear semidefinite programming. Math. Program. 135, 89–121 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  62. Yang, L., Sun, D., Toh, K.-C.: SDPNAL+: a majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math. Program. Comput. 7(3), 331–366 (2015)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the NSF Grant, USA, Award Number: 1619884.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Tran-Dinh.

Appendix: the proofs of technical results

Appendix: the proofs of technical results

This appendix provides the full proofs of all lemmas and theorems in the main text.

1.1 The proof of Lemma 1: the existence and uniqueness of the solution of (2).

Under Assumption A.1, the operator \(t\nabla {F}(\cdot ) + \mathcal {A}(\cdot )\) is maximally monotone for any \(t > 0\). We use [45, Theorem 12.51] to prove the solution existence of (2).

To this end, let \(\varvec{\omega }\ne \varvec{0}\) be chosen from the horizon cone of \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\). We need to find \(\mathbf {z}\in \mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\) with \(\mathbf {v}\in t \nabla {F}(\mathbf {z}) + \mathcal {A}(\mathbf {z})\) such that \(\langle \mathbf {v}, \varvec{\omega }\rangle > 0\). By assumption, there exists \(\hat{\mathbf {z}}\in \mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\) with \(\hat{\mathbf {a}}\in \mathcal {A}(\hat{\mathbf {z}}) \) such that \(\langle \hat{\mathbf {a}}, \varvec{\omega }\rangle >0\).

First, we show that \(\mathbf {z}_\tau = \hat{\mathbf {z}}+ \tau \varvec{\omega }\) belongs to \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\) for any \(\tau >0\). To see this, note that the assumption \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\ne \emptyset \) implies that \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {ri} \ \mathrm {dom}(\mathcal {A})\ne \emptyset \), which implies that the closure of \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\) is exactly \(\mathcal {Z}\cap \mathrm {cl}\!\left( \mathrm {dom}(\mathcal {A})\right) \). Choose \(\tau '>\tau \); by definition of the horizon cone, \(\mathbf {z}_{\tau '}\) belongs to the closure of \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\), so \(\mathbf {z}_{\tau '}\in \mathcal {Z}\) and \(\mathbf {z}_{\tau '}\in \mathrm {cl}\!\left( \mathrm {dom}(\mathcal {A})\right) \). Since \(\mathbf {z}_{\tau }\) is a convex combination of \(\hat{\mathbf {z}}\) and \(\mathbf {z}_{\tau '}\), it belongs to \(\mathrm {int}\left( \mathcal {Z}\right) \cap \mathrm {dom}(\mathcal {A})\), where we use the assumption that \(\mathrm {dom}(\mathcal {A})\) is either closed or open.

Next, for any \(\mathbf {a}_\tau \in \mathcal {A}(\mathbf {z}_\tau )\), we have

$$\begin{aligned} \langle \mathbf {a}_\tau , \varvec{\omega }\rangle = \langle \mathbf {a}_\tau - \hat{\mathbf {a}}, \varvec{\omega }\rangle + \langle \hat{\mathbf {a}}, \varvec{\omega }\rangle =\langle \mathbf {a}_\tau - \hat{\mathbf {a}}, \tau ^{-1}(\mathbf {z}_\tau -\hat{\mathbf {z}})\rangle + \langle \hat{\mathbf {a}},\varvec{\omega }\rangle \ge \langle \hat{\mathbf {a}},\varvec{\omega }\rangle > 0. \end{aligned}$$

On the other hand, \(\langle t \nabla {F}(\mathbf {z}_\tau ), \varvec{\omega }\rangle = \langle t \nabla {F}(\mathbf {z}_\tau ), \tau ^{-1}(\mathbf {z}_\tau -\hat{\mathbf {z}})\rangle \ge -\,\tau ^{-1} t\nu \) by [31, Theorem 4.2.4]. Combining the above two inequalities, we can see that

$$\begin{aligned} \langle t \nabla {F}(\mathbf {z}_\tau ) + \mathbf {a}_\tau , \varvec{\omega }\rangle \ge -\,\tau ^{-1} t\nu + \langle \hat{\mathbf {a}}, \varvec{\omega }\rangle >0 \end{aligned}$$

as long as \(\tau ^{-1}t\nu < \langle \hat{\mathbf {a}},\varvec{\omega }\rangle \). We have thereby verified the condition in [45, Theorem 12.51], which needs to guarantee (2) for having a nonempty (and bounded) solution set. Since \(\nabla F\) is strictly monotone, the solution of (2) is unique.

Note that \(\mathbf {z}^{\star }_t\) is the solution of (2) and \(\mathbf {z}^{\star }_t \in \mathrm {int}\left( \mathcal {Z}\right) \), we have \(-\,t\nabla {F}(\mathbf {z}^{\star }_t) \in \mathcal {A}(\mathbf {z}^{\star }_t) = \mathcal {A}_{\mathcal {Z}}(\mathbf {z}^{\star }_t)\). Hence, \(\mathrm {dist}_{\mathbf {z}^{\star }_t}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}^{\star }_t)) \le t\left\| \nabla {F}(\mathbf {z}^{\star }_t)\right\| _{\mathbf {z}^{\star }_t}^{*} \le t\sqrt{\nu }\) due to the property of F [31]. Using Definition 4, we have the last conclusion. \(\square \)

1.2 The proof of Lemma 3: approximate solution

First, since \(\bar{\mathbf {z}}_{+}\) is a zero point of \(\widehat{\mathcal {A}}_t(\cdot ;z)\), i.e., \(0 \in \widehat{\mathcal {A}}_t(\bar{\mathbf {z}}_{+},z)\), we have \(-\,t\nabla {F}(\mathbf {z}) - t\nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z}) \in \mathcal {A}(\bar{\mathbf {z}}_{+})\). Second, since \(\mathbf {z}_{+}\) is a \(\delta \)-solution to (23), there exists \(\mathbf {e}\) such that \(\mathbf {e}\in t\nabla {F}(\mathbf {z}) + t\nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) + \mathcal {A}(\mathbf {z}_{+})\) with \(\Vert \mathbf {e}\Vert _{\mathbf {z}}^{*} \le t\delta \) by Definition 5. Combining these expressions, and using the monotonicity of \(\mathcal {A}\) in Definition 1, we can show that \(\langle t[\nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})] -\mathbf {e}, \bar{\mathbf {z}}_{+} - \mathbf {z}_{+} \rangle \ge 0\). This inequality leads to

$$\begin{aligned} t\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}}^2 \le \langle \mathbf {e}, \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\rangle \le \Vert \mathbf {e}\Vert _{\mathbf {z}}^{*}\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}}, \end{aligned}$$
(65)

which implies \(\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} \le t^{-1}\Vert \mathbf {e}\Vert _{\mathbf {z}}^{*}\). Hence, \(\Vert \mathbf {e}\Vert _{\mathbf {z}}^{*}\le t\delta \) implies \(\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} \le \delta \).

Next, since \(\mathbf {z}_{+}\) is a \(\delta \)-approximate solution to (23) at t in the sense of Definition 5 up to the accuracy \(\delta \), there exists \(\mathbf {e}\in \mathbb {R}^p\) such that

$$\begin{aligned} \mathbf {e}\in t\left[ \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\right] + \mathcal {A}(\mathbf {z}_{+})~~\text {with}~~\left\| \mathbf {e}\right\| _{\mathbf {z}}^{*} \le t\delta . \end{aligned}$$

In addition, we have \(\mathbf {z}_{+} \in \mathrm {int}\left( \mathcal {Z}\right) \) due to Theorem 1. Hence, we have \(\mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+}) = \mathcal {A}(\mathbf {z}_{+})\). Using this relation and the above inclusion, we can show that

$$\begin{aligned} \begin{array}{ll} \mathrm {dist}_{\mathbf {z}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+})) &{}\le \Vert \mathbf {e}- t\left[ \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\right] \Vert _{\mathbf {z}}^{*} \\ &{}\le \left\| \mathbf {e}\right\| _{\mathbf {z}}^{*} + t\left\| \nabla {F}(\mathbf {z})\right\| _{\mathbf {z}}^{*} + t\Vert \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\Vert _{\mathbf {z}}^{*}\\ &{}\le t \left[ \delta + \sqrt{\nu } + \Vert \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})\Vert _{\mathbf {z}}^{*} + \Vert \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}}^{*}\right] \\ &{}\le t \left[ \delta + \sqrt{\nu } + \lambda _{t}(\mathbf {z}) + \left\| \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\right\| _{\mathbf {z}}\right] \\ &{}\le t\left( \sqrt{\nu } + \lambda _{t}(\mathbf {z}) + 2\delta \right) . \end{array} \end{aligned}$$
(66)

Here, we have used \(\left\| \nabla {F}(\mathbf {z})\right\| _{\mathbf {z}}^{*} \le \sqrt{\nu }\), and \(\left\| \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\right\| _{\mathbf {z}} \le \delta \) by the first part of this lemma. Note that if \(\lambda _t(\mathbf {z}) + \delta < 1\), then \(\mathrm {dist}_{\mathbf {z}_{+}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+})) \le (1-\lambda _t(\mathbf {z}) -\delta )^{-1}\mathrm {dist}_{\mathbf {z}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+}))\). Combining this inequality and the last estimate, we obtain (25). Finally, if we choose \(t \le (1-\lambda _t(\mathbf {z})-\delta )\left( \sqrt{\nu } + \lambda _{t}(\mathbf {z}) + 2\delta \right) ^{-1}\varepsilon \), then \(\mathrm {dist}_{\mathbf {z}_{+}}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}_{+})) \le \varepsilon \). Hence, \(\mathbf {z}_{+}\) is an \(\varepsilon \)-solution to (1) in the sense of Definition 4. \(\square \)

1.3 The proof of Theorem 1: a key estimate of generalized Newton-type schemes

First, similar to [2], we can easily show the the following non-expansive property holds

$$\begin{aligned} \Vert \mathcal {P}_{\hat{\mathbf {z}}}(\mathbf {u}; t) - \mathcal {P}_{\hat{\mathbf {z}}}(\mathbf {v}; t)\Vert _{\hat{\mathbf {z}}} \le \Vert \mathbf {u}- \mathbf {v}\Vert _{\hat{\mathbf {z}}},~~~\forall \mathbf {u},\mathbf {v}\in \mathbb {R}^p. \end{aligned}$$
(67)

Note that \(\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} \le \Vert \bar{\mathbf {z}}_{+} - \mathbf {z}\Vert _{\mathbf {z}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} = \lambda _t(\mathbf {z}) + \delta (\mathbf {z}) < 1\) by our assumption. This shows that \(\mathbf {z}_{+}\in \mathrm {int}\left( \mathcal {Z}\right) \) due to [31, Theorem 4.1.5 (1)].

Next, we consider the generalized gradient mappings \(G_{\mathbf {z}}(\mathbf {z};t_{+})\) and \(G_{\mathbf {z}_{+}}(\mathbf {z}_{+}; t_{+})\) at \(\mathbf {z}\) and \(\mathbf {z}_{+}\), respectively defined by (20) as follows:

$$\begin{aligned} \begin{array}{ll} G_{\mathbf {z}}(\mathbf {z}; t_{+}) &{}:= \nabla ^2{F}(\mathbf {z})\left( \mathbf {z}- \mathcal {P}_{\mathbf {z}}\left( \mathbf {z}- \nabla ^2{F}(\mathbf {z})^{-1}\nabla {F}(\mathbf {z}); t_{+}\right) \right) , \\ G_{\mathbf {z}_{+}}(\mathbf {z}_{+}; t_{+}) &{}:= \nabla ^2{F}(\mathbf {z}_{+})\left( \mathbf {z}_{+} - \mathcal {P}_{\mathbf {z}_{+}}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) \right) . \end{array} \end{aligned}$$
(68)

Let \(r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}) := \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})\). Then, by using \(\bar{\mathbf {z}}_{+} := \mathcal {P}_{\mathbf {z}}\big (\mathbf {z}- \nabla ^2{F}(\mathbf {z})^{-1}\nabla {F}(\mathbf {z}); t_{+}\big )\) from (26), we can show that

$$\begin{aligned} -r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}) := -\left[ \nabla {F}(\mathbf {z}) + \nabla ^2{F}(\mathbf {z})(\bar{\mathbf {z}}_{+} - \mathbf {z})\right] \in t_{+}^{-1}\mathcal {A}(\bar{\mathbf {z}}_{+}). \end{aligned}$$
(69)

Clearly, we can rewrite (69) as \(\bar{\mathbf {z}}_{+} -\nabla ^2{F}(\mathbf {z}_{+})^{-1}r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}) \in \bar{\mathbf {z}}_{+} + t_{+}^{-1}\nabla ^2{F}(\mathbf {z}_{+})^{-1}\mathcal {A}(\bar{\mathbf {z}}_{+})\). Then, using the definition (16) of \(\mathcal {P}_{\mathbf {z}_{+}}(\cdot ) := \left( \mathbb {I}+ t_{+}^{-1}\nabla ^2{F}(\mathbf {z}_{+})^{-1}\mathcal {A}\right) ^{-1}(\cdot )\), we can derive

$$\begin{aligned} \mathbf {z}_{+} = \mathcal {P}_{\mathbf {z}_{+}}\left( \bar{\mathbf {z}}_{+} -\nabla ^2{F}(\mathbf {z}_{+})^{-1}r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}); t_{+} \right) + (\mathbf {z}_{+} - \bar{\mathbf {z}}_{+}). \end{aligned}$$
(70)

Now, we can estimate \(\lambda _{t_{+}}(\mathbf {z}_{+})\) defined by (21) using (68), (70), (67), and (69) as follows:

$$\begin{aligned} \lambda _{t_{+}}(\mathbf {z}_{+})&:= \Vert G_{\mathbf {z}_{+}}(\mathbf {z}_{+}; t_{+})\Vert ^{*}_{\mathbf {z}_{+}} \overset{(68)}{=} \Vert \mathbf {z}_{+} - \mathcal {P}_{\mathbf {z}_{+}}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) \Vert _{\mathbf {z}_{+}} \nonumber \\&\overset{(70)}{{}={}} \Big \Vert \mathcal {P}_{\mathbf {z}_{+}}{}\left( \bar{\mathbf {z}}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}{} r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}); t_{+} \right) \nonumber \\&\quad - \mathcal {P}_{\mathbf {z}_{+}}{}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}{}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) + (\mathbf {z}_{+} - \bar{\mathbf {z}}_{+}) \Big \Vert _{\mathbf {z}_{+}} \nonumber \\&\le \Big \Vert \mathcal {P}_{\mathbf {z}_{+}}\left( \bar{\mathbf {z}}_{+} -\nabla ^2{F}(\mathbf {z}_{+})^{-1}r_{\mathbf {z}}(\bar{\mathbf {z}}_{+}); t_{+} \right) \nonumber \\&\quad - \mathcal {P}_{\mathbf {z}_{+}}\left( \mathbf {z}_{+} - \nabla ^2{F}(\mathbf {z}_{+})^{-1}\nabla {F}(\mathbf {z}_{+}); t_{+}\right) \Big \Vert _{\mathbf {z}_{+}}\nonumber \\&\quad + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\overset{(67)}{\le } \Big \Vert \nabla ^2{F}(\mathbf {z}_{+})^{-1}\left[ \nabla {F}(\mathbf {z}_{+}) - r_{\mathbf {z}}(\bar{\mathbf {z}}_{+})\right] + (\bar{\mathbf {z}}_{+} - \mathbf {z}_{+}) \Big \Vert _{\mathbf {z}_{+}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\overset{(69)}{=} \Big \Vert \nabla ^2{F}(\mathbf {z}_{+})^{-1}\big [\nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) + (\nabla ^2{F}(\mathbf {z}_{+})\nonumber \\&\quad - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\big ]\Big \Vert _{\mathbf {z}_{+}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\le \Vert \nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) \Vert _{\mathbf {z}_{+}}^{*} \nonumber \\&\quad + \Vert (\nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}_{+}}^{*} +\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}_{+}} \nonumber \\&\le \tfrac{1}{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}}\Big [ \Vert \nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z}) \Vert _{\mathbf {z}}^{*} \nonumber \\&\quad + \Vert (\nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}}^{*} \Big ] + \tfrac{\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} }{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}}. \end{aligned}$$
(71)

Here, in the last equality of (71), we have used the fact that \(\Vert \mathbf {w}\Vert _{\mathbf {z}_{+}}^2 = \langle \nabla ^2{F}(\mathbf {z}_{+})\mathbf {w}, \mathbf {w}\rangle \le (1-\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^{-2}\langle \nabla ^2{F}(\mathbf {z})\mathbf {w}, \mathbf {w}\rangle = (1-\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^{-2}\Vert \mathbf {w}\Vert _{\mathbf {z}}^2\) for any \(\mathbf {w}\) and \(\mathbf {z}, \mathbf {z}_{+}\) such that \(\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} < 1\), and the analogous fact for the dual norms. Both facts can be derived from [31, Theorem 4.1.6]. The condition \(\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} < 1\) is guaranteed since \(\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} \le \Vert \mathbf {z}- \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} = \lambda _{t_{+}}(\mathbf {z}) + \delta (\mathbf {z}) < 1\) by our assumption.

Similar to the proof of [31, Theorem 4.1.14], we can show that

$$\begin{aligned} \left\| \nabla {F}(\mathbf {z}_{+}) - \nabla {F}(\mathbf {z}) - \nabla ^2{F}(\mathbf {z})(\mathbf {z}_{+} - \mathbf {z})\right\| _{\mathbf {z}}^{*} \le \frac{\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}}. \end{aligned}$$
(72)

Next, we need to estimate \(B := \Vert (\nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z}))(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})\Vert _{\mathbf {z}}^{*}\). We define

$$\begin{aligned} \Sigma := \nabla ^2{F}(\mathbf {z})^{-1/2}\left( \nabla ^2{F}(\mathbf {z}_{+}) - \nabla ^2{F}(\mathbf {z})\right) \nabla ^2{F}(\mathbf {z})^{-1/2}. \end{aligned}$$

By [31, Theorem 4.1.6], we can show that

$$\begin{aligned} \Vert \Sigma \Vert\le & {} \max \left\{ 1 - (1 - \Vert \mathbf {z}_{+} -\mathbf {z}\Vert _{\mathbf {z}})^2, (1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^{-2} - 1\right\} \\= & {} \frac{2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^2}. \end{aligned}$$

Using this inequality we can estimate B as

$$\begin{aligned} B^2&= (\bar{\mathbf {z}}_{+} - \mathbf {z}_{+})^{\top }\nabla ^2{F}(\mathbf {z})^{1/2}\Sigma ^2\nabla ^2{F}(\mathbf {z})^{1/2}(\bar{\mathbf {z}}_{+} - \mathbf {z}_{+}) \le \Vert \Sigma \Vert ^2\Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}^2\nonumber \\&\le \left( \frac{2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^2}\right) ^2\Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}^2, \end{aligned}$$

which implies

$$\begin{aligned} B \le \left( \frac{2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^2}\right) \Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}. \end{aligned}$$
(73)

Substituting (72) and (73) into (71) we get

$$\begin{aligned} \lambda _{t_{+}}(\mathbf {z}_{+})&\le \frac{\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{\left( 1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}\right) ^2}\nonumber \\&\quad + \frac{\left[ 2\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}} - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2\right] \Vert \bar{\mathbf {z}}_{+} - \mathbf {z}_{+}\Vert _{\mathbf {z}}}{(1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}})^3} + \frac{\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} }{1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}} \nonumber \\&= \frac{\Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}^2}{\left( 1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}\right) ^2} + \frac{\Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} }{\left( 1 - \Vert \mathbf {z}_{+} - \mathbf {z}\Vert _{\mathbf {z}}\right) ^3}. \end{aligned}$$
(74)

Finally, we note that \(\lambda _{t_{+}}(\mathbf {z}) := \Vert G_{\mathbf {z}}(\mathbf {z}; t_{+})\Vert _{\mathbf {z}}^{*} = \Vert \mathbf {z}- \mathcal {P}_{\mathbf {z}}\left( \mathbf {z}- \nabla ^2{F}(\mathbf {z})^{-1}\nabla {F}(\mathbf {z}); t_{+} \right) \Vert _{\mathbf {z}} = \Vert \mathbf {z}- \bar{\mathbf {z}}_{+} \Vert _{\mathbf {z}}\) due to (26). Using the triangle inequality we have \(\Vert \mathbf {z}_{+}-\mathbf {z}\Vert _{\mathbf {z}} \le \Vert \mathbf {z}- \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} + \Vert \mathbf {z}_{+} - \bar{\mathbf {z}}_{+}\Vert _{\mathbf {z}} = \lambda _{t_{+}}(\mathbf {z}) + \delta (\mathbf {z}) < 1\). Since the right-hand side of (74) is monotonically increasing with respect to \(\Vert \mathbf {z}_{+}-\mathbf {z}\Vert _{\mathbf {z}}\), using the last inequality into (74), we obtain (27). \(\square \)

1.4 The proof of Theorem 2: local quadratic convergence of FGN

We first prove (a). Given a fixed parameter \(t > 0\) sufficiently small, our objective is to find \(\beta \in (0, 1)\) such that if \(\lambda _{t}(\mathbf {z}^k) \le \beta \), then \(\lambda _{t}(\mathbf {z}^{k+1}) \le \beta \). Indeed, using the key estimate (27) with t instead of \(t_{+}\), we can see that to guarantee \(\lambda _{t}(\mathbf {z}^{k+1}) \le \beta \), we require

$$\begin{aligned} \left( \frac{\lambda _{t}(\mathbf {z}^k) + \delta (\mathbf {z}^k)}{1 - \lambda _{t}(\mathbf {z}^k) - \delta (\mathbf {z}^k)}\right) ^2 + \frac{\delta (\mathbf {z}^k)}{\left( 1-\lambda _{t}(\mathbf {z}^k) - \delta (\mathbf {z}^k)\right) ^3} \le \beta . \end{aligned}$$

Since the left-hand side of this inequality is monotonically increasing when \(\lambda _{t}(\mathbf {z}^k)\) and \(\delta (\mathbf {z}^k)\) are increasing, we can overestimate it by

$$\begin{aligned} \left( \frac{\beta +\delta }{1 - \beta - \delta }\right) ^2 + \frac{\delta }{(1-\beta - \delta )^3} \le \beta . \end{aligned}$$

Using the identity \(\frac{\beta +\delta }{1-\beta -\delta } = \frac{\beta }{1-\beta } + \frac{\delta }{(1-\beta )(1-\beta -\delta )}\), we can write the last inequality as

$$\begin{aligned} \Big [\frac{2\beta }{(1-\beta )^2(1-\beta -\delta )} + \frac{\delta }{(1-\beta )^2(1-\beta -\delta )^2} + \frac{1}{(1-\beta - \delta )^3}\Big ]\delta \le \beta - \left( \frac{\beta }{1-\beta }\right) ^2. \end{aligned}$$
(75)

Clearly, the left-hand side of (75) is positive if \(0< \delta < 1-\beta \). Hence, we need to choose \(\beta \in (0, 0.5(3-\sqrt{5}))\) such that the right-hand side of (75) is also positive. Now, we choose \(\delta \ge 0\) such that \(\delta \le \beta (1-\beta ) < 1-\beta \). Then, (75) can be one more time overestimated by

$$\begin{aligned} \Big (\frac{2\beta ^3 - 5\beta ^2 + 3\beta + 1}{(1-\beta )^4}\Big )\delta \le \beta (1 - 3\beta + \beta ^2), \end{aligned}$$

which implies

$$\begin{aligned} 0 \le \delta \le \frac{\beta (1 - 3\beta + \beta ^2)(1-\beta )^4}{2\beta ^3 - 5\beta ^2 + 3\beta + 1} < \beta (1-\beta ),~~\forall \beta \in \left( 0, 0.5(3-\sqrt{5})\right) . \end{aligned}$$

This inequality suggests that we can choose \(\delta := \frac{\beta (1 - 3\beta + \beta ^2)(1-\beta )^4}{2\beta ^3 - 5\beta ^2 + 3\beta + 1} > 0\). In this case, we also have \(\delta (\mathbf {z}) + \lambda _t(\mathbf {z}) \le \delta + \beta < 1\), which guarantees the condition of Theorem 1. Hence, we can conclude that \(\lambda _t(\mathbf {z}^k) \le \beta \) implies \(\lambda _t(\mathbf {z}^{k+1}) \le \beta \). In other words, \(\left\{ \mathbf {z}^k\right\} \) belongs to \(\mathcal {Q}_{t}(\beta )\).

(b) Next, to guarantee a quadratic convergence, we can choose \(\delta _k\) such that \(\delta (\mathbf {z}^k) \le \delta _k \le \bar{\delta }_k := \frac{\lambda _{t}(\mathbf {z}^k)^2}{1-\lambda _{t}(\mathbf {z}^k)}\). Substituting the upper bound \(\bar{\delta }_k\) of \(\delta (\mathbf {z}^k)\) into (27) we obtain

$$\begin{aligned} \lambda _{t}(\mathbf {z}^{k+1}) \le \left( \frac{2-4\lambda _t(\mathbf {z}^k) + \lambda _t(\mathbf {z}^k)}{(1-2\lambda _t(\mathbf {z}^k))^3}\right) \lambda _t(\mathbf {z}^k)^2. \end{aligned}$$

Let us consider the function \(s(r) := \frac{(2 - 4r + r^2)r^2}{(1 - 2r)^3}\) on [0, 1]. We can easily check that \(s(r) < 1\) for all \(r \in [0, 1]\). Hence, \(\lambda _{t}(\mathbf {z}^{k+1}) < 1\) as long as \(\lambda _{t}(\mathbf {z}^k) < 1\). This proves the estimate (30).

Now, let us choose some \(\beta \in (0, 1)\) such that \(\lambda _t(\mathbf {z}^k) \le \beta \). Then (30) leads to

$$\begin{aligned} \lambda _t(\mathbf {z}^{k+1}) \le \left( \frac{2-4\beta +\beta ^2}{(1-2\beta )^3}\right) \lambda _t(\mathbf {z}^k)^2 = c\lambda _t(\mathbf {z})^2, \end{aligned}$$

where \(c := \frac{2-4\beta +\beta ^2}{(1-2\beta )^3} > 0\). We need to choose \(\beta \in (0, 1)\) such that \(c\lambda _t(\mathbf {z}^k) < 1\). Since \(\lambda _t(\mathbf {z}^k) \le \beta \), we choose \(\beta \) such that \(c\beta < 1\), which is equivalent to \(9\beta ^3 - 16\beta ^2 + 8\beta - 1 < 0\). If \(\beta \in (0, 0.18858]\), then \(9\beta ^3 - 16\beta ^2 + 8\beta - 1 < 0\). Therefore, the radius of the quadratic convergence region of \(\left\{ \lambda _t(\mathbf {z}^k)\right\} \) is \(r := 0.18858\).

(c) Finally, for any \(\beta \in (0, 0.18858]\), we can write \(c\lambda _t(\mathbf {z}^{k+1}) \le (c\lambda _t(\mathbf {z}^k))^2\). By induction, \(c\lambda _t(\mathbf {z}^k) \le (c\lambda _t(\mathbf {z}^0))^{2^k} \le c^{2^k}\beta ^{2^k} < 1\). We obtain \(\lambda _t(\mathbf {z}^k) \le c^{2^{k-1}}\beta ^{2^k}\). Let us choose \(\delta _k := \frac{\lambda _t(\mathbf {z}^k)^2}{1-\lambda _t(\mathbf {z}^k)}\). For \(\epsilon \in (0, \beta )\), assume that \(c^{2^{k-1}}\beta ^{2^k}\le \epsilon \). From Lemma 3, we can choose \(t := (1-\epsilon )(\sqrt{\nu } + \epsilon + 2\epsilon ^2/(1-\epsilon ))^{-1}\varepsilon \) . Then, \(\mathbf {z}^k\) is an \(\varepsilon \)-solution of (1). It remains to use the fact that \(c^{2^{k-1}}\beta ^{2^k}\le \epsilon \) to upper bound the number of iterations \(k := {\mathcal {O}}\left( \ln \left( \ln (1/\epsilon )\right) \right) \). \(\square \)

1.5 The proof of Theorem 3: local quadratic convergence of DGN

(a) Given a fixed parameter \(t > 0\) sufficiently small, it follows from DGN and (70) that

$$\begin{aligned} \begin{array}{ll} \bar{\mathbf {z}}^{k+2} &{}= \mathcal {P}_{\mathbf {z}^{k+1}}\left( \mathbf {z}^{k+1} - \nabla ^2{F}(\mathbf {z}^{k+1})^{-1}\nabla {F}(\mathbf {z}^{k+1}); t\right) , \\ \mathbf {z}^{k+1} &{}= \mathcal {P}_{\mathbf {z}^{k+1}}\left( \bar{\mathbf {z}}^{k+1} - \nabla ^2{F}(\mathbf {z}^{k+1})^{-1}r_{\mathbf {z}^k}(\bar{\mathbf {z}}^{k+1}); t\right) + (\mathbf {z}^{k+1} - \bar{\mathbf {z}}^{k+1}). \end{array} \end{aligned}$$

Hence, using these notations and the same proof as (74) with t instead of \(t_{+}\), and assuming \(\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}<1\), we can derive

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}_{k+1}} \le \left( \frac{\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}}{1-\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}}\right) ^2 + \frac{\Vert \mathbf {z}^{k+1} - \bar{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k}}{(1-\Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k})^3}. \end{aligned}$$
(76)

Now, let us define \(\tilde{\lambda }_t(\mathbf {z}^k) := \Vert \tilde{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}\) and \(\alpha _k := (1 + \tilde{\lambda }_t(\mathbf {z}^k))^{-1}\) as in DGN. From the update \(\mathbf {z}^{k+1} := (1-\alpha _k)\mathbf {z}^k + \alpha _k\tilde{\mathbf {z}}^{k+1}\) of DGN, we have

$$\begin{aligned} \Vert \mathbf {z}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}= & {} \alpha _k\Vert \tilde{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k} = \alpha _k\tilde{\lambda }_t(\mathbf {z}^k),~~~\text {and}\\ \Vert \mathbf {z}^{k+1} - \bar{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k}\le & {} \Vert \mathbf {z}^{k+1} - \tilde{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k} + \Vert \tilde{\mathbf {z}}^{k+1} - \bar{\mathbf {z}}^{k+1}\Vert _{\mathbf {z}^k} \\= & {} (1-\alpha _k)\Vert \tilde{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k} + \delta (\mathbf {z}^k) \\= & {} (1-\alpha _k)\tilde{\lambda }_t(\mathbf {z}^k) + \delta (\mathbf {z}^k). \end{aligned}$$

Substituting these expressions into (76) we get

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}_{k+1}}&\le \left( \frac{\alpha _k\tilde{\lambda }_{t}(\mathbf {z}^k)}{1 - \alpha _k\tilde{\lambda }_{t}(\mathbf {z}^k)}\right) ^2 + \frac{\delta (\mathbf {z}^k) + (1-\alpha _k)\tilde{\lambda }_t(\mathbf {z}^k)}{\left( 1 - \alpha _k\tilde{\lambda }_{t}(\mathbf {z}^k)\right) ^3}. \end{aligned}$$

Substituting \(\alpha _k := (1 + \tilde{\lambda }_{t}(\mathbf {z}^k))^{-1}\) into the last inequality and simplifying the result, we get

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}_{k+1}} \le \left( 2 + 2\tilde{\lambda }_{t}(\mathbf {z}^k) + \tilde{\lambda }_{t}(\mathbf {z}^k)^2\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2 + \left( 1 + \tilde{\lambda }_{t}(\mathbf {z}^k)\right) ^3\delta (\mathbf {z}^k). \end{aligned}$$

Next, by the triangle inequality, it follows from (68) and the definition of \(\lambda _t(\mathbf {z})\) and \(\tilde{\lambda }_t(\mathbf {z})\) that \(\tilde{\lambda }_t(\mathbf {z}^{k+1}) = \Vert \tilde{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}^{k+1}} \le \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}^{k+1}} + \Vert \tilde{\mathbf {z}}^{k+2} - \bar{\mathbf {z}}^{k+2}\Vert _{\mathbf {z}^{k+1}} = \Vert \bar{\mathbf {z}}^{k+2} - \mathbf {z}^{k+1}\Vert _{\mathbf {z}^{k+1}} + \delta (\mathbf {z}^{k+1})\). Combining this estimate and the above inequality we get

$$\begin{aligned} \tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \left( 2 + 2\tilde{\lambda }_{t}(\mathbf {z}^k) + \tilde{\lambda }_{t}(\mathbf {z}^k)^2\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2 + \left( 1 + \tilde{\lambda }_{t}(\mathbf {z}^k)\right) ^3\delta (\mathbf {z}^k) + \delta (\mathbf {z}^{k+1}). \end{aligned}$$

If we choose \(\delta (\mathbf {z}^k) \le \delta _k \le \frac{\tilde{\lambda }_{t}(\mathbf {z}^k)^2}{1+ \tilde{\lambda }_{t}(\mathbf {z}^k)}\), then, by induction, \(\delta (\mathbf {z}^{k+1}) \le \delta _{k+1} \le \frac{\tilde{\lambda }_{t}(\mathbf {z}^{k+1})^2}{1+\tilde{\lambda }_{t}(\mathbf {z}^{k+1})}\). Substituting these bounds into the last inequality and simplifying the result, we obtain

$$\begin{aligned} \tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \left( \frac{2\tilde{\lambda }_t(\mathbf {z}^k)^2 + 4\tilde{\lambda }_t(\mathbf {z}^k) + 3}{1 - \tilde{\lambda }_t(\mathbf {z}^k)^2\left( 2\tilde{\lambda }_t(\mathbf {z}^k)^2 + 4\tilde{\lambda }_t(\mathbf {z}^k) + 3\right) }\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2, \end{aligned}$$

which is indeed (31).

From (31), after a few elementary calculations, we can see that \(\tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \tilde{\lambda }_t(\mathbf {z}^k)\) if \(\tilde{\lambda }_t(\mathbf {z}^k)(1 + \tilde{\lambda }_t(\mathbf {z}^k))( 2\tilde{\lambda }_t(\mathbf {z}^k)^2 + 4\tilde{\lambda }_t(\mathbf {z}^k) + 3) \le 1\). Note that the function \(s(\tau ) := \tau (1+\tau )(2\tau ^2 + 4\tau +3)\) is increasing on \([0, 0.5(3-\sqrt{5}))\). By numerically computing \(\tilde{\lambda }_t(\mathbf {z}^k)\) we can observe that if \(\tilde{\lambda }_t(\mathbf {z}^k) \in [0, 0.21027]\), then \(\tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \tilde{\lambda }_t(\mathbf {z}^k)\). Hence, if \(\tilde{\lambda }_t(\mathbf {z}^k) \le \beta \) then \(\tilde{\lambda }_t(\mathbf {z}^{k+1}) \le \beta \). In other words, we can say that \(\left\{ \mathbf {z}^k\right\} \subset {\varOmega }_t(\beta )\).

We now prove (b). Indeed, if we take any \(\beta \in (0, 0.21027]\), we can show from (31) that

$$\begin{aligned} \tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \left( \frac{2\beta ^2 + 4\beta + 3}{1 - \beta ^2\left( 2\beta ^2 + 4\beta + 3\right) }\right) \tilde{\lambda }_{t}(\mathbf {z}^k)^2, \end{aligned}$$

where \(\bar{c} := \left( \frac{2\beta ^2 + 4\beta + 3}{1 - \beta ^2\left( 2\beta ^2 + 4\beta + 3\right) }\right) \in (0, +\,\infty )\). To guarantee \(\bar{c}\beta < 1\), we need to choose \(\beta > 0\) such that \(2\beta ^4 + 6\beta ^3 + 7\beta ^2 + 3\beta - 1 < 0\). This condition leads to \(\beta \in (0, 0.21027]\). Hence, for any \(0 < \beta \le 0.21027\), if \(\mathbf {z}^0\in \mathcal {Q}_{t}(\beta )\), then \(\tilde{\lambda }_{t}(\mathbf {z}^{k+1}) \le \bar{c}\tilde{\lambda }_{t}(\mathbf {z}^k)^2 < 1\) and, therefore, \(\big \{\tilde{\lambda }_{t}(\mathbf {z}^k)\big \}\) quadratically converges to zero.

(c) To prove the last conclusion in (c), from (66), we can show that

$$\begin{aligned} \mathrm {dist}_{\mathbf {z}^k}(\varvec{0}, \mathcal {A}_{\mathcal {Z}}(\mathbf {z}^{k+1}))&\le t\delta _k + t\left\| \nabla {F}(\mathbf {z}^k)\right\| _{\mathbf {z}^k}^{*} + t\left\| \mathbf {z}^{k+1} - \mathbf {z}^k\right\| _{\mathbf {z}^k} \\&\le t(\delta _k + \sqrt{\nu } + \alpha _k\tilde{\lambda }_t(\mathbf {z}^k)). \end{aligned}$$

Since \(\tilde{\lambda }_t(\mathbf {z}^k) \le \bar{c}^{2^k-1}\lambda _t(\mathbf {z}^0)^{2^k} \le \bar{c}^{2^k-1}\beta ^{2^k}\), \(\delta _k \le \frac{\tilde{\lambda }_t(\mathbf {z}^k)^2}{1 + \tilde{\lambda }_t(\mathbf {z}^k)}\), and \(\alpha _k = \frac{\tilde{\lambda }_t(\mathbf {z}^k)}{1 + \tilde{\lambda }_t(\mathbf {z}^k)}\), we obtain the last conclusion as a consequence of Lemma 3 with the same proof as in Theorem 2. \(\square \)

1.6 The proof of Lemma 4: the update rule for the penalty parameter

Let us define \(\bar{\mathbf {u}}^k := \mathcal {P}_{\mathbf {z}^k}\left( \mathbf {z}^k - \nabla ^2{F}(\mathbf {z}^k)^{-1}\nabla {F}(\mathbf {z}^k); t_k\right) \). Then, \(\lambda _{t_k}(\mathbf {z}^k)\) defined by (21) becomes \(\lambda _{t_k}(\mathbf {z}^k) := \Vert G_{\mathbf {z}^k}(\mathbf {z}^k; t_k)\Vert _{\mathbf {z}^k}^{*} = \Vert \mathbf {z}^k - \mathcal {P}_{\mathbf {z}^k}\left( \mathbf {z}^k - \nabla ^2{F}(\mathbf {z}^k)^{-1}\nabla {F}(\mathbf {z}^k); t_k\right) \Vert _{\mathbf {z}^k} = \Vert \mathbf {z}^k - \bar{\mathbf {u}}^k \Vert _{\mathbf {z}^k}\). Note that \(\bar{\mathbf {u}}^k = \mathcal {P}_{\mathbf {z}^k}\left( \mathbf {z}^k - \nabla ^2{F}(\mathbf {z}^k)^{-1}\nabla {F}(\mathbf {z}^k); t_k\right) \) leads to

$$\begin{aligned} -\,t_k \left( \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\right) \in \mathcal {A}(\bar{\mathbf {u}}^k). \end{aligned}$$

Combining this inclusion and (69) and using the monotonicity of \(\mathcal {A}\), we can derive

$$\begin{aligned}&\langle t_{k+1}\left[ \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {z}}^{k+1} - \mathbf {z}^k)\right] - t_k\left[ \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\right] , \bar{\mathbf {z}}^{k+1}\\&\quad - \bar{\mathbf {u}}^k\rangle \le 0. \end{aligned}$$

By rearranging this expression using \(t_{k+1} := (1-\sigma _{\beta })t_k\) from PFGN, we finally obtain

$$\begin{aligned} \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k}^2&\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\langle \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k), \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\rangle \nonumber \\&\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\Vert \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*}\Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k}, \end{aligned}$$

where the last inequality follows from the elementary Cauchy–Schwarz inequality. This inequality eventually leads to

$$\begin{aligned} \begin{array}{ll} \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k} &{}\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\Vert \nabla {F}(\mathbf {z}^k) + \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*}\\ &{}\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\left[ \Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} + \Vert \nabla ^2{F}(\mathbf {z}^k)(\bar{\mathbf {u}}^k - \mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} \right] \\ &{}\le \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\left[ \Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} + \Vert \bar{\mathbf {u}}^k - \mathbf {z}^k \Vert _{\mathbf {z}^k} \right] . \end{array} \end{aligned}$$

Now, by the triangle inequality, we have \(\Vert \bar{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k} \le \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k} + \Vert \bar{\mathbf {u}}^k - \mathbf {z}^k\Vert _{\mathbf {z}^k}\). This inequality is equivalent to \(\lambda _{t_{k+1}}(\mathbf {z}^k) \le \Vert \bar{\mathbf {z}}^{k+1} - \bar{\mathbf {u}}^k\Vert _{\mathbf {z}^k} + \lambda _{t_k}(\mathbf {z}^k)\) due to the definitions \(\lambda _{t_{k+1}}(\mathbf {z}^k) = \Vert \bar{\mathbf {z}}^{k+1} - \mathbf {z}^k\Vert _{\mathbf {z}^k}\) and \(\lambda _{t_k}(\mathbf {z}^k) = \Vert \bar{\mathbf {u}}^k - \mathbf {z}^k\Vert _{\mathbf {z}^k}\). Using the last estimate in the above inequality we get

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^k) \le \lambda _{t_k}(\mathbf {z}^k) + \frac{\sigma _{\beta }}{1 - \sigma _{\beta }}\left[ \Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*} + \lambda _{t_k}(\mathbf {z}^k) \right] , \end{aligned}$$

which is (32). The second inequality of (32) follows from the fact that \(\Vert \nabla {F}(\mathbf {z}^k)\Vert _{\mathbf {z}^k}^{*}\le \sqrt{\nu }\).

Let us denote by \(\gamma _k := \left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \left( \sqrt{\nu } + \lambda _{t_k}(\mathbf {z}^k)\right) \). For a given \(\beta \in (0, 1)\), we now assume that \(\lambda _{t_k}(\mathbf {z}^k) \le \beta \). Then, by using (32) in (27), and the monotonic increase of its right-hand side with respect to \(\lambda _{t_{k+1}}(\mathbf {z}^k)\), we can derive

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^{k+1})&\le \left( \frac{\lambda _{t_k}(\mathbf {z}^k) + \vert \gamma _k\vert + \delta _k}{1 - \lambda _{t_k}(\mathbf {z}^k) - \vert \gamma _k\vert - \delta _k}\right) ^2 + \frac{\delta _k}{\left( 1 - \lambda _{t_k}(\mathbf {z}^k) - \vert \gamma _k\vert - \delta _k\right) ^3} \nonumber \\&\le \left( \frac{\beta + \vert \gamma _k\vert + \delta _k}{1 - \beta - \vert \gamma _k\vert - \delta _k}\right) ^2 + \frac{\delta _k}{(1 - \beta - \vert \gamma _k\vert - \delta _k)^3}, \end{aligned}$$

as long as \(\beta + \vert \gamma _k\vert + \delta _k <1\). Let us denote \(\theta _k := \beta + \vert \gamma _k\vert \). By using the identity \(\frac{\beta + \vert \gamma _k\vert + \delta _k}{1 - \beta - \vert \gamma _k\vert -\delta _k} = \frac{\beta + \vert \gamma _k\vert }{1 - \beta - \vert \gamma _k\vert } + \frac{\delta _k}{(1-\theta _k)(1-\theta _k-\delta _k)}\), we can rewrite the last inequality as

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^{k+1})&\le \left( \frac{\theta _k}{1 - \theta _k }\right) ^2 \\&\quad + \left[ \frac{2\theta _k }{(1-\theta _k)^2(1-\theta _k -\delta _k)} + \frac{\delta _k}{(1-\theta _k)^2(1-\theta _k -\delta _k)^2} + \frac{1}{(1-\theta _k -\delta _k)^3}\right] \delta _k. \end{aligned}$$

If we choose \(\delta _k\) such that \(0\le \delta _k \le \theta _k(1-\theta _k)<1-\theta _k\), then the above inequality implies

$$\begin{aligned} \lambda _{t_{k+1}}(\mathbf {z}^{k+1})&\le \left( \frac{\theta _k}{1 - \theta _k }\right) ^2 \nonumber \\&\quad + \left[ \frac{2\theta _k(1-\theta _k)^2 + \theta _k(1 - \theta _k) + 1}{(1-\theta _k)^6}\right] \delta _k := \left( \frac{\theta _k}{1 - \theta _k }\right) ^2 + M_k\delta _k. \end{aligned}$$
(77)

Take any \(c\in (0, 1)\), .e.g., \(c := 0.95\), and choose \(\delta _k\) such that \(0 \le \delta _k \le \frac{(1-c^2)}{c^2M_k}\left( \frac{\theta _k}{1-\theta _k}\right) ^2\). Hence, in order to guarantee \(\lambda _{t_{k+1}}(\mathbf {z}^{k+1}) \le \beta \), by using (77), we can impose the condition \(\left( \frac{\theta _k}{1 - \theta _k }\right) ^2 + M_k\delta _k \le \frac{1}{c^2}\left( \frac{\theta _k}{1-\theta _k}\right) ^2 \le \beta \), which is equivalent to \(\frac{\theta _k}{1 - \theta _k} \le c\sqrt{\beta }\). This condition leads to \(\theta _k \ge \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }} \), and therefore, \(\vert \gamma _k\vert \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }} - \beta \). Since \(\vert \gamma _k\vert > 0\), we need to choose \(\beta \) such that \(0< \beta < 0.5(1 + 2c^2 - \sqrt{1 +4c^2})\).

Next, by the choice of \(\delta _k\), we require \(0 \le \delta _k \le \min \left\{ \frac{(1-c^2)}{c^2M_k}\left( \frac{\theta _k}{1-\theta _k}\right) ^2, \theta _k(1-\theta _k)\right\} \). Using the fact that \(M_k = \frac{2\theta _k(1-\theta _k)^2 + \theta _k(1-\theta _k) + 1}{(1-\theta _k)^6}\) from (77) and \(0 \le \theta _k \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }}\), we can show that the condition on \(\delta _k\) holds if we choose

$$\begin{aligned} \delta _k \le \bar{\delta } := \frac{(1-c^2)\beta }{(1+c\sqrt{\beta })^3\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] }. \end{aligned}$$

On the other hand, we have \(\vert \gamma _k\vert = \left| \left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \left( \sqrt{\nu } + \lambda _{t_k}(\mathbf {z}^k)\right) \right| \le \left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \left( \sqrt{\nu } + \beta \right) \). In order to guarantee that \(\vert \gamma _k\vert \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }} - \beta \), we use the above estimate to impose a condition \(\left( \frac{\sigma _{\beta }}{1-\sigma _{\beta }}\right) \le \frac{1}{\sqrt{\nu } + \beta }\left( \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }} - \beta \right) \), which leads to

$$\begin{aligned} \sigma _{\beta } \le \bar{\sigma }_{\beta } := \frac{c\sqrt{\beta } - \beta (1 + c\sqrt{\beta })}{(1+c\sqrt{\beta })\sqrt{\nu } + c\sqrt{\beta }}. \end{aligned}$$

This estimate is exactly the right-hand side of (33). Finally, using (32) and the definition of \(\gamma _k\), we can easily show that \(\lambda _{t_{k+1}}(\mathbf {z}^k) \le \lambda _{t_k}(\mathbf {z}^k) + \left| \gamma _k\right| \le \beta + \left| \gamma _k\right| \equiv \theta _k \le \frac{c\sqrt{\beta }}{1+c\sqrt{\beta }}\).

\(\square \)

1.7 The proof of Theorem 4: the worst-case iteration-complexity of PFGN

By Lemma 3 and \(\lambda _{t_{k+1}}(\mathbf {z}^k) \le \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }}\), we can see that \(\mathbf {z}^k\) is an \(\varepsilon \)-solution of (1) if \(t_k := M_0^{-1}\varepsilon \), where \(M_0 := \left( 1 - \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }}\right) ^{-1}\left( \sqrt{\nu } + \frac{c\sqrt{\beta }}{1 + c\sqrt{\beta }} + 2\bar{\delta }_t(\beta )\right) = {\mathcal {O}}(\sqrt{\nu })\).

On the other hand, by induction, it follows from the update rule \(t_{k+1} = (1-\sigma _{\beta })t_k\) of PFGN that \(t_k = (1-\sigma _{\beta })^kt_0\). Hence, \(\mathbf {z}^k\) is an \(\varepsilon \)-solution of (1) if we have \(t_k = (1-\sigma _{\beta })^k t_0 \le \frac{\varepsilon }{M_0}\). This condition leads to \(k\ln (1-\sigma _{\beta }) \ge \ln \left( \frac{\varepsilon }{M_0t_0}\right) \), which implies \(k \le \frac{\ln (\varepsilon /(M_0t_0))}{\ln (1-\sigma _{\beta })}\). Using an elementary inequality \(\ln (1-\sigma _{\beta }) \le -\sigma _{\beta }\), we can upper bound k as

$$\begin{aligned} k \ge \frac{1}{\bar{\sigma }_{\beta }}\ln \left( \frac{M_0 t_0}{\varepsilon }\right) = \frac{\left( (1+c\sqrt{\beta })\sqrt{\nu } + c\sqrt{\beta }\right) }{c\sqrt{\beta } - \beta (1 + c\sqrt{\beta })}\ln \left( \frac{M_0t_0}{\varepsilon }\right) . \end{aligned}$$

Consequently, the worst-case iteration-complexity of PFGN is \({\mathcal {O}}\left( \sqrt{\nu }\ln \left( \frac{\sqrt{\nu } t_0}{\varepsilon }\right) \right) \).

\(\square \)

1.8 The proof of Theorem 5: finding an initial point for PFGN

From (35), if we define \(\nabla {\hat{F}}(\hat{\mathbf {z}}^j) := \nabla {F}(\hat{\mathbf {z}}^k) - t_0^{-1}\tau _{k+1}\zeta _0\), then we still have \(\nabla ^2{\hat{F}}(\hat{\mathbf {z}}^j) = \nabla ^2{F}(\hat{\mathbf {z}}^j)\). Hence, the estimate (27) still holds for \(\hat{\lambda }_{\tau }(\hat{\mathbf {z}}^j)\).

Next, if we define \(\bar{\mathbf {v}}^j := \mathcal {P}_{\hat{\mathbf {z}}^j}\left( \hat{\mathbf {z}}^j - \nabla ^2{F}(\hat{\mathbf {z}}^j)^{-1}\left( \nabla {F}(\hat{\mathbf {z}}^j)- \tau _jt_0^{-1}\hat{\zeta }^0\right) ; t_0\right) \), then, by the definition of \(\mathcal {P}_{\hat{\mathbf {z}}^j}\), we have

$$\begin{aligned} -t_0\left[ \nabla ^2{F}(\hat{\mathbf {z}}^j)(\bar{\mathbf {v}}^j - \hat{\mathbf {z}}^j) + \nabla {F}(\hat{\mathbf {z}}^j) - \tau _jt_0^{-1}\hat{\zeta }_0\right] \in \mathcal {A}(\bar{\mathbf {v}}^j). \end{aligned}$$
(78)

Similarly, since \(\bar{\hat{\mathbf {z}}}^{j+1} := \mathcal {P}_{\hat{\mathbf {z}}^j}\left( \hat{\mathbf {z}}^j - \nabla ^2{F}(\hat{\mathbf {z}}^j)^{-1}\left( \nabla {F}(\hat{\mathbf {z}}^j)- \tau _{j+1}t_0^{-1}\hat{\zeta }^0\right) ; t_0\right) \), we have

$$\begin{aligned} -t_0\left[ \nabla ^2{F}(\hat{\mathbf {z}}^j)(\bar{\hat{\mathbf {z}}}^{j+1} - \hat{\mathbf {z}}^j) + \nabla {F}(\hat{\mathbf {z}}^j) - \tau _{j+1}t_0^{-1}\hat{\zeta }_0\right] \in \mathcal {A}(\bar{\hat{\mathbf {z}}}^{j+1}). \end{aligned}$$
(79)

Using (78), (79), and the monotonicity of \(\mathcal {A}\), we have

$$\begin{aligned} t_0\langle \nabla ^2{F}(\hat{\mathbf {z}}^j)(\bar{\hat{\mathbf {z}}}^{j+1} - \bar{\mathbf {v}}^j), \bar{\hat{\mathbf {z}}}^{j+1} - \bar{\mathbf {v}}^j\rangle \le (\tau _j - \tau _{j+1})\langle \hat{\zeta }_0, \bar{\mathbf {v}}^j - \bar{\hat{\mathbf {z}}}^{j+1}\rangle . \end{aligned}$$

Using \(\tau _{j+1} := \tau _j - {\varDelta }_j\) and the Cauchy–Schwarz inequality, the last inequality leads to

$$\begin{aligned} t_0\left\| \bar{\hat{\mathbf {z}}}^{j+1} - \bar{\mathbf {v}}^j\right\| _{\hat{\mathbf {z}}^j} \le {\varDelta }_j\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}. \end{aligned}$$
(80)

Now, similar to the proof of Lemma 4, using (80), we can derive

$$\begin{aligned} \hat{\lambda }_{\tau _{j+1}}(\hat{\mathbf {z}}^{j}) \le \hat{\lambda }_{\tau _{j}}(\hat{\mathbf {z}}^j) + \frac{{\varDelta }_j}{t_0}\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}. \end{aligned}$$
(81)

By the same argument as the proof of (33), we can show that with \(\hat{\gamma }_k := \frac{{\varDelta }_j}{t_0}\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}\), we have \(\left| \hat{\gamma }_k\right| \le \frac{c\sqrt{\eta }}{1+c\sqrt{\eta }} - \eta \). This shows that \({\varDelta }_j \le \frac{t_0}{\Vert \hat{\zeta }_0\Vert _{\hat{\mathbf {z}}^j}^{*}}\left( \frac{c\sqrt{\eta }}{1+c\sqrt{\eta }} - \eta \right) \), which is the first estimate of (37). The second estimate of (37) can be derived as in Lemma 4 using \(\eta \) instead of \(\beta \).

We prove (38). From (21) and (36), using the triangle inequality, we can upper bound

$$\begin{aligned} \lambda _{t_0}(\mathbf {z}^0)&:= \big \Vert \mathbf {z}^0 - \mathcal {P}_{\mathbf {z}^0}\big (\mathbf {z}^0 - \nabla ^2F(\mathbf {z}^0)^{-1}\nabla {F}(\mathbf {z}^0); t_0\big ) \big \Vert _{\mathbf {z}^0}\\&\overset{\tiny {\mathbf {z}^0 := \hat{\mathbf {z}}^j}}{=} \big \Vert \hat{\mathbf {z}}^j - \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\nabla {F}(\hat{\mathbf {z}}^j); t_0\big ) \big \Vert _{\hat{\mathbf {z}}^j}\\&\le \Big \Vert \hat{\mathbf {z}}^j - \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\big (\nabla {F}(\hat{\mathbf {z}}^j) - \tau _jt_0^{-1}\hat{\zeta }^0\big ); t_0\big )\Big \Vert _{\hat{\mathbf {z}}^j}\\&\quad + \Big \Vert \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\nabla {F}(\hat{\mathbf {z}}^j); t_0\big ) \\&\quad - \mathcal {P}_{\hat{\mathbf {z}}^j}\big (\hat{\mathbf {z}}^j - \nabla ^2F(\hat{\mathbf {z}}^j)^{-1}\big (\nabla {F}(\hat{\mathbf {z}}^j) - \tau _jt_0^{-1}\hat{\zeta }^0\big ); t_0\big ) \Big \Vert _{\hat{\mathbf {z}}^j}\\&\overset{(36),(67)}{\le }\hat{\lambda }_{\tau _j}(\hat{\mathbf {z}}^j) + \big \Vert t_0^{-1}\tau _j\nabla ^2{F}(\hat{\mathbf {z}}^j)^{-1}\hat{\zeta }^j\big \Vert _{\hat{\mathbf {z}}^j}\\&= \hat{\lambda }_{\tau _j}(\hat{\mathbf {z}}^j) + \tau _jt_0^{-1}\Vert \hat{\zeta }^0\Vert _{\hat{\mathbf {z}}^j}^{*}, \end{aligned}$$

which proves the first inequality of (38).

By [31, Corollary 4.2.1], we have \(\Vert \hat{\zeta }^0\Vert _{\hat{\mathbf {z}}^j}^{*} \le \kappa \Vert \hat{\zeta }^0\Vert _{\bar{\mathbf {z}}_F^{\star }}^{*}\), where \(\bar{\mathbf {x}}_F^{\star }\) and \(\kappa \) are given by (15) and below (15), respectively. Hence, \(\bar{{\varDelta }}_{\eta } := \frac{\mu _{\eta }}{\kappa \Vert \hat{\zeta }^0\Vert _{\bar{\mathbf {z}}_F^{\star }}^{*}} \le \bar{{\varDelta }}_{j}\). The second estimate of (38) follows from \(\tau _j := \tau - \sum _{l=0}^{j-1}{\varDelta }_j \le 1 - j\bar{{\varDelta }}_{\eta }\) due to the update rule (35) with \({\varDelta }_j := \bar{{\varDelta }}_{j} \ge \bar{{\varDelta }}_{\eta }\). In order to guarantee \(\lambda _{t_0}(\mathbf {z}^0) \le \beta \), it follows from (38) and the update rule of \(\tau _j\) that

$$\begin{aligned} j \ge \frac{1}{\bar{{\varDelta }}_{\eta }}\left( 1 - \frac{(\beta - \eta )t_0}{\kappa \Vert \hat{\zeta }^0\Vert _{\bar{\mathbf {z}}_F^{\star }}^{*}}\right) . \end{aligned}$$

Finally, substituting \(\bar{{\varDelta }}_{\eta } = \frac{t_0}{\kappa \Vert \hat{\zeta }_0\Vert _{\bar{\mathbf {z}}^{\star }_F}^{*}}\left( \frac{c\sqrt{\eta }}{1+c\sqrt{\eta }} - \eta \right) \) into this estimate and after simplifying the result, we obtain the remaining conclusion of Theorem 5. \(\square \)

1.9 The proof of Theorem 7: primal recovery for (4) in Algorithm 2

By the definition of \(\varphi \), we have \(\varphi (\mathbf {y}) := f^{*}(\mathbf {c}- L^{*}\mathbf {y}) = f^{*}(t^{-1}(\mathbf {c}- L^{*}\mathbf {y})) - \nu \ln (t)\) due to the self-concordant logarithmic homogeneity of f. Using the property of the Legendre transformation \(f^{*}\) of f, we can express this function as

$$\begin{aligned} \varphi (\mathbf {y}) = t^{-1}\max _{\mathbf {x}\in \mathrm {int}\left( \mathcal {K}\right) }\left\{ \langle \mathbf {c}- L^{*}\mathbf {y}, \mathbf {x}\rangle - tf(\mathbf {x}) \right\} - \nu \ln (t). \end{aligned}$$

We show that the point \(\mathbf {x}^k\) given by (56) solves the above maximization problem. We can write down the optimality condition of the above maximization problem as

$$\begin{aligned} \mathbf {c}- L^{*}\mathbf {y}^{k+1} - t_{k+1}\nabla {f}(\mathbf {x}^{k+1}) = 0, \end{aligned}$$

which leads to \(\nabla {f}(\mathbf {x}^{k+1}) = t_{k+1}^{-1}(\mathbf {c}- L^{*}\mathbf {y}^{k+1})\). On the other hand, by the well-known property of f [31], we have \(\mathbf {x}^{k+1} = \nabla {f^{*}}(\nabla {f}(\mathbf {x}^{k+1})) = \nabla {f^{*}}\left( t_{k+1}^{-1}(\mathbf {c}- L^{*}\mathbf {y}^{k+1})\right) \in \mathrm {int}\left( \mathcal {K}\right) \).

Now, we prove (57). Note that \(\mathbf {c}- L^{*}\mathbf {y}^{k+1} - t_{k+1}\nabla {f}(\mathbf {x}^{k+1}) = 0\) and \(\Vert \nabla {f}(\mathbf {x})\Vert _{\mathbf {x}}^{*}\le \sqrt{\nu }\), which leads to

$$\begin{aligned} \Vert L^{*}\mathbf {y}^{k+1} -\mathbf {c}\Vert _{\mathbf {x}^{k+1}}^{*} = t_{k+1}\Vert \nabla {f}(\mathbf {x}^{k+1})\Vert _{\mathbf {x}^{k+1}}^{*} \le t_{k+1} \sqrt{\nu }. \end{aligned}$$

Since \(t_{k+1} \le \varepsilon \), this estimate leads to the first inequality of (57).

From (24), there exists \(\mathbf {e}^k\in \mathbb {R}^p\) such that \(\mathbf {e}^k \in \nabla {\varphi }(\mathbf {y}^k) + \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\) and \(\Vert \mathbf {e}^k \Vert _{\mathbf {y}^{k}}^{*} \le \delta _k\). This condition leads to

$$\begin{aligned} \mathbf {e}^k + \nabla {\varphi }(\mathbf {y}^{k+1}) - \nabla {\varphi }(\mathbf {y}^k) - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k) \in \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1}). \end{aligned}$$

Therefore, we have

$$\begin{aligned}&\mathrm {dist}_{\mathbf {y}^{k+1}}\Big (0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\Big ) \le \Vert \mathbf {e}^k + \nabla {\varphi }(\mathbf {y}^{k+1}) \nonumber \\&\qquad - \nabla {\varphi }(\mathbf {y}^k) - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k)\Vert _{\mathbf {y}^{k+1}}^{*} \nonumber \\&\quad \le \Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*} + \Vert \nabla {\varphi }(\mathbf {y}^{k+1}) - \nabla {\varphi }(\mathbf {y}^k) \nonumber \\&\qquad - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k)\Vert _{\mathbf {y}^{k+1}}^{*}. \end{aligned}$$
(82)

To estimate the right-hand side of this inequality, we define \(M_k := \Vert \nabla {\varphi }(\mathbf {y}^{k+1}) - \nabla {\varphi }(\mathbf {y}^k) - \nabla ^2{\varphi }(\mathbf {y}^k)(\mathbf {y}^{k+1} - \mathbf {y}^k)\Vert _{\mathbf {y}^{k+1}}^{*}\). With the same proof as [31, Theorem 4.1.14], we can show that

$$\begin{aligned} M_k \le \left( 1 - \Vert \mathbf {y}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k}\right) ^{-2}\Vert \mathbf {y}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k}^2 \le \frac{\left( \delta (\mathbf {y}^k) + \lambda _{t_{k+1}}(\mathbf {y}^k)\right) ^2}{\left( 1- \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta (\mathbf {y}^k)\right) ^2}. \end{aligned}$$
(83)

Here, we use \(\Vert \mathbf {y}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k} \le \Vert \mathbf {y}^{k+1} - \bar{\mathbf {y}}^{k+1}\Vert _{\mathbf {y}^k} + \Vert \bar{\mathbf {y}}^{k+1} - \mathbf {y}^k\Vert _{\mathbf {y}^k} = \delta (\mathbf {y}^k) + \lambda _{t_{k+1}}(\mathbf {y}^k)\) by the definitions of \(\lambda _{t_{+}}(\mathbf {y})\) in (21) and of \(\delta (\mathbf {y})\) above (27). Substituting (83) into (82) we get

$$\begin{aligned} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\right) \le \Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*} + \frac{\left( \delta (\mathbf {y}^k) + \lambda _{t_{k+1}}(\mathbf {y}^k)\right) ^2}{\left( 1- \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta (\mathbf {y}^k)\right) ^2}. \end{aligned}$$
(84)

Next, it remains to estimate \(\Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*}\). Indeed, we have

$$\begin{aligned} \begin{array}{ll} \Vert \mathbf {e}^k\Vert _{\mathbf {y}^{k+1}}^{*} &{}\le \big (1 - \Vert \mathbf {y}^{k+1} - \mathbf {y}^{k}_{t_k}\Vert _{\mathbf {y}^k}\big )^{-1}\Vert \mathbf {e}^k\Vert _{\mathbf {y}^k} \le \left( 1- \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta (\mathbf {y}^k)\right) ^{-1}\Vert \mathbf {e}^k\Vert _{\mathbf {y}^k} \\ &{}\le \frac{\delta _k}{1 - \lambda _{t_{k+1}}(\mathbf {y}^k) - \delta _k}. \end{array} \end{aligned}$$

Using this estimate into (84) and \(\lambda _{t_{k+1}}(\mathbf {y}^k) \le c\sqrt{\beta }(1+c\sqrt{\beta })^{-1}\) from Lemma 4, we obtain

$$\begin{aligned} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\right)\le & {} \frac{\delta _k(1+c\sqrt{\beta })}{(1-\delta _k(1+c\sqrt{\beta }))}\\&+ \frac{(\delta _k(1+c\sqrt{\beta })+c\sqrt{\beta })^2}{(1 -\delta _k(1+c\sqrt{\beta }))^2}. \end{aligned}$$

Substituting an upper bound \(\delta _t := \frac{(1-c^2)\beta }{(1+c\sqrt{\beta })^3\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] }\) of \(\delta _k\) from Lemma 4 into the last estimate and simplifying the result, we get

$$\begin{aligned} {}\mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \nabla {\varphi }(\mathbf {y}^{k+1}) + t_{k+1}^{-1}\partial {\psi }(\mathbf {y}^{k+1})\right) \le \theta (c,\beta ),{} \end{aligned}$$
(85)

where \(\theta (c,\beta )\) is defined as

$$\begin{aligned} \theta (c,\beta ):= & {} \frac{(1-c^2)\beta }{(1+c\sqrt{\beta })^2\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] -(1-c^2)\beta }\nonumber \\&+ \left( \frac{(1-c^2)\beta + c\sqrt{\beta }(1+c\sqrt{\beta })^2\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] }{(1+c\sqrt{\beta })^2\left[ 3c\sqrt{\beta } + c^2\beta + (1+c\sqrt{\beta })^3\right] -(1-c^2)\beta }\right) ^2.\nonumber \\ \end{aligned}$$
(86)

Using the fact that \(c \in (0, 1)\) and \(0 \le \beta < 0.5(1 + 2c^2 - \sqrt{1 + 4c^2})\), we have \(\theta (c,\beta ) \le 1\). Since \(\nabla {\varphi }(\cdot ) = -L\nabla {f^{*}}( \mathbf {c}-L^{*}(\cdot ) ) = -t_{k+1}^{-1}L\nabla {f^{*}}(t_{k+1}^{-1}(\mathbf {c}-L^{*}(\cdot )))\) due to (48), using (56) we can show that \(\nabla {\varphi }(\mathbf {y}^{k+1}) = t_{k+1}^{-1}L\mathbf {x}^{k+1}\). Plugging this expression into (85) and noting that \(\partial {\psi }(\cdot ) = \partial {g}^{*}(\cdot ) + \mathbf {b}\), we obtain

$$\begin{aligned} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( L\mathbf {x}^{k+1} - \mathbf {b}, \partial {g^{*}}(\mathbf {y}^{k+1})\right)= & {} \mathrm {dist}_{\mathbf {y}^{k+1}}\left( 0, \mathbf {b}- L\mathbf {x}^{k+1} + \partial {g^{*}}(\mathbf {y}^{k+1})\right) \\\le & {} t_{k+1}\theta (c,\beta ). \end{aligned}$$

Let \(\mathbf {s}^{k+1} = \pi _{\partial {g^{*}}(\mathbf {y}^{k+1})}(L\mathbf {x}^{k+1} - \mathbf {b})\) be the projection of \(L\mathbf {x}^{k+1} - \mathbf {b}\) onto \(\partial {g^{*}}(\mathbf {y}^{k+1})\). Then, \(\mathbf {s}^{k+1} \in \partial {g^{*}}(\mathbf {y}^{k+1})\), and hence, \(\mathbf {y}^{k+1} \in \partial {g}(\mathbf {s}^{k+1})\), which shows the second term of (57). Using this relation in the last inequality and the definition of \(\mathbf {s}^{k+1}\), we obtain \(\Vert L\mathbf {x}^{k+1} - \mathbf {b}- \mathbf {s}^{k+1}\Vert _{\mathbf {y}^{k+1}}^{*} \le t_{k+1}\theta (c,\beta )\), which is the third term of (57). Finally, since \(\theta (c,\beta ) \le 1\), we have \(\max \left\{ \sqrt{\nu }, \theta (c,\beta )\right\} = \sqrt{\nu }\). Using (57), we can conclude that \((\mathbf {x}^k, \mathbf {s}^k)\) is an \(\varepsilon \)-solution of (3) if \(\sqrt{\nu }t_k \le \varepsilon \). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran-Dinh, Q., Sun, T. & Lu, S. Self-concordant inclusions: a unified framework for path-following generalized Newton-type algorithms. Math. Program. 177, 173–223 (2019). https://doi.org/10.1007/s10107-018-1264-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1264-6

Keywords

Mathematics Subject Classification

Navigation