Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

Drusvyatskiy, D.; Ioffe, A. D.; Lewis, A. S.

doi:10.1007/s10107-019-01432-w

Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

Full Length Paper
Series A
Published: 27 September 2019

Volume 185, pages 357–383, (2021)
Cite this article

Mathematical Programming Submit manuscript

1530 Accesses
20 Citations
Explore all metrics

Abstract

We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss–Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a nearby point. Consequently, we (1) show that the step-sizes can be reliably used to terminate the algorithm, (2) prove that as long as the step-sizes tend to zero, every limit point of the iterates is stationary, and (3) show that conditions, akin to classical quadratic growth, imply that the step-sizes linearly bound the distance of the iterates to the solution set. The latter so-called error bound property is typically used to establish linear (or faster) convergence guarantees. Analogous results hold when the step-size is replaced by the square root of the decrease in the model’s value. We complete the paper with extensions to when the models are minimized only inexactly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An algorithm for nonsmooth optimization by successive piecewise linearization

Article 06 April 2018

Local minimizers of the Crouzeix ratio: a nonsmooth optimization case study

Article 18 December 2021

Nonsmooth and Nonconvex Optimization via Approximate Difference-of-Convex Decompositions

Article 19 March 2019

Notes

Since the first version of this work [22], a number of new algorithms were developed building on our viewpoint. For example [16] analyze stochastic subgradient methods, [35, 54] consider algorithms for adversarial learning and saddle-point problems, while [49] discuss generic line-search procedures using Taylor-like models built from Bregman divergences.
One such univariate example is \(\min _x f(x)=|\frac{1}{2}x^2+x|\). The prox-linear algorithm for convex composite minimization [23, Algorithm 5.1] initiated to the right of the origin—a minimizer of f—will generate a sequence \(x_k\rightarrow 0\) with \(|f'(x_k)|\rightarrow 1\).
By stationary, we mean that zero is a limiting subgradient of the function at the point.

References

Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)
MathSciNet Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)
MathSciNet Google Scholar
Bai, Y., Duchi, J., Mei, S.: Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs (2019). Preprint arXiv:1903.00184
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A.S., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Google Scholar
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
MathSciNet Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2, Ser. A), 459–494 (2014)
MathSciNet Google Scholar
Burke, J.V.: Descent methods for composite nondifferentiable optimization problems. Math. Program. 33(3), 260–279 (1985)
MathSciNet Google Scholar
Burke, J.V., Ferris, M.C.: A Gauss–Newton method for convex composite optimization. Math. Program. 71(2), 179–194 (1995)
MathSciNet Google Scholar
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
MathSciNet Google Scholar
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for l-1 regularized optimization. Math. Program. 157(2), 375–396 (2016)
MathSciNet Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)
MathSciNet Google Scholar
Charisopoulos, V., Davis, D., Díaz, M., Drusvyatskiy, D.: Composite optimization for robust blind deconvolution (2019). arXiv preprint arXiv:1901.01624
Clarke, F.H., Ledyaev, Y., Stern, R.I., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Texts in Mathematics, vol. 178. Springer, New York (1998)
Google Scholar
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
MathSciNet Google Scholar
De Giorgi, E., Marino, A., Tosques, M.: Problemi di evoluzione in spazi metrici e curve di massima pendenza. Atti Acad. Nat. Lincei Rend. Cl. Sci. Fiz. Mat. Natur. 68, 180–187 (1980)
Google Scholar
Drusvyatskiy, D.: Slope and geometry in variational mathematics. PhD thesis, Cornell University (2013)
Drusvyatskiy, D., Ioffe, A.D.: Quadratic growth and critical point stability of semi-algebraic functions. Math. Program. 153(2, Ser. A), 635–653 (2015)
MathSciNet Google Scholar
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Curves of descent. SIAM J. Control Optim. 53(1), 114–138 (2015)
MathSciNet Google Scholar
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Transversality and alternating projections for nonconvex sets. Found. Comput. Math. 15(6), 1637–1651 (2015)
MathSciNet Google Scholar
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria (2016). arXiv:1610.03446 (Ver. 1)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
MathSciNet Google Scholar
Drusvyatskiy, D., Mordukhovich, B.S., Nghia, T.T.A.: Second-order growth, tilt stability, and metric regularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014)
MathSciNet Google Scholar
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Prog. (2016). https://doi.org/10.1007/s1010, arXiv:1605.00125
Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inf. Infer. J. IMA 8(3), 471–529 (2018)
MathSciNet Google Scholar
Ekeland, I.: On the variational principle. J. Math. Anal. Appl. 47, 324–353 (1974)
MathSciNet Google Scholar
Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization (Lexington, Ky., 1980). Mathematical Programming Studies, vol. 17, pp. 67–76. Springer, Berlin (1982)
Google Scholar
Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization, pp. 67–76. Springer, Berlin (1982)
Google Scholar
Geiping, J., Moeller, M.: Composite optimization by nonconvex majorization–minimization. SIAM J. Imaging Sci. 11(4), 2494–2528 (2018)
MathSciNet Google Scholar
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2, Ser. A), 59–99 (2016)
MathSciNet Google Scholar
Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)
MathSciNet Google Scholar
Ioffe, A.D.: Metric regularity and subdifferential calculus. Uspekhi Mat. Nauk 55(3(333)), 103–162 (2000)
MathSciNet Google Scholar
Ioffe, A.D.: Variational Analysis of Regular Mappings. Springer Monographs in Mathematics. Springer, Berlin (2017)
Google Scholar
Jin, C., Netrapalli, P., Jordan, M.I.: Minmax optimization: stable limit points of gradient descent ascent are locally optimal (2019). arXiv preprint arXiv:1902.00618
Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization: Regularity, Calculus, Methods and Applications. Nonconvex Optimization and Its Applications, vol. 60. Kluwer Academic Publishers, Dordrecht (2002)
Google Scholar
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)
MathSciNet Google Scholar
Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Program. 158, 1–46 (2015)
MathSciNet Google Scholar
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46/47(1–4), 157–178 (1993). Degeneracy in optimization problems
MathSciNet Google Scholar
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4(Ser. R–3), 154–158 (1970)
MathSciNet Google Scholar
Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. Cas de l’application prox. C. R. Acad. Sci. Paris Sér. A-B 274, A163–A165 (1972)
Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1, Ser. A), 177–205 (2006)
MathSciNet Google Scholar
Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
MathSciNet Google Scholar
Nesterov, Y.: Modified Gauss–Newton scheme with worst case guarantees for global performance. Optim. Methods Softw. 22(3), 469–483 (2007)
MathSciNet Google Scholar
Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program. 112(1, Ser. B), 159–181 (2008)
MathSciNet Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1, Ser. B), 125–161 (2013)
MathSciNet Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
Google Scholar
Noll, D., Prot, O., Rondepierre, A.: A proximity control algorithm to minimize nonsmooth and nonconvex functions. Pac. J. Optim. 4(3), 571–604 (2008)
MathSciNet Google Scholar
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181, 1–35 (2017)
MathSciNet Google Scholar
Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)
MathSciNet Google Scholar
Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Google Scholar
Powell, M.J.D.: General algorithms for discrete nonlinear approximation calculations. In: Chui, C.K., Schumaker, L.L., Ward, J.D. (eds.) Approximation Theory, IV (College Station, Tex., 1983), pp. 187–218. Academic Press, New York (1983)
Google Scholar
Powell, M.J.D.: On the global convergence of trust region algorithms for unconstrained minimization. Math. Program. 29(3), 297–303 (1984)
MathSciNet Google Scholar
Rafique, H., Liu, M., Lin, Q., Yang, T.: Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
MathSciNet Google Scholar
Rockafellar, R.T.: Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization. Math. Oper. Res. 6(3), 424–436 (1981)
MathSciNet Google Scholar
Rockafellar, R.T., Dontchev, A.L.: Implicit Functions and Solution Mappings. Monographs in Mathematics. Springer, Berlin (2009)
Google Scholar
Scheinberg, K., Tang, X.: Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis. Mathematical Programming, pp. 1–35. Springer, Berlin (2016)
Google Scholar
Wild, S.M.: Solving Derivative-Free Nonlinear Least Squares Problems with POUNDERS. Argonne National Lab, Lemont (2014)
Google Scholar
Wright, S.J.: Convergence of an inexact algorithm for composite nonsmooth optimization. IMA J. Numer. Anal. 10(3), 299–321 (1990)
MathSciNet Google Scholar
Yuan, Y.: On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Math. Program. 31(3), 269–285 (1985)
MathSciNet Google Scholar
Zhang, R., Treiman, J.: Upper-Lipschitz multifunctions and inverse subdifferentials. Nonlinear Anal. 24(2), 273–286 (1995)
MathSciNet Google Scholar

Download references

Acknowledgements

We thank the two anonymous referees and the Associate Editor for their insightful comments, which have improved the exposition of this work.

Author information

Authors and Affiliations

Department of Mathematics, University of Washington, Seattle, WA, 98195, USA
D. Drusvyatskiy
Department of Mathematics, Technion-Israel Institute of Technology, 32000, Haifa, Israel
A. D. Ioffe
School of Operations Research and Information Engineering, Cornell University, Ithaca, NY, USA
A. S. Lewis

Authors

D. Drusvyatskiy
View author publications
You can also search for this author in PubMed Google Scholar
A. D. Ioffe
View author publications
You can also search for this author in PubMed Google Scholar
A. S. Lewis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Drusvyatskiy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Research of Drusvyatskiy was partially supported by the AFOSR YIP award FA9550-15-1-0237. Research of Lewis was supported in part by National Science Foundation Grant DMS-1208338. Research of all three authors was supported in part by by the US-Israel Binational Science Foundation Grant 2014241.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Drusvyatskiy, D., Ioffe, A.D. & Lewis, A.S. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Math. Program. 185, 357–383 (2021). https://doi.org/10.1007/s10107-019-01432-w

Download citation

Received: 04 November 2016
Accepted: 06 September 2019
Published: 27 September 2019
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10107-019-01432-w

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

Abstract

Access this article

Similar content being viewed by others

An algorithm for nonsmooth optimization by successive piecewise linearization

Local minimizers of the Crouzeix ratio: a nonsmooth optimization case study

Nonsmooth and Nonconvex Optimization via Approximate Difference-of-Convex Decompositions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria

Abstract

Access this article

Similar content being viewed by others

An algorithm for nonsmooth optimization by successive piecewise linearization

Local minimizers of the Crouzeix ratio: a nonsmooth optimization case study

Nonsmooth and Nonconvex Optimization via Approximate Difference-of-Convex Decompositions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation