Abstract
We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss–Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a nearby point. Consequently, we (1) show that the step-sizes can be reliably used to terminate the algorithm, (2) prove that as long as the step-sizes tend to zero, every limit point of the iterates is stationary, and (3) show that conditions, akin to classical quadratic growth, imply that the step-sizes linearly bound the distance of the iterates to the solution set. The latter so-called error bound property is typically used to establish linear (or faster) convergence guarantees. Analogous results hold when the step-size is replaced by the square root of the decrease in the model’s value. We complete the paper with extensions to when the models are minimized only inexactly.
Similar content being viewed by others
Notes
Since the first version of this work [22], a number of new algorithms were developed building on our viewpoint. For example [16] analyze stochastic subgradient methods, [35, 54] consider algorithms for adversarial learning and saddle-point problems, while [49] discuss generic line-search procedures using Taylor-like models built from Bregman divergences.
One such univariate example is \(\min _x f(x)=|\frac{1}{2}x^2+x|\). The prox-linear algorithm for convex composite minimization [23, Algorithm 5.1] initiated to the right of the origin—a minimizer of f—will generate a sequence \(x_k\rightarrow 0\) with \(|f'(x_k)|\rightarrow 1\).
By stationary, we mean that zero is a limiting subgradient of the function at the point.
References
Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137(1–2, Ser. A), 91–129 (2013)
Bai, Y., Duchi, J., Mei, S.: Proximal algorithms for constrained composite optimization, with applications to solving low-rank SDPs (2019). Preprint arXiv:1903.00184
Beck, A., Teboulle, M.: A fast iterative shrinkage–thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Bolte, J., Daniilidis, A., Lewis, A.S., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165(2), 471–507 (2017)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1–2, Ser. A), 459–494 (2014)
Burke, J.V.: Descent methods for composite nondifferentiable optimization problems. Math. Program. 33(3), 260–279 (1985)
Burke, J.V., Ferris, M.C.: A Gauss–Newton method for convex composite optimization. Math. Program. 71(2), 179–194 (1995)
Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751–779 (2005)
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for l-1 regularized optimization. Math. Program. 157(2), 375–396 (2016)
Cartis, C., Gould, N.I.M., Toint, P.L.: On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming. SIAM J. Optim. 21(4), 1721–1739 (2011)
Charisopoulos, V., Davis, D., Díaz, M., Drusvyatskiy, D.: Composite optimization for robust blind deconvolution (2019). arXiv preprint arXiv:1901.01624
Clarke, F.H., Ledyaev, Y., Stern, R.I., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Texts in Mathematics, vol. 178. Springer, New York (1998)
Davis, D., Drusvyatskiy, D.: Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1), 207–239 (2019)
De Giorgi, E., Marino, A., Tosques, M.: Problemi di evoluzione in spazi metrici e curve di massima pendenza. Atti Acad. Nat. Lincei Rend. Cl. Sci. Fiz. Mat. Natur. 68, 180–187 (1980)
Drusvyatskiy, D.: Slope and geometry in variational mathematics. PhD thesis, Cornell University (2013)
Drusvyatskiy, D., Ioffe, A.D.: Quadratic growth and critical point stability of semi-algebraic functions. Math. Program. 153(2, Ser. A), 635–653 (2015)
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Curves of descent. SIAM J. Control Optim. 53(1), 114–138 (2015)
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Transversality and alternating projections for nonconvex sets. Found. Comput. Math. 15(6), 1637–1651 (2015)
Drusvyatskiy, D., Ioffe, A.D., Lewis, A.S.: Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria (2016). arXiv:1610.03446 (Ver. 1)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Drusvyatskiy, D., Mordukhovich, B.S., Nghia, T.T.A.: Second-order growth, tilt stability, and metric regularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014)
Drusvyatskiy, D., Paquette, C.: Efficiency of minimizing compositions of convex functions and smooth maps. Math. Prog. (2016). https://doi.org/10.1007/s1010, arXiv:1605.00125
Duchi, J.C., Ruan, F.: Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inf. Infer. J. IMA 8(3), 471–529 (2018)
Ekeland, I.: On the variational principle. J. Math. Anal. Appl. 47, 324–353 (1974)
Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization (Lexington, Ky., 1980). Mathematical Programming Studies, vol. 17, pp. 67–76. Springer, Berlin (1982)
Fletcher, R.: A model algorithm for composite nondifferentiable optimization problems. In: Sorensen, D.C., Wets, R.J.B. (eds.) Nondifferential and Variational Techniques in Optimization, pp. 67–76. Springer, Berlin (1982)
Geiping, J., Moeller, M.: Composite optimization by nonconvex majorization–minimization. SIAM J. Imaging Sci. 11(4), 2494–2528 (2018)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2, Ser. A), 59–99 (2016)
Goldstein, A.A.: Optimization of Lipschitz continuous functions. Math. Program. 13(1), 14–22 (1977)
Ioffe, A.D.: Metric regularity and subdifferential calculus. Uspekhi Mat. Nauk 55(3(333)), 103–162 (2000)
Ioffe, A.D.: Variational Analysis of Regular Mappings. Springer Monographs in Mathematics. Springer, Berlin (2017)
Jin, C., Netrapalli, P., Jordan, M.I.: Minmax optimization: stable limit points of gradient descent ascent are locally optimal (2019). arXiv preprint arXiv:1902.00618
Klatte, D., Kummer, B.: Nonsmooth Equations in Optimization: Regularity, Calculus, Methods and Applications. Nonconvex Optimization and Its Applications, vol. 60. Kluwer Academic Publishers, Dordrecht (2002)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier (Grenoble) 48(3), 769–783 (1998)
Lewis, A.S., Wright, S.J.: A proximal method for composite minimization. Math. Program. 158, 1–46 (2015)
Luo, Z.-Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46/47(1–4), 157–178 (1993). Degeneracy in optimization problems
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Rev. Française Informat. Recherche Opérationnelle 4(Ser. R–3), 154–158 (1970)
Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. Cas de l’application prox. C. R. Acad. Sci. Paris Sér. A-B 274, A163–A165 (1972)
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1, Ser. A), 177–205 (2006)
Nesterov, Y.: A method for solving the convex programming problem with convergence rate \(O(1/k^{2})\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
Nesterov, Y.: Modified Gauss–Newton scheme with worst case guarantees for global performance. Optim. Methods Softw. 22(3), 469–483 (2007)
Nesterov, Y.: Accelerating the cubic regularization of Newton’s method on convex problems. Math. Program. 112(1, Ser. B), 159–181 (2008)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1, Ser. B), 125–161 (2013)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)
Noll, D., Prot, O., Rondepierre, A.: A proximity control algorithm to minimize nonsmooth and nonconvex functions. Pac. J. Optim. 4(3), 571–604 (2008)
Ochs, P., Fadili, J., Brox, T.: Non-smooth non-convex Bregman minimization: unification and new algorithms. J. Optim. Theory Appl. 181, 1–35 (2017)
Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)
Polyak, B.T.: Gradient methods for the minimisation of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Powell, M.J.D.: General algorithms for discrete nonlinear approximation calculations. In: Chui, C.K., Schumaker, L.L., Ward, J.D. (eds.) Approximation Theory, IV (College Station, Tex., 1983), pp. 187–218. Academic Press, New York (1983)
Powell, M.J.D.: On the global convergence of trust region algorithms for unconstrained minimization. Math. Program. 29(3), 297–303 (1984)
Rafique, H., Liu, M., Lin, Q., Yang, T.: Non-convex min–max optimization: provable algorithms and applications in machine learning (2018). arXiv preprint arXiv:1810.02060
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14(5), 877–898 (1976)
Rockafellar, R.T.: Proximal subgradients, marginal values, and augmented Lagrangians in nonconvex optimization. Math. Oper. Res. 6(3), 424–436 (1981)
Rockafellar, R.T., Dontchev, A.L.: Implicit Functions and Solution Mappings. Monographs in Mathematics. Springer, Berlin (2009)
Scheinberg, K., Tang, X.: Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis. Mathematical Programming, pp. 1–35. Springer, Berlin (2016)
Wild, S.M.: Solving Derivative-Free Nonlinear Least Squares Problems with POUNDERS. Argonne National Lab, Lemont (2014)
Wright, S.J.: Convergence of an inexact algorithm for composite nonsmooth optimization. IMA J. Numer. Anal. 10(3), 299–321 (1990)
Yuan, Y.: On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Math. Program. 31(3), 269–285 (1985)
Zhang, R., Treiman, J.: Upper-Lipschitz multifunctions and inverse subdifferentials. Nonlinear Anal. 24(2), 273–286 (1995)
Acknowledgements
We thank the two anonymous referees and the Associate Editor for their insightful comments, which have improved the exposition of this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research of Drusvyatskiy was partially supported by the AFOSR YIP award FA9550-15-1-0237. Research of Lewis was supported in part by National Science Foundation Grant DMS-1208338. Research of all three authors was supported in part by by the US-Israel Binational Science Foundation Grant 2014241.
Rights and permissions
About this article
Cite this article
Drusvyatskiy, D., Ioffe, A.D. & Lewis, A.S. Nonsmooth optimization using Taylor-like models: error bounds, convergence, and termination criteria. Math. Program. 185, 357–383 (2021). https://doi.org/10.1007/s10107-019-01432-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-019-01432-w