Abstract
In this paper, we present a conditional gradient type (CGT) method for solving a class of composite optimization problems where the objective function consists of a (weakly) smooth term and a (strongly) convex regularization term. While including a strongly convex term in the subproblems of the classical conditional gradient method improves its rate of convergence, it does not cost per iteration as much as general proximal type algorithms. More specifically, we present a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex. While implementation of the CGT method requires explicitly estimating problem parameters like the level of smoothness of the first term in the objective function, we also present a few variants of this method which relax such estimation. Unlike general proximal type parameter free methods, these variants of the CGT method do not require any additional effort for computing (sub)gradients of the objective function and/or solving extra subproblems at each iteration. We then generalize these methods under stochastic setting and present a few new complexity results. To the best of our knowledge, this is the first time that such complexity results are presented for solving stochastic weakly smooth nonconvex and (strongly) convex optimization problems.
Similar content being viewed by others
Notes
Reddi et al. [34] was released several months after releasing the first version of this work.
References
Cartis, C., Gould, N.I.M., Toint, P.L.: On the complexity of steepest descent, newton’s and regularized newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)
Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)
Dang, C.D., Lan, G.: Stochastic block mirror descent methods for nonsmooth and stochastic optimization. SIAM J. Optim. 25(2), 856–881 (2015)
Devolder, O., Glineur, F., Nesterov, Y.E.: First-order methods with inexact oracle: the strongly convex case. December 2013, CORE Discussion Paper 2013/16
Dunn, J.C.: Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals. SIAM J. Control Optim. 17(2), 674–701 (1979)
Dunn, J.C.: Convergence rates for conditional gradient sequences generated by implicit step length rules. SIAM J. Control Optim. 18(5), 473–487 (1980)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)
Garber, D., Hazan, E.: A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization. arXiv e-prints (2013)
Ghadimi, S., Lan, G., Zhang, H.: Generalized uniformly optimal methods for nonlinear programming, manuscript. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA (August 2015)
Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for constrained nonconvex stochastic programming. Math. Program. 155, 267–305 (2016)
Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic optimization. Math. Program. 156, 59–99 (2016)
Ghadimi, S., Lan, G.: Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4), 2341–2368 (2013)
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23, 2061–2089 (2013)
Grandvalet, Y., Bengio, Y.: Semi-supervised learning by entropy minimization. In: Advances in Neural Information Processing Systems (NIPS), p. 17 (2005)
Guélat, J., Marcotte, P.: Some comments on wolfe’s ’away step’. Math. Progr. 35(1), 110–119 (1986)
Harchaoui, Z., Juditsky, A., Nemirovski, A.S.: Conditional gradient algorithms for machine learning. NIPS OPT Workshop (2012)
Ito, M.: New results on subgradient methods for strongly convex optimization problems with a unified analysis. Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan, Tokyo (April 2015)
Jaggi, M.: Revisiting frank-wolfe: projection-free sparse convex optimization. In: The 30th International Conference on Machine Learning (2013)
Jensen, T., Jørgensen, J.H., Hansen, P., Jensen, S.: Implementation of an optimal first-order method for strongly convex total variation regularization. BIT Numer. Math. 52, 329–356 (2012)
Jiang, B., Zhang, S.: Iteration Bounds for Finding the \(\epsilon \)-Stationary Points for Structured Nonconvex Optimization. arXiv e-prints (2014)
Kakade, S.M., Shalev-Shwartz, S., Tewari, A.: Regularization techniques for learning with matrices. J. Mach. Learn. Res. 13, 1865–1890 (2012)
Lan, G.: The complexity of large-scale convex programming under a linear optimization oracle. Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA, (June 2013). http://www.optimization-online.org
Lan, G.: Bundle-level type methods uniformly optimal for smooth and non-smooth convex optimization. Math. Progr. 149(1), 1–45 (2015)
Lan, G., Zhou, Y.: Conditional gradient sliding for convex optimization. SIAM J. Optim. 26, 1379–1409 (2016)
Luss, R., Teboulle, M.: Conditional gradient algorithms for rank one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)
Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent in function space. Proc. NIPS 12, 512–518 (1999)
Nemirovski, A.S., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Nemirovskii, A.S., Nesterov, Y.E.: Optimal methods for smooth convex minimization. Zh. Vichisl. Mat. Fiz. 25, 356–369 (1985). (In Russian)
Nesterov, Y.E.: Complexity bounds for primal-dual methods minimizing the model of objective function. Technical Report, CORE Discussion Papers, Februray (2015)
Nesterov, Y.E.: Universal gradient methods for convex optimization problems. Math. Progr. Ser. A (2014). https://doi.org/10.1007/s10107-014-0790-0
Nesterov, Y.E.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston (2004)
Nesterov, Y.E.: Gradient methods for minimizing composite objective functions. Math. Progr. Ser. B 140, 125–161 (2013)
Pshenichnyi, B.N., Danilin, I.M.: Numerical Methods in Extremal Problems. Mir Publishers, Moscow (1978)
Reddi, S.J., Sra, S., Poczos, B., Smola, A.: Stochastic Frank-Wolfe Methods for Nonconvex Optimization. arXiv e-prints (2016)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Acknowledgements
The author is very grateful to the associate editor and the anonymous referees for their valuable comments for improving the quality and presentation of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done while the author was working at the School of Mathematics of the Institute for Research in Fundamental Sciences (IPM), P.O. Box: 19395-5746, Tehran, Iran, and supported by a grant from IPM.
Rights and permissions
About this article
Cite this article
Ghadimi, S. Conditional gradient type methods for composite nonlinear and stochastic optimization. Math. Program. 173, 431–464 (2019). https://doi.org/10.1007/s10107-017-1225-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-017-1225-5
Keywords
- Iteration complexity
- Nonconvex optimization
- Strongly convex optimization
- Conditional gradient type methods
- Unified methods
- Weakly smooth functions