Abstract
We propose two limited-memory BFGS (L-BFGS) trust-region methods for large-scale optimization with linear equality constraints. The methods are intended for problems where the number of equality constraints is small. By exploiting the structure of the quasi-Newton compact representation, both proposed methods solve the trust-region subproblems nearly exactly, even for large problems. We derive theoretical global convergence results of the proposed algorithms, and compare their numerical effectiveness and performance on a variety of large-scale problems.
Similar content being viewed by others
References
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2007)
Brust, J.J., Burdakov, O., Erway, J.B., Marcia, R.F., Yuan, Y.X.: Shape-changing L-SR1 trust-region methods. Technical Report 2016-2, Department of Mathematics, Wake Forest University (2016)
Brust, J.J., Burdakov, O.P., Erway, J.B., Marcia, R.F.: Dense initializations for limited-memory quasi-Newton methods. Comput. Optim. Appl. 74(1), 121–142 (2019). https://doi.org/10.1007/s10589-019-00112-x
Brust, J.J., Erway, J.B., Marcia, R.F.: On solving L-SR1 trust-region subproblems. Comput. Optim. Appl. 66(2), 245–266 (2017)
Burdakov, O., Gong, L., Yuan, Y.X., Zikrin, S.: On efficiently combining limited memory and trust-region techniques. Math. Program. Comput. 9, 101–134 (2016)
Burdakov, O., Martinez, J., Pilotta, E.: A limited-memory multipoint symmetric secant method for bound constrained optimization. Ann. Oper. Res. 117, 51–70 (2002)
Burke, J.V., Wiegmann, A., Xu, L.: Limited memory BFGS updating in a trust-region framework. Technical Report, University of Washington (1996)
Byrd, R.H., Gilbert, J.C., Nocedal, J.: A trust region method based on interior point techniques for nonlinear programming. Math. Program. Ser. A 89, 149–185 (2000)
Byrd, R.H., Hribar, M., Nocedal, J.: An interior point algorithm for large-scale nonlinear programming. SIAM J. Optim. 9, 877–900 (1999)
Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-Newton matrices and their use in limited-memory methods. Math. Program. 63, 129–156 (1994)
Celis, M., Dennis Jr., J., Tapia, R.: A trust region strategy for equality constrained optimization. Technical Report 84-1, Mathematical Sciences Department, Rice University (1984)
Coleman, T., Branch, M.A., Grace, A.: Optimization Toolbox for Use with MATLAB. MathWorks, Natick (1999)
Coleman, T., Verma, A.: A preconditioned conjugate gradient approach to linear equality constrained minimization. Comput. Optim. Appl. 20, 61–72 (2001)
Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-Region Methods. SIAM, Philadelphia (2000)
DeGuchy, O., Erway, J.B., Marcia, R.F.: Compact representation of the full Broyden class of quasi-Newton updates. Numer Linear Algebra Appl 25(5), e2186 (2018)
Dolan, E., Moré, J.: Benchmarking optimization software with performance profiles. Math. Program. 91, 201–213 (2002)
Erway, J.B., Marcia, R.F.: Algorithm 943: MSS: MATLAB software for L-BFGS trust-region subproblems for large-scale optimization. ACM Trans. Math. Softw. 40(4), 28:1–28:12 (2014). https://doi.org/10.1145/2616588
Hager, W.W.: Updating the inverse of a matrix. SIAM Rev. 31(2), 221–239 (1989)
Lalee, M., Nocedal, J., Plantenga, T.: On the implementation of an algorithm for large-scale equality constrained optimization. SIAM J. Optim. 8(3), 682–706 (1998)
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4, 553–572 (1983)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
Powell, M., Yuan, Y.: A trust region algorithm for equality constrained optimization. Math. Program. 49, 189–211 (1991)
Saunders, M.A.: PDCO: Primal-dual interior method for convex objectives (2002–2015). http://www.stanford.edu/group/SOL/software/pdco.html. Accessed 21 June 2018
Steihaug, T.: The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 20, 626–637 (1983)
Vardi, A.: A trust region algorithm for equality constrained minimization: convergence properties and implementation. SIAM J. Numer. Anal. 22(3), 575–591 (1985)
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006)
Waltz, R., Morales, J., Nocedal, J., Orban, D.: An interior algorithm for nonlinear optimization that combines line search and trust region steps. SIAM. J. Optim. 9, 877–900 (1999)
Yuan, Y.X.: Trust region algorithms for constrained optimization. Technical report, State Key Laboratory of Scientific and Engineering Computing, Beijing
Zhijiang, S.: RSQP toolbox for MATLAB (2006). https://www.mathworks.com/matlabcentral/fileexchange/13046-rsqp-toolbox-for-matlab. Accessed 21 June 2018
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. R. Marcia’s research is partially supported by NSF Grant IIS 1741490. C. Petra also acknowledges support from the LDRD Program of Lawrence Livermore National Laboratory under Projects 16-ERD-025 and 17-SI-005.
J. J. Brust was formerly at University of California Merced, Merced, CA.
Appendix A
Appendix A
Notation
Section 2: Background
\({\mathbf {s}}_{k-1}={\mathbf {x}}_{k} - {\mathbf {x}}_{k-1} \qquad \qquad \qquad \qquad \quad {\mathbf {S}}_k =\displaystyle [ {\mathbf {s}}_{k-l} \,\, \cdots \,\, {\mathbf {s}}_{k-1}]\) |
\({\mathbf {y}}_{k-1}=\nabla f({\mathbf {x}}_{k}) - \nabla f({\mathbf {x}}_{k-1}) \qquad \quad {\mathbf {Y}}_k = \displaystyle \left[ {\mathbf {y}}_{k-l} \,\, \cdots \,\, {\mathbf {y}}_{k-1}\right] \) |
\({\mathbf {S}}_k^T {\mathbf {Y}}_k={\mathbf {L}}_k + {\mathbf {T}}_k \qquad \qquad \qquad \qquad \quad {\mathbf {D}}_k=\text {diag}({\mathbf {S}}_k^T{\mathbf {Y}}_k)\) |
\({\mathbf {B}}^{{(k)}}_0={\gamma _{k}} {\mathbf {I}}_n \qquad \qquad \qquad \qquad \qquad \qquad {\mathbf {H}}_k=\mathbf {B}^{-1}_k\) |
\(\gamma _{k}={\mathbf {y}}_{k-1}^T {\mathbf {y}}_{k-1} / {\mathbf {y}}_{k-1}^T {\mathbf {s}}_{k-1} \qquad \qquad \,\, \delta _{k} = {1/\gamma _k}\) |
\({\mathbf {B}}_k=\gamma _k {\mathbf {I}}_n + \widehat{\varvec{\Psi }}_k \widehat{\varvec{\Xi }}_k \widehat{\varvec{\Psi }}_k^T \qquad \qquad \qquad \widehat{\varvec{\Psi }}_k = [ {\mathbf {S}}_k \ \ {\mathbf {Y}}_k]\) |
\({\mathbf {H}}_k=\delta _k {\mathbf {I}}_n + \widehat{\varvec{\Psi }}_k \widehat{{\mathbf {M}}}_k \widehat{\varvec{\Psi }}_k^T\) |
\(\widehat{\varvec{\Xi }}_k = \displaystyle \gamma _k\left[ \begin{array}{cc} - {\mathbf {S}}_k^T {\mathbf {S}}_k &{} - {\mathbf {L}}_k \\ - {\mathbf {L}}_k^T &{} \ \ \gamma _k {\mathbf {D}}_k \end{array}\right] ^{-1}\) |
\(\widehat{{\mathbf {M}}}_k = -(\gamma _k^{2} \widehat{\varvec{\Xi }}_k^{-1} + \gamma _k\widehat{\varvec{\Psi }}_k^T \widehat{\varvec{\Psi }}_k)^{-1}\) |
Section 3: Trust-Region Subproblem Solution without an Inequality Constraint
\({\mathbf {K}}= \displaystyle \left[ \begin{array}{c c} {\mathbf {B}}_k &{} {\mathbf {A}}^T \\ {\mathbf {A}} &{} {\mathbf {0}} \end{array} \right] \) \( \begin{array}{l} \varvec{\Omega }_k = ( {\mathbf {A}} {\mathbf {B}}_k^{-1} {\mathbf {A}}^T )^{-1} \\ \varvec{\Psi }_k =[ {\mathbf {A}}^T \ \ \ \widehat{\varvec{\Psi }}_k ]\end{array}\) | |
\(\mathbf {K}^{-1} = \displaystyle \left[ \begin{array}{c c}{\mathbf {B}}_k^{-1} \!- {\mathbf {B}}_k^{-1}{\mathbf {A}}^T \varvec{\Omega }_k {\mathbf {A}} {\mathbf {B}}_k^{-1} \ \ &{} {\mathbf {B}}_k^{-1}{\mathbf {A}}^T \varvec{\Omega }_k \\ ({\mathbf {B}}_k^{-1}{\mathbf {A}}^T \varvec{\Omega }_k)^T \ \ &{} -\varvec{\Omega }_k \\ \end{array} \right] \) | |
\({\mathbf {V}}_k = {\mathbf {B}}_k^{-1} \!-\! {\mathbf {B}}_k^{-1}{\mathbf {A}}^T \varvec{\Omega }_k {\mathbf {A}} {\mathbf {B}}_k^{-1}\) | |
\({\mathbf {V}}_k = \delta _k {\mathbf {I}}_n + \varvec{\Psi }_k {\mathbf {M}}_k \varvec{\Psi }_k^T\) | |
\({\mathbf {W}}_k = {\mathbf {B}}_k^{-1}{\mathbf {A}}^T \varvec{\Omega }_k\) | |
\({\mathbf {M}}_k = \displaystyle \left[ \begin{array}{c c} - \delta _k^2 \varvec{\Omega }_k &{} - \delta _k\varvec{\Omega }_k {\mathbf {C}}_k\\ - \delta _k {\mathbf {C}}_k^T \varvec{\Omega }_k &{} \ \widehat{{\mathbf {M}}}_k \!-\! {\mathbf {C}}_k^T\varvec{\Omega }_k{\mathbf {C}}_k \end{array} \right] \) | |
\({\mathbf {C}}_k = {\mathbf {A}}\widehat{\varvec{\Psi }}_k\widehat{{\mathbf {M}}}_k \) |
Section 4: Trust-Region Subproblem Solution with an \(\ell _2\)-Norm Inequality Constraint
\({\mathbf {H}}_k(\sigma ) = ({\mathbf {B}}_k + \sigma {\mathbf {I}})^{-1} \qquad \qquad \qquad \qquad \quad \,\,\, {\mathbf {H}}_k = {\mathbf {H}}_k(0) \) |
\(\varvec{\Phi }_k(\sigma ) = {\mathbf {I}}_n - {\mathbf {A}}^T\varvec{\Omega }_k(\sigma ) {\mathbf {A}}{\mathbf {H}}_k(\sigma ) \qquad \qquad \varvec{\Phi }_k = \varvec{\Phi }_k(0) \) |
\({\mathbf {H}}_k(\sigma ) = \frac{1}{\gamma _k + \sigma }{\mathbf {I}}_n + \widehat{\varvec{\Psi }}_k\widehat{{\mathbf {M}}}_k(\sigma )\widehat{\varvec{\Psi }}_k^T\) |
\(\varvec{\Omega }_k(\sigma ) = ({\mathbf {A}}{\mathbf {H}}_k(\sigma ){\mathbf {A}}^T)^{-1}\) |
\(\widehat{{\mathbf {M}}}_k(\sigma ) = -\big ((\gamma _k + \sigma )^2 \widehat{\varvec{\Xi }}_k^{-1} + (\gamma _k + \sigma )\widehat{\varvec{\Psi }}_k^T\widehat{\varvec{\Psi }}_k \big )^{-1}\) |
\({\mathbf {V}}_k(\sigma ) = {\mathbf {H}}_k(\sigma ) - {\mathbf {H}}_k(\sigma ){\mathbf {A}}^T \varvec{\Omega }_k(\sigma ){\mathbf {A}} {\mathbf {H}}_k(\sigma )\) |
\({\mathbf {V}}_k(\sigma ) = {\mathbf {H}}_k(\sigma ) \varvec{\Phi }_k(\sigma )\) |
\({\mathbf {s}}(\sigma ) = - {\mathbf {H}}_k(\sigma ) \varvec{\Phi }_k(\sigma ) {\mathbf {g}}_k\) |
\({\mathbf {s}}'(\sigma ) = - {\mathbf {H}}_k(\sigma ) \varvec{\Phi }_k(\sigma ) {\mathbf {s}}(\sigma )\) |
Section 5: Trust-Region Subproblem Solution with a Shape-Changing Norm Inequality Constraint
\({\mathbf {U}}_k = -\varvec{\Psi }_k{\mathbf {M}}_k \varvec{\Psi }_k^T\) |
\({\mathbf {A}}^T = \mathbf {Q}_{1} \mathbf {R}_{1}\qquad \qquad \qquad \qquad \qquad \qquad \,\, \mathbf {Q}_{1} \mathbf {Q}_{1}^T = {\mathbf {A}}^T ({\mathbf {A}} {\mathbf {A}}^T)^{-1} {\mathbf {A}} \) |
\({\mathbf {P}} = {\mathbf {I}}_n - {\mathbf {A}}^T ({\mathbf {A}} {\mathbf {A}}^T)^{-1} {\mathbf {A}} \qquad \qquad \qquad {\mathbf {P}}\widehat{\varvec{\Psi }}_k = \widehat{{\mathbf {Q}}}_2\widehat{{\mathbf {R}}}_2 \) |
\(\widehat{{\mathbf {V}}}_2\widehat{\varvec{\Lambda }}_k \widehat{{\mathbf {V}}}^T_2 = \widehat{{\mathbf {R}}}_2 (\widehat{{\mathbf {M}}}_k-{\mathbf {C}}_k^T\varvec{\Omega }_k{\mathbf {C}}_k) \widehat{{\mathbf {R}}}^T_2 \) |
\(\mathbf {Q}_{2} = \widehat{{\mathbf {Q}}}_2 \widehat{{\mathbf {V}}}_2\) |
\({\mathbf {Q}} = \left[ \mathbf {Q}_{1} \, \mathbf {Q}_{2} \, \mathbf {Q}_{3} \right] \) |
\( \mathbf {Q}_{\parallel } = \left[ \mathbf {Q}_{1} \, \mathbf {Q}_{2} \right] \qquad \qquad \qquad \qquad \qquad \quad \mathbf {Q}_{\perp } = \mathbf {Q}_{3} \) |
\({\mathbf {z}} = \left[ \begin{array}{c} \mathbf {z}_{1} \\ \mathbf {z}_{2} \\ \mathbf {z}_{3} \end{array} \right] \qquad \qquad \qquad \qquad \qquad \qquad \quad {\mathbf {s}} = {\mathbf {Q}} {\mathbf {z}}\) |
\( \mathbf {z}_{\parallel } = \mathbf {z}_{2} = \mathbf {Q}_{2}^T {\mathbf {s}} \qquad \qquad \qquad \qquad \qquad \quad \mathbf {z}_{\perp } = \mathbf {z}_{3} = \mathbf {Q}_{3}^T {\mathbf {s}} \) |
\( \mathbf {g}_{\parallel } = \mathbf {Q}_{2}^T {\mathbf {g}}_k \quad \qquad \qquad \qquad \qquad \qquad \qquad \mathbf {g}_{\perp } = \mathbf {Q}_{\perp }^T {\mathbf {g}}_k \) |
\({\mathbf {V}}_k = {\mathbf {Q}} \varvec{\Lambda } {\mathbf {Q}}^T = \left[ \mathbf {Q}_{1} \, \mathbf {Q}_{2} \, \mathbf {Q}_{3} \right] \left[ \begin{array}{c c c} {\mathbf {0}} &{} \\ &{} \delta _k {\mathbf {I}} - \widehat{\varvec{\Lambda }}_k&{} \\ &{} &{} \delta _k {\mathbf {I}} \end{array} \right] \left[ \begin{array}{c} \mathbf {Q}_{1}^T \\ \mathbf {Q}_{2}^T \\ \mathbf {Q}_{3}^T \\ \end{array} \right] \) |
Rights and permissions
About this article
Cite this article
Brust, J.J., Marcia, R.F. & Petra, C.G. Large-scale quasi-Newton trust-region methods with low-dimensional linear equality constraints. Comput Optim Appl 74, 669–701 (2019). https://doi.org/10.1007/s10589-019-00127-4
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-019-00127-4