Abstract
We shed light on the relation between the discrete adjoints of multistep backward differentiation formula (BDF) methods and the solution of the adjoint differential equation. To this end, we develop a functional-analytic framework based on a constrained variational problem and introduce the notion of weak adjoint solutions of ordinary differential equations. We devise a Petrov-Galerkin finite element (FE) interpretation of the BDF method and its discrete adjoint scheme obtained by reverse internal numerical differentiation. We show how the FE approximation of the weak adjoint is computed by the discrete adjoint scheme and prove its convergence in the space of normalized functions of bounded variation. We also show convergence of the discrete adjoints to the classical adjoints on the inner time interval. Finally, we give numerical results for non-adaptive and fully adaptive BDF schemes. The presented framework opens the way to carry over techniques on global error estimation from FE methods to BDF methods.
Similar content being viewed by others
References
Adams, R., Fournier, J.: Sobolev Spaces, Pure and Applied Mathematics (Amsterdam), vol. 140, 2nd edn. Elsevier/Academic Press, Amsterdam (2003)
Albersmeyer, J., Bock, H.G.: Efficient sensitivity generation for large scale dynamic systems. Technical report, SPP 1253 Preprints, University of Erlangen (2009)
Albersmeyer, J., Bock, H. G.: Sensitivity Generation in an Adaptive BDF-Method. In: H. G. Bock, E. Kostina, X. Phu, R. Rannacher (eds.) Modeling, simulation and optimization of complex rocesses. In: Proceedings of the International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam, pp. 15–24. Springer, Berlin, Heidelberg (2008)
Albersmeyer, J.: Adjoint based algorithms and numerical methods for sensitivity generation and optimization of large scale dynamic systems. Ph.D. thesis, Ruprecht-Karls-Universität Heidelberg (2010)
Alt, H.W.: Lineare Funktionalanalysis, 4th edn. Springer, Berlin (2002)
Berkovitz, L.: Optimal Control Theory, Applied Mathematical Sciences, vol. 12. Springer, New York (1974)
Bock, H. G., Plitt, K. J.: A Multiple Shooting algorithm for direct solution of optimal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242–247. Pergamon Press, Budapest (1984)
Bock, H.G.: Numerical treatment of inverse problems in chemical reaction kinetics. In: Ebert, K., Deuflhard, P., Jäger, W. (eds.) Modelling of Chemical Reaction Systems, Springer Series in Chemical Physics, vol. 18, pp. 102–125. Springer, Heidelberg (1981)
Bock, H.G.: Randwertproblemmethoden zur Parameteridentifizierung in Systemen nichtlinearer Differentialgleichungen, Bonner Mathematische Schriften, vol. 183. Universität Bonn, Bonn (1987)
Bock, H.G., Schlöder, J.P., Schulz, V.: Numerik großer differentiell-algebraischer Gleichungen—simulation und optimierung. In: Schuler, H. (ed.) Prozeßsimulation, pp. 35–80. VCH Verlagsgesellschaft mbH, Weinheim (1994)
Böttcher, K., Rannacher, R.: Adaptive error control in solving ordinary differential equations by the discontinuous galerkin method. Preprint 96-53, SFB 359, University of Heidelberg (1996)
Cao, Y., Li, S., Petzold, L.: Adjoint sensitivity analysis for differential-algebraic equations: algorithms and software. J. Comput. Appl. Math. 149, 171–191 (2002)
Cao, Y., Petzold, L.: A posteriori error estimation and global error control for ordinary differential equations by the adjoint method. SIAM J. Sci. Comput. 26, 359–374 (2004)
Eriksson, K., Estep, D., Hansbo, P., Johnson, C.: Introduction to adaptive methods for differential equations. Acta Numerica, 4, pp. 105–158 (1995)
Hairer, E., Nørsett, S.: Solving Ordinary Differential Equations I, Springer Series in Computational Mathematics, vol. 8, 2nd edn. Springer, Berlin (1993)
Hartman, P.: Ordinary differential equations, Classics in Applied Mathematics, vol. 38. SIAM, Philadelphia, PA (2002). Corrected reprint of the second (1982) edition [Birkhäuser, Boston, MA; MR0658490 (83e:34002)]
Henrici, P.: Error Propagation for Difference Methods. Robert E. Krieger Publishing Co., Huntington, NY (1970). Reprint of the 1963 edition
Ioffe, A., Tihomirov, V.: Theory of Extremal Problems, Studies in Mathematics and its Applications, vol. 6. North-Holland Publishing Co., Amsterdam (1979)
Johnson, C.: Numerical Solutions of Partial Differential Equations by the Finite Element Method. Cambridge University Press, Cambridge (1987)
Johnson, C.: Error estimates and adaptive time-step control for a class of one-step methods for stiff ordinary differential equations. SIAM J. Numer. Anal. 25(4), 908–926 (1988)
Kirches, C., Wirsching, L., Bock, H., Schlöder, J.: Efficient direct multiple shooting for nonlinear model predictive control on long horizons. J. Process Control 22, 540–550 (2012)
Kolmogorov, A., Fomin, S.: Introductory real analysis. Revised English edition. Translated from the Russian and edited by Richard A. Silverman. Prentice-Hall Inc, Englewood Cliffs (1970)
Lang, J., Verwer, J.: On global error estimation and control for initial value problems. SIAM J. Sci. Comput. 29, 1460–1475 (2007)
Luenberger, D.: Optimization by vector space methods. Wiley Professional Paperback Series. Wiley, New York (1969)
Moon, K.S., Szepessy, A., Tempone, R., Zouraris, G.: Convergence rates for adaptive approximation of ordinary differential equations. Numer. Math. 96, 99–129 (2003)
Natanson, I.: Theorie der Funktionen einer reellen Veränderlichen. Akademie-Verlag, Berlin: Übersetzung nach der zweiten russischen Auflage von 1957, Herausgegeben von Karl Bögel, Vierte Auflage, Mathematische Lehrbücher und Monographien, I. Mathematische Lehrbücher, Band VI, Abteilung (1975)
Sandu, A.: Reverse automatic differentiation of linear multistep methods. In: Bischof, C., Bücker, H., Hovland, P., Naumann, U., Utke, J. (eds.) Advances in Automatic Differentiation. Lecture Notes in Computational Science and Engineering, vol. 64, pp. 1–12. Springer, Berlin (2008)
Shampine, L., Gordon, M.K.: Computer Solution of Ordinary Differential Equations. Freeman, San Francisco (1975)
Shampine, L.: Numerical solution of ordinary differential equations. Chapman & Hall, New York (1994)
Walther, A.: Automatic differentiation of explicit Runge–Kutta methods for optimal control. Comput. Optim. Appl. 36, 83–108 (2007)
Werner, D.: Funktionalanalysis. Springer, Berlin (2000)
Wirsching, L., Bock, H., Diehl, M.: Fast NMPC of a chain of masses connected by springs. In: Proceedings of the 2006 IEEE International Conference on Control Applications (CCA), pp. 591–596 (2006). doi:10.1109/CACSD-CCA-ISIC.2006.4776712
Wloka, J.: Funktionalanalysis und Anwendungen. Walter de Gruyter, Berlin, New York (1971). De Gruyter Lehrbuch
Acknowledgments
The authors express their gratitude to Christian Kirches and Andreas Potschka for valuable discussions on the subject. Scientific support of the DFG-Graduate-School 220 “Heidelberg Graduate School of Mathematical and Computational Methods for the Sciences” is gratefully acknowledged. Funding has been graciously provided by the German Ministry of Education and Research (Grant ID: 03MS649A), and the Helmholtz association through the SBCancer programme. The research leading to these results has received funding from the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement \(\hbox {n}^\mathrm{o}\) FP7-ICT-2009-4 248940.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Lagrange multipliers in \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\)
Recall that functions in \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\), restricted to the open interval \(({t_\mathrm{s}},{t_\mathrm{f}})\), form a dense subset of the space \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) of all quadratically Lebesgue-integrable functions from \(({t_\mathrm{s}},{t_\mathrm{f}})\) to \(\mathbb{R }^d\). Similarly, recall that the subset \(C^1[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) is dense in the Sobolev space \(H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\) of all \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\)-functions with weak derivative in \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) (see [1, Ch.3]). Furthermore, both spaces \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) and \(H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\) are Hilbert spaces.
Solving (5) on \(H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\), the Lagrangian \(\mathcal{L }:H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d \rightarrow \mathbb{R }\) reads
using the \(L^2\)-scalar product and the Lagrange multiplier \(\varvec{\lambda }\). The optimality condition of (5) is based on the Fréchet derivative of \(\mathcal{L }\) at \((\varvec{y},\varvec{\lambda })\) in direction \((\varvec{w},\varvec{\chi })\) which exists due to Fréchet differentiability of \(J\) and [18, Ch.0§0.2.5]
The necessary condition for a stationary point \((\varvec{y},\varvec{\lambda })\in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) of (5) is that \(\mathcal{L }^\prime (\varvec{y},\varvec{\lambda })(\varvec{w},\varvec{\chi })=0\) holds for all directions \((\varvec{w},\varvec{\chi })\in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\). Choosing \(\varvec{w}=\varvec{0} \in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \) and only varying \(\varvec{\chi } \in L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) the necessary condition reads
which possesses the same unique solution \(\varvec{y}\in C^1[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) as (1). Taking now \(\varvec{\chi }=\varvec{0} \in L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) and only varying \(\varvec{w} \in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\) one obtains using integration by parts
which possesses the same solution as (2).
Appendix 2: Duality pairing between \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) and \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d\)
According to the Riesz Representation Theorem [24, Ch.5§5.5] for every continuous linear functional \(\mathfrak L \) on \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]\) there exists a unique \(\varPsi \in {{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]\) such that
where the Riemann-Stieltjes integral [26, Ch.VIII§6] is utilized. The Banach space \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]\) consists of all normalized functions of bounded variation on \([{t_\mathrm{s}},{t_\mathrm{f}}]\) that are zero in \({t_\mathrm{s}}\) and continuous from the right on \(({t_\mathrm{s}},{t_\mathrm{f}})\). It is equipped with the total variation norm
where the supremum is taken over all partitions \({t_\mathrm{s}}=t_0<\dots <t_m={t_\mathrm{f}}\) of \([{t_\mathrm{s}},{t_\mathrm{f}}]\). According to the Riesz Representation Theorem, for each \(\varPsi \) the value of the total variation norm coincides with the value of the dual norm given by
The dual of the finite Cartesian product \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) is the finite Cartesian product \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) of the duals with duality pairing
and dual norm \( \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d} = \max _{1\le i\le d} \left\| \varPsi _i \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]}\), see [33, Ch.II§12.1].
Rights and permissions
About this article
Cite this article
Beigel, D., Mommer, M.S., Wirsching, L. et al. Approximation of weak adjoints by reverse automatic differentiation of BDF methods. Numer. Math. 126, 383–412 (2014). https://doi.org/10.1007/s00211-013-0570-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-013-0570-4