Approximation of weak adjoints by reverse automatic differentiation of BDF methods

Beigel, Dörte; Mommer, Mario S.; Wirsching, Leonard; Bock, Hans Georg

doi:10.1007/s00211-013-0570-4

Approximation of weak adjoints by reverse automatic differentiation of BDF methods

Published: 20 July 2013

Volume 126, pages 383–412, (2014)
Cite this article

Numerische Mathematik Aims and scope Submit manuscript

Dörte Beigel¹,
Mario S. Mommer¹,
Leonard Wirsching¹ &
…
Hans Georg Bock¹

300 Accesses
2 Citations
Explore all metrics

Abstract

We shed light on the relation between the discrete adjoints of multistep backward differentiation formula (BDF) methods and the solution of the adjoint differential equation. To this end, we develop a functional-analytic framework based on a constrained variational problem and introduce the notion of weak adjoint solutions of ordinary differential equations. We devise a Petrov-Galerkin finite element (FE) interpretation of the BDF method and its discrete adjoint scheme obtained by reverse internal numerical differentiation. We show how the FE approximation of the weak adjoint is computed by the discrete adjoint scheme and prove its convergence in the space of normalized functions of bounded variation. We also show convergence of the discrete adjoints to the classical adjoints on the inner time interval. Finally, we give numerical results for non-adaptive and fully adaptive BDF schemes. The presented framework opens the way to carry over techniques on global error estimation from FE methods to BDF methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

Mixed virtual element method for integro-differential equations of parabolic type

Article 16 April 2024

Finite difference method for the Riesz space distributed-order advection–diffusion equation with delay in 2D: convergence and stability

Article 18 April 2024

References

Adams, R., Fournier, J.: Sobolev Spaces, Pure and Applied Mathematics (Amsterdam), vol. 140, 2nd edn. Elsevier/Academic Press, Amsterdam (2003)
Google Scholar
Albersmeyer, J., Bock, H.G.: Efficient sensitivity generation for large scale dynamic systems. Technical report, SPP 1253 Preprints, University of Erlangen (2009)
Albersmeyer, J., Bock, H. G.: Sensitivity Generation in an Adaptive BDF-Method. In: H. G. Bock, E. Kostina, X. Phu, R. Rannacher (eds.) Modeling, simulation and optimization of complex rocesses. In: Proceedings of the International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam, pp. 15–24. Springer, Berlin, Heidelberg (2008)
Albersmeyer, J.: Adjoint based algorithms and numerical methods for sensitivity generation and optimization of large scale dynamic systems. Ph.D. thesis, Ruprecht-Karls-Universität Heidelberg (2010)
Alt, H.W.: Lineare Funktionalanalysis, 4th edn. Springer, Berlin (2002)
MATH Google Scholar
Berkovitz, L.: Optimal Control Theory, Applied Mathematical Sciences, vol. 12. Springer, New York (1974)
Book Google Scholar
Bock, H. G., Plitt, K. J.: A Multiple Shooting algorithm for direct solution of optimal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242–247. Pergamon Press, Budapest (1984)
Bock, H.G.: Numerical treatment of inverse problems in chemical reaction kinetics. In: Ebert, K., Deuflhard, P., Jäger, W. (eds.) Modelling of Chemical Reaction Systems, Springer Series in Chemical Physics, vol. 18, pp. 102–125. Springer, Heidelberg (1981)
Chapter Google Scholar
Bock, H.G.: Randwertproblemmethoden zur Parameteridentifizierung in Systemen nichtlinearer Differentialgleichungen, Bonner Mathematische Schriften, vol. 183. Universität Bonn, Bonn (1987)
Google Scholar
Bock, H.G., Schlöder, J.P., Schulz, V.: Numerik großer differentiell-algebraischer Gleichungen—simulation und optimierung. In: Schuler, H. (ed.) Prozeßsimulation, pp. 35–80. VCH Verlagsgesellschaft mbH, Weinheim (1994)
Chapter Google Scholar
Böttcher, K., Rannacher, R.: Adaptive error control in solving ordinary differential equations by the discontinuous galerkin method. Preprint 96-53, SFB 359, University of Heidelberg (1996)
Cao, Y., Li, S., Petzold, L.: Adjoint sensitivity analysis for differential-algebraic equations: algorithms and software. J. Comput. Appl. Math. 149, 171–191 (2002)
Article MATH MathSciNet Google Scholar
Cao, Y., Petzold, L.: A posteriori error estimation and global error control for ordinary differential equations by the adjoint method. SIAM J. Sci. Comput. 26, 359–374 (2004)
Article MATH MathSciNet Google Scholar
Eriksson, K., Estep, D., Hansbo, P., Johnson, C.: Introduction to adaptive methods for differential equations. Acta Numerica, 4, pp. 105–158 (1995)
Google Scholar
Hairer, E., Nørsett, S.: Solving Ordinary Differential Equations I, Springer Series in Computational Mathematics, vol. 8, 2nd edn. Springer, Berlin (1993)
Google Scholar
Hartman, P.: Ordinary differential equations, Classics in Applied Mathematics, vol. 38. SIAM, Philadelphia, PA (2002). Corrected reprint of the second (1982) edition [Birkhäuser, Boston, MA; MR0658490 (83e:34002)]
Henrici, P.: Error Propagation for Difference Methods. Robert E. Krieger Publishing Co., Huntington, NY (1970). Reprint of the 1963 edition
Ioffe, A., Tihomirov, V.: Theory of Extremal Problems, Studies in Mathematics and its Applications, vol. 6. North-Holland Publishing Co., Amsterdam (1979)
Google Scholar
Johnson, C.: Numerical Solutions of Partial Differential Equations by the Finite Element Method. Cambridge University Press, Cambridge (1987)
Google Scholar
Johnson, C.: Error estimates and adaptive time-step control for a class of one-step methods for stiff ordinary differential equations. SIAM J. Numer. Anal. 25(4), 908–926 (1988)
Article MATH MathSciNet Google Scholar
Kirches, C., Wirsching, L., Bock, H., Schlöder, J.: Efficient direct multiple shooting for nonlinear model predictive control on long horizons. J. Process Control 22, 540–550 (2012)
Article Google Scholar
Kolmogorov, A., Fomin, S.: Introductory real analysis. Revised English edition. Translated from the Russian and edited by Richard A. Silverman. Prentice-Hall Inc, Englewood Cliffs (1970)
Lang, J., Verwer, J.: On global error estimation and control for initial value problems. SIAM J. Sci. Comput. 29, 1460–1475 (2007)
Article MATH MathSciNet Google Scholar
Luenberger, D.: Optimization by vector space methods. Wiley Professional Paperback Series. Wiley, New York (1969)
Moon, K.S., Szepessy, A., Tempone, R., Zouraris, G.: Convergence rates for adaptive approximation of ordinary differential equations. Numer. Math. 96, 99–129 (2003)
Article MATH MathSciNet Google Scholar
Natanson, I.: Theorie der Funktionen einer reellen Veränderlichen. Akademie-Verlag, Berlin: Übersetzung nach der zweiten russischen Auflage von 1957, Herausgegeben von Karl Bögel, Vierte Auflage, Mathematische Lehrbücher und Monographien, I. Mathematische Lehrbücher, Band VI, Abteilung (1975)
Sandu, A.: Reverse automatic differentiation of linear multistep methods. In: Bischof, C., Bücker, H., Hovland, P., Naumann, U., Utke, J. (eds.) Advances in Automatic Differentiation. Lecture Notes in Computational Science and Engineering, vol. 64, pp. 1–12. Springer, Berlin (2008)
Chapter Google Scholar
Shampine, L., Gordon, M.K.: Computer Solution of Ordinary Differential Equations. Freeman, San Francisco (1975)
MATH Google Scholar
Shampine, L.: Numerical solution of ordinary differential equations. Chapman & Hall, New York (1994)
MATH Google Scholar
Walther, A.: Automatic differentiation of explicit Runge–Kutta methods for optimal control. Comput. Optim. Appl. 36, 83–108 (2007)
Article MATH MathSciNet Google Scholar
Werner, D.: Funktionalanalysis. Springer, Berlin (2000)
MATH Google Scholar
Wirsching, L., Bock, H., Diehl, M.: Fast NMPC of a chain of masses connected by springs. In: Proceedings of the 2006 IEEE International Conference on Control Applications (CCA), pp. 591–596 (2006). doi:10.1109/CACSD-CCA-ISIC.2006.4776712
Wloka, J.: Funktionalanalysis und Anwendungen. Walter de Gruyter, Berlin, New York (1971). De Gruyter Lehrbuch

Download references

Acknowledgments

The authors express their gratitude to Christian Kirches and Andreas Potschka for valuable discussions on the subject. Scientific support of the DFG-Graduate-School 220 “Heidelberg Graduate School of Mathematical and Computational Methods for the Sciences” is gratefully acknowledged. Funding has been graciously provided by the German Ministry of Education and Research (Grant ID: 03MS649A), and the Helmholtz association through the SBCancer programme. The research leading to these results has received funding from the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement $\hbox {n}^\mathrm{o}$ FP7-ICT-2009-4 248940.

Author information

Authors and Affiliations

Interdisciplinary Center for Scientific Computing (IWR), Heidelberg University, Im Neuenheimer Feld 368, 69120 , Heidelberg, Germany
Dörte Beigel, Mario S. Mommer, Leonard Wirsching & Hans Georg Bock

Authors

Dörte Beigel
View author publications
You can also search for this author in PubMed Google Scholar
Mario S. Mommer
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Wirsching
View author publications
You can also search for this author in PubMed Google Scholar
Hans Georg Bock
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dörte Beigel.

Appendices

Appendix 1: Lagrange multipliers in $L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$

Recall that functions in $C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d$, restricted to the open interval $({t_\mathrm{s}},{t_\mathrm{f}})$, form a dense subset of the space $L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$ of all quadratically Lebesgue-integrable functions from $({t_\mathrm{s}},{t_\mathrm{f}})$ to $\mathbb{R }^d$. Similarly, recall that the subset $C^1[{t_\mathrm{s}},{t_\mathrm{f}}]^d$ is dense in the Sobolev space $H^1({t_\mathrm{s}},{t_\mathrm{f}})^d$ of all $L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$-functions with weak derivative in $L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$ (see [1, Ch.3]). Furthermore, both spaces $L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$ and $H^1({t_\mathrm{s}},{t_\mathrm{f}})^d$ are Hilbert spaces.

Solving (5) on $H^1({t_\mathrm{s}},{t_\mathrm{f}})^d$, the Lagrangian $\mathcal{L }:H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d \rightarrow \mathbb{R }$ reads

$$\begin{aligned} \mathcal{L }(\varvec{y},\varvec{\lambda }):= J(\varvec{y}({t_\mathrm{f}})) - \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\lambda }^\intercal (t) \left[ \dot{\varvec{y}}(t)-\varvec{f}(t,\varvec{y}(t))\right] \mathrm{d}t - \varvec{\lambda }^\intercal ({t_\mathrm{s}}) \left[ \varvec{y}({t_\mathrm{s}})-\varvec{y}_\mathrm{s}\right] \end{aligned}$$

using the $L^2$-scalar product and the Lagrange multiplier $\varvec{\lambda }$. The optimality condition of (5) is based on the Fréchet derivative of $\mathcal{L }$ at $(\varvec{y},\varvec{\lambda })$ in direction $(\varvec{w},\varvec{\chi })$ which exists due to Fréchet differentiability of $J$ and [18, Ch.0§0.2.5]

$$\begin{aligned} \mathcal{L }^\prime (\varvec{y},\varvec{\lambda })(\varvec{w},\varvec{\chi })&\!=\!\left\{ \displaystyle J^\prime (\varvec{y}({t_\mathrm{f}})) \varvec{w}({t_\mathrm{f}}) \!-\! \!\int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\lambda }^\intercal (t) \left[ \! \dot{\varvec{w}}(t)\!-\!\varvec{f}_{\varvec{y}}(t,\varvec{y}(t))\varvec{w}(t) \!\right] \mathrm{d}t \!- \!\varvec{\lambda }^\intercal ({t_\mathrm{s}}) \varvec{w}({t_\mathrm{s}}) \!\right\} \\&\quad + \left\{ \displaystyle - \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\chi }^\intercal (t) \left[ \dot{\varvec{y}}(t)-\varvec{f}(t,\varvec{y}(t))\right] \mathrm{d}t - \varvec{\chi }^\intercal ({t_\mathrm{s}}) \left[ \varvec{y}({t_\mathrm{s}})-\varvec{y}_\mathrm{s}\right] \right\} . \end{aligned}$$

The necessary condition for a stationary point $(\varvec{y},\varvec{\lambda })\in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$ of (5) is that $\mathcal{L }^\prime (\varvec{y},\varvec{\lambda })(\varvec{w},\varvec{\chi })=0$ holds for all directions $(\varvec{w},\varvec{\chi })\in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$. Choosing $\varvec{w}=\varvec{0} \in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d $ and only varying $\varvec{\chi } \in L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$ the necessary condition reads

$$\begin{aligned} \mathcal{L }_{\varvec{\lambda }}(\varvec{y},\varvec{\lambda })(\varvec{\chi })=-\int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\chi }^\intercal (t) \left[ \dot{\varvec{y}}(t)-\varvec{f}(t,\varvec{y}(t))\right] \mathrm{d}t - \varvec{\chi }^\intercal ({t_\mathrm{s}}) \left[ \varvec{y}({t_\mathrm{s}})-\varvec{y}_\mathrm{s}\right] = 0,\; \forall \varvec{\chi } \end{aligned}$$

(32)

which possesses the same unique solution $\varvec{y}\in C^1[{t_\mathrm{s}},{t_\mathrm{f}}]^d$ as (1). Taking now $\varvec{\chi }=\varvec{0} \in L^2({t_\mathrm{s}},{t_\mathrm{f}})^d$ and only varying $\varvec{w} \in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d$ one obtains using integration by parts

$$\begin{aligned} \mathcal{L }_{\varvec{y}}(\varvec{y},\varvec{\lambda })(\varvec{w})\!=\!\left[ J^\prime (\varvec{y} ({t_\mathrm{f}}))\!-\!\varvec{\lambda }^\intercal ({t_\mathrm{f}})\right] \varvec{w}({t_\mathrm{f}})\!-\! \int \limits ^{t_\mathrm{s}}_{t_\mathrm{f}}\left[ \dot{\varvec{\lambda }}(t)\!+\! \varvec{f}^\intercal _{\varvec{y}}(t,\varvec{y}(t)) \varvec{\lambda }(t)\right] ^\intercal \varvec{w}(t) \mathrm{d}t\!=\! 0,\; \forall \varvec{w} \end{aligned}$$

which possesses the same solution as (2).

Appendix 2: Duality pairing between $C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d$ and ${{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d$

According to the Riesz Representation Theorem [24, Ch.5§5.5] for every continuous linear functional $\mathfrak L $ on $C^0[{t_\mathrm{s}},{t_\mathrm{f}}]$ there exists a unique $\varPsi \in {{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]$ such that

$$\begin{aligned} \mathfrak L [g] = \left\langle \varPsi ,g \right\rangle _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}],C^0[{t_\mathrm{s}},{t_\mathrm{f}}]} = \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}g(t) \mathrm{d}\varPsi (t), \end{aligned}$$

(33)

where the Riemann-Stieltjes integral [26, Ch.VIII§6] is utilized. The Banach space ${{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]$ consists of all normalized functions of bounded variation on $[{t_\mathrm{s}},{t_\mathrm{f}}]$ that are zero in ${t_\mathrm{s}}$ and continuous from the right on $({t_\mathrm{s}},{t_\mathrm{f}})$. It is equipped with the total variation norm

$$\begin{aligned} \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]} = \sup \sum _{i=1}^{m} \left| \varPsi (t_i)-\varPsi (t_{i-1}) \right| \end{aligned}$$

where the supremum is taken over all partitions ${t_\mathrm{s}}=t_0<\dots <t_m={t_\mathrm{f}}$ of $[{t_\mathrm{s}},{t_\mathrm{f}}]$. According to the Riesz Representation Theorem, for each $\varPsi $ the value of the total variation norm coincides with the value of the dual norm given by

$$\begin{aligned} \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]}= \max _{\left\| g \right\| _{C^0[{t_\mathrm{s}},{t_\mathrm{f}}]}=1} \left| \left\langle \varPsi ,g \right\rangle _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}],C^0[{t_\mathrm{s}},{t_\mathrm{f}}]} \right| . \end{aligned}$$

The dual of the finite Cartesian product $C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d$ is the finite Cartesian product ${{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d$ of the duals with duality pairing

$$\begin{aligned} \left\langle \varvec{\varPsi },\varvec{g} \right\rangle _{{{{\mathrm{NBV}}}}^d,\left( C^0\right) ^d} = \sum _{i=1}^d \left\langle \varPsi _i,g_i \right\rangle _{{{\mathrm{NBV}}},C^0} = \sum _{i=1}^d \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}g_i(t) \mathrm{d}\varPsi _i(t) =: \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{g}(t) \mathrm{d}\varvec{\varPsi }(t) \end{aligned}$$

and dual norm $ \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d} = \max _{1\le i\le d} \left\| \varPsi _i \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]}$, see [33, Ch.II§12.1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beigel, D., Mommer, M.S., Wirsching, L. et al. Approximation of weak adjoints by reverse automatic differentiation of BDF methods. Numer. Math. 126, 383–412 (2014). https://doi.org/10.1007/s00211-013-0570-4

Download citation

Received: 14 July 2011
Revised: 02 October 2012
Published: 20 July 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s00211-013-0570-4

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximation of weak adjoints by reverse automatic differentiation of BDF methods

Abstract

Access this article

Similar content being viewed by others

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Mixed virtual element method for integro-differential equations of parabolic type

Finite difference method for the Riesz space distributed-order advection–diffusion equation with delay in 2D: convergence and stability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Lagrange multipliers in \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\)

Appendix 2: Duality pairing between \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) and \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d\)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Approximation of weak adjoints by reverse automatic differentiation of BDF methods

Abstract

Access this article

Similar content being viewed by others

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Mixed virtual element method for integro-differential equations of parabolic type

Finite difference method for the Riesz space distributed-order advection–diffusion equation with delay in 2D: convergence and stability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Lagrange multipliers in \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\)

Appendix 2: Duality pairing between \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) and \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation