Abstract
On the space of probability densities, we extend the Wasserstein geodesics to the case of higher-order interpolation such as cubic spline interpolation. After presenting the natural extension of cubic splines to the Wasserstein space, we propose a simpler approach based on the relaxation of the variational problem on the path space. We explore two different numerical approaches, one based on multimarginal optimal transport and entropic regularization and the other based on semi-discrete optimal transport.
Similar content being viewed by others
Notes
We mainly use the word Wasserstein metric to denote the metric tensor associated with the Wasserstein distance on the space of probabilities. It was particularly highlighted and used by Otto in [25].
References
Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré. Iterative Bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111–A1138, 2015.
Jean-David Benamou, Guillaume Carlier, and Luca Nenna. A Numerical Method to solve Optimal Transport Problems with Coulomb Cost. working paper or preprint, May 2015.
Jean-David Benamou, Guillaume Carlier, and Luca Nenna. Generalized incompressible flows, multi-marginal transport and Sinkhorn algorithm. working paper or preprint, October 2017.
Geir Bogfjellmo, Klas Modin, and Olivier Verdier. A Numerical Algorithm for C2-splines on Symmetric Spaces. arXiv e-prints, page arXiv:1703.09589, Mar 2017.
Yann Brenier. The least action principle and the related concept of generalized flows for incompressible perfect fluids. Journal of the American Mathematical Society, 2(2):225–255, 1989.
M. Camarinha, F. Silva Leite, and P.Crouch. Splines of class \(\cal{C}^k\) on non-euclidean spaces. IMA Journal of Mathematical Control & Information, 12:399–410, 1995.
L. Chizat, B. Schmitzer, G. Peyré, and F.-X. Vialard. An Interpolating Distance between Optimal Transport and Fisher-Rao. Found. Comp. Math., 2016.
P. Crouch and F. Silva Leite. The dynamic interpolation problem: On Riemannian manifold, Lie groups and symmetric spaces. Journal of dynamical & Control Systems, 1:177–202, 1995.
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, pages 2292–2300, 2013.
Alfred Galichon and Bernard Salanié. Matching with Trade-offs: Revealed Preferences over Competiting Characteristics. working paper or preprint, April 2010.
F. Gay-Balmaz, D. D. Holm, D. M. Meier, T. S. Ratiu, and F.-X. Vialard. Invariant Higher-Order Variational Problems. Communications in Mathematical Physics, 309:413–458, January 2012.
F. Gay-Balmaz, D. D. Holm, D. M. Meier, T. S. Ratiu, and F.-X. Vialard. Invariant Higher-Order Variational Problems II. Journal of NonLinear Science, 22:553–597, August 2012.
B. Heeren, M. Rumpf, and B. Wirth. Variational time discretization of Riemannian splines. ArXiv e-prints, November 2017.
François-Xavier Vialard Jean-David Benamou, Thomas Gallouët. Second order models for optimal transport and cubic splines on the wasserstein space. Preprint arXiv:1801.04144, 2018.
B. Khesin and R. Wendt. The geometry of infinite-dimensional groups, volume 51. Springer, Berlin, 2008.
Young-Heon Kim and Brendan Pass. A general condition for monge solutions in the multi-marginal optimal transport problem. SIAM Journal on Mathematical Analysis, 46(2):1538–1550, 2014.
Bruno Lévy. https://members.loria.fr/Bruno.Levy/GEOGRAM/vorpaview.html.
Lévy, Bruno. A numerical algorithm for l2 semi-discrete optimal transport in 3d. ESAIM: M2AN, 49(6):1693–1715, 2015.
J. Lott. Some geometric calculations on Wasserstein space. Communications in Mathematical Physics, 277(2):423–437, 2008.
Quentin Mérigot. https://github.com/mrgt/PyMongeAmpere.
Quentin Mérigot. A multiscale approach to optimal transport. Computer Graphics Forum, 30 (5):1583–1592, 2011.
Quentin Mérigot and Jean-Marie Mirebeau. Minimal geodesics along volume preserving maps, through semi-discrete optimal transport. arXiv preprint arXiv:1505.03306, 2015.
Quentin Mérigot and Jean-Marie Mirebeau. Minimal geodesics along volume-preserving maps, through semidiscrete optimal transport. SIAM J. Numer. Anal., 54(6):3465–3492, 2016.
L. Noakes, G. Heinzinger, and B. Paden. Cubic splines on curved spaces. IMA Journal of Mathematical Control & Information, 6:465–473, 1989.
F. Otto. The geometry of dissipative evolution equations: The porous medium equation. Communications in Partial Differential Equations, 26(1-2):101–174, 2001.
Pass, Brendan. Multi-marginal optimal transport: Theory and applications. ESAIM: M2AN, 49(6):1771–1790, 2015.
F. Santambrogio. Optimal transport for applied mathematicians. Progress in Nonlinear Differential Equations and their applications, 87, 2015.
Nikhil Singh, François-Xavier Vialard, and Marc Niethammer. Splines for diffeomorphisms. Medical Image Analysis, 25(1):56–71, 2015.
R. Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. Amer. Math. Monthly, 74:402–405, 1967.
R. Tahraoui and F.-X. Vialard. Riemannian cubics on the group of diffeomorphisms and the Fisher-Rao metric. ArXiv e-prints, June 2016.
Alain Trouvé and François-Xavier Vialard. A second-order model for time-dependent data interpolation: Splines on shape spaces. In Proceedings of Miccai workshop, STIA, Beijing, 2010.
F.-X. Vialard and A. Trouvé. Shape Splines and Stochastic Shape Evolutions: A Second Order Point of View. Quart. Appl. Math., 2012.
Cédric Villani. Optimal transport: old and new, volume 338. Springer, Berlin, 2008.
Tryphon T Georgiou Yongxin Chen, Giovanni Conforti. Measure-valued spline curves: An optimal transport viewpoint. Preprint arXiv:1801.03186, 2018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Hans Zanna Munthe-Kaas.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proof of Theorem 1
The proof is a rewriting of the proof of [27, Theorem 1.33] when the initial and final spaces do not have the same dimension. In particular we prove that transport plans concentrated on a graph of a map \(T : {\mathbb R}^d \rightarrow {\mathbb R}^p \) are dense into transport plans in \({\mathbb R}^d \times {\mathbb R}^p\) and deduce, taking \(p= (n-1)d\), that for any continuous cost the multimarginal Kantorovich problem is the relaxation of the multimarginal Monge problem.
Theorem 2
Let \(M={\mathbb R}^d\) and \(c:M^n \rightarrow {\mathbb R}\) be a continuous cost function. Let \((\rho _i)_{i\in {1,\ldots ,n}}\) be n probability measures on M. We define the Monge Problem \((M_c)\) as
over the set of map \(\left. \rho _i, \, , i =2,\ldots ,n\right\} \). The Kantorovich problem \((K_c)\) is defined by
over the set of plan , where \(p_i\) is the projection of the \(i^{\text {th}}\) factor. Then, if all \((\rho _i)_{i\in {1,\ldots ,n}}\) have compact support and \(\rho _1\) is atomless, there holds \((M_c)=(K_c)\).
In order to prove Theorem 2 we first remark that [27, Corollary 1.29 and Theorem 1.32] have their multimarginal counterpart.
Lemma 2
Let \(\mu \in \mathcal {P}({\mathbb R}^d)\) be atomless measure and \(\nu \in \mathcal {P}({\mathbb R}^p)\), then there exists a transport map \(T: {\mathbb R}^d \rightarrow {\mathbb R}^p\) such that \(T_* \mu =\nu \).
Proof of Lemma 2
Let \(\sigma _d : {\mathbb R}^d \rightarrow {\mathbb R}\) (resp \(\sigma _p : {\mathbb R}^p \rightarrow {\mathbb R}\)) be an injective Borel map with Borel inverse (see [27, Lemma 1.28] for instance for a very simple proof of existence in this case). Since \(\mu \) is atomless \(({\sigma _d})_*\mu \) is also atomless. Let \(t: {\mathbb R}\rightarrow {\mathbb R}\) be the optimal transport map from \(({\sigma _d})_*\mu \) to \( ({\sigma _{p}})_*\nu \) for the quadratic cost. \(t_*\left( ({\sigma _d})_*\mu \right) = \left( {\sigma _{p}}\right) _*\nu \). Thus \(T= \sigma _{p}^{-1} \circ t \circ \sigma _d\) is a map pushing forward \(\mu \) to \(\nu \). \(\square \)
Theorem 3
With the notation of Theorem 2, if the support of all \(\rho _i\) are included in a compact domain then the set of plans \(\varPi _T\) induced by a transport is dense, for the weak topology, in the set of plans \(\varPi \) whenever \(\rho _1\) is atomless.
Remark 7
Theorem 3 is in fact very general, one can consider MN to be only Polish spaces for instance. Then there exists invertible Borel maps from M (resp N) to [0, 1]. This is enough to obtain Lemma 2. Then one just need to consider a uniformly small partition of \(\varOmega \) to prove the density Theorem 3.
Proof of Theorem 3
Again the proof is based on [27, Theorem 1.32]. In particular the strategy of the proof is to approach a transport plan by transport maps defined on small sets on which the measure is preserved.
We consider a compact domain \(\varOmega = \varOmega _d \times \varOmega _p \in ({\mathbb R}^d \times {\mathbb R}^p)\) and \(\pi \in \mathcal {P}(\varOmega _d \times \varOmega _p)\) such that \((p_{{\mathbb R}^d})_*(\pi )=\mu \) is atomless. For any m set a partition of \(\varOmega _p \) (resp \(\varOmega _q\)) into (disjoint) sets \(K_{i,m}\) (resp \(L_{j,m}\)) with diameter smaller than 1 / 2m. Then \(C_{i,j,m} = K_{i,m}\times L_{j,m} \) is a partition of \(\varOmega \) into sets with diameter smaller than 1 / m. Let \(\pi _{i,m}\) be the restriction of \(\pi \) on \(K_{i,m}\times \varOmega _p\) and \(\mu _{i,m} = (p_{{\mathbb R}^d})_*(\pi _{i,m})\) and \(\nu _{i,m} = (p_{{\mathbb R}^d})_*(\pi _{i,m})\). Since \(\mu \) is atomless \(\mu _{i,m}=\mu _{|K_{i,m}} \) is also atomless and thanks to Lemma 2 there exists \(t_{i,m}\) such that \((t_{i,m})_* \mu _{i,m}=\nu _{i,m}\). By definition
where \(t_m\) is define on \(\varOmega \) by \( t_{|K_{i,m}}=t_{i,m}\). In particular \((t_m)_*(\mu )=\nu \). Equation (A.1) and the definition of the partition sets \(C_{i,j,m}\) implies that \((\mathrm {Id},t_{m})_*(\mu )\) weakly converges toward \(\pi \) as \(m +\infty \) (they give same masses to any set of the partition). See [Theorem 1.31]santambrogio2015optimal for instance. To finish the proof let us remark that we can set \(p=d(n-1)\) then \(\mu =\rho _1\) is atomless and \(t_m:\)\( {\mathbb R}^d \rightarrow {\mathbb R}^{d(n-1)}\) defines \((t_{2,n},\dots ,t_{n,m})\). \(\square \)
Proof of Theorem 2
The continuity of the cost c and the density Theorem 3 implies that \((K_c)\le (M_c)\). Since the converse is always true we have \((M_c)= (K_c)\).
Remark 8
Theorem 1 is a consequence of Theorem A since both the Monge and the Kantorovich (Definitions 1 and 2) problems reduces on \(M^n\) with the spline cost which is continuous (see Corollaries 2 and 3).
Entropic Regularisation and Sinkhorn
1.1 Entropic Regularization and Sinkhorn Algorithm
The linear programming problems (5.7–5.10) is extremely costly to solve numerically and a natural strategy, which has received a lot of attention recently following the pioneering works of [9, 10] is to approximate these problems by strictly convex ones by adding an entropic penalization. It has been used with good results on a number of multimarginal optimal transport problems [1,2,3]. Here is a rapid and simplified description, see the references above for more details.
The regularized problem is
It is strictly convex. Denoting \(u^k_{\alpha _{j_k}, \beta _{j_k}}\) the Lagrange multipliers of the k constraints (5.10), we obtain the optimality conditions:
where
Equation (B.2) characterize the optimal tensor as a scaling of the Kernel K depending on the dual unknown \(U^k\). Inserting this factorization into the constrains (5.10) the dual problem takes the form of the set of equations ( \(\forall k \in [1,n] \))
Sinkhorn algorithm simply amounts to perform a Gauss–Seidel type iterative resolution of the system (B.3) and therefore consists in computing the sums on the right-hand side and then perform the (grid) point wise division.
1.2 Implementation
In dimension 2, each unknown \(U_k\) has dimension \(N_x^2\), the cost of one full Gauss Seidel cycle, i.e. on Sinkhorn iteration on all unknowns, will therefore be \(n \times N_x^2 \times \) the cost to compute the tensor matrix products in the denominator of (B.3). Remember that n is the number of time steps with constraints and N the total number of time steps. The given tensor Kernel \(K_{a,b}\) is a priori a large \(N \times N_x \times N_x\) tensor with indices \( {a,b} = {\alpha _1,\dots ,\alpha _N, \beta _1,\dots ,\beta _N}\). It can, however. advantageously be tensorized both along dimensions and also margins. First, using (5.4–5.8) we see that the Kernel is the product of smaller tensors
Moreover as we chose to work on a cartesian grid at all time steps, \(K^0\) tensorize again into
Finally our large kernel \(K_{a,b}\) can be represented a the product of \(2\,(N-2)\) identical tensors of size \(N_x \times N_x \times N_x\). Assuming a cubic cost \(n^3\) for the multiplication of two \((n \times n)\) matrix, we see our algorithm is of order \(O(N \, N_x^4)\) in dimension 2.
Rights and permissions
About this article
Cite this article
Benamou, JD., Gallouët, T.O. & Vialard, FX. Second-Order Models for Optimal Transport and Cubic Splines on the Wasserstein Space. Found Comput Math 19, 1113–1143 (2019). https://doi.org/10.1007/s10208-019-09425-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-019-09425-z