On linear optimization over Wasserstein balls

Yue, Man-Chung; Kuhn, Daniel; Wiesemann, Wolfram

doi:10.1007/s10107-021-01673-8

On linear optimization over Wasserstein balls

Short Communication
Series A
Published: 17 June 2021

Volume 195, pages 1107–1122, (2022)
Cite this article

Mathematical Programming Submit manuscript

1637 Accesses
9 Citations
Explore all metrics

Abstract

Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is weakly compact under mild conditions, and we offer necessary and sufficient conditions for the existence of optimal solutions. We also characterize the sparsity of solutions if the Wasserstein ball is centred at a discrete reference measure. In comparison with the existing literature, which has proved similar results under different conditions, our proofs are self-contained and shorter, yet mathematically rigorous, and our necessary and sufficient conditions for the existence of optimal solutions are easily verifiable in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

We are grateful to Lorenzo Dello Schiavo, who communicated this result to us.

References

Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd edn. Springer, New York (2006)
MATH Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 214–223 (2017)
Billingsley, P.: Convergence of Probability Measures, 2nd edn. Wiley, Boca Raton (1992)
MATH Google Scholar
Billingsley, P.: Probability and Measure, 3rd edn. Wiley, Boca Raton (1995)
MATH Google Scholar
Blanchet, J., Murthy, K.: Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2), 565–600 (2019)
Article MathSciNet Google Scholar
Bogachev, V.I.: Measure Theory, vol. II. Springer, New York (2007)
Book Google Scholar
Clément, P., Desch, W.: Wasserstein metric and subordination. Studia Mathematica 1(189), 35–52 (2008)
Article MathSciNet Google Scholar
Schiavo, L. Dello: Heat equation on metric measure spaces. Master’s thesis, Sapienza University of Rome (2015)
Dudley, R.M.: Real Analysis and Probability. Wadsworth & Brooks/Cole, New York (1989)
MATH Google Scholar
Frogner, C., Zhang, C., Mobahi, H., Araya, M., Poggio, T.A.: Learning with a Wasserstein loss. Adv. Neural Inf. Process. Syst. 28, 2053–2061 (2015)
Google Scholar
Gao, R., Kleywegt, A.J.: Distributionally robust stochastic optimization with Wasserstein distance. arXiv preprint arXiv:1604.02199 (2016)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Article Google Scholar
Ho, N., Nguyen, X. L., Yurochkin, M., Bui, H. H., Huynh, V., Phung, D.: Multilevel clustering via Wasserstein means. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1501–1509, (2017)
Kuhn, D., Esfahani, P.M. Nguyen, V., Shafieezadeh-Abadeh, S.: Wasserstein distributionally robust optimization: theory and applications in machine learning. In: Operations Research & Management Science in the Age of Analytics, pp. 130–166. INFORMS (2019)
Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programm. 171(1–2), 115–166 (2018)
Article MathSciNet Google Scholar
Nguyen, V.A., Shafieezadeh-Abadeh, S., Yue, M.-C., Kuhn, D., Wiesemann, W.: Calculating optimistic likelihoods using (geodesically) convex optimization. Adv. Neural Inform. Process. Syst. 32 (2019)
Nguyen, V.A., Shafieezadeh-Abadeh, S., Yue, M.-C., Kuhn, D., Wiesemann, W.: Optimistic distributionally robust optimization for nonparametric likelihood approximation. Adv. Neural Inform. Process. Syste. 32 (2019)
Owhadi, H., Scovel, C.: Extreme points of a ball about a measure with finite support. Commun. Math. Sci. 15(1), 77–96 (2017)
Article MathSciNet Google Scholar
Pflug, G., Wozabal, D.: Ambiguity in portfolio selection. Quant. Finance 7(4), 435–442 (2007)
Article MathSciNet Google Scholar
Pichler, A., Xu, H.: Quantitative stability analysis for minimax distributionally robust risk optimization. Mathematical Programming, Available Online (2018)
MATH Google Scholar
Pinelis, I.: On the extreme points of moments sets. Math. Methods Oper. Res. 83(3), 325–349 (2016)
Article MathSciNet Google Scholar
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming. SIAM, New York (2009)
Book Google Scholar
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)
MATH Google Scholar
Wozabal, D.: A framework for optimization under ambiguity. Ann. Oper. Res. 193(1), 21–47 (2012)
Article MathSciNet Google Scholar
Yue, M.-C., Kuhn, D., Wiesemann, W.: On linear optimization over Wasserstein balls. arXiv preprint arXiv:2004.07162, (2021)
Zhao, C., Guan, Y.: Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2), 262–267 (2018)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge funding from the Swiss National Science Foundation under Grant BSCGI0$\underline{~}$157733, the UK’s Engineering and Physical Sciences Research Council under Grant EP/R045518/1 and the Hong Kong Research Grants Council under the Grant 25302420.

Author information

Authors and Affiliations

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Man-Chung Yue
College of Management of Technology, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
Daniel Kuhn
Imperial College Business School, Imperial College London, London, UK
Wolfram Wiesemann

Authors

Man-Chung Yue
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Wolfram Wiesemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Man-Chung Yue.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Auxiliary measure-theoretic results

We review some well-known facts from measure theory that we use to prove our results. We first recall a connection between the notions of tightness and weak sequential compactness of collections of probability measures.

Definition 1

A collection ${\mathcal {S}} \subseteq {\mathcal {P}} (X)$ of probability measures is tight if for any $\epsilon >0$, there exists a compact subset $B \subseteq X$ such that $\mu (X \setminus B) \le \epsilon $ for all $\mu \in {\mathcal {S}}$.

Definition 2

A sequence $\{\mu ^k\}_k \subseteq {\mathcal {P}} (X)$ of probability measures converges weakly to $\mu ^\infty \in {\mathcal {P}} (X)$ if for any bounded and continuous function g on X, we have

$$\begin{aligned} \lim _{k\longrightarrow \infty } \int _X g(x) \,\mathrm {d}\mu ^k \;\; = \;\; \int _X g(x) \,\mathrm {d}\mu ^\infty . \end{aligned}$$

Definition 3

A collection ${\mathcal {S}} \subseteq {\mathcal {P}} (X)$ of probability measures is weakly sequentially compact if every sequence in ${\mathcal {S}}$ has a subsequence that converges weakly to an element of ${\mathcal {S}}$.

The concepts of tightness and weak sequential compactness are connected by Prokho-rov’s Theorem, see for example [3, Theorem 5.1].

Theorem 5

(Prokhorov’s Theorem) A collection ${\mathcal {S}} \subseteq {\mathcal {P}} (X)$ of probability measures is tight if and only if the closure of ${\mathcal {S}}$ is weakly sequentially compact in ${\mathcal {P}}(X)$.

Note that the space ${\mathcal {P}}(X)$ is metrizable, sequential compactness and compactness of subsets of ${\mathcal {P}}(X)$ are equivalent to each other.

The following lemma, which is excerpted from the Portmanteau Theorem (see for example [4, Problem 29.1(c)]), provides a useful characterization of weak convergence.

Lemma 3

A sequence $\{\mu ^k \}_k \subseteq {\mathcal {P}} (X)$ of probability measures converges weakly to $\mu ^\infty \in {\mathcal {P}} (X)$ if and only if for any upper bounded and upper semi-continuous function g on X, we have

$$\begin{aligned} \limsup _{k\longrightarrow \infty } \int _X g(x) \,\mathrm {d}\mu ^k (x) \;\; \le \;\; \int _X g(x) \,\mathrm {d}\mu ^\infty (x). \end{aligned}$$

Appendix B: Basic feasible solutions in infinite-dimensional linear programming

It is well-known that if a finite-dimensional linear program with m equality constraints has an optimal solution, then there must be an optimal basic feasible solution with at most m non-zero entries. An infinite-dimensional analogue of this fact is proved in [21, Corollary 5 and Proposition 6(v)]. To state this result, let Z be a topological space, let ${\mathcal {M}}_+ (Z)$ be the set of non-negative finite Borel measures supported on Z, and let $\psi , \phi _1,\dots ,\phi _m : Z \rightarrow {\mathbb {R}}$ be Borel functions as well as $v \in {\mathbb {R}}^m$. Consider now the optimization problem

$$\begin{aligned} \begin{array}{l@{\quad }l@{\quad }l} \displaystyle \mathop {\text {maximize}}_{\gamma } &{} \displaystyle \int _Z \psi (z) \, \mathrm {d}\gamma (z) \\ \displaystyle \text {subject to} &{} \gamma \in {\mathcal {M}}_+ (Z) \\ &{} \displaystyle \int _Z \phi _i (z) \, \mathrm {d}\gamma (z)= v_i &{} \displaystyle \forall i =1,\dots ,m, \end{array} \end{aligned}$$

(11)

and denote by ${\mathcal {F}}$ the feasible region of (11) and by $\mathrm {ext}({\mathcal {F}})$ the set of extreme points of ${\mathcal {F}}$.

Proposition 1

Suppose that for all $\gamma \in {\mathcal {F}}$, at least one of the integrals $\int _X [\psi (z)]_+ \, \mathrm {d} \gamma (z)$ and $\int _X [-\psi (z)]_+ \, \mathrm {d} \gamma (z)$ is finite and that $\int _Z |\phi _i| (z) \, \mathrm {d}\gamma (z) <\infty $ for all $i = 1,\dots ,m$. If

$$\begin{aligned} \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma (z): \gamma \in {\mathcal {F}} \right\} \;\; = \;\; \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma (z): \gamma \in \mathrm {ext}({\mathcal {F}}) \right\} , \end{aligned}$$

(12)

then it holds that

$$\begin{aligned} \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma (z) : \gamma \in {\mathcal {F}} \right\} \;\; = \;\; \sup \left\{ \int _Z \psi (z) \, \mathrm {d}\gamma : \gamma \in {\mathcal {F}}\cap {\mathcal {D}}_m (Z) \right\} , \end{aligned}$$

where ${\mathcal {D}}_m (Z)$ is the set of non-negative discrete measures supported on at most m points in Z. Furthermore, if ${\mathcal {F}}\subseteq {\mathcal {P}}(Z)$ and Z is Hausdorff, then the condition (12) is satisfied.

We note that the conclusion of Proposition 1 cannot readily be drawn from the Richter-Rogosinski theorem [22, Theorem 7.32]. Indeed, in our context the Richter-Rogosinski theorem would only ensure the existence of a non-negative discrete measure $\gamma ^\star $ that is supported on at most $m + 1$ (instead of m) points since $\gamma ^\star $ would have to satisfy $m + 1$ moment conditions: the m moment constraints of problem (11) as well as the additional constraint that $\gamma ^\star $ attains the optimal objective value of problem (11).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yue, MC., Kuhn, D. & Wiesemann, W. On linear optimization over Wasserstein balls. Math. Program. 195, 1107–1122 (2022). https://doi.org/10.1007/s10107-021-01673-8

Download citation

Received: 15 April 2020
Accepted: 07 June 2021
Published: 17 June 2021
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10107-021-01673-8

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On linear optimization over Wasserstein balls

Abstract

Access this article

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Auxiliary measure-theoretic results

Definition 1

Definition 2

Definition 3

Theorem 5

Lemma 3

Appendix B: Basic feasible solutions in infinite-dimensional linear programming

Proposition 1

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation