Abstract
Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is weakly compact under mild conditions, and we offer necessary and sufficient conditions for the existence of optimal solutions. We also characterize the sparsity of solutions if the Wasserstein ball is centred at a discrete reference measure. In comparison with the existing literature, which has proved similar results under different conditions, our proofs are self-contained and shorter, yet mathematically rigorous, and our necessary and sufficient conditions for the existence of optimal solutions are easily verifiable in practice.
Notes
We are grateful to Lorenzo Dello Schiavo, who communicated this result to us.
References
Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd edn. Springer, New York (2006)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, pp. 214–223 (2017)
Billingsley, P.: Convergence of Probability Measures, 2nd edn. Wiley, Boca Raton (1992)
Billingsley, P.: Probability and Measure, 3rd edn. Wiley, Boca Raton (1995)
Blanchet, J., Murthy, K.: Quantifying distributional model risk via optimal transport. Math. Oper. Res. 44(2), 565–600 (2019)
Bogachev, V.I.: Measure Theory, vol. II. Springer, New York (2007)
Clément, P., Desch, W.: Wasserstein metric and subordination. Studia Mathematica 1(189), 35–52 (2008)
Schiavo, L. Dello: Heat equation on metric measure spaces. Master’s thesis, Sapienza University of Rome (2015)
Dudley, R.M.: Real Analysis and Probability. Wadsworth & Brooks/Cole, New York (1989)
Frogner, C., Zhang, C., Mobahi, H., Araya, M., Poggio, T.A.: Learning with a Wasserstein loss. Adv. Neural Inf. Process. Syst. 28, 2053–2061 (2015)
Gao, R., Kleywegt, A.J.: Distributionally robust stochastic optimization with Wasserstein distance. arXiv preprint arXiv:1604.02199 (2016)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Ho, N., Nguyen, X. L., Yurochkin, M., Bui, H. H., Huynh, V., Phung, D.: Multilevel clustering via Wasserstein means. In: Proceedings of the 34th International Conference on Machine Learning, pp. 1501–1509, (2017)
Kuhn, D., Esfahani, P.M. Nguyen, V., Shafieezadeh-Abadeh, S.: Wasserstein distributionally robust optimization: theory and applications in machine learning. In: Operations Research & Management Science in the Age of Analytics, pp. 130–166. INFORMS (2019)
Esfahani, P.M., Kuhn, D.: Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable reformulations. Math. Programm. 171(1–2), 115–166 (2018)
Nguyen, V.A., Shafieezadeh-Abadeh, S., Yue, M.-C., Kuhn, D., Wiesemann, W.: Calculating optimistic likelihoods using (geodesically) convex optimization. Adv. Neural Inform. Process. Syst. 32 (2019)
Nguyen, V.A., Shafieezadeh-Abadeh, S., Yue, M.-C., Kuhn, D., Wiesemann, W.: Optimistic distributionally robust optimization for nonparametric likelihood approximation. Adv. Neural Inform. Process. Syste. 32 (2019)
Owhadi, H., Scovel, C.: Extreme points of a ball about a measure with finite support. Commun. Math. Sci. 15(1), 77–96 (2017)
Pflug, G., Wozabal, D.: Ambiguity in portfolio selection. Quant. Finance 7(4), 435–442 (2007)
Pichler, A., Xu, H.: Quantitative stability analysis for minimax distributionally robust risk optimization. Mathematical Programming, Available Online (2018)
Pinelis, I.: On the extreme points of moments sets. Math. Methods Oper. Res. 83(3), 325–349 (2016)
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on Stochastic Programming. SIAM, New York (2009)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)
Wozabal, D.: A framework for optimization under ambiguity. Ann. Oper. Res. 193(1), 21–47 (2012)
Yue, M.-C., Kuhn, D., Wiesemann, W.: On linear optimization over Wasserstein balls. arXiv preprint arXiv:2004.07162, (2021)
Zhao, C., Guan, Y.: Data-driven risk-averse stochastic optimization with Wasserstein metric. Oper. Res. Lett. 46(2), 262–267 (2018)
Acknowledgements
The authors gratefully acknowledge funding from the Swiss National Science Foundation under Grant BSCGI0\(\underline{~}\)157733, the UK’s Engineering and Physical Sciences Research Council under Grant EP/R045518/1 and the Hong Kong Research Grants Council under the Grant 25302420.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Auxiliary measure-theoretic results
We review some well-known facts from measure theory that we use to prove our results. We first recall a connection between the notions of tightness and weak sequential compactness of collections of probability measures.
Definition 1
A collection \({\mathcal {S}} \subseteq {\mathcal {P}} (X)\) of probability measures is tight if for any \(\epsilon >0\), there exists a compact subset \(B \subseteq X\) such that \(\mu (X \setminus B) \le \epsilon \) for all \(\mu \in {\mathcal {S}}\).
Definition 2
A sequence \(\{\mu ^k\}_k \subseteq {\mathcal {P}} (X)\) of probability measures converges weakly to \(\mu ^\infty \in {\mathcal {P}} (X)\) if for any bounded and continuous function g on X, we have
Definition 3
A collection \({\mathcal {S}} \subseteq {\mathcal {P}} (X)\) of probability measures is weakly sequentially compact if every sequence in \({\mathcal {S}}\) has a subsequence that converges weakly to an element of \({\mathcal {S}}\).
The concepts of tightness and weak sequential compactness are connected by Prokho-rov’s Theorem, see for example [3, Theorem 5.1].
Theorem 5
(Prokhorov’s Theorem) A collection \({\mathcal {S}} \subseteq {\mathcal {P}} (X)\) of probability measures is tight if and only if the closure of \({\mathcal {S}}\) is weakly sequentially compact in \({\mathcal {P}}(X)\).
Note that the space \({\mathcal {P}}(X)\) is metrizable, sequential compactness and compactness of subsets of \({\mathcal {P}}(X)\) are equivalent to each other.
The following lemma, which is excerpted from the Portmanteau Theorem (see for example [4, Problem 29.1(c)]), provides a useful characterization of weak convergence.
Lemma 3
A sequence \(\{\mu ^k \}_k \subseteq {\mathcal {P}} (X)\) of probability measures converges weakly to \(\mu ^\infty \in {\mathcal {P}} (X)\) if and only if for any upper bounded and upper semi-continuous function g on X, we have
Appendix B: Basic feasible solutions in infinite-dimensional linear programming
It is well-known that if a finite-dimensional linear program with m equality constraints has an optimal solution, then there must be an optimal basic feasible solution with at most m non-zero entries. An infinite-dimensional analogue of this fact is proved in [21, Corollary 5 and Proposition 6(v)]. To state this result, let Z be a topological space, let \({\mathcal {M}}_+ (Z)\) be the set of non-negative finite Borel measures supported on Z, and let \(\psi , \phi _1,\dots ,\phi _m : Z \rightarrow {\mathbb {R}}\) be Borel functions as well as \(v \in {\mathbb {R}}^m\). Consider now the optimization problem
and denote by \({\mathcal {F}}\) the feasible region of (11) and by \(\mathrm {ext}({\mathcal {F}})\) the set of extreme points of \({\mathcal {F}}\).
Proposition 1
Suppose that for all \(\gamma \in {\mathcal {F}}\), at least one of the integrals \(\int _X [\psi (z)]_+ \, \mathrm {d} \gamma (z)\) and \(\int _X [-\psi (z)]_+ \, \mathrm {d} \gamma (z)\) is finite and that \(\int _Z |\phi _i| (z) \, \mathrm {d}\gamma (z) <\infty \) for all \(i = 1,\dots ,m\). If
then it holds that
where \({\mathcal {D}}_m (Z)\) is the set of non-negative discrete measures supported on at most m points in Z. Furthermore, if \({\mathcal {F}}\subseteq {\mathcal {P}}(Z)\) and Z is Hausdorff, then the condition (12) is satisfied.
We note that the conclusion of Proposition 1 cannot readily be drawn from the Richter-Rogosinski theorem [22, Theorem 7.32]. Indeed, in our context the Richter-Rogosinski theorem would only ensure the existence of a non-negative discrete measure \(\gamma ^\star \) that is supported on at most \(m + 1\) (instead of m) points since \(\gamma ^\star \) would have to satisfy \(m + 1\) moment conditions: the m moment constraints of problem (11) as well as the additional constraint that \(\gamma ^\star \) attains the optimal objective value of problem (11).
Rights and permissions
About this article
Cite this article
Yue, MC., Kuhn, D. & Wiesemann, W. On linear optimization over Wasserstein balls. Math. Program. 195, 1107–1122 (2022). https://doi.org/10.1007/s10107-021-01673-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01673-8