Abstract
Approximating a divergence between two probability distributions from their samples is a fundamental challenge in the statistics, information theory, and machine learning communities, because a divergence estimator can be used for various purposes such as two-sample homogeneity testing, change-point detection, and class-balance estimation. Furthermore, an approximator of a divergence between the joint distribution and the product of marginals can be used for independence testing, which has a wide range of applications including feature selectionFeature selection and extraction, clusteringClustering, object matching, independent component analysisIndependent component analysis (ICA), and causalityCausality learning. In this chapter, we review recent advances in direct divergence approximation that follow the general inference principle advocated by Vladimir Vapnik—one should not solve a more general problem as an intermediate step. More specifically, direct divergence approximation avoids separately estimating two probability distributions when approximating a divergence. We cover direct approximators of the Kullback–Leibler (KL) divergence, the Pearson (PE) divergence, the relative PE (rPE) divergence, and the L 2-distance. Despite the overwhelming popularity of the KL divergence, we argue that the latter approximators are more useful in practice due to their computational efficiency, high numerical stabilityStability, and superior robustness against outliers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ali, S., Silvey, S.: A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. B 28(1), 131–142 (1966)
Amari, S., Nagaoka, H.: Methods of Information Geometry. Oxford University Press, Providence (2000)
Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge (2006)
Cortes, C., Mansour, Y., Mohri, M.: Learning bounds for importance weighting. In: Lafferty, J., Williams, C., Zemel, R., Shawe-Taylor, J., Culotta, A. (eds.) Advances in Neural Information Processing Systems, Vancouver, vol. 23, pp. 442–450 (2010)
Cover, T., Thomas, J.: Elements of Information Theory, 2nd edn. Wiley, Hoboken (2006)
Csiszár, I.: Information-type measures of difference of probability distributions and indirect observation. Stud. Sci. Math. Hungarica 2, 229–318 (1967)
du Plessis, M., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. In: Proceedings of 29th International Conference on Machine Learning (ICML’12), Edinburgh, pp. 823–830 (2012)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Hoerl, A., Kennard, R.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67 (1970)
Jitkrittum, W., Hachiya, H., Sugiyama, M.: Feature selection via ℓ 1-penalized squared-loss mutual information. IEICE Trans. Inf. Syst. E96-D(7), 1513–1524 (2013)
Kanamori, T., Hido, S., Sugiyama, M.: A least-squares approach to direct importance estimation. J. Mach. Learn. Res. 10, 1391–1445 (2009)
Kanamori, T., Suzuki, T., Sugiyama, M.: f-divergence estimation and two-sample homogeneity test under semiparametric density-ratio models. IEEE Trans. Inf. Theory 58(2), 708–720 (2012)
Karasuyama, M., Sugiyama, M.: Canonical dependency analysis based on squared-loss mutual information. Neural Netw. 34, 46–55 (2012)
Kawahara, Y., Sugiyama, M.: Sequential change-point detection based on direct density-ratio estimation. Stat. Anal. Data Min. 5(2), 114–127 (2012)
Keziou, A.: Dual representation of ϕ-divergences and applications. C. R. Math. 336(10), 857–862 (2003)
Kimura, M., Sugiyama, M.: Dependence-maximization clustering with least-squares mutual information. J. Adv. Comput. Intell. Intell. Inform. 15(7), 800–805 (2011)
Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)
Liu, S., Yamada, M., Collier, N., Sugiyama, M.: Change-point detection in time-series data by relative density-ratio estimations. Neural Netw. 43, 72–83 (2013)
Nguyen, X., Wainwright, M., Jordan, M.: Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans. Inf. Theory 56(11), 5847–5861 (2010)
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50(302), 157–175 (1900)
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948)
Sugiyama, M.: Machine learning with squared-loss mutual information. Entropy 15(1), 80–112 (2013)
Sugiyama, M., Suzuki, T.: Least-squares independence test. IEICE Trans. Inf. Syst. E94-D(6), 1333–1336 (2011)
Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., Kawanabe, M.: Direct importance estimation for covariate shift adaptation. Ann. Inst. Stat. Math. 60(4), 699–746 (2008)
Sugiyama, M., Kawanabe, M., Chui, P.: Dimensionality reduction for density ratio estimation in high-dimensional spaces. Neural Netw. 23, 44–59 (2010)
Sugiyama, M., Suzuki, T., Itoh, Y., Kanamori, T., Kimura, M.: Least-squares two-sample test. Neural Netw. 24(7), 735–751 (2011)
Sugiyama, M., Yamada, M., von Bünau, P., Suzuki, T., Kanamori, T., Kawanabe, M.: Direct density-ratio estimation with dimensionality reduction via least-squares hetero-distributional subspace search. Neural Netw. 24(2), 183–198 (2011)
Sugiyama, M., Yamada, M., Kimura, M., Hachiya, H.: On information-maximization clustering: Tuning parameter selection and analytic solution. In: Proceedings of 28th International Conference on Machine Learning (ICML’11), Bellevue, pp. 65–72 (2011)
Sugiyama, M., Suzuki, T., Kanamori, T.: Density Ratio Estimation in Machine Learning. Cambridge University Press, Cambridge (2012)
Sugiyama, M., Suzuki, T., Kanamori, T.: Density ratio matching under the Bregman divergence: a unified framework of density ratio estimation. Ann. Inst. Stat. Math. 64(5), 1009–1044 (2012)
Sugiyama, M., Suzuki, T., Kanamori, T., du Plessis, M., Liu, S., Takeuchi, I.: Density-difference estimation. Neural Comput. 25(10), 2734–2775 (2013)
Suzuki, T., Sugiyama, M.: Least-squares independent component analysis. Neural Comput. 23(1), 284–301 (2011)
Suzuki, T., Sugiyama, M.: Sufficient dimension reduction via squared-loss mutual information estimation. Neural Comput. 3(25), 725–758 (2013)
Suzuki, T., Sugiyama, M., Kanamori, T., Sese, J.: Mutual information estimation reveals global associations between stimuli and biological processes. BMC Bioinform. 10(1) (2009). S52, 12p
Tibshirani, R.: Regression shrinkage and subset selection with the Lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)
Tomioka, R., Suzuki, T., Sugiyama, M.: Super-linear convergence of dual augmented Lagrangian algorithm for sparsity regularized estimation. J. Mach. Learn. Res. 12, 1537–1586 (2011)
Torkkola, K.: Feature extraction by non-parametric mutual information maximization. J. Mach. Learn. Res. 3, 1415–1438 (2003)
Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., Sugiyama, M.: Direct density ratio estimation for large-scale covariate shift adaptation. J. Inf. Process. 17, 138–155 (2009)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Yamada, M., Sugiyama, M.: Direct importance estimation with Gaussian mixture models. IEICE Trans. Inf. Syst. E92-D(10), 2159–2162 (2009)
Yamada, M., Sugiyama, M.: Dependence minimizing regression with model selection for non-linear causal inference under non-Gaussian noise. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI’10), Atlanta, pp. 643–648. AAAI (2010)
Yamada, M., Sugiyama, M.: Cross-domain object matching with model selection. In: Gordon, G., Dunson, D., Dudík, M. (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS2011), Ft. Lauderdale, JMLR Workshop and Conference Proceedings, vol. 15, pp. 807–815 (2011)
Yamada, M., Sugiyama, M.: Direct density-ratio estimation with dimensionality reduction via hetero-distributional subspace analysis. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence (AAAI11), San Francisco, pp. 549–554. AAAI (2011)
Yamada, M., Sugiyama, M., Wichern, G., Simm, J.: Direct importance estimation with a mixture of probabilistic principal component analyzers. IEICE Trans. Inf. Syst. E93-D(10), 2846–2849 (2010)
Yamada, M., Niu, G., Takagi, J., Sugiyama, M.: Computationally efficient sufficient dimension reduction via squared-loss mutual information. In: Proceedings of the Third Asian Conference on Machine Learning (ACML’11), Taoyuan. JMLR Workshop and Conference Proceedings, vol. 20, pp. 247–262 (2011)
Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H., Sugiyama, M.: Relative density-ratio estimation for robust distribution comparison. Neural Comput. 25(5), 1324–1370 (2013)
Yamanaka, M., Matsugu, M., Sugiyama, M.: Detection of activities and events without explicit categorization. IPSJ Trans. Math. Model. Appl. 6(2), 86–92 (2013)
Yamanaka, M., Matsugu, M., Sugiyama, M.: Salient object detection based on direct density-ratio estimation. IPSJ Trans. Math. Model. Appl. 6(2), 78–85 (2013)
Acknowledgements
The author acknowledges support from the JST PRESTO program, KAKENHI 25700022, the FIRST program, and AOARD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sugiyama, M. (2013). Direct Approximation of Divergences Between Probability Distributions. In: Schölkopf, B., Luo, Z., Vovk, V. (eds) Empirical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41136-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-41136-6_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41135-9
Online ISBN: 978-3-642-41136-6
eBook Packages: Computer ScienceComputer Science (R0)