Abstract
A lattice Gaussian distribution of given mean and covariance matrix is a discrete distribution supported on a lattice maximizing Shannon’s entropy under these mean and covariance constraints. Lattice Gaussian distributions find applications in cryptography and in machine learning. The set of Gaussian distributions on a given lattice can be handled as a discrete exponential family whose partition function is related to the Riemann theta function. In this paper, we first report a formula for the Kullback–Leibler divergence between two lattice Gaussian distributions and then show how to efficiently approximate it numerically either via Rényi’s \(\alpha \)-divergences or via the projective \(\gamma \)-divergences. We illustrate how to use the Kullback-Leibler divergence to calculate the Chernoff information on the dually flat structure of the manifold of lattice Gaussian distributions.
Similar content being viewed by others
Notes
Definition: n univariate functions \(f_1(x),\ldots , f_n(x)\) are said to be linearly dependent if there exists n constants \(c_1,\ldots , c_n\), not all zero, such that \(\sum _{i=1}^n c_i f_i(x)=0\) for some x belonging to an interval \(I\subset \mathbb {R}\). Otherwise, the functions are said linearly independent.
References
Keener RW (2010) Theoretical statistics: topics for a core course. Springer, Heidelberg
Grätzer G (2011) Lattice theory: foundation. Springer, Heidelberg
Barndorff-Nielsen O (2014) Information and Exponential Families in Statistical Theory. Wiley, New Jersey
Calin O, Udrişte C (2014) Geometric modeling in probability and statistics. Springer, Heidelberg, Germany
Agostini D, Améndola C (2019) Discrete Gaussian distributions via theta functions. SIAM J Appl Algebra Geometry 3(1):1–30
Olver FW, Lozier DW, Boisvert RF, Clark CW (2010) NIST Handbook of Mathematical Functions. Cambridge University Press, Cambridge
Nielsen F (2020) An elementary introduction to information geometry. Entropy 22(10):1100
Siegel CL (2014) Symplectic Geometry. Elsevier, Amsterdam
Deconinck B, Van Hoeij M (2001) Computing Riemann matrices of algebraic curves. Physica D 152:28–46
Frauendiener J, Jaber C, Klein C (2019) Efficient computation of multidimensional theta functions. J Geometry Phys 141:147–158
Mumford, D., Musili, C.: Tata Lectures on Theta I. Birkhäuser, Boston, USA (2007). With the collaboration of C. Musili, M. Nori, E. Previato, and M. Stillman
Deconinck B, Heil M, Bobenko A, Van Hoeij M, Schmies M (2004) Computing Riemann theta functions. Math Comput 73(247):1417–1442
Agostini D, Chua L (2021) Computing theta functions with Julia. J Softw Algebra Geometry 11(1):41–51
Osborne, A.R.: Nonlinear ocean wave and the inverse scattering transform. In: Scattering, pp. 637–666. Elsevier, The Netherlands (2002)
Labrande H (2018) Computing Jacobi’s theta in quasi-linear time. Math Comput 87(311):1479–1508
Lisman J, Van Zuylen M (1972) Note on the generation of most probable frequency distributions. Statistica Neerlandica 26(1):19–23
Kemp AW (1997) Characterizations of a discrete normal distribution. J Stat Planning Inference 63(2):223–229
Szabłowski PJ (2001) Discrete normal distribution and its relationship with Jacobi theta functions. Stat Prob Lett 52(3):289–299
Budroni, A., Semaev, I.: New Public-Key Crypto-System EHT. arXiv preprint arXiv:2103.01147 (2021)
Wang, L., Jia, R., Song, D.: D2P-Fed: Differentially private federated learning with efficient communication. arXiv preprint arXiv:2006.13039 (2020)
Canonne, C.L., Kamath, G., Steinke, T.: The discrete Gaussian for differential privacy. arXiv preprint arXiv:2004.00010 (2020)
Canonne, C.L., Kamath, G., Steinke, T.: The discrete gaussian for differential privacy. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020)
Carrazza S, Krefl D (2020) Sampling the Riemann-Theta Boltzmann machine. Comput Phys Commun 256:107464
Van Erven T, Harremos P (2014) Rényi divergence and Kullback–Leibler divergence. IEEE Trans Inform Theory 60(7):3797–3820
Fujisawa H, Eguchi S (2008) Robust parameter estimation with a small bias against heavy contamination. J Multivariate Anal 99(9):2053–2081
Cover TM (1999) Elements of Information Theory. Wiley, New Jersey
Julier SJ An empirical study into the use of Chernoff information for robust, distributed fusion of Gaussian mixture models. In: 9th International conference on information Fusion, pp. 1–8 (2006). IEEE
Pistone G, Wynn HP Finitely generated cumulants. Statistica Sinica, 1029–1052 (1999)
Rockafellar RT (2015) Convex Analysis. Princeton University Press, Princeton
Navarro J, Ruiz J (2005) A note on the discrete normal distribution. Adv Appl Stat 5(2):229–245
Zellner A, Highfield RA (1988) Calculation of maximum entropy distributions and approximation of marginalposterior distributions. J Econ 37(2):195–209
Mohammad-Djafari A A Matlab program to calculate the maximum entropy distributions. In: Maximum Entropy and Bayesian Methods, pp. 221–233. Springer, Heidelberg (1992)
George AJ, Kashyap N An MCMC Method to Sample from Lattice Distributions. arXiv:2101.06453 (2021)
Nielsen F, Nock R Entropies and cross-entropies of exponential families. In: 2010 IEEE International Conference on Image Processing, pp. 3621–3624 (2010). IEEE
Zhang J (2004) Divergence function, duality, and convex analysis. Neural Comput 16(1):159–195
Nielsen F, Boltz S (2011) The Burbea-Rao and Bhattacharyya centroids. IEEE Trans Inform Theory 57(8):5455–5466
Bhattacharyya A On a measure of divergence between two multinomial populations. Sankhyā: the indian journal of statistics, 401–406 (1946)
Cichocki A, Amari S-i Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities. Entropy 12(6), 1532–1568 (2010)
Nielsen F, Sun K, Marchand-Maillet S (2017) On Hölder projective divergences. Entropy 19(3):122
Jenssen R, Principe JC, Erdogmus D, Eltoft T (2006) The Cauchy-Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels. J Franklin Inst 343(6):614–629
Amari S-i Information Geometry and Its Applications vol. 194. Springer, Heidelberg (2016)
Nielsen F (2013) An information-geometric characterization of Chernoff information. IEEE Signal Process Lett 20(3):269–272
Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput Math Math Phys 7(3):200–217
Shima H (2007) The Geometry of Hessian Structures. World Scientific, Singapore
Boissonnat J-D, Nielsen F, Nock R (2010) Bregman Voronoi diagrams. Discrete Comput Geometry 44(2):281–307
Garcia V, Nielsen F (2010) Simplification and hierarchical representations of mixtures of exponential families. Signal Process 90(12):3197–3212
Banerjee A, Merugu S, Dhillon IS, Ghosh J Clustering with Bregman divergences. Journal of machine learning research 6(10) (2005)
McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296
Acknowledgements
We thank the reviewers for the constructive and helpful suggestions on this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nielsen, F. The Kullback–Leibler Divergence Between Lattice Gaussian Distributions. J Indian Inst Sci 102, 1177–1188 (2022). https://doi.org/10.1007/s41745-021-00279-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41745-021-00279-5