Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels

Alquier, P.; Friel, N.; Everitt, R.; Boland, A.

doi:10.1007/s11222-014-9521-x

Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels

Published: 10 December 2014

Volume 26, pages 29–47, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

P. Alquier¹,
N. Friel²,
R. Everitt³ &
…
A. Boland²

1673 Accesses
67 Citations
2 Altmetric
Explore all metrics

Abstract

Monte Carlo algorithms often aim to draw from a distribution $\pi $ by simulating a Markov chain with transition kernel $P$ such that $\pi $ is invariant under $P$. However, there are many situations for which it is impractical or impossible to draw from the transition kernel $P$. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace $P$ by an approximation $\hat{P}$. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how ‘close’ the chain given by the transition kernel $\hat{P}$ is to the chain given by $P$. We apply these results to several examples from spatial statistics and network analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NuZZ: Numerical Zig-Zag for general models

Article Open access 05 January 2024

Coordinate sampler: a non-reversible Gibbs-like MCMC sampler

Article 26 December 2019

Non-reversible Metropolis-Hastings

Article Open access 18 August 2015

References

Ahn, S., Korattikara, A., Welling, M.: Bayesian posterior sampling via stochastic gradient Fisher scoring. In: Proceedings of the 29th International Conference on Machine Learning. (2012)
Andrieu, C., Roberts, G.: The pseudo-marginal approach for efficient Monte-Carlo computations. Ann. Stat. 37(2), 697–725 (2009)
Article MATH MathSciNet Google Scholar
Andrieu, C., Vihola, M.: Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms. Preprint arXiv:1210.1484 (2012).
Bardenet, R., Doucet, A., Holmes, C.: Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 31st International Conference on Machine Learning (2014)
Beaumont, M.A.: Estimation of population growth or decline in genetically monitored populations. Genetics 164, 1139–1160 (2003)
Google Scholar
Besag, J.E.: Spatial Interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B 36, 192–236 (1974)
MATH MathSciNet Google Scholar
Bottou, L., Bousquet, O.: The tradeoffs of large-scale learning. In: Sra, S., Nowozin, S., Wright, S.J. (eds.) Optimization for Machine Learning, pp. 351–368. MIT Press, Cambridge (2011)
Google Scholar
Bühlmann, P., Van de Geer, S.: Statistics for High-Dimensional Data. Springer, Berlin (2011)
Book MATH Google Scholar
Caimo, A., Friel, N.: Bayesian inference for exponential random graph models. Soc. Netw. 33, 41–55 (2011)
Article Google Scholar
Dalalyan, A., Tsybakov, A.B.: Sparse regression learning by aggregation and Langevin. J. Comput. Syst. Sci. 78(5), 1423–1443 (2012)
Article MATH MathSciNet Google Scholar
Ferré, D., Hervé, L., Ledoux, J.: Regular perturbation of $V$-geometrically ergodic Markov chains. J. Appl. Probab. 50(1), 184–194 (2013)
Friel, N., Pettitt, A.N.: Likelihood estimation and inference for the autologistic model. J. Comput. Graph. Stat. 13, 232–246 (2004)
Article MathSciNet Google Scholar
Friel, N., Pettitt, A.N., Reeves, R., Wit, E.: Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J. Comput. Graph. Stat. 18, 243–261 (2009)
Article MathSciNet Google Scholar
Friel, N., Rue, H.: Recursive computing and simulation-free inference for general factorizable models. Biometrika 94, 661–672 (2007)
Article MATH MathSciNet Google Scholar
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984)
Article MATH Google Scholar
Gilks, W., Roberts, G., George, E.: Adaptive direction sampling. Statistician 43, 179–189 (1994)
Article Google Scholar
Girolami, M., Calderhead, B.: Riemann manifold Langevin and Hamiltoian Monte Carlo methods (with discussion). J. R. Stat. Soc. Ser. B 73, 123–214 (2011)
Article MathSciNet Google Scholar
Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Kartashov, N.V.: Strong Stable Markov Chains. VSP, Utrecht (1996)
MATH Google Scholar
Korattikara, A., Chen Y., Welling, M.: Austerity in MCMC land: cutting the Metropolis–Hastings Budget. In: Proceedings of the 31st International Conference on Machine Learning, pp. 681–688 (2014)
Liang, F. Jin, I.-H.: An auxiliary variables Metropolis–Hastings algorithm for sampling from distributions with intractable normalizing constants. Technical report (2011)
Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J.: Approximate Bayesian computational methods. Stat. Comput. 22(6), 1167–1180 (2012)
Article MATH MathSciNet Google Scholar
Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability. Cambridge University Press, Cambridge (1993)
Book MATH Google Scholar
Mitrophanov, A.Y.: Sensitivity and convergence of uniformly ergodic Markov chains. J. Appl. Probab. 42, 1003–1014 (2005)
Article MATH MathSciNet Google Scholar
Møller, J., Pettitt, A.N., Reeves, R., Berthelsen, K.K.: An efficient Markov chain Monte-Carlo method for distributions with intractable normalizing constants. Biometrika 93, 451–458 (2006)
Article MathSciNet Google Scholar
Murray, I., Ghahramani, Z., MacKay, D.: MCMC for doubly-intractable distributions. In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence UAI06, AUAI Press, Arlington, Virginia (2006)
Nicholls, G. K., Fox, C., Watt, A.M.: Coupled MCMC with a randomized acceptance probability. Preprint arXiv:1205.6857 (2012)
Propp, J., Wilson, D.: Exactly sampling with coupled Markov chains and applications to statistical mechanics. Random Struct. Algorithms 9, 223–252 (1996)
Article MATH MathSciNet Google Scholar
Reeves, R., Pettitt, A.N.: Efficient recursions for general factorisable models. Biometrika 91, 751–757 (2004)
Article MATH MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Article MATH MathSciNet Google Scholar
Roberts, G.O., Stramer, O.: Langevin diffusions and Metropolis–Hastings algorithms. Methodol. Comput. Appl. Probab. 4, 337–357 (2002)
Article MATH MathSciNet Google Scholar
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996a)
Article MATH MathSciNet Google Scholar
Roberts, G.O., Tweedie, R.L.: Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithm. Biometrika 83(1), 95–110 (1996b)
Article MATH MathSciNet Google Scholar
Robins, G., Pattison, P., Kalish, Y., Lusher, D.: An introduction to exponential random graph models for social networks. Soc. Netw. 29(2), 169–348 (2007)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)
MATH MathSciNet Google Scholar
Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Article MATH Google Scholar
Welling, M., Teh, Y. W.: Bayesian learning via stochastic gradient Langevin dynamics. In: Proceedings of the 28th International Conference on Machine Learning, pp. 681–688 (2011)

Download references

Acknowledgments

The Insight Centre for Data Analytics is supported by Science Foundation Ireland under Grant Number SFI/12/RC/2289. Nial Friel’s research was also supported by an Science Foundation Ireland grant: 12/IP/1424.

Author information

Authors and Affiliations

ENSAE, Paris, France
P. Alquier
School of Mathematical Sciences and Insight: The National Center for Data Analytics, University College Dublin, Dublin, Ireland
N. Friel & A. Boland
Department of Mathematics and Statistics, University of Reading, Reading, UK
R. Everitt

Authors

P. Alquier
View author publications
You can also search for this author in PubMed Google Scholar
N. Friel
View author publications
You can also search for this author in PubMed Google Scholar
R. Everitt
View author publications
You can also search for this author in PubMed Google Scholar
A. Boland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Friel.

Appendix: Proofs

Proof of Corollary 2.3

We apply Theorem 2.1. First, note that we have

$$\begin{aligned} P(\theta ,\mathrm{d}\theta ')&= \delta _{\theta }(\mathrm{d}\theta ') \left[ 1-\int \mathrm{d}t\; h(t|\theta ) \min \left( 1,\alpha (\theta ,t)\right) \right] \\&+\, h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \end{aligned}$$

and

$$\begin{aligned} \hat{P}(\theta ,\mathrm{d}\theta ')&= \delta _{\theta }(\mathrm{d}\theta ')\, \Biggl [1-\iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y') \min \left( 1,\hat{\alpha }(\theta ,t,y')\right) \Biggr ]\\&+ \int \mathrm{d}y'F_{\theta '}(y') \Bigl [ h(\theta '|\theta ) \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Bigr ]. \end{aligned}$$

So we can write

$$\begin{aligned}&(P-\hat{P})(\theta ,\mathrm{d}\theta ')\\&\quad = \delta _{\theta }(\mathrm{d}\theta ') \iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y') \Bigl [ \min \left( 1,\hat{\alpha }(\theta ,t,y')\right) \\&\quad - \min \left( 1,\alpha (\theta ,t)\right) \Bigr ]+\int \mathrm{d}y'\; F_{\theta '}(y') \Bigl [ h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\quad -\, h(\theta '|\theta ) \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Bigr ] \end{aligned}$$

and, finally,

$$\begin{aligned} \Vert P-\hat{P}\Vert&= \frac{1}{2}\sup _{\theta } \int |P-\hat{P}|(\theta ,\mathrm{d}\theta ')\\&= \frac{1}{2}\sup _{\theta } \Biggl \{ \Biggl | \iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y') \Bigl [ \min \left( 1,\hat{\alpha }(\theta ,t,y')\right) \\&\quad -\,\min \left( 1,\alpha (\theta ,t)\right) \Bigr ] \Biggr |\\&\quad +\, \Biggl | \iint \mathrm{d}y'\; \mathrm{d}\theta '\; F_{\theta '}(y') \Biggl [h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\quad -\, h(\theta '|\theta ) \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Biggr ] \Biggr |\Biggr \}\\&= \sup _{\theta } \Biggl \{ \Biggl | \iint \mathrm{d}t\; \mathrm{d}y'\; h(t|\theta ) F_{t}(y')\\&\qquad \times \, \Bigl [\min \left( 1,\hat{\alpha }(\theta ,t,y')\right) - \min \left( 1,\alpha (\theta ,t)\right) \Bigr ] \Biggr | \Biggr \}\\&\le \sup _{\theta } \iint \mathrm{d}y'\; \mathrm{d}\theta ' F_{\theta '}(y') h(\theta '|\theta ) \Bigl | \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\quad -\, \min \left( 1,\hat{\alpha }(\theta ,\theta ',y')\right) \Bigr |\\&= \sup _{\theta } \int \mathrm{d}\theta '\; h(\theta '|\theta ) \int \mathrm{d}y'\; F_{\theta '}(y') \Bigl | \min (1,\alpha (\theta ,\theta '))\\&\quad - \,\min (1,\hat{\alpha }(\theta ,\theta ',y')) \Bigr |\\&\le \sup _{\theta } \int \mathrm{d}\theta '\; h(\theta '|\theta ) \delta (\theta ,\theta '). \end{aligned}$$

$\square $

Proof of Lemma 1

We still use Theorem 2.1, note that

Now, note that

where $X\sim \mathcal {N}(0,I)$ and . Then:

$$\begin{aligned} \mathbb {E}\Biggl |&1 - \exp \left( a^T X - \frac{\Vert a\Vert ^2 }{2} \right) \Biggr | \\&= \exp \left( - \frac{\Vert a\Vert ^2}{2} \right) \mathbb {E}\Biggl | \exp \left( a^T X \right) - \exp \left( \frac{\Vert a\Vert ^2 }{2} \right) \Biggr |\\&= \exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \mathbb {E}\Biggl | \exp \left( a^T X \right) - \mathbb {E}\left[ \exp \left( a^T X \right) \right] \Biggr |\\&\le \exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \sqrt{\mathrm{Var}[\exp \left( a^T X \right) ]}\\&=\exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \sqrt{\mathbb {E}\left[ \exp \left( 2 a^T X \right) \right] -\mathbb {E}\left[ \exp \left( a^T X \right) \right] ^2}\\&= \exp \left( - \frac{\Vert a\Vert ^2 }{2} \right) \sqrt{\exp (2 \Vert a\Vert ^2) - \exp (\Vert a\Vert ^2 )} \\&= \sqrt{\exp (\Vert a\Vert ^2) -1 }. \end{aligned}$$

So finally,

$\square $

Proof of Lemma 2

We only have to check that

$$\begin{aligned}&\mathbb {E}_{y'\sim F_{\theta '}} \left| \hat{\alpha }(\theta ,\theta ',y')-\alpha (\theta ,\theta ')\right| \\&\quad \le \int \mathrm{d}y'\; f(y'|\theta ') \Bigl | \alpha (\theta ,\theta ') - \hat{\alpha }(\theta ,\theta ',y') \Bigr |\\&=\frac{h(\theta |\theta ')\pi (\theta ')q_{\theta '}(y)}{h(\theta '|\theta )\pi (\theta )q_{\theta }(y)}\\&\quad \quad \times \,\,\mathbb {E}_{y_1',\dots ,y_N'\sim f(\cdot |\theta ')} \left| \frac{1}{N}\sum _{i=1}^{N} \frac{q_{\theta }(y_i')}{q_{\theta '}(y_i')} - \frac{Z(\theta )}{Z(\theta ')} \right| \\&\le \frac{1}{\sqrt{N}} \frac{h(\theta |\theta ')\pi (\theta ')q_{\theta '}(y)}{h(\theta '|\theta )\pi (\theta )q_{\theta }(y)} \sqrt{\mathrm{Var}_{y_1 '\sim f(y_1 '|\theta ')} \left( \frac{q_{\theta _n}(y_1')}{q_{\theta '}(y_1')} \right) }. \end{aligned}$$

$\square $

Proof of Theorem 3.1

Under the assumptions of Theorem 3.1, note that (4) leads to

$$\begin{aligned} \alpha (\theta _n,\theta ') = \frac{\pi (\theta ')q_{\theta '}(y) Z(\theta _n)}{\pi (\theta _n)q_{\theta _n}(y)Z(\theta ') } \frac{h(\theta _n|\theta ')}{h(\theta '|\theta _n)} \ge \frac{1}{c_{\pi }^2 c_{h}^2 \mathcal {K}^4}. \end{aligned}$$

(10)

Let us consider any measurable subset $B$ of $\varTheta $ and $\theta \in \varTheta $. We have

$$\begin{aligned} P(\theta ,B)&= \int _{B} \delta _{\theta }(\mathrm{d}\theta ') \left[ 1-\int \mathrm{d}t\; h(t|\theta ) \min \left( 1,\alpha (\theta ,t)\right) \right] \\&\quad +\int _B \mathrm{d}\theta '\; h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\ge \int _B \mathrm{d}\theta '\; h(\theta '|\theta ) \min \left( 1,\alpha (\theta ,\theta ')\right) \\&\ge \frac{1}{c_{\pi }^2 c_{h}^2 \mathcal {K}^4} \int _B \mathrm{d}\theta '\; h(\theta '|\theta ) \text { thanks to~(10)}\\&\ge \frac{1}{c_{\pi }^2 c_{h}^3 \mathcal {K}^4} \int _B \mathrm{d}\theta '. \end{aligned}$$

This proves that $\varTheta $ is a small set for the Lebesgue measure (multiplied by constant $1/c_{\pi }^2 c_{h}^3 \mathcal {K}^4$) on $\varTheta $. According to Theorem 16.0.2 page 394 in Meyn and Tweedie (1993), this proves that:

$$\begin{aligned} \sup _{\theta } \Vert \delta _{\theta } P - \pi (\cdot |y) \Vert \le C \rho ^n \end{aligned}$$

where

$$\begin{aligned} C = 2 \text { and } \rho = 1 - \frac{1}{c_{\pi }^3 c_{h}^3 \mathcal {K}^4} \end{aligned}$$

(note that, by definition, $\mathcal {K},c_\pi ,c_h>1$ so we necessarily have $0<\rho <1$). So, Condition (H1) in Lemma 2.3 is satisfied.

Moreover,

$$\begin{aligned} \delta (\theta ,\theta ')&= \frac{h(\theta |\theta ')\pi (\theta ')q_{\theta '}(y)}{h(\theta '|\theta )\pi (\theta )q_{\theta }(y)} \sqrt{\mathrm{Var}_{y '\sim f(y '|\theta ')} \left( \frac{q_{\theta _n}(y')}{q_{\theta '}(y')} \right) } \\&\le c_h^2 c_{\pi }^2 \frac{q_{\theta '}(y)}{q_{\theta }(y)} \sqrt{\mathbb {E}_{y '\sim f(y '|\theta ')} \left[ \left( \frac{q_{\theta _n}(y')}{q_{\theta '}(y')} \right) ^2\right] } \le c_h^2 c_{\pi }^2 \mathcal {K}^4. \end{aligned}$$

So, Condition (H2) in Lemma 2.3 is satisfied. We can apply this lemma and to give

$$\begin{aligned} \sup _{\theta _0\in \varTheta } \Vert \delta _{\theta _0} P^n - \delta _{\theta _0} \hat{P}^n \Vert \le \frac{\mathcal {C}}{\sqrt{N}} \end{aligned}$$

with

$$\begin{aligned} \mathcal {C} = c_\pi ^2 c_h^2 \mathcal {K}^4 \left( \lambda + \frac{C\rho ^{\lambda }}{1-\rho } \right) \end{aligned}$$

with $\lambda =\left\lceil \frac{\log (1/C)}{\log (\rho )} \right\rceil $. $\square $

Proof of Lemma 3

Note that

So we have to find an upper bound, uniformly over $\theta $, for

$$\begin{aligned}&D\,{:=}\, \mathbb {E}_{y'\sim F_{\theta _n}} \Biggl \{ \exp \Biggl [ \frac{\sigma ^2}{2}\Biggl \Vert \Sigma ^{\frac{1}{2}}\Biggl (\frac{1}{N}\sum _{i=1}^{N} s(y'_i)\\&\quad \quad - \mathbb {E}_{y'\sim f_{\theta }}[s(y')] \Biggr )\Biggr \Vert ^2 \Biggr ] -1 \Biggr \}. \end{aligned}$$

Let us put $V:=\frac{1}{N}\sum _{i=1}^N V^{(i)} := \frac{1}{N}\sum _{i=1}^{N} \Sigma ^{\frac{1}{2}} \{ s(y'_i) - \mathbb {E}_{y'\sim f_{\theta }}[s(y')]\}$ and denote $V_j$ ($j=1,\dots ,k$) the coordinates of $V$, and $V_j^{(i)}$ ($j=1,\dots ,k$) the coordinates of $V^{(i)}$. We have

$$\begin{aligned} D&= \mathbb {E} \left\{ \exp \left[ \frac{1}{2}\sum _{j=1}^k V_j^2 \right] -1 \right\} \\&= \mathbb {E} \left\{ \exp \left[ \frac{1}{k}\sum _{j=1}^k \frac{k}{2} V_j^2 \right] -1 \right\} \\&\le \frac{1}{k}\sum _{j=1}^k \mathbb {E} \left\{ \exp \left[ \frac{k}{2} V_j^2 \right] -1 \right\} . \end{aligned}$$

Now, remark that $V_j=\frac{1}{N}\sum _{i=1}^{N} V_j^{(i)}$ with $-\mathcal {S} \Vert \Sigma \Vert \le V_j^i \le \mathcal {S} \Vert \Sigma \Vert $ so, Hoeffding’s inequality ensures, for any $t\ge 0$,

$$\begin{aligned} \mathbb {P} \left( \left| \sqrt{N} V_j \right| \ge t \right) \le 2 \exp \left[ - \frac{t^2}{2 \mathcal {S}^2 \Vert \Sigma \Vert ^2 }. \right] \end{aligned}$$

As a consequence, for any $\tau >0$,

$$\begin{aligned} \mathbb {E} \exp&\left[ \frac{k}{2} V_j^2 \right] = \mathbb {E} \exp \left[ \frac{k}{2 N} \left( \sqrt{N}V_j\right) ^2 \right] \\&= \mathbb {E} \exp \left[ \frac{k}{2 N} \left( \sqrt{N} V_j\right) ^2 \mathbf {1}_{|\sqrt{N} V_j|\le \tau } \right] \\&\quad + \mathbb {E} \exp \left[ \frac{k}{2 N} \left( \sqrt{N}V_j\right) ^2 \mathbf {1}_{|\sqrt{N} V_j|> \tau } \right] \\&= \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + \int _{\tau }^{\infty } \exp \left( \frac{k}{2N} x^2 \right) \mathbb {P}\left( \left| \sqrt{N} V_j\right| \ge x\right) \mathrm{d} x \\&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \int _{\tau }^{\infty } \exp \left[ \left( \frac{k}{2N} - \frac{1}{2\mathcal {S}^2\Vert \Sigma \Vert ^2} \right) x^2 \right] \mathrm{d} x \\&=\exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \sqrt{\frac{2\pi }{\frac{1}{\mathcal {S}^2\Vert \Sigma \Vert ^2}-\frac{2 k}{N}}}\\&\quad \times \mathbb {P}\left( |\mathcal {N}| > \tau \sqrt{\frac{1}{\frac{1}{\mathcal {S}^2\Vert \Sigma \Vert ^2}-\frac{2k\sigma ^2}{N}}}\right) \\&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \sqrt{\frac{2\pi }{\frac{1}{\mathcal {S}^2 \Vert \Sigma \Vert ^2}-\frac{2k}{N}}} \exp \left[ -\frac{\tau ^2}{ \left( \frac{2}{\mathcal {S}^2 \Vert \Sigma \Vert ^2}-\frac{4k}{N}\right) } \right] \\&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&\quad + 2 \sqrt{\frac{2\pi }{\frac{1}{\mathcal {S}^2 \Vert \Sigma \Vert ^2} -\frac{2k}{N}}} \exp \left[ - \frac{\tau ^2\mathcal {S}^2 \Vert \Sigma \Vert ^2}{2}\right] \end{aligned}$$

where $\mathcal {N}\sim \mathcal {N}(0,1)$. Now, we assume that $N>4 k \mathcal {S}^2 \Vert \Sigma \Vert ^2$. This leads to $\frac{1}{\mathcal {S}^2\Vert \Sigma \Vert ^2}-\frac{2k}{N} >\frac{1}{2\mathcal {S}^2 \Vert \Sigma \Vert ^2}$. This simplifies the bound to

$$\begin{aligned} \mathbb {E} \exp \left[ \frac{k}{2} V_j^2 \right]&\le \exp \left( \frac{k \tau ^2}{2N} \right) \\&+\,\, 4\sqrt{\pi } \mathcal {S} \Vert \Sigma \Vert \exp \left[ -\frac{\tau ^2\mathcal {S}^2 \Vert \Sigma \Vert ^2}{2} \right] . \end{aligned}$$

Finally, we put $\tau =\sqrt{\log (N/k)/(2\mathcal {S}^2 \Vert \Sigma \Vert ^2)}$ to get

$$\begin{aligned} \mathbb {E} \exp \left[ \frac{k}{2} V_j^2 \right] \le \exp \left( \frac{k \log \left( \frac{N}{k}\right) }{4 \mathcal {S}^2 \Vert \Sigma \Vert ^2 N} \right) + \frac{4k\sqrt{\pi } \mathcal {S}\Vert \Sigma \Vert }{N}. \end{aligned}$$

It follows that

$$\begin{aligned} D\le \exp \left( \frac{k \log (N)}{4 \mathcal {S}^2 \Vert \Sigma \Vert ^2 N} \right) -1 +\frac{4k \sqrt{\pi } \mathcal {S} \Vert \Sigma \Vert }{N}. \end{aligned}$$

This ends the proof. $\square $

Proof of Lemma 3.2

We just check all the conditions of Theorem 2.2. First, from Lemma 3, we know that $\Vert P_{\Sigma }-\hat{P}_{\Sigma }| \le \sqrt{\delta /2}\rightarrow 0$ when $N\rightarrow \infty $. Then, we have to find the function $V$. Note that here:

$$\begin{aligned} \nabla \log \pi (\theta |y)&= \nabla \log \pi (\theta ) + s(y)-\mathbb {E}_{y|\theta }[s(y)]\\&= - \frac{\theta }{s^2} + s(y)-\mathbb {E}_{y|\theta }[s(y)]\\&\asymp - \frac{\theta }{s^2}. \end{aligned}$$

Then, according to Theorem 3.1 page 352 in Roberts and Tweedie (1996a) (and its proof), we know that for $\Sigma <s^2$, for some positive numbers $a$ and $b$, for $V(\theta )=a\theta $ when $\theta \ge 0$ and $V(\theta )=-b\theta $ for $\theta <0$, there is a $0<\delta <1$, $\beta >0$ and an inverval $I$ with

$$\begin{aligned} \int V(\theta ) P_{\Sigma } (\theta _0,\mathrm{d}\theta ) \le \delta V(\theta _0) + L\mathbf {1}_{I}(\theta _0), \end{aligned}$$

and so $P_{\Sigma }$ is geometrically ergodic with function $V$. We calculate

and:

So,

$$\begin{aligned} \int V(\theta ) \hat{P}_{\Sigma } (\theta _0,\mathrm{d}\theta )&\le \int V(\theta ) P_{\Sigma } (\theta _0,\mathrm{d}\theta ) + 2\mathcal {S} \max (a,b) \\&\le \delta V(\theta _0) + [L + 2\mathcal {S} \max (a,b)]. \end{aligned}$$

So all the assumptions of Theorem 2.2 are satisfied, and we can conclude that $ \Vert \pi _{\Sigma }-\pi _{\Sigma ,N}\Vert \xrightarrow [N\rightarrow \infty ]{} 0$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alquier, P., Friel, N., Everitt, R. et al. Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels. Stat Comput 26, 29–47 (2016). https://doi.org/10.1007/s11222-014-9521-x

Download citation

Received: 18 September 2014
Accepted: 04 October 2014
Published: 10 December 2014
Issue Date: January 2016
DOI: https://doi.org/10.1007/s11222-014-9521-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels

Abstract

Access this article

Similar content being viewed by others

NuZZ: Numerical Zig-Zag for general models