# Probabilistic Inference Using Markov Chain Monte Carlo Methods, Parts 3 & 4

These notes are based on the gnofai tutorial that covers Radford Neal’s MCMC review (pdf). Sanmi’s writeup is here. This post is mainly to index my scanned notes with keywords, so I can search for them.

We started off with a review of the motivation behind this tutorial series. What do we want? $E[f]$. Why? $P(y|x)$ can be written as $E[y|x]$

$E[f] = \int f(x)p(x)dx$, but we don’t know our distribution. So take some $z$ ~ $p(x)$. We want $\frac{1}{n} \sum f(z) \rightarrow E[f]$ as $n \rightarrow \infty$.

This is where markov chains come in: we get some $p_n(x) \rightarrow p(x)$; if our MC is ergodic, we’ll converge to $p(x)$ regardless of our starting state, $p_0$., i.e., we want to reach $p(x) = p(x)T(x)$.

The practical concerns Sanmi mentioned where work per transition, time to converge, and the number of steps we need to get IID draws from our distribution once we’ve converged.

Then there was a brief review of Gibbs sampling, to motivate the Metropolis algorithm. Recall that in Gibbs sampling, our local transitions, $B_k$, the marginals, $p(x_k | X_{\backslash k})$. With Gibbs, we fix all but one variable, sample from / change that variable, and iterate, but this requires getting the partials. Sometimes we can’t do that, and that’s when we want to use Metropolis, which is what today’s notes are about.

Unfortunately, the explosive decompression of a pen destroyed my notes for part 4, but as always, Sanmi has a nice writeup

# Probabilistic Inference Using Markov Chain Monte Carlo Methods, Part 2

These notes are based on the gnofai tutorial that covers Radford Neal’s MCMC review (pdf).

Sanmi was out of town this week, so I can’t rely on his writeup for keywords to index my notes this time around.

We covered the basics of Markov Chains this week: every fininte chain has at least one (not necessarily unique) invariant distribution.

Detailed balance (time reversibility, $\pi(x)T(x,x') = \pi(x')(T(x',x)$) and ergodicity (convergence to a fixed point, $\pi = \pi T$ independent of starting state, $P_0$) were defined; we showed that detailed balance implies a fixed point, but that the converse isn’t true, and we found conditions for homogeneous (stateless / memoryless) Markov process to be ergodic, as well as how to generalize to a non-homogeneous process (let $T = T_0 T_1 \dots T_{d-1}$).

There was a brief aside where we related the eigenvectors of T to properties of the Markov chain; there was a comment from Meghana that the properties as written weren’t quite right, so we should  see the review for details.

We then covered Gibbs Sampling, which is useful when we have a joint distribution that we want to sample, and the conditionals have simple parametric forms. The result is that we reduce a multivariate to a univariate with a nice form that we can sample. And even if not, we could just use rejection sampling.

To do Gibbs, we fix everything but some $x_k$, and then change $x_k$ when we sample, iterating over all $k$. Since we repeat after N steps, this is homogenous w.r.t. $T = B_1 B_2 \dots B_n$. We proved that we do actually converge to a fixed point; intuitively, it’s because only the kth variable changes, so their marginals don’t change, and the kth just draws from this conditional.

We then covered a simple example of a latent class model with n observed variables which were correlated with some hidden variable