Tuesday, September 1, 2015

Equivalence of Shannon Entropy and Logarithm of Posterior Partition function (under Laplace Approximation)

Now, the integral of the posterior over parameter space is called the evidence:

\begin{eqnarray}
P(D) = Z &=& \int d\vec{\theta} \mathcal{L}(D \vert \vec{\theta})P(\vec{\theta})
\end{eqnarray}

If we approximate the un-normalized posterior as a Gaussian -- which I've recently learned from Christopher Bishop is called the Laplace approximation -- by taking logarithmic derivatives, then we get:

\begin{eqnarray}
P(D \vert \vec{\theta})P(\vec{\theta}) &\approx & P(D \vert \vec{\theta}_{\mathrm{MAP}})P(\vec{\theta}_{\mathrm{MAP}})\mathrm{exp}\left[-(\vec{\theta}-\vec{\theta}_{\mathrm{MAP}})_i\left(F_{ij}+\Sigma_{ij}^{-1}\right)(\vec{\theta}-\vec{\theta}_{\mathrm{MAP}})_j \right]
\end{eqnarray}

We already know what the integral of this Gaussian over parameter space will be:

\begin{eqnarray}
\log Z &=& \frac{D}{2}\log(2\pi)+\frac{1}{2}\log \left( \vert F_{ij}+\Sigma_{ij}^{-1} \vert\right)
\end{eqnarray}

Comparing this with our measured entropy of the posterior, we see we're off by just a constant:

\begin{eqnarray}
H\left[ P(\vec{\theta} \vert D \right] &=&  \frac{D}{2}\log(2\pi)+\frac{1}{2}\log \left( \vert F_{ij}+\Sigma_{ij}^{-1} \vert\right) + \frac{D}{2}
\end{eqnarray}

This is another example of why a statistical mechanics interpretation of $Z$, our normalization of the posterior, is right on point. It's logarithm -- up to an additive constant, which can be thrown away -- is equal to the Entropy of our distribution, which is a common point of wisdom in statistical mechanics. So in conclusion, under the laplace approximation, writing our posterior as a Gaussian by expanding in the exponential, collecting the first $\vec{\theta}_{\mathrm{MAP}}$ and second $F_{ij}$ cumulants, we get:

\begin{eqnarray}
\log Z &=& H\left[P(\vec{\theta} \vert D )\right] + \mathrm{const}
\end{eqnarray}

No comments:

Post a Comment