Thursday, October 8, 2015

Note On "Information Gain" and the Fisher Information

From last post, http://rspeare.blogspot.com/2015/08/the-fisher-matrix-and-volume-collapse_31.html, I was talking about "volume" collapse in parameter space due to some data, $\vec{x}$. I'd like to relate this to information gain, which can be defined pretty simply as:

\begin{eqnarray}
H[p(\vec{\theta})] - H[p(\vec{\theta} \vert \vec{x})] &=& IG(\vec{\theta} \vert \vec{x})
\end{eqnarray}

Now, using Bayes' rule we can change what we've written in the second term above:

\begin{eqnarray}
H[p(\vec{\theta})] - H[\mathcal{L}(\vec{x} \vert \vec{\theta}) p(\vec{\theta})] &=& IG(\vec{\theta} \vert \vec{x})
\end{eqnarray}

And using the addition property of entropy, we can write:

\begin{eqnarray}
IG(\vec{\theta} \vert \vec{x}) &=& - H[\mathcal{L}(\vec{x} \vert \vec{\theta})]
\end{eqnarray}

But, with the fisher information matrix,

\begin{eqnarray}
\mathbf{F}_{ij} &=&\langle \frac{-\partial^2 \log \mathcal{L}(\mathbf{x} \vert \mathbf{\theta})}{\partial \theta_i \partial \theta_j} \rangle
\end{eqnarray}

 we can estimate the covariance of the likelihood function, and therefore it's entropy -- if we use the laplace approximation and denote the likelihood a gaussian in parameter space:

\begin{eqnarray}
H[\mathcal{L}] &=& \frac{d}{2}\log(2\pi e) + \log \left( \vert \mathbf{F}_{ij} \vert^{-1} \right)
\end{eqnarray}

This means that our information gain on $\vec{\theta}$ given an experiment $\vec{x}$ is proportional to the logarithm of the determinant of the Fisher matrix.

\begin{eqnarray}
IG(\vec{\theta} \vert \vec{x} ) &\sim &\log \left( \vert \mathbf{F}_{ij}\vert \right)
\end{eqnarray}

And so, we now see intuitively why this is \textbf{called} the fisher information. Our ``volume'' collapse on the variables of interest $\vec{\theta}$ given our experiment, is:

\begin{eqnarray}
e^{IG(\mathrm{\theta} \vert \mathrm{x})} & \sim & \mathrm{det}\vert \mathbf{F}_{ij} \vert
\end{eqnarray}




No comments:

Post a Comment