Tuesday, June 24, 2014

Conditioning Multinormals

In the following, all equals signs will really mean "proportional" as we're going to work with probability densities and for the time being, don't want to worry about normalization.

Let $\mathbf{x}$ and $\mathbf{y}$ be random vectors drawn from the following multinormal:

\begin{eqnarray}
P(\mathbf{x},\mathbf{y}) \sim \mathrm{N}\left( \left[\begin{array}{c} \mu_x \\ \mu_y \end{array}\right], \left[\begin{array}{cc} A & C \\ C^T & B \end{array}\right]^{-1} \right)\\
\sim e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x ,& y - \mu_y \end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x, \\ y-\mu_y \end{array}\right] \right)}
\end{eqnarray}

The covariance matrix above is quite complicated, but we can think of $C^{-1}$ as cross correlating our vectors $\mathbf{x}$ and $\mathbf{y}$ and $A^{-1}$ and $B^{-1}$ "auto"-correlating within the vectors $\mathbf{x}$ and $\mathbf{y}$. We have set this up to be a Block Diagonal Matrix. We need not have $\mathbf{x}$ and $\mathbf{y}$ be of the same dimension.

To find the inverse of a block diagonal matrix, like the above, we write

\begin{equation}
A=\left[\begin{array}{cc} P & Q \\ R & S \end{array}\right]
\end{equation}

we write:

\begin{eqnarray}
\left[\begin{array}{cc} P & Q \\ R & S \end{array}\right]\left[\begin{array}{cc} X_1 & X_2 \\ X_3 & X_4 \end{array}\right] &=& \left[\begin{array}{cc} I & 0 \\ 0 & I \end{array}\right]
\end{eqnarray}

Which results in the following linear equations:

\begin{eqnarray}
PX_1+QX_3 &=& I \\
PX_2 &=& - Q X_4 \\
RX_1 = - S X_3 \\
RX_2+SX_4 &=& I
\end{eqnarray}

with the solutions:

\begin{eqnarray}
X_1 &=& P^{-1}+P^{-1}QMRP^{-1}\\
X_2 &=& -P^{-1}QM\\
X_3 &=& -MRP^{-1} \\
X_4 &=& M \\
M &=& (S-RP^{-1}Q)^{-1}
\end{eqnarray}

and

\begin{eqnarray}
X_1 &=& N\\
X_2 &=& -NQS^{-1}\\
X_3 &=& -S^{-1}RN \\
X_4 &=& S^{-1}+S^{-1}RNQS^{-1} \\
N &=& (P-QS^{-1}R)^{-1}
\end{eqnarray}

Now, let's say we are interested in the conditional multinormal

\begin{eqnarray}
(x \vert y) \sim ?
\end{eqnarray}

We expect ?, above, to be another Gaussian, which will be uniquely determined by its mean and variance -- the first two cumulants. Let's right down the conditional expectation value using Bayes' Theorem:

\begin{eqnarray}
\langle\mathbf{x} \vert \mathbf{y}\rangle &=& \int \mathbf{x} P(x\vert y) d\mathbf{x} \\
&=& \int \mathbf{x} \frac{P(\mathbf{x},\mathbf{y})}{P(\mathbf{y})} d\mathbf{x}
\end{eqnarray}

And now, to expand things to higher order in $\mathbf{x}$, we can create a conditional moment generating function:

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot (\mathbf{x} \vert \mathbf{y})}\right)\ \ \frac{e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x & y-\mu_y \end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x \\ y-\mu_y \end{array}\right] \right)} }{e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x & y-\mu_y \end{array}\right] \cdot \left[\begin{array}{cc} 0 & 0 \\ 0 & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x \\ y-\mu_y \end{array}\right] \right)}}
\end{eqnarray}

Notice how the denominator is just the Probability density on $\mathbf{y}$. Let's rewrite $\mathbf{x}^\prime = \mathbf{x}-\mathbf{\mu}_x$ and $\mathbf{y}^\prime = \mathbf{y}-\mathbf{\mu}_y$ to simplify things. Let's also get rid of the conditional notation $\vert y $ in the exponent to clean things up and make conditioning implicit:

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot (\mathbf{x^\prime}+\mathbf{\mu}_x )}\right)\ \ e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x^\prime & y^\prime \end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & 0 \end{array}\right]\cdot \left[\begin{array}{c} x^\prime \\ y^\prime \end{array}\right] \right)} \\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot (\mathbf{x^\prime}+\mathbf{\mu}_x )}\right)\ \ e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x^\prime & y^\prime \end{array}\right] \cdot \left[\begin{array}{c} Ax^\prime+Cy^\prime \\ C^Tx^\prime \end{array}\right] \right)} \\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }\int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot \mathbf{x^\prime}}\right)\ \ e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j)- C_{ij} x^\prime_i y^\prime_j \right)}
\end{eqnarray}

Now we can simplify things a little bit, by writing $ik_i = \lambda_i-C_{ij}y^\prime_j$
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }\int d\mathbf{x} \left(e^{(\lambda_i-C_{ij}y^\prime_j)x^\prime_i}\right)\ \ e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j \right)}\\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }\int d\mathbf{x} e^{ik_i x^\prime_i}\ \ e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j \right)}
\end{eqnarray}

And now we have a multidimensional Fourier transform of a Gaussian, (which is another Gaussian)

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }e^{-\frac{1}{2}k_i A^{-1}_{ij}k_j}\\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x -\frac{1}{2}(\lambda_i-C_{il}y^\prime_l)A^{-1}_{ij}(\lambda_j-C_{jm}y^\prime_m)}\\
\end{eqnarray}

Foiling out and collecting like terms in $\lambda $, we've got:

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda_i \left[\mathbf{\mu}^{(x)}_i  -\frac{1}{2}A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m)\right]-\left[\frac{1}{2}\lambda_i A^{-1}_{ij}\lambda_j\right] \ \ - \ \ (y_l-\mu^{(y)}_l)C_{il}A^{-1}_{ij}C_{jm}(y_m-\mu^{(y)}_m)} \\
\end{eqnarray}

And now we see that the first moment and variance behave differently than one would expect:
\begin{eqnarray}
\frac{\partial}{\partial \lambda_i}\langle e^{(\mathbf{x} \vert \mathbf{y})}\vert_{\mathbf{\lambda}=0} \rangle &=&\langle x_i \rangle = \mathbf{\mu}^{(x)}_i  - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m) \\
\frac{\partial^2}{\partial \lambda_i \lambda_j}\langle e^{(\mathbf{x} \vert \mathbf{y})}\vert_{\mathbf{\lambda}=0} \rangle &=&\langle x_i x_j \rangle = A^{-1}_{ij} = A-C B^{-1}C^T
\end{eqnarray}

Where each of these expectation values are modified by the Gaussian on $\mathbf{y}$, which is essentially another normalization factor that we've thrown out:
\begin{eqnarray}
f(y) &=& \mathrm{Exp}\left[-\frac{1}{2}(y_l-\mu^{(y)}_l)C_{il}A^{-1}_{ij}C_{jm}(y_m-\mu^{(y)}_m)\right]
\end{eqnarray}

So now, if we write the conditional Probability density function $\mathbf{P(x \vert y)}$, we have:

\begin{eqnarray}
P(\mathbf{x} \vert \mathbf{y}) &=& N\left[ \mathbf{\mu}^{(x)}_i  - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m), A^{-1}_{ij} \right] \\
P(\mathbf{x} \vert \mathbf{y}) &=& N\left[ \mathbf{\mu}^{(x)}_i  - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m), A-C B^{-1}C^T \right]
\end{eqnarray}




No comments:

Post a Comment