Saturday, February 22, 2014

Bivariate Normal Distribution

Because I have had a great deal of trouble finding information on the Bivariate Normal distribution -- or at least, information that is easy for me to understand -- I'd like to take a moment to write down my understanding of the concept, which stems mainly from the Characteristic function.

Working with results from last time, on the central limit theorem, we see that we can describe a Probability density function by it's fourier transform, the characteristic function

\begin{eqnarray}
P(\mathbf{x}) &=& \frac{1}{(2\pi)^{d/2}}\int d^dk e^{i\mathbf{k}\cdot \mathbf{x}-i\mathbf{k}\cdot \mathbf{c_1}-\mathbf{k}\cdot \mathbf{c_2}\cdot\mathbf{k}+\dots }
\end{eqnarray}

Where, for now, I will ignore higher order terms corresponding to non-gaussianity. If we zero about the mean -- absorbing our first cumulant, $\mathbf{c_1}$ into $\mathbf{x}$ -- we find that we have a simple fourier transform of a Gaussian, which is itself a Gaussian.

\begin{eqnarray}
P(\mathbf{x}) &=& \frac{1}{(2\pi)^{d/2}}\int d^dk e^{i\mathbf{k}\cdot \mathbf{x}-\mathbf{k}\cdot \mathbf{c_2}\cdot\mathbf{k}}\\
P(\mathbf{x}) &=& \frac{1}{(2\pi)}\frac{1}{|\mathbf{c_2}|}e^{\mathbf{x}\cdot \mathbf{c_2^{-1}}\cdot \mathbf{x}}
\end{eqnarray}

Where $\mathbf{c_2}$ is the covariance matrix -- rank (0,2) tensor -- in $k$ space.

If we are working with two random variables, $x_1,x_2$, we see that our $k$ integration takes place over two dimensions, so we are left with
\begin{eqnarray}
P(x_1,x_2) &=& \frac{1}{2\pi}\int dk_1 dk_2 e^{i(k_1 x_1+k_2 x_2)}e^{-\frac{1}{2}\left( c_{11}k_1^2 +2c_{12}k_1k_2 +c_{22}k_2^2 \right)}
\end{eqnarray}

Notice that our characteristic function has a simple form, which we can now play with in order to construct estimators for the correlation coefficient $\rho$ commonly defined as

\begin{eqnarray}
\rho &=& \frac{c_{12}}{\sigma_1 \sigma_2}
\end{eqnarray}

Let's take a look. First note that $c_{11}=\sigma_1^2$ and $c_{22}=\sigma_2^2$. These are the second cumulants of $x_1$ and $x_2$ respectively, their variance with respect to self. Re-writing our bivariate normal with these definitions, and assuming zero mean, (or $c_1=c_2=0$), we find

\begin{eqnarray}
\mathbf{c_2^{-1}}&=& \left(\begin{array}{cc}
 \frac{1}{\sigma_1^2} & \frac{\rho}{\sigma_1 \sigma_2} \\
 \frac{\rho}{\sigma_1 \sigma_2} & \frac{1}{\sigma_2^2} \\
\end{array}\right)\\
\vert \mathbf{c_2} \vert &=& \sigma_1^2 \sigma_2^2 (1-\rho^2)
\end{eqnarray}

\begin{eqnarray}
P(x_1,x_2) &=& \frac{1}{2\pi}\frac{1}{\sigma_1 \sigma_2 \sqrt{1-\rho^2}}^{-\frac{1}{2}\left( \frac{x_1^2}{\sigma_1^2} +2\rho \frac{x_1x_2}{\sigma_1 \sigma_2} +\frac{x_2^2}{\sigma_2^2} \right)}
\end{eqnarray}

This last, nasty equation is the typical bivariate normal seen in the literahchurr. (literature in snoot-spell). But how the heck to get an estimator for the covariance?!?! We know that

\begin{eqnarray}
\sigma_1^2 &=& \langle x_1^2 \rangle-\langle x_1 \rangle^2 = c_{11} \\
&=& -\frac{\partial^2 P}{\partial k_1^2}\vert_{(k_1,k_2)=(0,0)}+\left(\frac{\partial P}{\partial k_1}\vert_{(k_1,k_2)=(0,0)}\right)^2
\end{eqnarray}

and

\begin{eqnarray}
\sigma_2^2 &=& \langle x_2^2 \rangle-\langle x_2 \rangle^2 = c_{22} \\
&=& -\frac{\partial^2 P}{\partial k_2^2}\vert_{(k_1,k_2)=(0,0)}+\left(\frac{\partial P}{\partial k_2}\vert_{(k_1,k_2)=(0,0)}\right)^2
\end{eqnarray}

The second term in both expansions -- the first derivative of P(k) squared -- is unnecessary, since we are assuming zero mean.

Let us show this trick of differentiating the characteristic function in order to get cumulant estimators explicitly, and then apply it to the covariance $\rho$

\begin{eqnarray}
P(k_1,k_2) &=& \frac{1}{2\pi} \int dx_1 dx_2 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}
\end{eqnarray}

Differentiating with respect to $k_1$ twice

\begin{eqnarray}
\frac{\partial P}{\partial k_1} &=& \frac{1}{2\pi} \int dx_1 dx_2 -ix_1 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{11}k_1+ c_{12}k_2 \right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}\\
\frac{\partial^2 P}{\partial k_1^2}&=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1^2 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{11}\right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}
\end{eqnarray}

and setting $k_1=k_2=0$, we find

\begin{eqnarray}
\frac{\partial^2 P}{\partial k_1^2}\vert_{(k_1,k_2)=(0,0)}&=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1^2 P(x_1,x_2) = \left( c_{11}\right) \\
c_{11} &=& \langle x_1^2 \rangle
\end{eqnarray}

Now for the covariance, we can take the derivative with respect to $k_1$, then $k_2$, to get,

\begin{eqnarray}
\frac{\partial P}{\partial k_1} &=& \frac{1}{2\pi} \int dx_1 dx_2 -ix_1 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{11}k_1+ c_{12}k_2 \right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}\\
\frac{\partial^2 P}{\partial k_1 \partial k_2} &=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1x_2 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{12} \right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}\\
\frac{\partial^2 P}{\partial k_1 \partial k_2}\vert_{(k_1,k_2)=(0,0)} &=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1x_2 P(x_1,x_2) = c_{12} \\
c_{12} &=& \langle x_1 x_2 \rangle \\
\rho &=& \frac{c_{12}}{\sigma_1 \sigma_2}\\
\rho &=& \frac{\langle x_1 x_2 \rangle}{\left(\langle x_1^2 \rangle \langle x_2^2 \rangle\right)^{1/2}}\\
\end{eqnarray}

And voila! We have found the estimator -- without taking into account bias -- of the correlation coefficient!



Tuesday, February 18, 2014

Sturm Liouville Problems and the Action

Note to self, the following three problem are equivalent (Thanks to the notes of Dr. Robert Hunt, Cambridge University http://www.damtp.cam.ac.uk/user/reh10/lectures/)

-------

1) Find the eigenvalues and eigenfunction of the following Sturm-Liouville problem
\begin{eqnarray}
-\frac{d}{dx}\left[p(x)y^\prime \right]+q(x)y &=& \lambda w(x) y
\end{eqnarray}

for $a<x<b$ where neither $p$ nor $w$ vanish.

2) Find the functions for which the following functional is stationary

\begin{eqnarray}
F[y] &=& \int_a^b (p(y^\prime)^2+qy^2)dx
\end{eqnarray}

subject to the constraint

\begin{eqnarray}
G[y] &=& \int_a^b wy^2 dx =1
\end{eqnarray}

If the constraint is satisfied, then the eigenvalues of the Sturm-Liouville problem are given by the values of $F[y]$. The eigenvalue of the system will have the smallest eigenvalue.

3) Find the functions $y(x)$ for which the functional

\begin{eqnarray}
\Lambda[y] &=& \frac{F[y]}{G[y]}
\end{eqnarray}

is stationary; the eigenvalues of the first problem are then values of $\Lambda[y]$ -- without normalization of $G[y]$.

It can be seen in the functional $F[y]$ and $G[y]$ that if $w,p$ and $q$ are all positive functions of $x$, then the eigenvalues will be greater than zero. Or, the Sturm-Liouville operator is positive definite.

-----------------

Using these facts, I was playing around with the one-dimensional Schrodinger equation, and reformed it as a Sturm Liouville problem,

\begin{eqnarray}
\frac{-\hbar^2}{2m}\psi^{\prime \prime} + V(x)\psi &=& i\hbar \frac{\partial \psi}{\partial t}\\
\end{eqnarray}

If one assumes separability, and that $\frac{\partial \psi}{\partial t}=E_n \psi$, then

\begin{eqnarray}
\frac{-\hbar^2}{2m}\psi^{\prime \prime} + V(x)\psi &=& \lambda \psi
\end{eqnarray}

so that we can write $w=1$, $q(x)=V(x)$, $p(x)=\frac{\hbar^2}{2m}$, and now write the functional to be minimized -- assuming $\psi$ is complex:

\begin{eqnarray}
F[\psi] &=& \int \left( \frac{\hbar^2}{2m}\frac{\partial \psi}{\partial x}\frac{\partial \psi^\star}{\partial x} + V(x) \psi^\star \psi \right)dx \\
F[\psi] &=& \lambda \\
\int \left( -T+V \right)dx&=& \lambda \\
\int \left( \mathcal{L} \right)dx&=& -\lambda
\end{eqnarray}

This is just the action! I don't know if I made some awful assumptions here, but if we take $F[\psi]=-S[\psi]=-\lambda$, we can now write the time dependence of our `separable' equation as

\begin{eqnarray}
\psi &=& C_1 e^{\frac{iS[\psi]}{\hbar}t}
\end{eqnarray}

Redundant, but interesting. Note also that the Sturm-Liouville construction takes into account normalization -- which I didn't mention before -- since

\begin{eqnarray}
G[\psi] &=& \int \psi \psi^\star dx = 1 \\
\Rightarrow F[\psi] &=& \lambda
\end{eqnarray}

-------

In problem construction (3), one can make guesses at the solution function -- or eigenfunction -- of the problem $y$, and find successive values of $\lambda$. For p,q,w all greater than zero, the lowest value of $\lambda$ wins, and so one reduces the Sturm-Liouville to a minimization problem. This is called the Rayleigh-Ritz method.


Monday, February 17, 2014

Multi-Dimensional Central Limit Theorem

Let us begin with the definition of a normalized probability distribution:

\begin{eqnarray}
\int_{-\infty}^{\infty} P(x) dx &=& 1
\end{eqnarray}

and a moment,

\begin{eqnarray}
\int_{-\infty}^{\infty} x^n P(x) dx &=& m_n
\end{eqnarray}

Notice that $m_1$ is the `mean' or average, or, `center of mass' of the Probability distribution. The second moment, $m_2$ is intimately related to the variance, or width of our distribution $P(x)$.

We can define the moment generating function as

\begin{eqnarray}
\int e^{kx} P(x) dx &=& M(k)
\end{eqnarray}

Where now, taylor expanding our exponential integrand, we see that the $n^{\rm th}$ derivative of $M(k)$ with respect to $k$ evaluated at $k=0$, gives the respective moments:

\begin{eqnarray}
M(k) &=& \int e^{kx} P(x) dx\\
&=& \int \Sigma_n \frac{(kx)^n}{n!} P(x) dx \\
\frac{\partial^n M}{\partial k^n}\vert_{k=0} &=& \int x^n P(x) dx \\
\frac{\partial^n M}{\partial k^n}\vert_{k=0} &=& m_n
\end{eqnarray}

We can also define the moment generating function as the laplace transform of $P(x)$, which leaves us with an added sign convention in the definition of our moments,

\begin{eqnarray}
M^\prime (k) = \int e^{-kx}P(x) dx \\
(-1)^n \frac{\partial^n M^\prime}{\partial k^n}\vert_{k=0} &=& m_n
\end{eqnarray}


Now we are in a position to define the fourier transorm of our probability distribution, called the characteristic function,

\begin{eqnarray}
M^\prime(ik) = \phi(k) &=& \int e^{-ikx}P(x) dx
\end{eqnarray}

We can simply think of $\phi(k)$ as the fourier transform of $P(x)$, and note that $\phi(0) =1$ always -- as required by our normalization condition above.

Writing the inverse fourier transform, we can now write $P(x)$ in terms of $k-$space components,

\begin{eqnarray}
P(x) &=& \frac{1}{2\pi}\int e^{ikx} \phi(k) dk \\
 &=& \frac{1}{\pi}\int e^{ikx} e^{\log\left(\phi(k)\right)}dk
\end{eqnarray}

Where I have used the $\frac{1}{2 \pi}$ for inverse fourier normalization conventions (not the same as the QM convention!).

I would now like to define cumulants which, are the corresponding derivatives of the $\psi(k)=\log\left(\phi(k)\right)$ function, seen the exponential argument of the above equation. First, notice that under the fourier transform, multiplication in x-space is differentiation in k-space, i.e.

\begin{eqnarray}
\mathcal{F}\left[ xP(x) \right] &=& (-i)\frac{\partial \phi}{\partial k}
\end{eqnarray}

and so we find that the various moments can be defined by the partial derivatives of the characteristic function (not much different than what we did before with the moment generating function $M^\prime$, under simply a change of variable):

\begin{eqnarray}
m_n &=& (-i)^n \frac{\partial^n \phi}{\partial k^n}\vert_{k=0}
\end{eqnarray}

Using our $\psi(k)$ function from before, we can now define the cumulants

\begin{eqnarray}
c_n &=& (i)^n \frac{\partial^n \psi}{\partial k^n}\vert_{k=0}
\end{eqnarray}

which, can be constructed out of our initial moments,

\begin{eqnarray}
c_1 &=& m_1 \\
c_2 &=& m_2-m_1^2\\
c_3 &=& m_2-3m_2m_1+2m_1^2\\
c_4 &=& m_4-4m_3m_1-3m_2^2+12m_2m_1^2-6m_1^2
\end{eqnarray}

Notice that the first cumulant is our standard mean, or average, and that the second cumulant is our definition of variance, or the expectation value of difference from the mean. (i.e. $c_1=\bar{x}$ and $c_2 =\sigma^2=E[(x-\bar{x})^2]$). These cumulants are very good descriptors of a statistical distribution because they add under convolution. Let's take a closer look.

Under the fourier transform, convolution in x-space is multiplication in k-space -- see convolution theorem -- and so cumulants and cumulant-generating functions add under convolution. In probability theory, when one adds two statistically independent variables $x$ and $y$ in order to create a new variable $z=x+y$, then the probability distribution that describes $z$ is the convolution of the two former probability distributions: $P(z) = \left(P_x \star P_y\right)(z)$. Let's see what such an addition would do in fourier space,

\begin{eqnarray}
P(z) &=& \left(P_x \star P_y\right)(z) \\
&=& \mathcal{F}^{-1}\left[ \phi_x(k)\phi_y(k) \right]\\
&=& \frac{1}{2\pi}\int e^{i k x} \phi_x(k)\phi_y(k) dk \\
&=& \frac{1}{2\pi}\int e^{i k x} e^{\log\left(\phi_x(k)\right)}e^{\log \left(\phi_y(k)\right)}dk \\
&=& \frac{1}{2\pi}\int e^{i k x} e^{\psi_x(k)+\psi_y(k)}dk \\
\end{eqnarray}

Expanding both cumulant-generating functions as a taylor series, and collecting like powers of $k$, we find

\begin{eqnarray}
P(z) &=& \frac{1}{2\pi}\int e^{i k x} e^{-ic_{1x}k-c_{2x}k^2/2+ic_{3x}\frac{k^3}{3!}+\dots}e^{-ic_{1y}k-c_{2y}k^2/2+ic_{3y}\frac{k^3}{3!}+\dots}dk \\
&=& \frac{1}{2\pi}\int e^{i k x} e^{-i(c_{1x}+c_{1y})k-(c_{2x}+c_{2y})k^2/2+i(c_{3x}+c_{3y})\frac{k^3}{3!}+\dots}dk \\
\end{eqnarray}

Notice that $c_{1z}= c_{1x}+c_{1y}$ and $c_{2z}= c_{2x}+c_{2y}$, or the mean is the sum of the two former means, and the variance is the sum of the two former variances. Very cool!

Under what are called identically independent processes, such as random walks, we have a new variable $z=\sigma x_i$, which is a superposition of independent variables that are described by the exact same probability density function. In this case, our expectation value -- or first moment -- for the variable $z$ becomes $Nm_{1z}=N\bar{x}$, where $N$ is the number of `steps' or `trials' taken. Similarly we find that the variance of the sum of random variables is $\sigma^2_z = N \sigma^2_x$. Or the variance grows as $\sqrt{N}$.

The central limit theorem depends intimately upon this addition of cumulants under convolution, and uses the inverse properties of expansion and dilation under the fourier transform to truncate the taylor expansion of our effective cumulant generating functions. Let's take a look at an `iid' process, or $P(z)$, where $z=\Sigma x_i$.

\begin{eqnarray}
P(z) &=& \frac{1}{2\pi} \int e^{ikz}\Pi_{i=1}^N \phi_i(k) dk \\
P(z) &=& \frac{1}{2\pi} \int e^{ikz}e^{\Sigma_i \psi_i(k)} dk \\
P(z) &=& \frac{1}{2\pi} \int e^{ikz}e^{\Sigma_i \left( -ic_{1i}\right)-ik}e^{\Sigma_i \left( -c_{2i}\right)\frac{-k^2}{2}}+\cdots dk
\end{eqnarray}

As we convolve more and more probability density functions, our variances will add, and we will end up with an extremely wide density function. In fourier space, this corresponds to an extremely narrow characteristic function, and allows us to assume negligible value of $\phi(k)$ at high $k$-values. Thus, we can write

\begin{eqnarray}
\psi(k) &=& -ic_1 k - c_2\frac{k^2}{2}+ic_3\frac{k^3}{3!}-c_4\frac{k^4}{4!}+\dots
\end{eqnarray}

as

\begin{eqnarray}
\psi(k) &\approx & -ic_1 k - c_2\frac{k^2}{2}
\end{eqnarray}

Taking this into account -- and noting that the $c_1$ and $c_2$ are in fact sums of former means and variances under convolution -- we can now write $P(z)$ as,

\begin{eqnarray}
P(z) &=& \frac{1}{2 \pi} \int e^{ikz}e^{-ic_1 k}e^{-c_2 \frac{k^2}{2}}dk \\
P(z) &=& \frac{1}{2 \pi} \int e^{ikz}e^{-ic_1 k}e^{-c_2 \frac{k^2}{2}}dk
\end{eqnarray}

Setting $z^\prime = z-c_1$, or, centering about the mean we can exclude the first exponential and write

\begin{eqnarray}
P(z^\prime) &=& \frac{1}{2 \pi} \int e^{ikz^\prime}e^{-c_2 \frac{k^2}{2}}dk
\end{eqnarray}


This is the fourier transform of a gaussian, which is itself a gaussian,

\begin{eqnarray}
P(z^\prime) &=& \frac{1}{\sqrt{2 \pi c_2}}e^{\frac{-z^2}{2c_2}}\\
P(z) &=& \frac{1}{\sqrt{2 \pi \sigma^2}}e^{\frac{-(z-c_1)^2}{2\sigma^2}}\\
P(z) &=& \frac{1}{\sqrt{2 \pi \sigma^2}}e^{\frac{-(z-\bar{z})^2}{2\sigma^2}}
\end{eqnarray}

Whew! Notice that our mean and variance in this final equation are simply the sums of former means and variances of the independent variables, $x_i$ that created $z$. ($z = \Sigma_i x_{i}$  ;  $c_1 = \Sigma_i c_{1i}$  ;  $c_2 = \Sigma_i c_{2i}$). The central limit theorem does not depend on each of these variables -- perhaps $N$ of them -- being identical, or, described by the same probability density function, they need only be independent.

Now, extrapolation to multiple dimension is not to bad, we now deal with random vectors as compared to random scalars. Let's represent these vectors with boldface, and label their components with a superscript. We can now write

\begin{eqnarray}
\mathbf{z} &=& \Sigma_j \mathbf{x^j}
\end{eqnarray}

Where the superscript denotes the $j^{\rm th}$ independent random vector. Similarly, our $k$ space is now multidimensional, and so we write the forward fourier transform of our probability density as

\begin{eqnarray}
\phi(\mathbf{k}) &=& \int e^{i\mathbf{k}\cdot \mathbf{x}}P(\mathbf{x})d^d\mathbf{x} \\
\phi(\mathbf{k}) &=& \int e^{i\mathbf{k_i}\mathbf{x_i}}P(\mathbf{x})d^d\mathbf{x}
\end{eqnarray}

The cumulants and the moments are now rank $n$ tensors, seen by the following

\begin{eqnarray}
\frac{\partial^n M}{\partial \mathbf{k}_i \partial \mathbf{k}_j \cdots \partial \mathbf{k}_\gamma}\vert_{\mathbf{k}=0} &=& (-i)^n \mathbf{m}_{ij\cdots \gamma} \\
\frac{\partial^n \psi}{\partial \mathbf{k}_i \partial \mathbf{k}_j \cdots \partial \mathbf{k}_\gamma}\vert_{\mathbf{k}=0} &=& (-i)^n \mathbf{c}_{ij\cdots \gamma}
\end{eqnarray}

We now use the same approximation scheme as before, convolving an absurd number of multi-dimensional probability density functions, $P(\mathbf{x}_i)$ in order to yield a convolution in fourier space -- and thus an addition of cumulant generating functions

\begin{eqnarray}
P(\mathbf{z}) &=& \frac{1}{(2\pi)^d}\int e^{i\mathbf{k}\cdot \mathbf{z}}e^{\Sigma \psi_i(\mathbf{k})}d^d\mathbf{k}\\
P(\mathbf{z}) &=& \frac{1}{(2\pi)^d}\int e^{i\mathbf{k}\cdot \mathbf{z}}e^{-i \mathbf{c}_i \mathbf{k}_i)}e^{-i\mathbf{k}_i \mathbf{c}_{ij} \mathbf{k}_j)}e^{-i \mathbf{c}_{ijk} \mathbf{k}_i \mathbf{k}_j \mathbf{k}_k)}\cdots d^d\mathbf{k}\\
\end{eqnarray}

Truncating our taylor expansion and centering about the multidimensional mean $\mathbf{z}_i^\prime = \mathbf{z}_i - \mathbf{c}_i $,

\begin{eqnarray}
P(\mathbf{z}^\prime) &=& \frac{1}{(2\pi)^d}\int e^{i\mathbf{k}\cdot \mathbf{z}^\prime}e^{-i\mathbf{k}_i \mathbf{c}_{ij} \mathbf{k}_j)}d^d\mathbf{k}\\
\end{eqnarray}

We can rescale coordinates by writing

\begin{eqnarray}
\mathbf{w} &=& \sqrt{\mathbf{c}_{ij}}\mathbf{k}_j
\end{eqnarray}

where, the square of a matrix can be written terms of its diagonalization by two unitary matrices $\mathbf{S}_{ij}$, and the diagonal matrix of eigenvalues $\mathbf{\Lambda}_{ij} = \delta_{ij}\lambda_i$,

\begin{eqnarray}
\mathbf{c}_{ij} &=& \mathbf{S}^{-1} \mathbf{\Lambda}\mathbf{S} \\
\mathbf{c}_{ij} &=& \mathbf{S}_{il} \lambda_l \delta_{lm} \mathbf{S}_{lj} \\
\sqrt{\mathbf{c}_{ij}} &=& \mathbf{S}_{il} \sqrt{\lambda_l} \delta_{lm} \mathbf{S}_{lj} \\
\mathbf{w} &=& \sqrt{\mathbf{c}_{ij}}\mathbf{k}_j
\end{eqnarray}

The determinant of this matrix will be the product of the eigenvalues $\lambda_j$, which is also the dilating factor by which one expands a single $\mathbf{k}$ vector; so  we can now write our infinitesimal volume element in $d$-dimensional $\mathbf{k}$ space using this determinant,

\begin{eqnarray}
d^d \mathbf{w} &=& \left( \vert \mathbf{c}_{ij} \vert \right)^{d/2} d^d\mathbf{k}
\end{eqnarray}

Now rewriting our integral, we have a multi-dimensional transform of a Gaussian, which is another Gaussian

\begin{eqnarray}
P(\mathbf{z}^\prime) &=& \frac{1}{(2\pi)^d} \int e^{i \frac{\mathbf{w}_j}{\sqrt{\mathbf{c_{ij}}}} \mathbf{z}_i^\prime}e^{-\frac{\mathbf{w}_i  \mathbf{w}_i}{2}}\frac{d^d\mathbf{w}}{\vert \mathbf{c}_{ij} \vert^{d/2}}\\
P(\mathbf{z}^\prime)  &=& \frac{1}{(2\pi \vert \mathbf{c}_{ij} \vert)^{d/2}} \mathrm{exp}\left(-\frac{1}{2}\frac{\mathbf{z^\prime}_l}{\mathbf{c}_{il}}\frac{\mathbf{z^\prime}_\gamma}{\mathbf{c}_{i \gamma}}\right)
\end{eqnarray}

Whew! The argument in the exponential -- namely, those nasty matrix multiplications into the second cumulant matrices -- can be simplified, yielding a single covariance matrix on the bottom.

\begin{eqnarray}
P(\mathbf{z}^\prime)  &=& \frac{1}{(2\pi \vert \mathbf{c}_{ij} \vert)^{d/2}} \mathrm{exp}\left(-\frac{1}{2}\frac{\mathbf{z^\prime}_l}{\mathbf{c}_{il}}\frac{\mathbf{z^\prime}_\gamma}{\mathbf{c}_{i \gamma}}\right)
\end{eqnarray}