Thursday, November 6, 2014

Linear Stochastic ODE's lead to Gaussianity at Asymptotic times, even if Noise function has non-zero higher order Cumulants

So a few friends of mine are working on Stochastic ODE's and their connection to path integrals. After dorking out about this for a few moments, I'm able to make some ``baby'' statements about the problem. If you consider a sequence of random numbers:

\begin{eqnarray}
\left \lbrace \mathbf{X}_i \right \rbrace_{i=1}^n
\end{eqnarray}

which is determined by the following difference equation:

\begin{eqnarray}
d\mathbf{X}_i = \mathbf{X}_{i+1} - \mathbf{X}_i &=& a_i+\mathbf{W}_i
\end{eqnarray}

subject to the initial condition $\mathbf{X}_0=0$

You can express the solution as a sum of two sums -- one deterministic and one random.

\begin{eqnarray}
X_n &=& \sum_{i=0}^n a_i + \sum_{i=1}^n \mathbf{W_i}
\end{eqnarray}

Where I have boldfaced all random variables. For instance $a_i$ is a real sequence of numbers, perhaps they are the same for all $i$. $\mathbf{W}$ is a noise variable, or some random forcing function. We see that the solution after N steps will be

\begin{eqnarray}
\mathbf{X}_n &=& n a+ \sum_{i=1}^n \mathbf{W}_i
\end{eqnarray}

Now, if we see that $\mathbf{W}_i$ is drawn from some probability distribution at every single step $i$, we know that, at asymptotic times $N \to \infty$, subject to certain conditions on the probability density of $W_i$, our distribution on $\mathbf{X}$ will converge to a Gaussian. This is very cool, and not necessarily dependent on $\mathbf{W}$ being an identically independently distributed variable. We simply say  that if
\begin{eqnarray}
\mathbf{W_i} & \sim & N(0,\sigma^2) \ \ \forall i
\end{eqnarray}

then,
\begin{eqnarray}
\mathbf{X_N} &\sim & na(t)+ N(0, n \sigma^2)
\end{eqnarray}

Where $N(0,\sigma^2)$ stands for a normal distribution with zero mean and variance $\sigma^2$. Note that, this is simply a conclusion from the addition of cumulants under convolution -- which is what you do when add random variables.

\begin{eqnarray}
Z&=&X+Y \\
X & \sim & N(c_1, c_2)\\
Y & \sim & N(c_1^\prime, c_2^\prime )\\
Z & \sim & N(c_1+c_1^\prime, c_2+ c_2^\prime )
\end{eqnarray}

So our cumulants add, and the central limit theorem hinges upon this, because since our characteristic function -- or the fourier transform of our probability distribution -- is bounded above by one (1), when we convolve tow distributions in real space we multiply in frequency space, making the characteristic function of our result variable $Z$ -- which is very much like an average, thinner and thinner and thinner... Meaning that you can truncate the characteristic function's cumulant generating function $\psi$ at order $k^2$, leading to a Gaussian.

This means that any sum of random variables, even they are not identically and independently distributed -- although they must be independent in order to convolve -- and even if those variables have non-zero higher order cumulants, like skewness $c_3$ or kurtosis $c_4$, will give you a Gaussian in the $n \to \infty$ limit. This is an analog of the law of large numbers.

So why do we care in this Stochastic ODE case? It means that under linear dynamics, at asymptotic times, we converge to a Gaussian distribution on $\mathbf{X}$, even our noise function itself has very strange properties, like higher order cumulants. This is very strange indeed, and comes from the fact that system is linear, i.e. we are adding random variables together.

Under non-linear evolution, it can be shown using Perturbation theory that non-zero third and higher order moments are created, but showing this in the stochastic framework is a bit difficult...

It is easy to show however, that an equation like:

\begin{eqnarray}
L_0 \delta &=& \delta^2
\end{eqnarray}

where $L_0$ is some linear differential operator, can be expanded in power series of small parameter $\lambda$
\begin{eqnarray}
L_0 \delta &=& \lambda \delta^2\\
\delta &=& \sum_{i=1}^\infty \lambda^i \delta_i
\end{eqnarray}

So we have, to each order:

\begin{eqnarray}
\lambda^0 : L_0 \delta_0 &=& 0
\end{eqnarray}

which is our linear solution. Then we have to leading order:
\begin{eqnarray}
\lambda^1: L_0 \delta_1 &=& \delta_0^2
\end{eqnarray}

Now we find, that if we take the connected third moment, or the third cumulant, we get a nonzero value:

\begin{eqnarray}
\langle \delta^3 \rangle &=& \langle \delta_0 ^3 \rangle +\lambda \langle \delta_0^2 \delta_1 \rangle + \dots
\end{eqnarray}

If $\delta_0$ is Gaussian distributed, as we found that we would be for some driving function at asymptotic times -- or if we simply assume Gaussian initial conditions -- then we know that $\langle \delta_0^3 \rangle =0$. The leading order term however, will not be zero, because it goes like $ \sim \delta_0^4 $, which under Wick's theorem/Gaussian statistics can be built out of second cumulants. So see that non-linearity gives non-zero skewness and kurtosis, and other higher order things, at late times.

The key to connecting this with Stochastic ODE's lies in the fact that we are not adding random variables anymore but multiplying them, and this is a very peculiar type of convolution, which in general does not do a simple addition of cumulants. I will have to look more into this.

Note: The lognormal distribution is the convergent distribution for a product of random variables, since the log of the product is the sum of the logs. So perhaps it could be shown that some non-linear Stochastic ODE's go to a lognormal (which I believe is already a common concept on Wall street, for estimating the dynamics stock prices).

Monday, October 20, 2014

The Scalar field Propagator and its Asymptotic Expansion


If we only took ``half'' of the commutator, above, we would get an expression that looks like:

\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y) \vert 0\rangle &=& \int_0^\infty \frac{dk}{(2\pi)^2}\frac{k}{2\omega_k}\frac{i}{\xi}e^{-i\omega_k \tau}\left(e^{ik\xi}-e^{-ik\xi}\right)
\end{eqnarray}

Now this integral is even $k$, since we can rewrite the two exponential terms as:

\begin{eqnarray}
&=& \int_0^\infty \frac{dk}{(2\pi)^2}\frac{1}{2\omega_k}\frac{i}{\xi}e^{-i\omega_k \tau}\left(ke^{ik\xi}-ke^{-ik\xi}\right)
\end{eqnarray}

We therefore extend integration to $-\infty \to \infty$, and write:

\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y) \vert 0\rangle &=& \frac{i}{\xi}  \int_{-\infty}^\infty \frac{dk}{(2\pi)^2}\frac{k}{2\omega_k}e^{-i\omega_k \tau+ik\xi}
\end{eqnarray}

And now we have this strange integral to deal with. The first thing to do is choose a case: either $\xi=0$ or $\tau=0$, since we can always find a proper lorentz boost to accomplish this. If we look at the space-like separated case, we have


\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y) \vert 0\rangle &=& \frac{i}{\xi}  \int_{-\infty}^\infty \frac{dk}{(2\pi)^2}\frac{k}{2 \sqrt{k^2+m^2}}e^{ik\xi}
\end{eqnarray}

which has a pesky square root function, and therefore a branch cut, at either $im$ or $-im$. We choose our branch cut to be in the negative imaginary regime, and make a transformation $k^\prime=-ik$, such that:

\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y) \vert 0\rangle &=& \frac{i}{\xi}  \int_{i\infty}^{-i\infty} \frac{dk}{(2\pi)^2}\frac{-k}{2 \sqrt{m^2-k^2}}e^{-k\xi}\\
 &=& \frac{i}{\xi}  \int_{-i\infty}^{i\infty} \frac{dk}{(2\pi)^2}\frac{k}{2 \sqrt{m^2-k^2}}e^{-k\xi}
\end{eqnarray}

Now we notice that if we make a ``keyhole contour'' that goes along the imaginary axis in this rotated space, goes out along $k\to \infty$ and then wraps around the branch cut, then we can say that such a closed contour integral will yield zero, since it contains no singularities.

\begin{eqnarray}
\oint_{C_r + C_i + C_b}
\end{eqnarray}

Where the $C$'s stand for rounded arcs, the imaginary and branch axis contours, respectively. We can claim immediately that the arcs will yield zero as we go out to infinity, since the real part of $k$ damps our integrand exponentially. The next step is to notice that our branch cut function can be written as:
\begin{eqnarray}
\sqrt{m^2-k^2}&=& e^{\frac{1}{2}\log(m^2-k^2)+\frac{i}{2}\phi }\\
&=& \sqrt{m^2-k^2}e^{i\phi/2}
\end{eqnarray}

Where the angle $\phi$, in this rotated space, takes on the values $0<\phi<2\pi$. This means that above the branch cut, on the top side of the real axis, we will get a positive sign in our branched function. But below the real axis we will get a negative sign, due to the phase $e^{i \frac{2\pi}{2}}$ (with some delta in there so that we're not exactly on the branch cut!). So we can equate the imaginary axis integration and the branch cut integration:

\begin{eqnarray}
\frac{i}{\xi}  \int_{-i\infty}^{i\infty} \frac{dk}{(2\pi)^2}\frac{k}{2 \sqrt{m^2-k^2}}e^{-k\xi}
 &=& - \int_m^\infty \frac{dk}{(2\pi)^2}\frac{k}{\sqrt{m^2-k^2}}e^{-k\xi}
\end{eqnarray}

So we may write

\begin{eqnarray}
\langle 0\vert \phi(x)\phi(y) \vert 0\rangle &=& - \int_m^\infty \frac{dk}{(2\pi)^2}\frac{k}{\sqrt{m^2-k^2}}e^{-k\xi}
\end{eqnarray}

Up to some possibly dropped negative sign. The important thing is that now, we may take the $\xi \to \infty$ limit, and note that the asymptotic contribution to this integral comes from the minimum value of $k$, and so we get

\begin{eqnarray}
\lim_{\vert x-y \vert \to \infty}\langle 0\vert \phi(x)\phi(y) \vert 0\rangle &\sim & e^{-m\vert \vec{x}-\vec{y}\vert}
\end{eqnarray}

Similarly, for the time-like separation case, we can do the same yakkety-yak, rotating our complex integration space and then wrapping around the a fitting branch cut, in order to write:
\begin{eqnarray}
\langle 0\vert \phi(x)\phi(y) \vert 0\rangle &=& - \int_m^\infty \frac{dk}{(2\pi)^2}\frac{k}{\sqrt{m^2-k^2}}e^{i\sqrt{m^2-k^2}\tau}
\end{eqnarray}

Except in this case we have not really helped ourselves, as this integral needs some sort of regularization, or infinitesimal offset of the parameter $m$ in order to converge, since, $k=0$ corresponds to the point of stationary phase in the asymptotic $\tau \to \infty$ limit.

We could have just taylor expand our function $\sqrt{m^2-k^2}$ about $0$, without all the branch and rotation nonsense, to write:

\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y) \vert 0\rangle &=& \int_{-\infty}^\infty \frac{dk}{(2\pi)^2}\frac{k^2}{2 \sqrt{k^2+m^2}}e^{-i\sqrt{k^2+m^2} \tau}\\
&\approx & e^{-im\tau} \times \dots
\end{eqnarray}

The rest of the expression can be written as a series of expectation values of a 1-D gaussian integral, with variance $\sigma^2 = \frac{m}{\tau}$.

So we find that the naive scalar propagator does not vanish for space-like separation, but it's commutator does. We also find that for time-like separation we get an oscillatory result, which not coincidentally, for the commutator, will result in something like

\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y) \vert 0\rangle &=& e^{-im\tau}-e^{im\tau}
\end{eqnarray}
an interference  between the two measurements. 

Preserving Causality for Field Operators

It is important to examine the quantity:

\begin{eqnarray}
\langle 0 \vert \phi(x)\phi(y)\vert 0 \rangle
\end{eqnarray}

Or two field operators at different points in space time, acting on the vacuum. In particular, for a causal theory, we would like all space-like separated operatores to commute, i.e.:

\begin{eqnarray}
[\mathcal{O}(x),\mathcal{O}(y)] &=& 0 \ \mathrm{if} \  \eta_{\mu \nu}x^\mu y^\nu <0
\end{eqnarray}

Or two space-like separated observables cannot effect each other. Let us examine the commutator for our mode-expanded fields:

\begin{eqnarray*}
\phi(x) &=& \sum_k \frac{1}{\sqrt{2L^3 \omega_k}}\left(a_ke^{-ikx}+a^*_k e^{ikx}\right)\\
\left[\phi (x), \phi (y)\right] &=& \sum_{k,p} \frac{1}{2L^3} \frac{1}{\sqrt{\omega_k \omega_p}}\left[\left(a_ke^{-ikx}+a^* e^{ikx}\right)\left(a_pe^{-ipy}+a^*_p e^{ipy}\right)\right]\\
&=& \sum_{k,p} \frac{1}{2L^3} \frac{1}{\sqrt{\omega_k \omega_p}}\left[a_k,a^*_p \right]e^{-ikx+ipy}+\left[a^*_k,a_p \right]e^{+ikx-ipy}\\
&=& \sum_{k,p} \frac{1}{2L^3} \frac{1}{\sqrt{\omega_k \omega_p}}\delta_{k,p}\left(e^{-ikx+ipy}-e^{+ikx-ipy}\right)\\
&=& \sum_{k,p} \frac{1}{2L^3} \frac{1}{\omega_k}\left(e^{-ik(x-y)}-e^{+ik(x-y)}\right)
\end{eqnarray*}

We can know re-write this difference in space time coordinates as:

\begin{eqnarray}
x-y &=& \langle \tau, \vec{\xi} \rangle
\end{eqnarray}

and expand out the four vector contractions:

\begin{eqnarray}
\left[\phi(x),\phi(y) \right]&=& \sum_{k,p} \frac{1}{2L^3} \frac{1}{\omega_k}\left(e^{-i\omega_k \tau}e^{ik\xi}-e^{+i\omega_k\tau}e^{-ik\xi}\right)
\end{eqnarray}

Turning our integral into a sum now, we write:

\begin{eqnarray}
\sum_k \frac{1}{V} \to \int \frac{d^3k}{(2\pi)^3}\\
\left[\phi(x),\phi(y) \right]&=& \int \frac{d^3k}{(2\pi)^3} \frac{1}{2 \omega_k}\left(e^{-i\omega_k \tau}e^{ik\xi}-e^{+i\omega_k\tau}e^{-ik\xi}\right)
\end{eqnarray}

Integration over our angular coordinates is trivial, since we can set $\vec{\xi}$ along the $k_z$ axis and get rid of $\theta$ immediately.

\begin{eqnarray}
\left[\phi(x),\phi(y) \right]&=& \int_0^\infty \int_{1}^{-1} \frac{k^2 dk du }{(2\pi)^2} \frac{1}{2 \omega_k}\left(e^{-i\omega_k \tau}e^{ik\xi u}-e^{+i\omega_k\tau}e^{-ik\xi u}\right)
\end{eqnarray}

Notice we have set $u=\cos(\psi)$, the angle from $k_z$, above. Integrating over $u$ gives us the typical bessel functions:

\begin{eqnarray}
\left[\phi(x),\phi(y) \right]&=& \int_0^\infty \frac{k^2 dk }{(2\pi)^2} \frac{1}{2 \omega_k}\left(\frac{e^{-i\omega_k \tau}e^{ik\xi u}}{ik\xi}-\frac{e^{+i\omega_k\tau}e^{-ik\xi u}}{-ik\xi} \right) \vert^{-1}_{1}\\
\left[\phi(x),\phi(y) \right]&=& \frac{-i}{\xi}\int_0^\infty \frac{k dk }{(2\pi)^2} \frac{1}{2 \omega_k}\left(e^{-i\omega_k \tau}e^{ik\xi u}+e^{+i\omega_k\tau}e^{-ik\xi u}\right) \vert^{-1}_{1}
\end{eqnarray}

Taking a long hard look at the above equation, one might be able to see that we're going to get zero if $\tau=0$ since, the evaluation at endpoints of $u$ will cancel. We can also expand out the endpoints of $u$ to get:

\begin{eqnarray}
\left[\phi(x),\phi(y) \right]&=& \frac{1}{\xi}\int_0^\infty \frac{dk}{2\pi^2} \frac{k}{\omega_k}\sin(k\xi)\sin(\sqrt{k^2+m^2}\tau)
\end{eqnarray}

Now we see immediately that the $\tau \to 0$ limit gives us a vanishing commutator, and the $\xi \to 0$ limit gives us something that is finite, like

\begin{eqnarray}
\left[\phi(x),\phi(y) \right]&=& \int_0^\infty \frac{dk}{2\pi^2} \frac{k^2}{\sqrt{k^2+m^2}}\sin(\sqrt{k^2+m^2}\tau)
\end{eqnarray}

This will give an asymptotic solution of the form $e^{imt}$ which can be shown by stationary phase arguments. And we find, therefore, that the commutator for space-like separation is non-vanishing.

It is important to note that, during this entire discussion, we could have framed the commutator problem as the difference of two propagators, the probability for a particle to be created at $y$ and destroyed at $x$:

\begin{eqnarray}
\langle 0 \vert \phi(x) \phi(y)\vert 0 \rangle &=& D_{xy}
\end{eqnarray}

and therefore our commutator looks like:

\begin{eqnarray}
[\phi(x),\phi(y)] &=& D_{xy}-D_{yx}=0 \ \forall \ \eta_{\mu \nu}x^\mu y^\nu <0
\end{eqnarray}

David Tong, in his online notes, says we can ``wrap words around this''. When the events of creation and annihilation are space-like separated, we can ``re-order'' the events by a proper Lorentz boost.  This means that the two amplitudes for the processes $x\to y$ and $y \to x$ cancel. For a complex scalar field, this turns into a statement about the amplitude of a particle and an anti-particle travelling from $x \to y$, which once again cancel for space-like separation.

Friday, August 22, 2014

Canonical Transformation and the word "Symplectic"

There is this very frustrating thing in Lagrangian and Hamiltonian Mechanics, called a "canonical transformation", which supposedly simplifies the equations of motion and sets up all sorts of higher order analysis like Action Angle variables, the Hamilton-Jacobi equation, and canonical perturbation theory.

The trouble is, it's hard to get a feel for these things. The basic building block of Hamiltonian mechanics is the equation[s] of motion:

\begin{eqnarray}
\frac{\partial H}{\partial q_i} &=& -\dot{p_i} \\
\frac{\partial H}{\partial p_i} &=& \dot{q_i}
\end{eqnarray}

We can write this more succinctly using a phase space vector, which I will call "z":

\begin{eqnarray}
\mathbf{z} &=& \left( q_1,q_2, \dots, q_n, p_1,p_2, \dots, p_n \right)
\end{eqnarray}

So z is a $2n$-dimensional vector in our $2n$-dimensional phase space. We can now write the equations of motion using a a strange matrix, $\Omega$:

\begin{eqnarray}
\Omega &=& \left(\begin{array}{cc}
0 & I_n \\ -I_n & 0
\end{array} \right)
\end{eqnarray}

Where $I_n$ represent $n$-dimensional identity matrices, with ones along the diagonal, and the zeros represent $n$-by-$n$ zero matrices. This $\Omega$ is what we call a block diagonal matrix, in that it can be decomposed into the four "blocks" I have written above. We could also write it as a Direct product of two matrices, the $n$-by-$n$ identity and one of the pauli spin matrices (which is also a rotation about the $xy$ plane of 90 degrees):

\begin{eqnarray}
i \sigma_2 &=&  \left(\begin{array}{cc}
0 & 1 \\ -1 & 0
\end{array} \right)\\
\Omega &=& I_n \bigotimes i \sigma_2
\end{eqnarray}

Now, we can write Hamilton's equation of motion in the following form:

\begin{eqnarray}
\frac{dz_i}{dt} &=& \Omega_{ij} \frac{\partial H}{\partial z_j}
\end{eqnarray}

Another way to use this tidy notation is in the poisson bracket:

\begin{eqnarray}
\left[A_i,B_j \right] &=& \sum_k \left(\frac{A_i}{q_k}\frac{B_j}{p_k}- \frac{A_i}{p_k}\frac{B_j}{q_k}\right) \\
&=& \frac{\partial A_i}{\partial z_k} \Omega_{km} \frac{\partial B_j}{\partial z_m}\\
\end{eqnarray}

and one finds, if we compute the poisson bracket of some quantity $A_i$ with the Hamiltonian, we get a partial derivative with respect to time via chain-rule:
\begin{eqnarray}
\left[A_i,H \right] &=& \frac{\partial A_i}{\partial z_k} \Omega_{km} \frac{\partial H}{\partial z_m}\\
\left[A_i,H \right] &=& \frac{\partial A_i}{\partial z_k} \frac{\partial z_k}{\partial t}
\end{eqnarray}
and so we find, if our vector-valued function of interest $A_i$ is dependent upon $q,p,t$ or $z,t$, we can write our total time derivative as:

\begin{eqnarray}
\frac{dA_i}{dt} &=& \left[A_i,H\right] + \frac{\partial A_i}{\partial t}
\end{eqnarray}

The Hamiltonian is thus called the "generator" of time translation, because, let's say $A_i$ does not depend on t. In the quantum mechanics regime of things we would say it is  a Schrodinger operator. We could essentially translate the operator -- or in this case the function $A_i$ -- forward in time by taylor expansion:

\begin{eqnarray}
A_i(t) &=& A_i(0) + \frac{dA_i}{dt}\vert_{t_0}(t-t_0)+ \frac{d^2A_i}{dt^2}\vert_{t_0}\frac{(t-t_0)^2}{2!}+\dots
\end{eqnarray}

But this can be accomplished by repeatedly taking the commutator with H!

\begin{eqnarray}
A_i(t) &=& A_i(0) + \frac{dA_i}{dt}\vert_{t_0}(t-t_0)+ \frac{d^2A_i}{dt^2}\vert_{t_0}\frac{(t-t_0)^2}{2!}+\dots \\
A_i(t) &=& A_i(0) + (t-t_0) \left[A_i(0),H \right]+ \frac{(t-t_0)^2}{2!} \left[\left[A_i(0),H \right], H\right]+ \frac{(t-t_0)^3}{3!} \left[\left[\left[A_i(0),H \right], H\right],H \right] +\dots \\
&=& e^{\left[ \ast , H\right](t-t_0)}A_i(0)
\end{eqnarray}

This is in incredibly close parallel to the Baker-Hausdorff lemma in Quantum mechanics, which essentially makes time-dependent operators -- in the Heisenberg picture -- by repeatedly taking commutators on "both sides" of a bra-ket operator. If we promote the Hamiltonian to be an operator, then we write:

\begin{eqnarray}
A_i(t) &=& e^{\frac{iH(t-t_0)}{\hbar}}A_i(0)e^{\frac{-iH(t-t_0)}{\hbar}}
\end{eqnarray}

where the commutators are no longer in the classical sense, but in the ``operator'' sense.

---------------------------------------------------------------------------------------------------------------------------------

So, why do we care about all these commutators and things? Well, a simple reason is that if we are to have a "valid" canonical transformation, we must show that the Hamiltonian Equations of motion remain untarnished.  Let's look at our nice form of the EOM again:

\begin{eqnarray}
\frac{dz_i}{dt} &=& \Omega_{ij} \frac{\partial H}{\partial z_j}
\end{eqnarray}

we can re-write this with a Poisson bracket

\begin{eqnarray}
\frac{dz_i}{dt} &=& \left[z_i,H \right] \\
&=& \frac{\partial z_i}{\partial z_k}\Omega_{km} \frac{\partial H}{\partial z_m}\\
dz_i &=& \frac{\partial z_i}{\partial z_k}\Omega_{km} \frac{\partial H}{\partial z_m}dt
\end{eqnarray}

Now let us transform into some new coordinate system $\mathbf{y}=\left(Q_1,Q_2,\dots, Q_n, P_1,P_2,\dots P_n \right)$. We find that all of the $dz$'s can be written as:

\begin{eqnarray}
dz_i &=& \frac{\partial z_i}{\partial y_j} dy_j \\
dz_i &=& J_{ij}^{-1} dy_j
\end{eqnarray}
The matrix we have used above is simply the standard jacobian, $\mathbf{J}_{ij}=\frac{\partial y_i}{\partial z_j}$. Remember $J^TJ=I$. Now we re-write our EOM in the y-coordinates:

\begin{eqnarray}
dz_i &=& \frac{\partial z_i}{\partial z_k}\Omega_{km} \frac{\partial H}{\partial z_m}dt \\
dz_i &=& \delta_{ik}\Omega_{km} \frac{\partial H}{\partial z_m}dt \\
J_{ij}^{-1} dy_j &=& \delta_{ik} \Omega_{km} J_{mi}\frac{\partial H}{\partial y_i}dt \\
\end{eqnarray}

Multiplying both sides by $J_{ij}$ we get:
\begin{eqnarray}
dy_j &=& J_{kj} \Omega_{km} J_{mi}\frac{\partial H}{\partial y_i}dt \\
\frac{dy_j}{dt} &=& J_{kj} \Omega_{km} J_{mi}\frac{\partial H}{\partial y_i} \\
\end{eqnarray}

Now, we say this final equation is valid if it reproduces the standard equations of motion:

\begin{eqnarray}
\frac{dy_j}{dt} &=& \left[ y_j, H \right]
\end{eqnarray}

Which will only be true if this jacobian transformation preserves the structure of our original $Omega$ matrix:

\begin{eqnarray}
\Omega_{ij} &=& J_{ki}\Omega_{km}J_{mi}\\
\mathbf{\Omega} &=& \mathbf{J}^T\mathbf{\Omega}\mathbf{J}
\end{eqnarray}

such a transformation $q,p \to Q,P$ is called "simplectic" or ``canonical'', which in my mental dictionary, means that it preserves the structure of this matrix $\Omega$ and thus the Poisson brackets/fundamental commutation relations:

\begin{eqnarray}
\left[ z_i, z_j \right] &=& \Omega_{ij} \\
\left[ y_i, y_j \right] &=& \Omega_{ij}
\end{eqnarray}

Just like the Lorentz boosts leave the minkowksi metric $\eta$ invariant. This set of linear transformations $\mathbf{J}$ can be thought of as a representation of the simplectic ``group'', which are continuously connected to the identity operation.
---------------------------------------------------------------------------------------------------------------------------------

Now one way to define these canonical transformations is to add a total time derivative to the lagrangian:

\begin{eqnarray}
L(q,Q,t) &=& L(q,\dot{q},t) - \frac{dF(q,Q,t)}{dt}
\end{eqnarray}

Such a "generator" of the canonical transformation is called type 1, because it exchanges Q for $\dot{q}$. We allow ourselves to add this total time derivative to the Lagrangian, because Hamilton's principle states that we are only interested in minimizing the action through variation:

\begin{eqnarray}
S &=& \int L dt \\
S^\prime &=& \int L - \frac{dF}{dt}dt=S+constant \\
\delta S &=& \int \left( \frac{\partial L}{\partial q}-\frac{d}{dt}\frac{\partial L}{\partial \dot{q}}\right)\delta q dt \\
\delta S = \delta S^\prime
\end{eqnarray}

so we don't care about adding total time derivatives. (Notice that I have not allowed $F$ to be a function of the generalized coordinate velocity, $\dot{q}$ this is because when varying the action, any dependence upon $\dot{q}$ will result in non-zero terms outside the functional integral, so we need to be careful here! In field theory, we find that adding a total derivative $\partial_\mu X^\mu$ to the lagrangian results in the same action as well, so perhaps this can also be thought of as a type I canonical transformation...)

Pounding through the same equations of motion, we find that, if we want our new Lagrangian to only depend upon q,Q and t, we require:

\begin{eqnarray}
L^\prime(q,Q,t) &=& L - \frac{\partial F}{\partial t}- \frac{\partial F}{\partial q}\dot{q}- \frac{\partial F}{\partial Q}\dot{Q}\\
\frac{\partial L^\prime}{\partial \dot{q}} =0 &\implies &\frac{\partial L}{\partial \dot{q}}=p=\frac{\partial F}{\partial q}\\
\end{eqnarray}

and we make the definition of a new momentum variable
\begin{eqnarray}
P=\frac{\partial L^\prime}{\partial \dot{Q}}=-\frac{\partial F}{\partial Q}
\end{eqnarray}

 With these two definitions in hand, we have essentially defined our new phase space vector $\mathbf{y}_i$. So, we can check out the NEW fundamental commutation relations

\begin{eqnarray}
\left[ Q,P \right] &=& \frac{\partial Q}{\partial q}\frac{\partial P}{\partial p}-\frac{\partial Q}{\partial p}\frac{\partial P}{\partial q} \\
\left[ Q, P \right] &=& -\frac{\partial Q}{\partial q}\frac{\partial^2 F}{\partial p \partial Q}+\frac{\partial Q}{\partial p}\frac{\partial^2 F}{\partial Q\partial q} \\
&=& \frac{\partial Q}{\partial p}\frac{\partial p}{\partial Q} \\
&=& 1
\end{eqnarray}

Trivially, we expect $\left[Q,Q \right]=\left[P,P \right]=0$, and so it all works out. Further generators of the canonical transformation can be created using the legendre transform.


Thursday, July 24, 2014

1-D Wave Equation Green's Function

As I am trying to gain some fluency with Green's functions, and there use in non-relativistic PDE's, I thought it'd be a good exercise to work out some expressions -- or "Magic Rules" -- for the Green's function of the wave equation:

\begin{eqnarray}
\left(\frac{\partial^2}{\partial t^2}-c_s^2 \frac{\partial^2}{\partial x_i \partial x_i}\right)\mathbf{\psi}(\vec{x},t)&=& 0
\end{eqnarray}

Let's assume that $c_s$ is our "speed of sound" in the medium, and that $\psi$ some scalar function of space and time. If we assume that $\psi \to 0$ as we go to $\vert x \vert \to \infty$, or, the boundaries of our space of interest, we are well equipped to perform a fourier transform on this PDE:

\begin{eqnarray}
\left(\frac{\partial^2}{\partial t^2}+c_s^2 k^2\right)\tilde{\psi}(\vec{k},t)&=& 0
\end{eqnarray}

We can now laplace transform in the time variable, swapping $t$ for $\omega$, and write

\begin{eqnarray}
\left(\omega^2+c_s^2 k^2\right)\tilde{\psi}(\vec{k},\omega)&=& \omega \tilde{\psi}(\vec{k},0)+ \tilde{\psi}(\vec{k},0)^\prime
\end{eqnarray}

We can see now that our Differential equation has been rendered algebraic in this "doubly" transformed space. We invert the PDE and write

\begin{eqnarray}
\tilde{\psi}(\vec{k},\omega)&=& \frac{ \omega \tilde{\psi_0}(k) + \tilde{\psi_0^{\prime}}(k)}{\left(\omega^2+c_s^2 k^2\right)}
\end{eqnarray}

We can immediately see, once we invert the laplace transform, that our initial conditions are built into the homogeneous solution:

\begin{eqnarray}
\tilde{\psi}(\vec{k},t)&=& \oint_{\gamma-i \infty}^{\gamma+i \infty}\frac{ \omega \tilde{\psi_0}(k)e^{\omega t}}{\left(\omega^2+c_s^2 k^2\right)}\frac{d\omega}{2\pi i}+\oint_{\gamma-i \infty}^{\gamma+i \infty}\frac{ \tilde{\psi_0^{\prime}}(k)e^{\omega t}}{\left(\omega^2+c_s^2 k^2\right)}\frac{d\omega}{2\pi i}
\end{eqnarray}

Both contour integrals have identical poles: $\omega = \pm i c_s \vert \vec{k} \vert $, and so we use the residue theorem to write down the inverted PDE in k-space:

\begin{eqnarray}
\tilde{\psi}(\vec{k},t) &=& \tilde{\psi_0}(k) \cos \left(c_s k t \right)+ \frac{\tilde{\psi_0^{\prime}}(k)\sin \left(c_s k t \right)}{c_s k}
\end{eqnarray}

Which is incredibly simple but difficult to transform back into x-space. First of all, notice that we are multiplying our initial conditions -- or spectra -- with two separate kernels, sine-over-k and cosine. In the x-space, this means we will be "convolving" our initial conditions with these two separate kernels -- or the x-space inverse fourier transforms -- in order to create our homogeneous solution.

Furthermore, remember that our time-transform variable $\omega$ had poles that were only dependent upon the modulus of $\vec{k}$, meaning that all of the k's you see above should be replaced by $abs(\vec{k})$. These kernels exhibit isotropy, which makes sense since the wave equation is a spatially isotropic and homogenous differential equation to begin with -- the laplacian operator has no preference in direction or orientation.

But, such radial functions of $k$ will prove difficult to inverse fourier transform in higher and higher dimensions of $x$.

If we start out with 1-D, we know that the cosine will turn into an evenly spaced pair of dirac delta distributions, and the sin over k term will turn into a top hat function. And so we are left with

\begin{eqnarray}
\psi(x,t)^{1D} &=& \frac{\psi_0(x-c_s t)+\psi_0(x+c_s t)}{2}+\frac{\Pi(c_s t)}{c_s}\ast \psi^{\prime}_0(x) \\
\psi(x,t)^{1D} &=& \frac{\psi_0(x-c_s t)+\psi_0(x+c_s t)}{2}+\int_{x-c_s t}^{x+c_s t} \frac{\psi^{\prime}_0(x^\prime)}{2 c_s} dx^\prime
\end{eqnarray}

This is a wave front uniformly expand with speed $c_s$, independent of whether the problem is sit up with Neumman or Dirichlet boundary conditions. And, it seems I have stumbled upon -- by the laplace transform -- d'Alembert's formula for the Cauchy problem.

----------------------------------------------------------------------
Now if we move into 2-D, we find that the we need to regularize the inverse fourier transforms, which I will leave for a later post...

Wednesday, July 9, 2014

Importance Sampling

So, last week I was learning a great deal about something called the VEGAS algorithm, which was first presented by Peter LePage in the 60's or 70's. It's a very clever way of going about importance sampling, which is a Monte Carlo method for integration.

To start, let's say we would like to evaluate the integral:

\begin{eqnarray}
I &=& \int f(\mathbf{x})d^n x
\end{eqnarray}

Where $\mathbf{x}$ is an n-dimensional vector and we are integrating over n-dimensional space, with jacobian $d^n x$. Let's say that this function, our integrand, $f(\mathbf{x})$, is purely intractable by analytic means -- it is impossible to integrate with pen and paper. Luckily, we can estimate this integral by other (Monte Carlo) means.

A naive strategy would be to construct a uniform Probability density function (PDF) on the integration region, and sample randomly from this PDF to get the independent variable $\mathbf{x}$ and the corresponding integrand value $f(\mathbf{x}$. This would be essentially obtaining an expectation value. Let's take a look in 1-D:

\begin{eqnarray}
I &=& \int_a^b f(x) dx \\
g(x) &=& \frac{1}{(b-a)}
\eta \sim P(x) \\
\langle f(x) \rangle_{g(x)} &=& \int_a^b f(x) g(x) dx = \frac{1}{b-a}I
\end{eqnarray}

As the number of our random samples, $\eta$ gets large, we can use the law of large numbers to assume that our discrete mean closes in on the "true" mean, which, in this case, is the value of the integral $I$:

\begin{eqnarray}
\langle f(x) \rangle_g &=& \lim_{N \to \infty} \sum_i^N \frac{f(\eta_i)}{N} \ \ \ \ \ \eta \sim g(x) \\
&=& (b-a) \int f(x) dx
\end{eqnarray}

So that's pretty cool! We have estimated an integral by taking a "strange type of mean", drawing random variables from some arbitrary -- well, in this case uniform -- PDF $g(x)$. One way to take care of our strange factor $b-a$ above is to "normalize"

\begin{eqnarray}
I &=& \int f(x) dx = \int \frac{f(x)}{g(x)}g(x)dx
\end{eqnarray}

We see that the above is just our old integral $I$, but we find that, in numerical means, the last term on the right is equivalent to finding the expectation value of $f/g$ "under" $g(x)$, which means:

\begin{eqnarray}
I &=& \int \frac{f(x)}{g(x)}g(x)dx = \langle \frac{f(x)}{g(x)} \rangle_g \\
&=& \lim_{N\to \infty} \sum_i^N \frac{1}{N}\frac{f(\eta_i)}{g(\eta_i)} \ \ \ \ \ \eta \sim g(x)
\end{eqnarray}

So our random sample $\eta$ are drawn from the PDF $g(x)$, and we are searching for the mean value of this \textbf{ratio}, $\frac{f}{g}$. Interesting trick right?

Now our next step in this numerical procedure is to estimate the variance in our "integral estimator", which is what we will call our sum of the ratios, above:

\begin{eqnarray}
\hat{I} &=& \sum_i^N \frac{1}{N}\frac{f(\eta_i)}{g(\eta_i)} \ \ \ \ \ \eta \sim g(x) \\
\lim_{N\to \infty} \hat{I} &=& I
\end{eqnarray}

What is $\sigma^2_I$? We construct our estimator of the variance in the usual way: the mean of the square minus the square of the mean:

\begin{eqnarray}
\sigma^2_I &=& \langle (\hat{I}-I)^2 \rangle = \langle \hat{I}^2 \rangle - \langle \hat{I} \rangle^2 \\
&=& \langle \left(\frac{f(x)}{g(x)}\right)^2 \rangle_g - \langle \frac{f(x)}{g(x)}\rangle_g^2 \\
&=& \int \left(\frac{f(x)}{g(x)}\right)^2  g(x) dx - \left( \int f(x) dx\right)^2\\
&=& \int \frac{\vert f(x) \vert^2}{g(x)^2} g(x) dx - \left( \int f(x) dx\right)^2\\
\end{eqnarray}

Numerically, this final equation can be represented by another discrete sum, which we will call our estimate of the variance:

\begin{eqnarray}
\hat{\sigma}^2_I &=& \frac{ \sum_i^N \frac{\vert f(\eta_i) \vert^2}{g(\eta_i)^2}- \left( \sum_i^N \frac{f(\eta_i)}{g(\eta_i)}\right)^2}{N^2}  \ \ \ \ \ \ \ \eta \sim g(x)
\end{eqnarray}

Now since we are subtracting by an estimator of the mean and not the true mean, we find that the above estimator of the variance is biased, by a factor of $\frac{N-1}{N}$, so we should rewrite our denominator above with an $N(N-1)$, as we are used to.

-----------------------------------------------------------------------------------------------------------

So what choice of the probability density $g(x)$ minimizes the variance in our integral estimate? Intuitively, we can expect that if $g(x)$ closely matches the behaviour of $f(x)$ over the whole domain of integration, the ratio $f/g$ would remain constant, and we expect very small variance. Let's take a function derivative of our expression for variance $\sigma^2_I$ with a lagrange multiplier $\lambda$ in place to maintain the constraint of $g(x)$'s normalization as a probability density function:

\begin{eqnarray}
\sigma^2_I &=& \int \frac{\vert f(x)\vert^2}{g(x)}dx - (\int f(x) dx)^2 + \lambda \left(\int g(x) dx -1 \right) \\
\frac{\delta \sigma^2_I}{\delta g}&=&  \int \left[ \frac{-\vert f(x)\vert^2}{g(x)^2}+\lambda \right] dx
\end{eqnarray}

We see that this derivative above will be zero only when the integrand is zero for all x, and so we have

\begin{eqnarray}
g(x) &=& \lambda \vert f(x) \vert
\end{eqnarray}

Since we have already assumed $g(x)$ to be a PDF, we need not place the absolute value sign on the LHS, above, because PDF's are positive definite (in order to make their cumulative PDF's  $G(x)=\int_{-\infty}^x g(x^\prime)dx^\prime $ monotonically increasing). Plugging in our normalization constraint, once again, we find:

\begin{eqnarray}
\int g(x) dx &=& \lambda \int \vert f(x) \vert dx \\
1 &=& \lambda \int \vert f(x) \vert dx \\
\frac{1}{\int \vert f(x) \vert dx} &=& \lambda \\
\end{eqnarray}

and so the optimum PDF is

\begin{eqnarray}
g(x) &=& \frac{\vert f(x) \vert}{\int \vert f(x) \vert dx}
\end{eqnarray}

or, is proportional to absolute value of our integrand -- with some normalization constant -- at every $x$. This is pretty cool! One can find, by taking a similar functional derivative in N-dimensions, that if $g(x)$, the multidimensional PDF is separable across the components of our vector $\mathbf{x}$

\begin{eqnarray}
g(\mathbf{x}) &=& \Pi_i^N g_i (x_i)
\end{eqnarray}

then the optimum $g(\mathbf{x})$ takes the following form:

\begin{eqnarray}
g_j (x_j) &=& \frac{\left[ \int \frac{\vert f(x) \vert^2}{\Pi_{i \ne j} g_i (x_i) } d^{n-1}x\right]^{1/2}}{\int dx_j \left[ \int \frac{\vert f(x) \vert^2}{\Pi_{i \ne j} g_i (x_i) } d^{n-1}x\right]^{1/2}}
\end{eqnarray}

-----------------------------------------------------------------------------------------------------------

So how does VEGAS go about approximating these "optimum" multidimensional distributions? By changing the sampling grid. LePage did a very clever thing when he constructed an algorithm that initially samples some integrand uniformly, essentially adopting the following PDF:

\begin{eqnarray}
g_0(\mathbf{x}) &=& \frac{1}{V_n}
\end{eqnarray}

Where $V_n$, is the n-dimensional volume of the integration region. But then, once these samples are taken, he perturbs the grid spacings -- keeping the number of samples constant -- in proportion with the absolute value of the integrand, $f(\mathbf{x})$. What this essentially does is match $g(\mathbf{x})$'s peaks to $f(\mathbf{x})$'s peaks, be they high or low, and over successive sampling rounds -- let's say M of them -- the "grid", or $g(\mathbf{x})$, gets perfectly attuned to the characteristics of $f(\mathbf{x})$. 

Monday, June 30, 2014

The Power Spectrum under Linear Turbulence

In an attempt to square up my understanding of nonlinear fluid dynamics and the connection between Green's functions and two-point correlation functions, I have been playing with the following, very general equation

\begin{eqnarray}
L^0 u_\alpha(\vec{x},t) &=& f_\alpha (\vec{x},t)+ M_{\alpha \beta \gamma}(\vec{x},t) \int \beta (\vec{x},t) \gamma(\vec{q}-\vec{x},t) d^3q
\end{eqnarray}

Where, in the above, $L^0$ is a linear differential operator. The diffusion equation would be a good example:

\begin{eqnarray}
L^0 \sim \frac{\partial}{\partial t} - \nu \vec{\nabla}^2
\end{eqnarray}

By linear, I mean that $L^0$ is not a function of our function of interest $u_\alpha(\vec{x},t)$, which we are trying to solve/predict. I have given $u$ a subscript $\alpha$ to keep track of four components: the three cartesian velocities and the density. This fully describes fluid motion.

Let's call the  quantity $f_\alpha$ the "noise" or "stirring" term, that excites the systems over space and time. The final term in the above equation is our non-linear "mixing" function, but more on that later.

We can solve the linear system

\begin{eqnarray}
L^0 u_\alpha(\vec{x},t) &=& f_\alpha (\vec{x},t)
\end{eqnarray}

by just writing our function of interest $u_\alpha (\vec{x},t)$ in terms of what's called a linear Green's function. The Green's function has the following properties:

\begin{eqnarray}
L^0 G^0(\vec{x},t \vert \vec{x}^\prime, t^\prime) &=& \delta(\vec{x}-\vec{x}^\prime)\delta(\vec{t}-\vec{t}^\prime) \\
&=& 0 \ \ \ \mathrm{if} \ \ t<t^\prime
\end{eqnarray}

and can be built out of something called the propagator, which has some slightly different characteristics:

\begin{eqnarray}
L^0K^0(\vec{x},t \vert \vec{x}^\prime, t^\prime) &=& 0 \ \ \ \mathrm{if} \ \ t>t^\prime \\
K^0(\vec{x},t \vert \vec{x}^\prime, t^) &=&  \delta(\vec{x}-\vec{x}^\prime) \ \ \ \mathrm{for\ \ equal \ \ times}
\end{eqnarray}

If take a look at our linear system in fourier space, we will see that the differential operator -- transforming $\vec{x}$, not t -- becomes an algebraic function of $\vec{k}$ and $\partial_t$:

\begin{eqnarray}
\tilde{L}^0(\vec{k},\partial_t) u_\alpha(\vec{k},t) &=& f_\alpha (\vec{k},t)
\end{eqnarray}

We would like to construct our linear solution to this system using the Green's functions. Now we know, from above that $L^0$ applied to our Green's function yields

\begin{eqnarray}
\tilde{L}^0 G^0(\vec{k},t \vert \vec{k}^\prime, t^\prime) &=& 1 \cdot \delta(t-t^\prime)
\end{eqnarray}

Where I have taken the fourier transform of the spatial dirac delta function from before, which is unity. In a translationally -- or momentum conserving -- system, we expect all fourier transforms of functions to carry with them an extra dirac term:

\begin{eqnarray}
\tilde{L}^0 G^0(\vec{k},t \vert \vec{k}^\prime, t^\prime) \delta(k+k^\prime) &=& 1 \cdot \delta(t-t^\prime)
\end{eqnarray}

And so we find

\begin{eqnarray}
\tilde{L}^0 G^0(\vec{k},t \vert -\vec{k}, t^\prime) &=& 1 \cdot \delta(t-t^\prime)
\end{eqnarray}

Convolving the Green's function with our source function in k space, now, will yield our linear solution

\begin{eqnarray}
\tilde{L}^0 \left( G^0(\vec{k},t \vert -\vec{k}, t^\prime)\star f_\alpha(\vec{k}) \right) &=&  f_\alpha(\vec{k},t) \\
u_\alpha^0(\vec{k},t) &=& \int G^0(\vec{k},t \vert -\vec{k}, t^\prime)f_\alpha(\vec{k},t^\prime) dt^\prime
\end{eqnarray}

This immediately shows that the fourier decomposition of our linear system does not "mix modes" over time. We have no integral over $d^3k$ written above, and so we can claim, under linear forces, that the power spectrum of our system should retain the same shape.

Let's look into this more explicitly. If take the ensemble average of two of our vector functions of interest in k-space:

\begin{eqnarray*}
\langle u_\alpha^0(\vec{k_1},t_1) u_\alpha^0(\vec{k_2},t_2)  \rangle &=& \int \int G^0(\vec{k_1},t_1 \vert -\vec{k_1}, t_1^\prime)f G^0(\vec{k_2},t_2 \vert -\vec{k_2}, t_2^\prime) \langle f_\alpha(\vec{k_1},t_1^\prime) f_\alpha(\vec{k_2},t_2^\prime)  \rangle dt_1^\prime dt_2^\prime
\end{eqnarray*}

Now, if we assume the following isotropic, Gaussian form for our noise funtion (recalling that we've already assume a translationally invariant system)

\begin{eqnarray}
\langle f_\alpha(\vec{k_1},t_1^\prime) f_\alpha(\vec{k_2},t_2^\prime)  \rangle &=& \left(\delta_{\alpha \beta} - \frac{k_\alpha k_\beta}{k^2}\right) W(\vert k \vert) \delta(t_1^\prime - t_2^\prime) \delta(k_1 - k_2)
\end{eqnarray}

Then our expectation value for the second moment, or the pair correlation tensor, above, simplifies immensely:

\begin{eqnarray*}
\langle u_\alpha^0(\vec{k_1},t_1) u_\alpha^0(\vec{k_2},t_2)  \rangle &=& D_{\alpha \beta} \int  W(\vert \mathbf{k} \vert) G^0\left(\mathbf{k},t_1 \vert -\mathbf{k},t^\prime \right)G^0\left(\mathbf{k},t_2 \vert -\mathbf{k},t^\prime \right) \ dt^\prime
\end{eqnarray*}

Now I find this very interesting, because of we correlate two modes at equal time $t_1 = t_2$ and set $\alpha=0$, we've got the power spectrum:

\begin{eqnarray*}
P(k,t)\delta(k+k^\prime) &=& \langle u_0^0(\vec{k},t) u_0^0(\vec{k^\prime},t)  \rangle \\
&=& \langle u_0^0(\vec{k},t) u_0^0(\vec{-k},t)  \rangle \\
&=& \int_{t_0}^t  W(\vert \mathbf{k} \vert) \vert \hat{G}^0(\mathbf{k},t ,t^\prime) \vert^2 dt^\prime
\end{eqnarray*}

If we take the limit as $t \to t_0$, then we find that integral goes to zero "width" and we are left with just the value of our integrand, which, given the properties of the propagator, should be $W(k)$. Thus our Power spectrum under no time-evolution yields

\begin{eqnarray}
P(k,t_0) &=& W(k)
\end{eqnarray}

As one would expect. If $W(k)$ is time independent, then we would expect this function to represent the "seeds" for fluid perturbations -- or in cosmology, for gravitational growth!

Now the following could be perhaps spurious, but perhaps the propagator is the transfer function I am so used to seeing in Cosmology:

\begin{eqnarray*}
P(k) &=& T(k)^2 P(k,t_0)\\
&=&  P(k,t_0) \int_{t_0}^t  \vert \hat{G}^0(\mathbf{k},t ,t^\prime) \vert^2 dt^\prime \\
T(k)^2 &=&  \int_{t_0}^t  \vert \hat{G}^0(\mathbf{k},t ,t^\prime) \vert^2 dt^\prime
\end{eqnarray*}

The key here is that the time dependence in the above integral has been "shucked" out, by separating time -- or redshift -- into the linear growth factor $a(t)$.

Tuesday, June 24, 2014

Conditioning Multinormals

In the following, all equals signs will really mean "proportional" as we're going to work with probability densities and for the time being, don't want to worry about normalization.

Let $\mathbf{x}$ and $\mathbf{y}$ be random vectors drawn from the following multinormal:

\begin{eqnarray}
P(\mathbf{x},\mathbf{y}) \sim \mathrm{N}\left( \left[\begin{array}{c} \mu_x \\ \mu_y \end{array}\right], \left[\begin{array}{cc} A & C \\ C^T & B \end{array}\right]^{-1} \right)\\
\sim e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x ,& y - \mu_y \end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x, \\ y-\mu_y \end{array}\right] \right)}
\end{eqnarray}

The covariance matrix above is quite complicated, but we can think of $C^{-1}$ as cross correlating our vectors $\mathbf{x}$ and $\mathbf{y}$ and $A^{-1}$ and $B^{-1}$ "auto"-correlating within the vectors $\mathbf{x}$ and $\mathbf{y}$. We have set this up to be a Block Diagonal Matrix. We need not have $\mathbf{x}$ and $\mathbf{y}$ be of the same dimension.

To find the inverse of a block diagonal matrix, like the above, we write

\begin{equation}
A=\left[\begin{array}{cc} P & Q \\ R & S \end{array}\right]
\end{equation}

we write:

\begin{eqnarray}
\left[\begin{array}{cc} P & Q \\ R & S \end{array}\right]\left[\begin{array}{cc} X_1 & X_2 \\ X_3 & X_4 \end{array}\right] &=& \left[\begin{array}{cc} I & 0 \\ 0 & I \end{array}\right]
\end{eqnarray}

Which results in the following linear equations:

\begin{eqnarray}
PX_1+QX_3 &=& I \\
PX_2 &=& - Q X_4 \\
RX_1 = - S X_3 \\
RX_2+SX_4 &=& I
\end{eqnarray}

with the solutions:

\begin{eqnarray}
X_1 &=& P^{-1}+P^{-1}QMRP^{-1}\\
X_2 &=& -P^{-1}QM\\
X_3 &=& -MRP^{-1} \\
X_4 &=& M \\
M &=& (S-RP^{-1}Q)^{-1}
\end{eqnarray}

and

\begin{eqnarray}
X_1 &=& N\\
X_2 &=& -NQS^{-1}\\
X_3 &=& -S^{-1}RN \\
X_4 &=& S^{-1}+S^{-1}RNQS^{-1} \\
N &=& (P-QS^{-1}R)^{-1}
\end{eqnarray}

Now, let's say we are interested in the conditional multinormal

\begin{eqnarray}
(x \vert y) \sim ?
\end{eqnarray}

We expect ?, above, to be another Gaussian, which will be uniquely determined by its mean and variance -- the first two cumulants. Let's right down the conditional expectation value using Bayes' Theorem:

\begin{eqnarray}
\langle\mathbf{x} \vert \mathbf{y}\rangle &=& \int \mathbf{x} P(x\vert y) d\mathbf{x} \\
&=& \int \mathbf{x} \frac{P(\mathbf{x},\mathbf{y})}{P(\mathbf{y})} d\mathbf{x}
\end{eqnarray}

And now, to expand things to higher order in $\mathbf{x}$, we can create a conditional moment generating function:

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot (\mathbf{x} \vert \mathbf{y})}\right)\ \ \frac{e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x & y-\mu_y \end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x \\ y-\mu_y \end{array}\right] \right)} }{e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x-\mu_x & y-\mu_y \end{array}\right] \cdot \left[\begin{array}{cc} 0 & 0 \\ 0 & B \end{array}\right]\cdot \left[\begin{array}{c} x-\mu_x \\ y-\mu_y \end{array}\right] \right)}}
\end{eqnarray}

Notice how the denominator is just the Probability density on $\mathbf{y}$. Let's rewrite $\mathbf{x}^\prime = \mathbf{x}-\mathbf{\mu}_x$ and $\mathbf{y}^\prime = \mathbf{y}-\mathbf{\mu}_y$ to simplify things. Let's also get rid of the conditional notation $\vert y $ in the exponent to clean things up and make conditioning implicit:

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot (\mathbf{x^\prime}+\mathbf{\mu}_x )}\right)\ \ e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x^\prime & y^\prime \end{array}\right] \cdot \left[\begin{array}{cc} A & C \\ C^T & 0 \end{array}\right]\cdot \left[\begin{array}{c} x^\prime \\ y^\prime \end{array}\right] \right)} \\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& \int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot (\mathbf{x^\prime}+\mathbf{\mu}_x )}\right)\ \ e^{-\frac{1}{2} \left( \left[\begin{array}{cc} x^\prime & y^\prime \end{array}\right] \cdot \left[\begin{array}{c} Ax^\prime+Cy^\prime \\ C^Tx^\prime \end{array}\right] \right)} \\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }\int d\mathbf{x} \left(e^{\mathbf{\lambda}\cdot \mathbf{x^\prime}}\right)\ \ e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j)- C_{ij} x^\prime_i y^\prime_j \right)}
\end{eqnarray}

Now we can simplify things a little bit, by writing $ik_i = \lambda_i-C_{ij}y^\prime_j$
\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }\int d\mathbf{x} \left(e^{(\lambda_i-C_{ij}y^\prime_j)x^\prime_i}\right)\ \ e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j \right)}\\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }\int d\mathbf{x} e^{ik_i x^\prime_i}\ \ e^{-\frac{1}{2}\left( x^\prime_i A_{ij}x^\prime_j \right)}
\end{eqnarray}

And now we have a multidimensional Fourier transform of a Gaussian, (which is another Gaussian)

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x }e^{-\frac{1}{2}k_i A^{-1}_{ij}k_j}\\
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda \cdot \mathbf{\mu}_x -\frac{1}{2}(\lambda_i-C_{il}y^\prime_l)A^{-1}_{ij}(\lambda_j-C_{jm}y^\prime_m)}\\
\end{eqnarray}

Foiling out and collecting like terms in $\lambda $, we've got:

\begin{eqnarray}
\langle e^{(\mathbf{x} \vert \mathbf{y})} \rangle &=& e^{\lambda_i \left[\mathbf{\mu}^{(x)}_i  -\frac{1}{2}A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m)\right]-\left[\frac{1}{2}\lambda_i A^{-1}_{ij}\lambda_j\right] \ \ - \ \ (y_l-\mu^{(y)}_l)C_{il}A^{-1}_{ij}C_{jm}(y_m-\mu^{(y)}_m)} \\
\end{eqnarray}

And now we see that the first moment and variance behave differently than one would expect:
\begin{eqnarray}
\frac{\partial}{\partial \lambda_i}\langle e^{(\mathbf{x} \vert \mathbf{y})}\vert_{\mathbf{\lambda}=0} \rangle &=&\langle x_i \rangle = \mathbf{\mu}^{(x)}_i  - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m) \\
\frac{\partial^2}{\partial \lambda_i \lambda_j}\langle e^{(\mathbf{x} \vert \mathbf{y})}\vert_{\mathbf{\lambda}=0} \rangle &=&\langle x_i x_j \rangle = A^{-1}_{ij} = A-C B^{-1}C^T
\end{eqnarray}

Where each of these expectation values are modified by the Gaussian on $\mathbf{y}$, which is essentially another normalization factor that we've thrown out:
\begin{eqnarray}
f(y) &=& \mathrm{Exp}\left[-\frac{1}{2}(y_l-\mu^{(y)}_l)C_{il}A^{-1}_{ij}C_{jm}(y_m-\mu^{(y)}_m)\right]
\end{eqnarray}

So now, if we write the conditional Probability density function $\mathbf{P(x \vert y)}$, we have:

\begin{eqnarray}
P(\mathbf{x} \vert \mathbf{y}) &=& N\left[ \mathbf{\mu}^{(x)}_i  - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m), A^{-1}_{ij} \right] \\
P(\mathbf{x} \vert \mathbf{y}) &=& N\left[ \mathbf{\mu}^{(x)}_i  - A^{-1}_{ij} C_{jm}(y_m-\mu^{(y)}_m), A-C B^{-1}C^T \right]
\end{eqnarray}




Thursday, May 22, 2014

The Student t- distribution

Let's say we are given some data, a series of scalar values x_i with subscript 1 to n. The student t-statistic is defined as

\begin{eqnarray}
t=\frac{\bar{x}-\mu}{\sqrt{s^2/n}}
\end{eqnarray}

Now we can write $s^2$ as our unbiased estimator for the variance,

\begin{eqnarray}
s^2 &=& \sum_i^N \frac{(x_i-\bar{x})^2}{N-1}
\end{eqnarray}

Let's play with the t statistic representation a little more,

\begin{eqnarray}
t=\frac{(\frac{\bar{x}-\mu}{\sigma})\sqrt{n}}{\sqrt{\frac{s^2}{\sigma^2}}}
\end{eqnarray}

We can now see that the numerator is a random variable with Expectation value 0, and if we think about things a little bit, assuming that each and every data point is drawn from a Gaussian distribution:

\begin{eqnarray}
x_i \sim N(\mu,\sigma)
\end{eqnarray}
Then the addition of all of our x_i's -- in order to create the mean -- will be a convolution of all of our Gaussians, and thus an addition of our second cumulants, the variance:

\begin{eqnarray}
n\bar(x)=\sum_i^n x_i \sim N(\mu,\sigma) \star N(\mu,\sigma) \star N(\mu,\sigma) \cdots &=& N(n\mu,\sqrt{n}\sigma)
\end{eqnarray}

This is of course the same idea as the width of a random walker, we see that with n steps, the variance of our end Probability density goes like the square of n. Now subtracting the mean is just centering our PDF:

\begin{eqnarray}
\bar{x}-\mu=\sum_i^n \frac{x_i-\mu}{n} &\sim& N(0,\sqrt{n}\sigma)
\end{eqnarray}
And so we find, if we scale by our theoretical variance, the random variable in the numerator is determined by a particularly tractable normal distribution, with variance n

\begin{eqnarray}
X=\frac{(\bar{x}-\mu)\sqrt{n}}{\sigma}&\sim& N(0,n)
\end{eqnarray}

Now for the denominator. Without the square root sign we have:

\begin{eqnarray}
\frac{s^2}{\sigma^2}&=& \sum_i^n \frac{(x_i-\mu)^2}{(n-1)\sigma^2}
\end{eqnarray}

And we can see immediately this will be the sum of the square of our former variables,

\begin{eqnarray}
x_i-\mu &\sim& N(0,\sigma) \\
\frac{x_i-\mu}{\sigma} &\sim& N(0,1)\\
\frac{(x_i-\mu)^2}{\sigma^2} &\sim& \frac{N(0,1)(x^2)}{x}=g_{1/2,1/2}(x^2)\\
\frac{(x_i-\bar{x})^2}{(n-1)\sigma^2} &\sim& \frac{N(0,1)(x^2)}{x}=g_{1/2,1/2}(x^2)
\end{eqnarray}

Where the final PDF written is the standard Gamma density
\begin{eqnarray}
N(0,1) &=& \frac{1}{\sqrt{2\pi}}e^{-x^2/2}\\
g_{\alpha,\nu}(x)&=& \frac{1}{\Gamma(\nu)}\alpha^\nu x^{\nu-1}e^{-\alpha x}\\
g_{1/2,1/2}(x^2) &=& \frac{1}{\sqrt{2\pi}}(x^2)^{-1/2}e^{-\frac{x^2}{2}}
\end{eqnarray}

Which is the required PDF for the square of our Gaussian random variable, with unit variance. Now in order to convert to the square root of denominator variable, let's call it y, we need to multiply by a certain factor:

\begin{eqnarray}
\frac{(x_i-\bar{x})^2}{(n-1)\sigma^2}=\frac{s^2}{\sigma^2} &\sim & g_{1/2,n/2}(x^2) \\
Y=\sqrt{\frac{s^2}{\sigma^2}} &\sim & 2Y g_{1/2,n/2}(Y^2)
\end{eqnarray}

And so we see that

\begin{eqnarray}
T=\frac{X}{Y}
\end{eqnarray}
where x,y each have their own probability density functions. The best way to combine these pdf's is to to write constrain X and then integrate over all possible values of Y. For example the cumulative distribution function for t should be:

\begin{eqnarray}
X &\sim & f\\
Y &\sim & g\\
T &\sim & P(t)\\
\Phi(t)=P(T \leq t)&=& \int_0^t \int_0^\infty g(y)F(t^\prime Y) dt^\prime dy
P(t) &=& \int_0^\infty y f(ty)g(y) dy
\end{eqnarray}

Plugging in our probability densities from before, we have:

\begin{eqnarray}
f(ty) &=& \frac{1}{\sqrt{2\pi} n}\mathrm{Exp}\left[-\frac{(t^2y^2)}{2n^2}\right]\\
g(y) &=& \frac{y^{n-1}}{\Gamma(\frac{n}{2})}\frac{1}{\sqrt{2^{n-2}}}\mathrm{Exp}\left(-\frac{y^2}{2}\right)\\
P(t) &=& \int_0^\infty t \frac{1}{\sqrt{2\pi} n}\mathrm{Exp}\left[-\frac{(t^2y^2)}{2n^2}\right]\frac{y^{n}}{\Gamma(\frac{n}{2})}\frac{1}{\sqrt{2^{n-2}}}\mathrm{Exp}\left(-\frac{y^2}{2}\right) dy\\
&=&\frac{1}{\sqrt{2^{n}}}\frac{1}{\Gamma(\frac{1}{2}n)\Gamma(\frac{n}{2})} \int_0^\infty y^{n} \mathrm{Exp}\left[-\frac{(t^2y^2)}{2n^2}-\frac{y^2}{2}\right]dy\\
&=&\frac{1}{\sqrt{2^{n}}}\frac{1}{\Gamma(\frac{1}{2}n)\Gamma(\frac{n}{2})} \int_0^\infty y^{n} \mathrm{Exp}\left[-\frac{y^2}{2}(1+\frac{t^2}{n^2})\right]dy\\
\end{eqnarray}

Now making the tedious variable change

\begin{eqnarray}
s= \frac{y^2}{2}(1+\frac{t^2}{n^2})
\end{eqnarray}

we find
\begin{eqnarray}
y^{n}dy=\frac{2^{\frac{n}{2}}s^{\frac{n+1}{2}-1}}{\left(1+\frac{t^2}{n^2}\right)^{\frac{n+1}{2}}}ds
\end{eqnarray}

and so P(t) reduces to a Gamma integral:
\begin{eqnarray}
P(t)&=&\frac{1}{\sqrt{2^{n}}n}\frac{1}{\Gamma(\frac{1}{2})\Gamma(\frac{n}{2})} \int_0^\infty y^{n} \mathrm{Exp}\left[-\frac{y^2}{2}(1+\frac{t^2}{n^2})\right]dy\\
&=&\frac{1}{n}\frac{\Gamma(\frac{n+1}{2})}{\Gamma(\frac{1}{2})\Gamma(\frac{n}{2})}\frac{1}{\left(1+\frac{t^2}{n^2}\right)^{\frac{n+1}{2}}}\\
&=&\frac{1}{n}\frac{1}{B(\frac{1}{2},\frac{n}{2})}\frac{1}{\left(1+\frac{t^2}{n^2}\right)^{\frac{n+1}{2}}}\\
\end{eqnarray}

I'm a bit off from the wikipedia article on the student-t here, but a good exercise in combining PDFs. 

Friday, May 16, 2014

A first scrawl at connected and disconnected Moments

Building upon the last post, I finally realize the difference between connected moments and disconnected moments: It has to do with conservation of momentum in the Feynman diagrams we have been using to represent the Gram-Charlier expansion. (I've recently realized the story becomes more complicated in Statmech, and connected moments are quickly defined to be the cumulants, but more on that next time.)

The correlation between two localized excitations in the field are given by derivatives the generating function:

\begin{eqnarray}
\langle J_{i_1}\dots J_{i_l} \rangle &=& \frac{\partial}{\partial J_{i_1}}\cdots \frac{\partial}{\partial J_{i_l}} \log\left(Z(\mathbf{J})\right)
\end{eqnarray}

But we found, from before, that the generating function was actually built out of a sum of Green's functions, which are basically correlators in momentum space:

\begin{eqnarray}
Z(\mathbf{J}) &=& \sum_s=0^l J_{i_1}\cdots J_{i_l} G_{i_1 \cdots i_l} \\
G_{ij} &=& \langle q_i q_j \rangle = \int d^Nq \left(q_i q_j\right)e^{\mathbf{J}\cdot \mathbf{q} - \mathbf{q}\cdot \mathbf{A} \cdot \mathbf{q}-\frac{\lambda}{4!} \mathbf{q}^4}
\end{eqnarray}

The above is written for a random anharmonic field -- thus the lambda J $q^4$ term -- and we see that this is just our two-point green's function from before. If we expand this integral out in powers of lambda, we will get our one loop and two loop terms:

\begin{eqnarray}
G_{ij} &=& \langle q_i q_j \rangle = \int d^Nq \left(q_i q_j\right) \left(1-\frac{\lambda}{4!}\sum_n (q_n)^4+\frac{\lambda^2}{4!4!}\sum_m \sum_n (q_n)^4(q_m)^4 + \dots \right)e^{\mathbf{J}\cdot \mathbf{q} - \mathbf{q}\cdot \mathbf{A} \cdot \mathbf{q}}
\end{eqnarray}

We see by Wick contraction this leads to terms like:

\begin{eqnarray}
G_{ij} &=& \langle q_i q_j \rangle + \sum_n \frac{\lambda}{4!} \langle q_i q_j q_n q_n q_n q_n\rangle +  \sum_m \sum_n \frac{\lambda^2}{4!4!} \langle q_i q_j q_n q_n q_n q_n q_m q_m q_m q_m\rangle + \dots \\
&=&  \mathbf{A}^{-1}_{ij} + \frac{\lambda (4\cdot 3)}{4!}\sum_n(\mathbf{A}^{-1}_{in}\mathbf{A}^{-1}_{nj}\mathbf{A}^{-1}_{nn}+\mathbf{A}^{-1}_{ij}\mathbf{A}^{-1}_{nn}\mathbf{A}^{-1}_{nn} ) + \frac{\lambda (8 \cdot 7)}{4! 4!}\sum_m \sum_n \left( \dots \right)+\dots
\end{eqnarray}

The disconnected terms are any that contain
\begin{eqnarray}
\mathbf{A}^{-1}_{ij}
\end{eqnarray}

in the n and m summations.  The word "connected" in momentum space means that the sum of our q's must be equal to zero. Or, that the ingoing and outgoing momenta must sum to zero. (Disconnected diagrams correspond to "vacuum fluctuations" where a random variable or particle appears out of nowhere, and then disappears at some later point in time.) This restriction on our available terms in the two-point Green's function can be written as:

\begin{eqnarray}
G_{ij\ \ \rm{connected}} &=& \langle q_i q_j \rangle \delta(q_i+q_j) + \sum_n\frac{\lambda}{4!} \langle q_i q_j q_n q_n q_n q_n\rangle \delta(q_i+q_j+q_n) \\ &&+ \sum_m \sum_n \frac{\lambda^2}{4!4!} \langle q_i q_j q_n q_n q_n q_n q_m q_m q_m q_m\rangle \delta(q_i+q_j+q_n+q_m) + \dots
\end{eqnarray}

And this discrete sum looks a lot like the one loop "integrations" I have been working with for the power spectrum and bispectrum! Trouble is, in Standard perturbation theory, we are using an explicit recursion relation to expand the random variable -- over density -- in powers of the scale factor, not through some coupling constant lambda.

 The anharmonic term may be written slightly incorrectly above, but it is difficult to represent the potential of a self-interacting field without a textbook. I'll have to look up the approximation.

-------------------------------------------------------------------------------------------------------------------------
                    Translational invariance of correlation functions and "connectedness"
-------------------------------------------------------------------------------------------------------------------------

It is also interesting to note that, translationally invariant correlation functions, such as the Power Spectrum and Bispectrum, carry a natural dirac delta function of the form

\begin{eqnarray}
G_{ij} &=& \langle q_i q_j \rangle \delta(q_i +q_j) \\
G_{ijk} &=& \langle q_i q_j q_k \rangle \delta(q_i +q_j+q_k) ,
\end{eqnarray}

written above. This means that translationally invariant correlations automatically require connected diagrams? After reading about this in some Statmech textbooks, it seems a translationally invariant system already defines connected moments as the cumulants due to a graphical expansion of the partition function. Will write more about this later....

Sunday, April 27, 2014

Loop Diagrams and the Gram-Charlier Expansion

In response to the former question, whether an arbitrary statistical distribution -- with non-zero kurtosis and skewness and higher order cumulants -- can be represented in terms of loop diagrams, the answer is yes!

After looking at Zee's "Child Problem" In "Quantum field theory in a nutshell", we see the following multidimensional integral:

\begin{eqnarray}
Z(J) &=&\int d^Nq \mathrm{Exp}\left(\frac{-\vec{q}\cdot \mathbf{A} \cdot \vec{q}}{2}+\vec{J}\cdot{q}-\frac{\lambda}{4!}q^4 \right)
\end{eqnarray}

Which can be described in terms of a Green's function series expansion in powers of J:

\begin{eqnarray}
Z(J) &=& \sum_{s=0}^N \frac{1}{s!}J_{i_1}J_{i_2}\cdots J_{i_s} G_{i_1i_2\cdots i_s} \\
G_{i\cdots j} &=& \int d^Nq \left(q_i \cdots q_j \right)\mathrm{Exp}\left(\frac{-\vec{q}\cdot \mathbf{A} \cdot \vec{q}}{2}-\frac{\lambda}{4!}q^4 \right) = \langle q_i \cdots q_j \rangle
\end{eqnarray}

Now G is a rank 's' tensor, and we interpret J physically, as an excitation of the system, but in statistics, this is just an offset to the total random vector q. The matrix A is the covariance matrix, or our multidimensional second cumulant -- rank 2 tensor -- from before. We can immediately write down the two point Green's function for this expansion to zeroth and first loop order:

\begin{eqnarray}
G_{ij} &=& \mathbf{A}^{-1}_{ij}-\lambda \left( \sum_n  \mathbf{A}^{-1}_{in} \mathbf{A}^{-1}_{jn} \mathbf{A}^{-1}_{nn} \right)
\end{eqnarray}

Which, in statistics-speak reads:

\begin{eqnarray}
G_{ij} &=& \underline{\underline{c_2}}_{ij}-\lambda \left( \sum_n \underline{\underline{c_2}}_{in}\underline{\underline{c_2}}_{jn}\underline{\underline{c_2}}_{nn} \right)
\end{eqnarray}

Those underlines are messy, but we now see that we are taking into account cross correlation between two variables -- labeled by subscripts i and j -- as well as "one-loop" terms which allow for correlation with some intermediate variable "n", and even accounting for "n"'s correlation with itself. I think this is the correct interpretation, and if one were to allow even more expansion, we'd have

\begin{eqnarray}
G_{ij} &=& \mathbf{A}^{-1}_{ij}-\lambda \left( \sum_n  \mathbf{A}^{-1}_{in} \mathbf{A}^{-1}_{jn} \mathbf{A}^{-1}_{nn} +\mathbf{A}^{-1}_{ij}\mathbf{A}^{-1}_{nn}\mathbf{A}^{-1}_{nn}\right)
\end{eqnarray}


for the two-point green's function to oneloop order. This third term corresponds to a "disconnected" diagram, since the point -- or random variable -- n is in no way correlated with the initial "source" points i and j.

Now to write the total generating function, we would have something like

\begin{eqnarray}
Z(J) &=& \left(\frac{(2\pi)^N}{\vert \mathbf{A}\vert }\right)^{1/2}\left[1+\vec{J}\cdot \mathbf{A}^{-1}\cdot \vec{J} -\lambda \vec{J}_i (\sum_n  \mathbf{A}^{-1}_{in} \mathbf{A}^{-1}_{jn} \mathbf{A}^{-1}_{nn} +\mathbf{A}^{-1}_{ij}\mathbf{A}^{-1}_{nn}\mathbf{A}^{-1}_{nn})\vec{J}_j \right]
\end{eqnarray}

It's a bit unclear where to go from here, but these green's functions certainly correspond to expectation values of a random vector. The question is, when on introduces anharmonicity into the field -- for instance a non-zero fourth cumulant -- what happens to our expectation values? And, how do we represent those new 2-point and 4-point green's functions diagrammatically?

Measure for the Path Integral

Quick Reminder to self:

The measure for the path integral, normally written

\begin{eqnarray}
\langle q_F \vert e^{-iHT}\vert q_i \rangle &=& \int Dq e^{\frac{i}{\hbar}\int L(q,\dot{q},t)dt}
\end{eqnarray}

is

\begin{eqnarray}
\int Dq &=& \lim_{N \to \infty}\left( \frac{2\pi m}{i \delta t} \right)^{N/2} \int \cdots \int dq_1 dq_2 \cdots dq_N
\end{eqnarray}

Wednesday, April 23, 2014

PT and the Gram-Charlier

Is it possible to represent the Gram-Charlier expansion in terms of loop diagrams? Something to look into...

Monday, April 21, 2014

ANOVA and the "Temperature" of a data set

Typically, with analysis of variance, one chooses to weight all data points -- or deviations from our model -- equally in the chi-squared statistic, assuming

\begin{eqnarray}
\chi^2 &=& \sum_i^N \frac{(x_i-f(x_i))^2}{2\sigma^2_i}\\
\sigma_i &=& \sigma \ \ \forall, i
\end{eqnarray}

Which yields a very simple posterior distribution:

\begin{eqnarray}
P(\mu, \sigma^2 \vert D) &=& \frac{\left(2\pi \sigma^2\right)^{-N/2}e^{-\chi^2/2}P(\mu,\sigma^2)}{Z}
\end{eqnarray}

Where Z is our awful, marginalized normalizing factor from before. We can write our posterior distribution much like the partition function of an Bose-Einstein ensemble, and associate our chi-squared statistic -- or deviation from our model f(x) -- with an 'Energy':
\begin{eqnarray}
P(\mu, \sigma^2 \vert D) &=& \frac{e^{-\beta E(\mu,\sigma^2)}}{Z} \\
Z &=& \int \int e^{-\beta E(\mu,\sigma^2)} d\mu d\sigma^2 \\
\beta E(\mu,\sigma^2) &=& \sum_i^N \frac{(x_i-f(x_i))^2}{2\sigma^2}
\end{eqnarray}

In this case the variance of each and every data point y_i -- the denominator in our chi-squared expression above -- acts much like the "temperature" of our distribution. We can make things even clearer by putting in the chemical potential, or the energy associated with adding/replacing a member of our dataset:

\begin{eqnarray}
\mu_c &=& \mathrm{chemical}\ \ \mathrm{potential}\\
P(\mu, \sigma^2 \vert D) &=& \frac{e^{N\beta\mu_c-\beta E)}}{Z} \\
\beta \mu_c &=& \log\left(\frac{1}{\sqrt{2\pi \sigma^2}} \right)
\end{eqnarray}

Which manifests itself as our extra normalizing factor for adding an extra data point. Exactly like our chemical potential mu's function in the bose einstein distribution! (Pardon the repetition of mu).

Tuesday, April 15, 2014

Least Chi-Squared -- or Maximum Log-Likelihood -- Estimation of Parameters

Least Chi-Squared -- or Maximum Log-Likelihood -- Estimation of Parameters

Let's say your given some data, a list of x and y values, which seem to be related by a model. One example is a perfect linear relation ship with some scatter, where

\begin{eqnarray}
y_i \approx m*x_i+b
\end{eqnarray}

Where $m$ is a best fit slope and $b$ a vertical offset. Since this is not a perfect relationship, we must introduce an error term into our model:

\begin{eqnarray}
y_i &=& m*x_i+b + e_i\\
y_i &=& f(x_i) + e_i \\
f(x) &=& mx+b
\end{eqnarray}

$f(x_i)$ in this instance, represents our model, or, our "perceived" relationship between $x$ and $y$ values. Another example is when the $y_i$ values are just centered around some single value -- i.e. a random variable. Then we have:

\begin{eqnarray}
y_i &=& f(x_i) + e_i \\
f(x) &=& \mu
\end{eqnarray}

For a random variable $Y$ centered about zero, we might say $f(x)=0$. This is STILL a model!

But the point being, if we want to make a "best-guess" at the parameters within our model -- such as $m$ and $b$ above, or $\mu$ -- we are going to have to come up with some way of describing the error terms, $e_i$.

A reasonable assumption is that those error terms are drawn from some Probability density; let's say a Gaussian.

\begin{eqnarray}
e_i &\sim & \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{e_i^2}{2\sigma^2}}
\end{eqnarray}

We can isolate the error term in the following way: we subtract our "output" value $y_i$ from the model $f(x_i)$, this results in, for a random variable centered about a true value $\mu$, $f(x)=\mu$:

\begin{eqnarray}
e_i &=& y_i - f(x_i) \\
P(y_i \vert \mu, \sigma^2) &=& \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{\left(y_i - f(x_i)\right)^2}{2\sigma^2}}
\end{eqnarray}

Now, this is the probability of a single data point-pair, $x_i,y_i$, assuming our model $f(x)$, and an error term that is described by this Gaussian distribution. If we were to count up all $N$ data point pairs, and calculate the probability of observing the whole set $D:{x_i}$, we would take the product of the individual probabilities:

\begin{eqnarray}
P(D| \mu, \sigma^2) &=& \Pi_i P(y_i \vert \mu, \sigma^2)\\
&=& (2\pi \sigma^2)^{-\frac{N}{2}} e^{-\sum_i \frac{\left(y_i - f(x_i)\right)^2}{2\sigma^2}}
\end{eqnarray}

Notice that this conditional probability has the same structure as our $P(D\vert H_i) $ term from Bayes Theorem, before. Our "hypothesis" in this case, is the mean and variance; or, the parameters of our model. We can construct posterior probability densities on those parameters in the usual way:

\begin{eqnarray}
P(H_i \vert D) &=& \frac{P(D \vert H_i)P(H_i)}{P(D)}\\
&=& \frac{P(D \vert H_i)P(H_i)}{\sum_i P(D \vert H_i)P(H_i)}\\
P(\mu, \sigma^2 \vert D) &=& \frac{P(D \vert \mu, \sigma^2)P(\mu, \sigma^2)}{\int \int P(D \vert \mu, \sigma^2)P(\mu, \sigma^2)d\mu d\sigma^2}
\end{eqnarray}

The denominator in these terms is simply a normalizing factor. If one wants to construct the posterior on a single parameter, say $\mu$, we merely perform what's called marginalization:

\begin{eqnarray}
P(\mu \vert D) &=& \int P(\mu, \sigma^2 \vert D) d\sigma^2
\end{eqnarray}

This equation reads "the probability of some mean $\mu$, given the data $D$, an array of x and y values".

Ok, so why do we care? Well, in Bayes Theorem above, we implicitly used something called a Likelihood function, which is the Likelihood of seeing the data, given some model and its parameters:
\begin{eqnarray}
\mathcal{L}(D \vert \mu, \sigma^2)=P(D \vert \mu, \sigma^2) &=& (2\pi \sigma^2)^{-\frac{N}{2}} e^{-\sum_i \frac{\left(y_i - f(x_i)\right)^2}{2\sigma^2}}
\end{eqnarray}

Now the term in our exponential above can be simplified. Defining the $\chi^2$ statistic for our data:

\begin{eqnarray}
\chi^2 &=& \sum_i \frac{\left(y_i - f(x_i)\right)^2}{\sigma^2}\\
\mathcal{L}(D \vert \mu, \sigma^2)&=& (2\pi \sigma^2)^{-\frac{N}{2}} e^{-\frac{\chi^2}{2}}
\end{eqnarray}

Pretty cool right? So, maximizing Likelihood is minimizing $\chi^2$. We can take the log of this Likelihood function to write things in a more illuminating fashion:

\begin{eqnarray}
\log\left[\mathcal{L}(D \vert \mu, \sigma^2)\right]&=& -\frac{N}{2}\log (2\pi \sigma^2)-\frac{\chi^2}{2}
\end{eqnarray}

Taking the derivative with respect to $\mu$ and setting equal to zero yields are critical value of $\mu$ -- let's call it $\hat{\mu}$ -- that yields maximum Likelihood:

\begin{eqnarray}
\frac{\partial}{\partial \mu}\log\left[\mathcal{L}(D \vert \mu, \sigma^2)\right] &=&-\frac{\partial}{\partial \mu}\chi^2/2\\
0&=&\frac{\sum_i x_i - N\hat{\mu}}{\sigma^2}\\
\hat{\mu} &=& \sum_i \frac{x_i}{N}
\end{eqnarray}

our standard "average". This is our estimator of the mean.

Now for an estimator of the variance -- or, in our older terms, the value of $\sigma$ for which Likelihood is maximized:

\begin{eqnarray}
\frac{\partial}{\partial \sigma}\log\left[\mathcal{L}(D \vert \mu, \sigma^2)\right] &=& 0\\
\frac{N}{\hat{\sigma}}&=&\frac{\sum_i (x_i-\mu)^2}{\hat{\sigma}^3}\\
\hat{\sigma^2} &=&\frac{\sum_i (x_i-\mu)^2}{N}
\end{eqnarray}

Interesting, our standard sum of squares. Notice however, that we've used our "true" mean $\mu$ in the above equation, not our estimator $\hat{\mu}$. In order to account for this, one can write

\begin{eqnarray}
\hat{\sigma^2} &=&\frac{\sum_i (x_i-\mu)^2}{N}\\
N\hat{\sigma^2} &=&\sum_i (x_i-\mu)^2\\
&=&\sum_i (x_i-\hat{\mu})^2+N(\hat{\mu}-\mu)^2\\
&=& Ns^2+N(\sum_i \frac{x_i}{N}-\mu)^2\\
N\hat{\sigma^2} &=& Ns^2+N(\sum_i \frac{(x_i-\mu)^2}{N^2}\\
N\hat{\sigma^2} &=& Ns^2+\hat{\sigma^2}\\
\frac{(N-1)}{N}\hat{\sigma^2} &=& s^2
\end{eqnarray}

Where we now have a new estimator for the variance, which requires no "true information":
\begin{eqnarray}
s^2&=& \frac{\sum_i (x_i-\hat{\mu})^2}{N}
\end{eqnarray}

If we want to best emulate the "true information estimator", we had best divide by $N-1$ instead of $N$. Another way to show this is that $s^2$ is a biased estimator: it's expectation value, or average over an infinite number of data sets, is not $\sigma^2$ but in fact:

\begin{eqnarray}
\langle s^2 \rangle &=& \frac{N}{N-1}\sigma^2
\end{eqnarray}

To review, let us write our Posterior distribution one more time:

\begin{eqnarray}
P(H_i \vert D) &=& \frac{P(D \vert H_i)P(H_i)}{\sum_i P(D \vert H_i)P(H_i)}\\
P(\mu, \sigma^2 \vert D) &=& \frac{\mathcal{L}(D \vert \mu, \sigma^2)P(\mu, \sigma^2)}{\int \int P(D \vert \mu, \sigma^2)P(\mu, \sigma^2)d\mu d\sigma^2}\\
P(\mu, \sigma^2 \vert D) &=& \frac{(2\pi \sigma^2)^{-\frac{N}{2}} e^{-\frac{\chi^2}{2}}P(\mu, \sigma^2)}{\int \int P(D \vert \mu, \sigma^2)P(\mu, \sigma^2)d\mu d\sigma^2}
\end{eqnarray}

The factor $P(\mu,\sigma^2)$ on the right hand side is called a prior, and determines our a priori information about the parameters $\mu$ and $\sigma^2$. What's nice about Bayes' construction is that the posterior from one data set can be used as the prior for another -- $D$ just goes into more initial information. (Perhaps we should've written the prior

\begin{eqnarray}
P(\mu,\sigma^2 \vert \mathrm{old\ dataset})
\end{eqnarray}

Stirling's Approximation

To take a quick detour, let us examine the following definition of the factorial:

\begin{eqnarray}
N! &=& \int_0^\infty x^N e^{-x}dx
\end{eqnarray}

One way to prove this is to write

\begin{eqnarray}
I(a) \int_0^\infty e^{-ax}dx &=& \frac{1}{a}
\end{eqnarray}

and take the derivative underneath the integral sign, to write:

\begin{eqnarray}
I^\prime(a) &=& \frac{\partial}{\partial a} \int_0^\infty e^{-ax}dx \\
&=& \int_0^\infty -ax e^{-ax}dx \\
&=& \frac{-1}{a^2}
\end{eqnarray}

and more generally,

\begin{eqnarray}
\frac{\partial^n I(a)}{\partial a^n} &=& (-1)^n \int_0^\infty a^n x^n e^{-ax}dx = \frac{(-1)^n n!}{a^{n+1}}
\end{eqnarray}

Setting $a=1$ we find

\begin{eqnarray}
\Gamma[n+1] &=& \int_0^\infty x^n e^{-x}dx =  n!
\end{eqnarray}

Now let's examine this integral in the limit $n \to \infty$. We can take our $x$ argument up, into the exponential and write the corresponding function as $f(x)$:

\begin{eqnarray}
n! &=& \int_0^\infty e^{-x+n \log x}dx\\
&=& \int_0^\infty e^{f(x)}dx\\
f(x) &=& -x + n \log x
\end{eqnarray}

Now $f(x)$ is an absurdly large -- or high-valued -- function for large $n$, and so we can approximate this integral as only "counting" contributions around the maximum of $f(x)$. We find the position this maximum in the normal way:

\begin{eqnarray}
f^\prime &=& -1 + \frac{n}{x} = 0 \\
x_0 &=& n
\end{eqnarray}

Taking a look at our second derivative
\begin{eqnarray}
f^{\prime \prime}\vert_{x_0} &=& -\frac{n}{x^2} = -\frac{1}{n} < 0
\end{eqnarray}

we see that $x_0$ is the position of a true maximum. Expanding out our $f(x)$ with a Taylor approximation:

\begin{eqnarray}
n! &\approx& \int_0^\infty e^{f(x_0)+f^\prime(x) \vert_{x_0}(x-x_0)+f^{\prime \prime}(x) \vert_{x_0}\frac{(x-x_0)^2}{2}}dx\\
\end{eqnarray}

We see that the first derivative term is zero by construction, and we are left with a constant times a Gaussian,

\begin{eqnarray}
n! &\approx& e^{-n+n\log n}\int_0^\infty e^{\frac{(x-x_0)^2}{2n}}dx\\
&\approx& n^n e^{-n}\int_0^\infty e^{\frac{(x-n)^2}{2n}}dx\\
\end{eqnarray}

Now this integral is tricky, because we are taking the limit as $n \to \infty$, which means that, essentially, the middle of our Gaussian distribution is far afield from $x=0$. Since the integral of any Gaussian $e^{-x^2/(2\sigma^2)}$ is $\sqrt{2 \pi \sigma^2}$, we can approximate the integral above to be the "full" $-\infty < x < \infty$ integration, because our moment, or center of the distribution, $x_0$ is far to the positive side of zero. This yields, with $\sigma^2=\sqrt{n}$:

\begin{eqnarray}
n! &\approx& n^n e^{-n}\sqrt{2\pi n}
\end{eqnarray}

Which is stirling's approximation!

Now if we use this approximation to examine the binomial distribution in the same limit:

\begin{eqnarray}
P(k;N) &=& \frac{N!}{k! (N-k)!}p^k (1-p)^{N-k}
\end{eqnarray}

Wednesday, March 12, 2014

Propagator for the free particle

As an exercise, I'd like to write up something that I agonized over for 48 hours -- the propagator for a free particle in Quantum mechanics.

We begin with the following function, which I will call $U$.

\begin{eqnarray}
U &=& \langle x_f, t_f \vert x_i, t_i \rangle \\
&=& \int dx_1 dx_2 \dots dx_{n-1}dx_n  \langle x_1, t_1 \vert e^{\frac{-i\hat{H}\tau}{\hbar}}\vert x_2, t_2 \rangle  \langle x_2, t_2 \vert e^{\frac{-i\hat{H}\tau}{\hbar}}\vert x_3, t_3 \rangle \cdots  \langle x_{n-1}, t_{n-1} \vert e^{\frac{-i\hat{H}\tau}{\hbar}}\vert x_n, t_n \rangle
\end{eqnarray}

$U$ has a peculiar property, because if we take the inner product of our final state with any wavefunction $\vert \psi \rangle$, we find

\begin{eqnarray}
\langle x_f,t_f \vert \psi \rangle &=&  \int \langle x_f, t_f \vert x_i, t_i \rangle \langle x_i, t_i \vert \psi \rangle dx_i \\
&=&  \int U \langle x_i, t_i \vert \psi \rangle dx_i \\
\psi_f &=& \int U \psi_i dx_i
\end{eqnarray}

Our function $U$ acts like a Green's function on our initial state, $\psi_i$. In the integral above, we have partitioned the transition between states into $n$ steps, propagating forward in time with the operator $e^{\frac{-i \hat{H}\tau}{\hbar}}$

If we examine a single transition between the $x_{j+1}$ state and the $x_j$, we find that we can represent our wave functions in terms of their fourier transforms, or in terms of the momentum eigenbasis:

\begin{eqnarray}
\vert x \rangle &=& \int \vert p \rangle \langle p \vert x \rangle \\
&=& \int \frac{1}{\sqrt{2\pi \hbar}}e^{ipx/\hbar} \phi(p) dp \\
\Rightarrow && \frac{1}{\sqrt{2\pi \hbar}}e^{ipx/\hbar} =  \langle p \vert x \rangle \\
&& \frac{1}{\sqrt{-2\pi \hbar}}e^{ipx/\hbar} =  \langle x \vert p \rangle
\end{eqnarray}

So rewriting our integral in terms of these bases,

\begin{eqnarray}
\langle x_{j+1} \vert e^{\frac{-i}{\hbar}\hat{H}\tau} \vert x_j \rangle &=& \int  \langle x_{j+1} \vert q \rangle \langle q \vert e^{\frac{i}{\hbar}\hat{H}\tau}  \vert p \rangle \langle p \vert x_j \rangle dp dq \\
&=& \int  \frac{e^{\frac{-iqx_{j+1}}{\hbar}}}{\sqrt{2\pi \hbar}} \langle q \vert e^{\frac{-i}{\hbar}\hat{H}\tau}  \vert p \rangle \langle p \vert x_j \rangle dp dq \\
&=& \int  \frac{e^{\frac{-iqx_{j+1}}{\hbar}}}{\sqrt{2\pi \hbar}} \langle q \vert e^{\frac{-i}{\hbar}\hat{H}\tau} \vert p \rangle \frac{e^{\frac{ipx_{j}}{\hbar}}}{\sqrt{2\pi \hbar}} dp dq \\
&=& \int  \frac{e^{\frac{-iqx_{j+1}}{\hbar}}}{\sqrt{2\pi \hbar}} \langle q \vert \left( 1 - \frac{i}{\hbar}\frac{p^2}{2m}\tau-iV(x)/ \hbar +\dots \right) \vert p \rangle \frac{e^{\frac{ipx_{j}}{\hbar}}}{\sqrt{2\pi \hbar}} dp dq \\
&=& \int  \frac{e^{\frac{-iqx_{j+1}}{\hbar}}}{\sqrt{2\pi \hbar}} \left( 1 - \frac{i}{\hbar}\frac{p^2}{2m}\tau-iV(x)/ \hbar\tau + \dots \right)  \frac{e^{\frac{ipx_{j}}{\hbar}}}{\sqrt{2\pi \hbar}}\delta(p-q) dp dq \\
&=& \int  \frac{e^{\frac{-ip(x_{j+1}-x_j)}{\hbar}}}{2\pi \hbar} \left( e^{\frac{-i}{\hbar}\frac{p^2}{2m}\tau-iV(x)/ \hbar}\tau \right)  dp \\
&=& \int  \frac{e^{\frac{-ip(x_{j+1}-x_j)}{\hbar}}}{2\pi \hbar}  e^{\frac{-i}{\hbar}\frac{p^2}{2m}\tau}dp e^{-iV(x)\tau/ \hbar}
\end{eqnarray}
We recognize the integral over $p$ to be a Gaussian integral, or the fourier transform of a Gaussian.  The result is

\begin{eqnarray}
\langle x_{j+1} \vert e^{\frac{-i}{\hbar}\hat{H}\tau} \vert x_j \rangle &=& \sqrt{\frac{-im}{2 \pi \hbar \tau}} e^{\frac{i}{\hbar}\left(\frac{m}{2 \tau}(x_{j+1}-x_j)^2-V(x)\tau \right)}
\end{eqnarray}

Multiplying all of our transition amplitudes and integrating over the various possible $x_j$ variables, we find the Path integral formulation of Quantum Mechanics!

\begin{eqnarray}
U &=& \left(\frac{-im}{2 \pi \hbar \tau}\right)^{N/2} \int \cdots \int \left( \prod_{j=1}^N dx_j \right) e^{\frac{i}{\hbar}\sum_{j=1}^N \left(\frac{m}{2 \tau}(x_{j+1}-x_j)^2-V(x)\tau \right)}
\end{eqnarray}

If we take the limit as $N \to \infty$, then we get our action in the exponential argument, unlike the discrete quantity we currently have,
\begin{eqnarray}
\lim_{N\to \infty} U &=& \lim_{N\to \infty} \left(\frac{-im}{2 \pi \hbar \tau}\right)^{N/2} \int \cdots \int \left( \prod_{j=1}^N dx_j \right) e^{\frac{i}{\hbar} \int L(x,\dot{x},t) dt}
\end{eqnarray}

(I think what I have called $U$ is in fact the propagator. I also seem to be off by a negative sign in our normalization factor -- the factor $-im$ should be $im$.)

For a free particle, $V(x)=0$ and so we have for our inner product:

\begin{eqnarray}
U &=& \left(\frac{-im}{2 \pi \hbar \tau}\right)^{N/2} \int \cdots \int \left( \prod_{j=1}^N dx_j \right) e^{\frac{i}{\hbar}\sum_{j=1}^N \left(\frac{m}{2 \tau}(x_{j+1}-x_j)^2\right)}
\end{eqnarray}

Examining the first few terms of this nasty integral, we see that we have

\begin{eqnarray}
\int dx_1 e^{\frac{mi}{2\hbar \tau}\left(x_0^2 - 2x_0x_1+ x_1^2 -2x_1x_2 + 2x_2^2 -2x_2x_3+ \dots x_n^2\right)}
\end{eqnarray}

Integrating over each successive $x_{n-1}$ yields

\begin{eqnarray}
\sqrt{\frac{2\pi \hbar \tau}{-im(n-1)}}^{n-1}e^{\frac{-im}{2\hbar (n-1)\tau}\left(x_{n}-x_0\right)^2}
\end{eqnarray}

Notice that the factor next to our Gaussian like function is the perfect inverse of our normalization factor for the path integral earlier! (along with a factor of $n-1$, which will create our total time interval, $T=n\tau$ as you'll see in a minute.) So our final expression for $U$ becomes

\begin{eqnarray}
U =  \langle x_f, t_f \vert x_i, t_i \rangle &=& \sqrt{\frac{2\pi \hbar }{-imT}}\mathrm{exp}\left[\frac{-im}{2\hbar T}\left(x_f-x_0\right)^2 \right]
\end{eqnarray}

So this expression gives the probability of transitioning from one state $\vert x_0, t_0 \rangle$ to another $\langle x_f, t_f \vert$. This expression is Gaussian, although I don't know what the complex phase means in the exponential argument. It is certainly normalized, since we are multiplying by $\sqrt{2\pi/\sigma^2}$ ...

Monday, March 10, 2014

Cosmology: Adiabatic Expansion of a Gas

For a quick reminder, an adiabatic process is one where there is no heat exchange between a system and its surroundings. This happens either because the process happens so quickly that the system "has no chance" to exchange heat, or the system is thermally insulated. Taking a look at the first law of thermodynamics for a gas,

\begin{eqnarray}
dU &=& dQ + dW \\
dU &=& -pdV
\end{eqnarray}

For an expanding universe, we can write the energy of our gas as being proportional to volume, which is proportional to the scale factor, cubed:

\begin{eqnarray}
d(\rho a^3) &=& -p d(a^3)
\end{eqnarray}

Putting in our equation of state $p=w \rho$, we find

\begin{eqnarray}
d(\rho a^3) &=& -w \rho d(a^3)\\
d\rho a^3 + \rho d(a^3) &=& -w \rho d(a^3)\\
d\rho a^3 &=& -(w+1) \rho d(a^3)\\
\frac{d\rho}{\rho} &=& \frac{d(a^3)}{a^3}
\end{eqnarray}

Integrating both sides, we get

\begin{eqnarray}
\rho &=& \rho_0 a^{-3(w+1)}
\end{eqnarray}

Self-Fourier Transforms

Just spent a long, long time working with the quantum harmonic oscillator, with the equation of motion -- Schrodinger's time-independent equation:

\begin{eqnarray}
-\frac{\hbar^2}{2m}\nabla^2 \psi + \frac{m \omega^2 x^2}{2}\psi &=& E\psi
\end{eqnarray}

Typical ways to solve this involve power series or algebraic factoring of the equation of motion, but I was trying to formulate the problem as either a Sturm-Liouville differential equation or as an invertible equation of motion with well-defined Green's function.

From the Green's function perspective, we find that this equation under Fourier transform yields

\begin{eqnarray}
-\frac{\hbar^2}{2m}k^2 \overline{\psi} + \frac{m \omega^2}{2}\nabla_k^2 \overline{\psi} &=& E\overline{\psi}
\end{eqnarray}

Which leads to problems in defining $G(k)$, due to the derivative on the left hand side, but has the exact same structure -- aside from coefficients -- as the EOM in x-space. The equations of motion is "self-fourier" and so we might expect the solution $\psi$ to be self-fourier as well.

Well known self-fourier functions are a Gaussian and a dirac comb, but I came upon this interesting article that pointed out if one uses an even superposition of a function $g$ and it's transform $\overline{g}$, that superposition will be self-fourier. For example

\begin{eqnarray}
f &=& g(x) + g(-x) +  \overline{g(x)} + \overline{g(-x)} \\
\overline{f} &=&  \overline{g(k)} + \overline{g(-k)}+g(k) + g(-k)
\end{eqnarray}

Candidates would be

\begin{eqnarray}
f &=& 1 + \delta(x) \\
f &=& \Pi(x) + Sinc(x) \\
f &=& \Lambda (x) + Sinc(x)^2 \\
f &=& e^{|x|}+ \frac{1}{1+4\pi^2x^2}
\end{eqnarray}

Pretty cool, right? And the only sensible of these Self-fourier functions, in the context of our harmonic oscillator is the Gaussian.

There is a way to transform our equation of motion into a simple Sturm-Liouville, but it still escapes me...

Saturday, February 22, 2014

Bivariate Normal Distribution

Because I have had a great deal of trouble finding information on the Bivariate Normal distribution -- or at least, information that is easy for me to understand -- I'd like to take a moment to write down my understanding of the concept, which stems mainly from the Characteristic function.

Working with results from last time, on the central limit theorem, we see that we can describe a Probability density function by it's fourier transform, the characteristic function

\begin{eqnarray}
P(\mathbf{x}) &=& \frac{1}{(2\pi)^{d/2}}\int d^dk e^{i\mathbf{k}\cdot \mathbf{x}-i\mathbf{k}\cdot \mathbf{c_1}-\mathbf{k}\cdot \mathbf{c_2}\cdot\mathbf{k}+\dots }
\end{eqnarray}

Where, for now, I will ignore higher order terms corresponding to non-gaussianity. If we zero about the mean -- absorbing our first cumulant, $\mathbf{c_1}$ into $\mathbf{x}$ -- we find that we have a simple fourier transform of a Gaussian, which is itself a Gaussian.

\begin{eqnarray}
P(\mathbf{x}) &=& \frac{1}{(2\pi)^{d/2}}\int d^dk e^{i\mathbf{k}\cdot \mathbf{x}-\mathbf{k}\cdot \mathbf{c_2}\cdot\mathbf{k}}\\
P(\mathbf{x}) &=& \frac{1}{(2\pi)}\frac{1}{|\mathbf{c_2}|}e^{\mathbf{x}\cdot \mathbf{c_2^{-1}}\cdot \mathbf{x}}
\end{eqnarray}

Where $\mathbf{c_2}$ is the covariance matrix -- rank (0,2) tensor -- in $k$ space.

If we are working with two random variables, $x_1,x_2$, we see that our $k$ integration takes place over two dimensions, so we are left with
\begin{eqnarray}
P(x_1,x_2) &=& \frac{1}{2\pi}\int dk_1 dk_2 e^{i(k_1 x_1+k_2 x_2)}e^{-\frac{1}{2}\left( c_{11}k_1^2 +2c_{12}k_1k_2 +c_{22}k_2^2 \right)}
\end{eqnarray}

Notice that our characteristic function has a simple form, which we can now play with in order to construct estimators for the correlation coefficient $\rho$ commonly defined as

\begin{eqnarray}
\rho &=& \frac{c_{12}}{\sigma_1 \sigma_2}
\end{eqnarray}

Let's take a look. First note that $c_{11}=\sigma_1^2$ and $c_{22}=\sigma_2^2$. These are the second cumulants of $x_1$ and $x_2$ respectively, their variance with respect to self. Re-writing our bivariate normal with these definitions, and assuming zero mean, (or $c_1=c_2=0$), we find

\begin{eqnarray}
\mathbf{c_2^{-1}}&=& \left(\begin{array}{cc}
 \frac{1}{\sigma_1^2} & \frac{\rho}{\sigma_1 \sigma_2} \\
 \frac{\rho}{\sigma_1 \sigma_2} & \frac{1}{\sigma_2^2} \\
\end{array}\right)\\
\vert \mathbf{c_2} \vert &=& \sigma_1^2 \sigma_2^2 (1-\rho^2)
\end{eqnarray}

\begin{eqnarray}
P(x_1,x_2) &=& \frac{1}{2\pi}\frac{1}{\sigma_1 \sigma_2 \sqrt{1-\rho^2}}^{-\frac{1}{2}\left( \frac{x_1^2}{\sigma_1^2} +2\rho \frac{x_1x_2}{\sigma_1 \sigma_2} +\frac{x_2^2}{\sigma_2^2} \right)}
\end{eqnarray}

This last, nasty equation is the typical bivariate normal seen in the literahchurr. (literature in snoot-spell). But how the heck to get an estimator for the covariance?!?! We know that

\begin{eqnarray}
\sigma_1^2 &=& \langle x_1^2 \rangle-\langle x_1 \rangle^2 = c_{11} \\
&=& -\frac{\partial^2 P}{\partial k_1^2}\vert_{(k_1,k_2)=(0,0)}+\left(\frac{\partial P}{\partial k_1}\vert_{(k_1,k_2)=(0,0)}\right)^2
\end{eqnarray}

and

\begin{eqnarray}
\sigma_2^2 &=& \langle x_2^2 \rangle-\langle x_2 \rangle^2 = c_{22} \\
&=& -\frac{\partial^2 P}{\partial k_2^2}\vert_{(k_1,k_2)=(0,0)}+\left(\frac{\partial P}{\partial k_2}\vert_{(k_1,k_2)=(0,0)}\right)^2
\end{eqnarray}

The second term in both expansions -- the first derivative of P(k) squared -- is unnecessary, since we are assuming zero mean.

Let us show this trick of differentiating the characteristic function in order to get cumulant estimators explicitly, and then apply it to the covariance $\rho$

\begin{eqnarray}
P(k_1,k_2) &=& \frac{1}{2\pi} \int dx_1 dx_2 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}
\end{eqnarray}

Differentiating with respect to $k_1$ twice

\begin{eqnarray}
\frac{\partial P}{\partial k_1} &=& \frac{1}{2\pi} \int dx_1 dx_2 -ix_1 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{11}k_1+ c_{12}k_2 \right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}\\
\frac{\partial^2 P}{\partial k_1^2}&=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1^2 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{11}\right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}
\end{eqnarray}

and setting $k_1=k_2=0$, we find

\begin{eqnarray}
\frac{\partial^2 P}{\partial k_1^2}\vert_{(k_1,k_2)=(0,0)}&=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1^2 P(x_1,x_2) = \left( c_{11}\right) \\
c_{11} &=& \langle x_1^2 \rangle
\end{eqnarray}

Now for the covariance, we can take the derivative with respect to $k_1$, then $k_2$, to get,

\begin{eqnarray}
\frac{\partial P}{\partial k_1} &=& \frac{1}{2\pi} \int dx_1 dx_2 -ix_1 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{11}k_1+ c_{12}k_2 \right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}\\
\frac{\partial^2 P}{\partial k_1 \partial k_2} &=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1x_2 e^{-i(k_1 x_1 + k_2 x_2}P(x_1,x_2) = \left( c_{12} \right) e^{-\frac{1}{2}\left( c_{11}k_1^2 + 2 c_{12}k_1 k_2 + c_{22} k_2^2 \right)}\\
\frac{\partial^2 P}{\partial k_1 \partial k_2}\vert_{(k_1,k_2)=(0,0)} &=& \frac{1}{2\pi} \int dx_1 dx_2 -x_1x_2 P(x_1,x_2) = c_{12} \\
c_{12} &=& \langle x_1 x_2 \rangle \\
\rho &=& \frac{c_{12}}{\sigma_1 \sigma_2}\\
\rho &=& \frac{\langle x_1 x_2 \rangle}{\left(\langle x_1^2 \rangle \langle x_2^2 \rangle\right)^{1/2}}\\
\end{eqnarray}

And voila! We have found the estimator -- without taking into account bias -- of the correlation coefficient!