Thursday, May 22, 2014

The Student t- distribution

Let's say we are given some data, a series of scalar values x_i with subscript 1 to n. The student t-statistic is defined as

\begin{eqnarray}
t=\frac{\bar{x}-\mu}{\sqrt{s^2/n}}
\end{eqnarray}

Now we can write $s^2$ as our unbiased estimator for the variance,

\begin{eqnarray}
s^2 &=& \sum_i^N \frac{(x_i-\bar{x})^2}{N-1}
\end{eqnarray}

Let's play with the t statistic representation a little more,

\begin{eqnarray}
t=\frac{(\frac{\bar{x}-\mu}{\sigma})\sqrt{n}}{\sqrt{\frac{s^2}{\sigma^2}}}
\end{eqnarray}

We can now see that the numerator is a random variable with Expectation value 0, and if we think about things a little bit, assuming that each and every data point is drawn from a Gaussian distribution:

\begin{eqnarray}
x_i \sim N(\mu,\sigma)
\end{eqnarray}
Then the addition of all of our x_i's -- in order to create the mean -- will be a convolution of all of our Gaussians, and thus an addition of our second cumulants, the variance:

\begin{eqnarray}
n\bar(x)=\sum_i^n x_i \sim N(\mu,\sigma) \star N(\mu,\sigma) \star N(\mu,\sigma) \cdots &=& N(n\mu,\sqrt{n}\sigma)
\end{eqnarray}

This is of course the same idea as the width of a random walker, we see that with n steps, the variance of our end Probability density goes like the square of n. Now subtracting the mean is just centering our PDF:

\begin{eqnarray}
\bar{x}-\mu=\sum_i^n \frac{x_i-\mu}{n} &\sim& N(0,\sqrt{n}\sigma)
\end{eqnarray}
And so we find, if we scale by our theoretical variance, the random variable in the numerator is determined by a particularly tractable normal distribution, with variance n

\begin{eqnarray}
X=\frac{(\bar{x}-\mu)\sqrt{n}}{\sigma}&\sim& N(0,n)
\end{eqnarray}

Now for the denominator. Without the square root sign we have:

\begin{eqnarray}
\frac{s^2}{\sigma^2}&=& \sum_i^n \frac{(x_i-\mu)^2}{(n-1)\sigma^2}
\end{eqnarray}

And we can see immediately this will be the sum of the square of our former variables,

\begin{eqnarray}
x_i-\mu &\sim& N(0,\sigma) \\
\frac{x_i-\mu}{\sigma} &\sim& N(0,1)\\
\frac{(x_i-\mu)^2}{\sigma^2} &\sim& \frac{N(0,1)(x^2)}{x}=g_{1/2,1/2}(x^2)\\
\frac{(x_i-\bar{x})^2}{(n-1)\sigma^2} &\sim& \frac{N(0,1)(x^2)}{x}=g_{1/2,1/2}(x^2)
\end{eqnarray}

Where the final PDF written is the standard Gamma density
\begin{eqnarray}
N(0,1) &=& \frac{1}{\sqrt{2\pi}}e^{-x^2/2}\\
g_{\alpha,\nu}(x)&=& \frac{1}{\Gamma(\nu)}\alpha^\nu x^{\nu-1}e^{-\alpha x}\\
g_{1/2,1/2}(x^2) &=& \frac{1}{\sqrt{2\pi}}(x^2)^{-1/2}e^{-\frac{x^2}{2}}
\end{eqnarray}

Which is the required PDF for the square of our Gaussian random variable, with unit variance. Now in order to convert to the square root of denominator variable, let's call it y, we need to multiply by a certain factor:

\begin{eqnarray}
\frac{(x_i-\bar{x})^2}{(n-1)\sigma^2}=\frac{s^2}{\sigma^2} &\sim & g_{1/2,n/2}(x^2) \\
Y=\sqrt{\frac{s^2}{\sigma^2}} &\sim & 2Y g_{1/2,n/2}(Y^2)
\end{eqnarray}

And so we see that

\begin{eqnarray}
T=\frac{X}{Y}
\end{eqnarray}
where x,y each have their own probability density functions. The best way to combine these pdf's is to to write constrain X and then integrate over all possible values of Y. For example the cumulative distribution function for t should be:

\begin{eqnarray}
X &\sim & f\\
Y &\sim & g\\
T &\sim & P(t)\\
\Phi(t)=P(T \leq t)&=& \int_0^t \int_0^\infty g(y)F(t^\prime Y) dt^\prime dy
P(t) &=& \int_0^\infty y f(ty)g(y) dy
\end{eqnarray}

Plugging in our probability densities from before, we have:

\begin{eqnarray}
f(ty) &=& \frac{1}{\sqrt{2\pi} n}\mathrm{Exp}\left[-\frac{(t^2y^2)}{2n^2}\right]\\
g(y) &=& \frac{y^{n-1}}{\Gamma(\frac{n}{2})}\frac{1}{\sqrt{2^{n-2}}}\mathrm{Exp}\left(-\frac{y^2}{2}\right)\\
P(t) &=& \int_0^\infty t \frac{1}{\sqrt{2\pi} n}\mathrm{Exp}\left[-\frac{(t^2y^2)}{2n^2}\right]\frac{y^{n}}{\Gamma(\frac{n}{2})}\frac{1}{\sqrt{2^{n-2}}}\mathrm{Exp}\left(-\frac{y^2}{2}\right) dy\\
&=&\frac{1}{\sqrt{2^{n}}}\frac{1}{\Gamma(\frac{1}{2}n)\Gamma(\frac{n}{2})} \int_0^\infty y^{n} \mathrm{Exp}\left[-\frac{(t^2y^2)}{2n^2}-\frac{y^2}{2}\right]dy\\
&=&\frac{1}{\sqrt{2^{n}}}\frac{1}{\Gamma(\frac{1}{2}n)\Gamma(\frac{n}{2})} \int_0^\infty y^{n} \mathrm{Exp}\left[-\frac{y^2}{2}(1+\frac{t^2}{n^2})\right]dy\\
\end{eqnarray}

Now making the tedious variable change

\begin{eqnarray}
s= \frac{y^2}{2}(1+\frac{t^2}{n^2})
\end{eqnarray}

we find
\begin{eqnarray}
y^{n}dy=\frac{2^{\frac{n}{2}}s^{\frac{n+1}{2}-1}}{\left(1+\frac{t^2}{n^2}\right)^{\frac{n+1}{2}}}ds
\end{eqnarray}

and so P(t) reduces to a Gamma integral:
\begin{eqnarray}
P(t)&=&\frac{1}{\sqrt{2^{n}}n}\frac{1}{\Gamma(\frac{1}{2})\Gamma(\frac{n}{2})} \int_0^\infty y^{n} \mathrm{Exp}\left[-\frac{y^2}{2}(1+\frac{t^2}{n^2})\right]dy\\
&=&\frac{1}{n}\frac{\Gamma(\frac{n+1}{2})}{\Gamma(\frac{1}{2})\Gamma(\frac{n}{2})}\frac{1}{\left(1+\frac{t^2}{n^2}\right)^{\frac{n+1}{2}}}\\
&=&\frac{1}{n}\frac{1}{B(\frac{1}{2},\frac{n}{2})}\frac{1}{\left(1+\frac{t^2}{n^2}\right)^{\frac{n+1}{2}}}\\
\end{eqnarray}

I'm a bit off from the wikipedia article on the student-t here, but a good exercise in combining PDFs. 

Friday, May 16, 2014

A first scrawl at connected and disconnected Moments

Building upon the last post, I finally realize the difference between connected moments and disconnected moments: It has to do with conservation of momentum in the Feynman diagrams we have been using to represent the Gram-Charlier expansion. (I've recently realized the story becomes more complicated in Statmech, and connected moments are quickly defined to be the cumulants, but more on that next time.)

The correlation between two localized excitations in the field are given by derivatives the generating function:

\begin{eqnarray}
\langle J_{i_1}\dots J_{i_l} \rangle &=& \frac{\partial}{\partial J_{i_1}}\cdots \frac{\partial}{\partial J_{i_l}} \log\left(Z(\mathbf{J})\right)
\end{eqnarray}

But we found, from before, that the generating function was actually built out of a sum of Green's functions, which are basically correlators in momentum space:

\begin{eqnarray}
Z(\mathbf{J}) &=& \sum_s=0^l J_{i_1}\cdots J_{i_l} G_{i_1 \cdots i_l} \\
G_{ij} &=& \langle q_i q_j \rangle = \int d^Nq \left(q_i q_j\right)e^{\mathbf{J}\cdot \mathbf{q} - \mathbf{q}\cdot \mathbf{A} \cdot \mathbf{q}-\frac{\lambda}{4!} \mathbf{q}^4}
\end{eqnarray}

The above is written for a random anharmonic field -- thus the lambda J $q^4$ term -- and we see that this is just our two-point green's function from before. If we expand this integral out in powers of lambda, we will get our one loop and two loop terms:

\begin{eqnarray}
G_{ij} &=& \langle q_i q_j \rangle = \int d^Nq \left(q_i q_j\right) \left(1-\frac{\lambda}{4!}\sum_n (q_n)^4+\frac{\lambda^2}{4!4!}\sum_m \sum_n (q_n)^4(q_m)^4 + \dots \right)e^{\mathbf{J}\cdot \mathbf{q} - \mathbf{q}\cdot \mathbf{A} \cdot \mathbf{q}}
\end{eqnarray}

We see by Wick contraction this leads to terms like:

\begin{eqnarray}
G_{ij} &=& \langle q_i q_j \rangle + \sum_n \frac{\lambda}{4!} \langle q_i q_j q_n q_n q_n q_n\rangle +  \sum_m \sum_n \frac{\lambda^2}{4!4!} \langle q_i q_j q_n q_n q_n q_n q_m q_m q_m q_m\rangle + \dots \\
&=&  \mathbf{A}^{-1}_{ij} + \frac{\lambda (4\cdot 3)}{4!}\sum_n(\mathbf{A}^{-1}_{in}\mathbf{A}^{-1}_{nj}\mathbf{A}^{-1}_{nn}+\mathbf{A}^{-1}_{ij}\mathbf{A}^{-1}_{nn}\mathbf{A}^{-1}_{nn} ) + \frac{\lambda (8 \cdot 7)}{4! 4!}\sum_m \sum_n \left( \dots \right)+\dots
\end{eqnarray}

The disconnected terms are any that contain
\begin{eqnarray}
\mathbf{A}^{-1}_{ij}
\end{eqnarray}

in the n and m summations.  The word "connected" in momentum space means that the sum of our q's must be equal to zero. Or, that the ingoing and outgoing momenta must sum to zero. (Disconnected diagrams correspond to "vacuum fluctuations" where a random variable or particle appears out of nowhere, and then disappears at some later point in time.) This restriction on our available terms in the two-point Green's function can be written as:

\begin{eqnarray}
G_{ij\ \ \rm{connected}} &=& \langle q_i q_j \rangle \delta(q_i+q_j) + \sum_n\frac{\lambda}{4!} \langle q_i q_j q_n q_n q_n q_n\rangle \delta(q_i+q_j+q_n) \\ &&+ \sum_m \sum_n \frac{\lambda^2}{4!4!} \langle q_i q_j q_n q_n q_n q_n q_m q_m q_m q_m\rangle \delta(q_i+q_j+q_n+q_m) + \dots
\end{eqnarray}

And this discrete sum looks a lot like the one loop "integrations" I have been working with for the power spectrum and bispectrum! Trouble is, in Standard perturbation theory, we are using an explicit recursion relation to expand the random variable -- over density -- in powers of the scale factor, not through some coupling constant lambda.

 The anharmonic term may be written slightly incorrectly above, but it is difficult to represent the potential of a self-interacting field without a textbook. I'll have to look up the approximation.

-------------------------------------------------------------------------------------------------------------------------
                    Translational invariance of correlation functions and "connectedness"
-------------------------------------------------------------------------------------------------------------------------

It is also interesting to note that, translationally invariant correlation functions, such as the Power Spectrum and Bispectrum, carry a natural dirac delta function of the form

\begin{eqnarray}
G_{ij} &=& \langle q_i q_j \rangle \delta(q_i +q_j) \\
G_{ijk} &=& \langle q_i q_j q_k \rangle \delta(q_i +q_j+q_k) ,
\end{eqnarray}

written above. This means that translationally invariant correlations automatically require connected diagrams? After reading about this in some Statmech textbooks, it seems a translationally invariant system already defines connected moments as the cumulants due to a graphical expansion of the partition function. Will write more about this later....