Thursday, July 24, 2014

1-D Wave Equation Green's Function

As I am trying to gain some fluency with Green's functions, and there use in non-relativistic PDE's, I thought it'd be a good exercise to work out some expressions -- or "Magic Rules" -- for the Green's function of the wave equation:

\begin{eqnarray}
\left(\frac{\partial^2}{\partial t^2}-c_s^2 \frac{\partial^2}{\partial x_i \partial x_i}\right)\mathbf{\psi}(\vec{x},t)&=& 0
\end{eqnarray}

Let's assume that $c_s$ is our "speed of sound" in the medium, and that $\psi$ some scalar function of space and time. If we assume that $\psi \to 0$ as we go to $\vert x \vert \to \infty$, or, the boundaries of our space of interest, we are well equipped to perform a fourier transform on this PDE:

\begin{eqnarray}
\left(\frac{\partial^2}{\partial t^2}+c_s^2 k^2\right)\tilde{\psi}(\vec{k},t)&=& 0
\end{eqnarray}

We can now laplace transform in the time variable, swapping $t$ for $\omega$, and write

\begin{eqnarray}
\left(\omega^2+c_s^2 k^2\right)\tilde{\psi}(\vec{k},\omega)&=& \omega \tilde{\psi}(\vec{k},0)+ \tilde{\psi}(\vec{k},0)^\prime
\end{eqnarray}

We can see now that our Differential equation has been rendered algebraic in this "doubly" transformed space. We invert the PDE and write

\begin{eqnarray}
\tilde{\psi}(\vec{k},\omega)&=& \frac{ \omega \tilde{\psi_0}(k) + \tilde{\psi_0^{\prime}}(k)}{\left(\omega^2+c_s^2 k^2\right)}
\end{eqnarray}

We can immediately see, once we invert the laplace transform, that our initial conditions are built into the homogeneous solution:

\begin{eqnarray}
\tilde{\psi}(\vec{k},t)&=& \oint_{\gamma-i \infty}^{\gamma+i \infty}\frac{ \omega \tilde{\psi_0}(k)e^{\omega t}}{\left(\omega^2+c_s^2 k^2\right)}\frac{d\omega}{2\pi i}+\oint_{\gamma-i \infty}^{\gamma+i \infty}\frac{ \tilde{\psi_0^{\prime}}(k)e^{\omega t}}{\left(\omega^2+c_s^2 k^2\right)}\frac{d\omega}{2\pi i}
\end{eqnarray}

Both contour integrals have identical poles: $\omega = \pm i c_s \vert \vec{k} \vert $, and so we use the residue theorem to write down the inverted PDE in k-space:

\begin{eqnarray}
\tilde{\psi}(\vec{k},t) &=& \tilde{\psi_0}(k) \cos \left(c_s k t \right)+ \frac{\tilde{\psi_0^{\prime}}(k)\sin \left(c_s k t \right)}{c_s k}
\end{eqnarray}

Which is incredibly simple but difficult to transform back into x-space. First of all, notice that we are multiplying our initial conditions -- or spectra -- with two separate kernels, sine-over-k and cosine. In the x-space, this means we will be "convolving" our initial conditions with these two separate kernels -- or the x-space inverse fourier transforms -- in order to create our homogeneous solution.

Furthermore, remember that our time-transform variable $\omega$ had poles that were only dependent upon the modulus of $\vec{k}$, meaning that all of the k's you see above should be replaced by $abs(\vec{k})$. These kernels exhibit isotropy, which makes sense since the wave equation is a spatially isotropic and homogenous differential equation to begin with -- the laplacian operator has no preference in direction or orientation.

But, such radial functions of $k$ will prove difficult to inverse fourier transform in higher and higher dimensions of $x$.

If we start out with 1-D, we know that the cosine will turn into an evenly spaced pair of dirac delta distributions, and the sin over k term will turn into a top hat function. And so we are left with

\begin{eqnarray}
\psi(x,t)^{1D} &=& \frac{\psi_0(x-c_s t)+\psi_0(x+c_s t)}{2}+\frac{\Pi(c_s t)}{c_s}\ast \psi^{\prime}_0(x) \\
\psi(x,t)^{1D} &=& \frac{\psi_0(x-c_s t)+\psi_0(x+c_s t)}{2}+\int_{x-c_s t}^{x+c_s t} \frac{\psi^{\prime}_0(x^\prime)}{2 c_s} dx^\prime
\end{eqnarray}

This is a wave front uniformly expand with speed $c_s$, independent of whether the problem is sit up with Neumman or Dirichlet boundary conditions. And, it seems I have stumbled upon -- by the laplace transform -- d'Alembert's formula for the Cauchy problem.

----------------------------------------------------------------------
Now if we move into 2-D, we find that the we need to regularize the inverse fourier transforms, which I will leave for a later post...

Wednesday, July 9, 2014

Importance Sampling

So, last week I was learning a great deal about something called the VEGAS algorithm, which was first presented by Peter LePage in the 60's or 70's. It's a very clever way of going about importance sampling, which is a Monte Carlo method for integration.

To start, let's say we would like to evaluate the integral:

\begin{eqnarray}
I &=& \int f(\mathbf{x})d^n x
\end{eqnarray}

Where $\mathbf{x}$ is an n-dimensional vector and we are integrating over n-dimensional space, with jacobian $d^n x$. Let's say that this function, our integrand, $f(\mathbf{x})$, is purely intractable by analytic means -- it is impossible to integrate with pen and paper. Luckily, we can estimate this integral by other (Monte Carlo) means.

A naive strategy would be to construct a uniform Probability density function (PDF) on the integration region, and sample randomly from this PDF to get the independent variable $\mathbf{x}$ and the corresponding integrand value $f(\mathbf{x}$. This would be essentially obtaining an expectation value. Let's take a look in 1-D:

\begin{eqnarray}
I &=& \int_a^b f(x) dx \\
g(x) &=& \frac{1}{(b-a)}
\eta \sim P(x) \\
\langle f(x) \rangle_{g(x)} &=& \int_a^b f(x) g(x) dx = \frac{1}{b-a}I
\end{eqnarray}

As the number of our random samples, $\eta$ gets large, we can use the law of large numbers to assume that our discrete mean closes in on the "true" mean, which, in this case, is the value of the integral $I$:

\begin{eqnarray}
\langle f(x) \rangle_g &=& \lim_{N \to \infty} \sum_i^N \frac{f(\eta_i)}{N} \ \ \ \ \ \eta \sim g(x) \\
&=& (b-a) \int f(x) dx
\end{eqnarray}

So that's pretty cool! We have estimated an integral by taking a "strange type of mean", drawing random variables from some arbitrary -- well, in this case uniform -- PDF $g(x)$. One way to take care of our strange factor $b-a$ above is to "normalize"

\begin{eqnarray}
I &=& \int f(x) dx = \int \frac{f(x)}{g(x)}g(x)dx
\end{eqnarray}

We see that the above is just our old integral $I$, but we find that, in numerical means, the last term on the right is equivalent to finding the expectation value of $f/g$ "under" $g(x)$, which means:

\begin{eqnarray}
I &=& \int \frac{f(x)}{g(x)}g(x)dx = \langle \frac{f(x)}{g(x)} \rangle_g \\
&=& \lim_{N\to \infty} \sum_i^N \frac{1}{N}\frac{f(\eta_i)}{g(\eta_i)} \ \ \ \ \ \eta \sim g(x)
\end{eqnarray}

So our random sample $\eta$ are drawn from the PDF $g(x)$, and we are searching for the mean value of this \textbf{ratio}, $\frac{f}{g}$. Interesting trick right?

Now our next step in this numerical procedure is to estimate the variance in our "integral estimator", which is what we will call our sum of the ratios, above:

\begin{eqnarray}
\hat{I} &=& \sum_i^N \frac{1}{N}\frac{f(\eta_i)}{g(\eta_i)} \ \ \ \ \ \eta \sim g(x) \\
\lim_{N\to \infty} \hat{I} &=& I
\end{eqnarray}

What is $\sigma^2_I$? We construct our estimator of the variance in the usual way: the mean of the square minus the square of the mean:

\begin{eqnarray}
\sigma^2_I &=& \langle (\hat{I}-I)^2 \rangle = \langle \hat{I}^2 \rangle - \langle \hat{I} \rangle^2 \\
&=& \langle \left(\frac{f(x)}{g(x)}\right)^2 \rangle_g - \langle \frac{f(x)}{g(x)}\rangle_g^2 \\
&=& \int \left(\frac{f(x)}{g(x)}\right)^2  g(x) dx - \left( \int f(x) dx\right)^2\\
&=& \int \frac{\vert f(x) \vert^2}{g(x)^2} g(x) dx - \left( \int f(x) dx\right)^2\\
\end{eqnarray}

Numerically, this final equation can be represented by another discrete sum, which we will call our estimate of the variance:

\begin{eqnarray}
\hat{\sigma}^2_I &=& \frac{ \sum_i^N \frac{\vert f(\eta_i) \vert^2}{g(\eta_i)^2}- \left( \sum_i^N \frac{f(\eta_i)}{g(\eta_i)}\right)^2}{N^2}  \ \ \ \ \ \ \ \eta \sim g(x)
\end{eqnarray}

Now since we are subtracting by an estimator of the mean and not the true mean, we find that the above estimator of the variance is biased, by a factor of $\frac{N-1}{N}$, so we should rewrite our denominator above with an $N(N-1)$, as we are used to.

-----------------------------------------------------------------------------------------------------------

So what choice of the probability density $g(x)$ minimizes the variance in our integral estimate? Intuitively, we can expect that if $g(x)$ closely matches the behaviour of $f(x)$ over the whole domain of integration, the ratio $f/g$ would remain constant, and we expect very small variance. Let's take a function derivative of our expression for variance $\sigma^2_I$ with a lagrange multiplier $\lambda$ in place to maintain the constraint of $g(x)$'s normalization as a probability density function:

\begin{eqnarray}
\sigma^2_I &=& \int \frac{\vert f(x)\vert^2}{g(x)}dx - (\int f(x) dx)^2 + \lambda \left(\int g(x) dx -1 \right) \\
\frac{\delta \sigma^2_I}{\delta g}&=&  \int \left[ \frac{-\vert f(x)\vert^2}{g(x)^2}+\lambda \right] dx
\end{eqnarray}

We see that this derivative above will be zero only when the integrand is zero for all x, and so we have

\begin{eqnarray}
g(x) &=& \lambda \vert f(x) \vert
\end{eqnarray}

Since we have already assumed $g(x)$ to be a PDF, we need not place the absolute value sign on the LHS, above, because PDF's are positive definite (in order to make their cumulative PDF's  $G(x)=\int_{-\infty}^x g(x^\prime)dx^\prime $ monotonically increasing). Plugging in our normalization constraint, once again, we find:

\begin{eqnarray}
\int g(x) dx &=& \lambda \int \vert f(x) \vert dx \\
1 &=& \lambda \int \vert f(x) \vert dx \\
\frac{1}{\int \vert f(x) \vert dx} &=& \lambda \\
\end{eqnarray}

and so the optimum PDF is

\begin{eqnarray}
g(x) &=& \frac{\vert f(x) \vert}{\int \vert f(x) \vert dx}
\end{eqnarray}

or, is proportional to absolute value of our integrand -- with some normalization constant -- at every $x$. This is pretty cool! One can find, by taking a similar functional derivative in N-dimensions, that if $g(x)$, the multidimensional PDF is separable across the components of our vector $\mathbf{x}$

\begin{eqnarray}
g(\mathbf{x}) &=& \Pi_i^N g_i (x_i)
\end{eqnarray}

then the optimum $g(\mathbf{x})$ takes the following form:

\begin{eqnarray}
g_j (x_j) &=& \frac{\left[ \int \frac{\vert f(x) \vert^2}{\Pi_{i \ne j} g_i (x_i) } d^{n-1}x\right]^{1/2}}{\int dx_j \left[ \int \frac{\vert f(x) \vert^2}{\Pi_{i \ne j} g_i (x_i) } d^{n-1}x\right]^{1/2}}
\end{eqnarray}

-----------------------------------------------------------------------------------------------------------

So how does VEGAS go about approximating these "optimum" multidimensional distributions? By changing the sampling grid. LePage did a very clever thing when he constructed an algorithm that initially samples some integrand uniformly, essentially adopting the following PDF:

\begin{eqnarray}
g_0(\mathbf{x}) &=& \frac{1}{V_n}
\end{eqnarray}

Where $V_n$, is the n-dimensional volume of the integration region. But then, once these samples are taken, he perturbs the grid spacings -- keeping the number of samples constant -- in proportion with the absolute value of the integrand, $f(\mathbf{x})$. What this essentially does is match $g(\mathbf{x})$'s peaks to $f(\mathbf{x})$'s peaks, be they high or low, and over successive sampling rounds -- let's say M of them -- the "grid", or $g(\mathbf{x})$, gets perfectly attuned to the characteristics of $f(\mathbf{x})$.