The post What is the Central Limit Theorem? appeared first on Ben's Planet.

]]>*Consider a sequence $x_1, \dots, x_n$ of independent, identically distributed random variables (Bernoulli trials) with mean $\mu$ and variance $\sigma^2$. Their empirical mean is defined by*

$$

\bar{x} = \frac{1}{n} (x_1 + \dots + x_n) = \mathrm{empirical \, mean}

$$

*Normalizing it to a random variable with an expectation value of zero in the way*

$$

z = \frac{\bar{x} – \mu}{\sigma/\sqrt{n}}

$$

*the probability distribution $p(z) \to N(0,1)$ for large $n$ (“convergence in distribution”). Equivalently, we may write $\mathrm{CDR}(z) \to \mathrm{Erf}(z)$ for allmost all $z \in \mathbb{R}$.*

The marvel of the central limit theorem is the normal distribution emerging from adding up many random variables that themselves don’t have to be normal distributions.

However, they need to be independent (see the previous article on random variables) and ‘identically distributed,’ i.e., drawn from the same probability distribution, with the same, well-defined mean $\mu$ and variance $\sigma^2$.

Let’s first try to understand why the limiting distribution of $z$ has to have mean zero and variance one:

First, since the expectation value of a random variable is linear,

$$

\left\langle \frac{x_1 + \dots + x_n}{n} \right\rangle

= \frac{\left\langle(x_1 + \dots + x_n) \right\rangle}{n}

= n \frac{\left\langle x_1 \right\rangle}{n} = \left\langle x_1 \right\rangle = \mu

$$

the expectation value of the empirical mean equals the mean of the individual distributions, and thus the expectation value of $z$ equals zero. For statistical samples, one says that the empirical mean tends toward the population mean.

Second, for independent random variables, the variance adds up according to the Pythagorean theorem

$$

\text{Var}(aX + bY) = a^2 \text{Var}(X) + b^2 \text{Var}(Y)

$$

and thus

$$

\text{Var}\left(\frac{x_1 + \dots + x_n}{n}\right) = \frac{\text{Var}((x_1 + \dots + x_n))}{n^2} = \frac{n \text{Var}(x)}{n^2} = \frac{\text{Var}(x)}{n}

$$

Therefore, dividing by $\frac{\sigma}{\sqrt{n}}$ ensures that $z$ has variance one.

The real mystery that needs explanation then is: Why is it a normal distribution at all?

Whereas formal proofs are rather technical, we shortly sketch two ideas:

**Elementary Explanation**

Intuitively, the central limit theorem states that the sum $\frac{x_1 + \dots + x_n}{\sqrt{n}}$ of random variables converges to a normal distribution for $n \to \infty$, independent of the individual $x_i$.

It does not matter if we mix or replace some of the random variables. As long as they have the same first and second moments, especially if we replace the $x_i$ by a corresponding normal distribution, the stability of the Gaussian guarantees that adding two Gaussians will again be a Gaussian.

Thus, one can show that for the central limit theorem to be true, it requires the individual $x_i$ to be dependent only on the first and second moment (see here).

**The Maximum Entropy Principle**

Following the principle of maximum entropy, the distribution governing $z$ should be the one with maximum entropy consistent with the constraints that the mean being $\mu$ and the variance being $\sigma^2$.

This can be shown by employing the calculus of variations.

The central limit theorem is closely related to the law of large numbers, stating that the empirical mean $\frac{x_1 + \dots + x_n}{n}$ of identically distributed, independent random variables depends only on $\mu$ for $n \to \infty$.

Similarly, the central limit theorem states that the expression $\frac{x_1 + \dots + x_n}{\sqrt{n}}$ depends only on $\mu$ and $\sigma^2$ for $n \to \infty$.

The normal distribution is just one example of a stable distribution: adding two Gaussians yields again a Gaussian.

If either $\mu$ or $\sigma^2$ (or both) are infinite, the central limit theorem breaks down. However, there may still be a limiting distribution that is not normal: these are generally called stable distributions.

P. Lévy, A. Khintchine, and others determined in the 1930s the specifics of when a stable distribution exists and what form it takes (see, for example, the book by Paul and Baschnagel).

Wolfgang Paul and Jörg Baschnagel. Stochastic Processes. 2nd Edition 2013 Springer

math.stackexchange.com/questions/12983/intuition-about-the-central-limit-theorem

intuitive visualization: www.cantorsparadise.com/the-central-limit-theorem-why-is-it-so-2ae93edf6e8

The post What is the Central Limit Theorem? appeared first on Ben's Planet.

]]>The post What are Finite Difference Schemes? appeared first on Ben's Planet.

]]>The method derives directly from the mathematical definition of a derivative. For example, if you start with the first-order derivative of some function $f(x)$ as the following limit:

$$

\frac{df}{dx}(x) = f^\prime(x) = \lim_{h \to 0} \frac{f(x+h) – f(x)}{h}

$$

Then $f^\prime(x)$ can be approximated by the fraction on the right-hand side, choosing an appropriately small $h$.

$$

f^\prime(x) \approx \frac{1}{h} (f(x+h) – f(x)) + \mathcal{O}(h)

$$

Since we don’t take the limit, the difference between those function values remains finite $f(x+h) – f(x)$, and therefore, it is called the “Finite Difference Method.”

The example above is a so-called “forward finite difference” since one of the function values is advanced by $h$. The corresponding numerical finite difference scheme is called the “Euler method.” A variation of this is the “backward finite difference.”

$$

f^\prime(x) \approx \frac{1}{h} (f(x) – f(x-h)) + \mathcal{O}(h)

$$

Both schemes allow for solving differential equations rather quickly but might suffer from numerical instabilities (such as violation of energy conservation) and more significant numerical errors on the order of $\mathcal{O}(h)$

Therefore, in practice, slightly more advanced finite difference schemes are employed, such as the central difference scheme

$$

f^\prime(x) \approx \frac{1}{\tilde{h}} (f(x+\tilde{h}/2) – f(x-\tilde{h}/2)) + \mathcal{O}(h^2)

$$

or, by choosing $h = \frac{\tilde{h}}{2}$, we obtain

$$

f^\prime(x) \approx \frac{1}{2h} (f(x+h) – f(x-h)) + \mathcal{O}(h^2)

$$

Central differences, despite being more elaborate, have the great advantage of being symmetric in $\pm h$ and giving a more accurate approximation, with the numerical error being of order $\mathcal{O}(h^2)$.

What about second-order derivatives? We can approximate those from our first order approximation via

$

\begin{aligned}

\frac{d^2f}{dx^2}(x) &= f^{\prime\prime}(x) = \frac{d}{dx}\frac{df}{dx} \approx \frac{d}{dx} \left( \frac{f(x+h) – f(x-h)}{2h} \right) \\

&\approx \frac{1}{2h} \left( \frac{f(x+2h) – f(x)}{2h} – \frac{f(x) – f(x-2h)}{2h} \right) \\ &= \frac{1}{(2h)^2}\left(f(x+2h) – 2f(x) + f(x-2h) \right)

\end{aligned}

$

or, alternatively, by redefining $h$ like above

$$

f^{\prime\prime}(x) = \frac{1}{h^2} \left(f(x) – 2f(x) + f(x) \right)

$$

In the same way, mixed derivatives are handled:

$$

\frac{\partial^2 f}{\partial x \partial y} \approx \frac{1}{2h} \left(\left(\frac{\partial f}{\partial y}\right){x+h, y} – \left(\frac{\partial f}{\partial y}\right){x-h, y}\right)

$$

with

$$

\left(\frac{\partial f}{\partial y}\right)_{x+h, y} \approx \frac{f(x+h, y+h) – f(x+h, y-h)}{2 h}

$$

$$

\left(\frac{\partial f}{\partial y}\right)_{x-h, y} \approx\frac{f(x-h, y+h) – f(x-h, y-h)}{2 h}

$$

finally resulting in

$$

\begin{aligned}

\frac{\partial^2 f}{\partial x \partial y} = & \frac{1}{4h} (f(x+h, y+h) – f(x+h, y-h) \\

& – f(x-h, y+h) + f(x-h, y-h))\end{aligned}

$$

Last but not least, let’s get a bit fancy and construct a central differences scheme in two dimensions for affine coordinates. These are described by a pair of covariant vectors $\vec{e}_1$, $\vec{e}_2$ forming the metric tensor

$$

\hat{g}_{i,j} = \vec{e}_1 \cdot \vec{e}_2

$$

which governs all distances in affine coordinates. Its inverse, the dual metric tensor $\hat{g}^{i,j} = (\hat{g}_{i,j})^{-1}$, is employed for the actual coordinate transformation

$$

\begin{pmatrix}

\xi^1 \ \xi^2

\end{pmatrix} = \hat{g}^{\alpha\beta} \cdot

\begin{pmatrix} x \ y

\end{pmatrix}

$$

The first-order derivative, given by the gradient in affine coordinates, is analogous to the one in Cartesian coordinates

$$

\nabla f(\xi_1, \xi_2) = \frac{1}{2 h} \begin{pmatrix} f(\xi_1+h,\xi_2) – f(\xi_1-h,\xi_2) \\ f(\xi_1,\xi_2+h) – f(\xi_1,\xi_2-h)\end{pmatrix}

$$

For the Laplacian, also the dual metric tensor needs to be taken into account

$$

\begin{aligned}

\Delta f &= \nabla_i \nabla^i f = g^{i,j} \nabla_i \nabla_j f \\

&= g^{1,1} \frac{\partial^2 u}{\partial \xi_1 \partial \xi_1} +g^{1,2} \frac{\partial^2 f}{\partial \xi_1 \partial \xi_2} +g^{2,1} \frac{\partial^2 f}{\partial \xi_2 \partial \xi_1} +g^{2,2} \frac{\partial^2 f}{\partial \xi_2 \partial \xi_2}

\end{aligned}

$$

resulting in

$$

\begin{aligned}

h^2 \cdot \Delta f &=g^{1,1} (f(\xi_1+h, \xi_2) – 2 f(\xi_1, \xi_2) + f(\xi_1-h, \xi_2)) \\

& + g^{2,2} (f(\xi_1, \xi_2+h) – 2 f(\xi_1, \xi_2) + f(\xi_1, \xi_2-h)) \\

& + \frac{g^{1,2}}{2} (f(\xi_1+h, \xi_2+h) – f(\xi_1+h, \xi_2-h) \\

& – f(\xi_1-h, \xi_2+h) + f(\xi_1-h, \xi_2-h))\end{aligned}

$$

where we have used $g^{1,2} = g^{2,1}$

www.iue.tuwien.ac.at/phd/heinzl/node27.html

www.holoborodko.com/pavel/numerical-methods/numerical-derivative/central-differences/#comment-18109

www.mathematik.uni-dortmund.de/~kuzmin/cfdintro/lecture4.pdf

The post What are Finite Difference Schemes? appeared first on Ben's Planet.

]]>The post What means Quantum? appeared first on Ben's Planet.

]]>Actually, quantization might refer to different yet related concepts, depending on whom you ask and which level of understanding you assume. So we’ll start with the most tangible one, the notion of physical quanta as particles. Next, we continue on a little more abstract level with the quantization of physical quantities such as energy, and finally, we’ll have a glimpse at the process of canonical quantization. Let’s dive in!

Quantum physics is the science of atomic and sub-atomic particles at about the length scale of one-millionth of a hair’s diameter. While some of these particles are formed by smaller, more fundamental particles, like a proton is made up of quarks, some of these particles are – to the best of our current knowledge – non-divisible and thus constitute the quanta of matter: the smallest, indivisible building blocks of our universe, such as quarks, electrons or photons.

On this level, quantization means that matter and light only come in certain portions and that larger portions are always a multiple of these single portions. Think about a laser beam consisting of $n$ photons. From Einstein’s explanation of the photoelectric effect, we know that a single photon has an energy $E$ proportional to its frequency $f$ by $E = h \cdot f$. And the entire beam will have an energy of $E_{\mathrm{tot}} = n \cdot h \cdot f$

So the energy of this laser beam is quantized, as only multiples of a single photons’ energy will occur, never one half of it and neither twenty-three thirds. It basically the same as the quantization of paper money, there is a banknote for one dollar, five dollars, or ten dollars, but there are no three-dollar banknotes.

We are now going one step further and assume you have heard about the existence of Schrödinger equation:

$$

\hat{H} \, \psi(x) = E \cdot \psi(x)

$$

In the previous section, the energy of that laser beam was quantized because the beam itself consisted of a discrete number of particles. Yet the energy of a single photon could take any value, only depending on its frequency, and that was the case because these photons were free particles, i.e. they were free to float around in space going anywhere.

But what if we now consider a particle of mass $m$ being subject to a potential $V(\vec{r})$, e.g., placed inside a finite potential well? If the particle’s energy is greater than the potential well’s depth $V_0$, we get scattering states, and the particle’s energy still can assume any value: we get a continuous energy spectrum.

The whole story changes if our particle has less energy and is bound by the potential. In order to figure out, what is the particle’s energy you would a) solve Schrödinger’s equation:

$$

\hat{H} \, \psi = \left( \frac{-\hbar^2}{2m} \frac{\partial^2}{\partial x^2} + V \right) \psi(x) = E \cdot \psi(x)

$$

and b) take into account the boundary conditions the particle’s wave function needs to fulfill when hitting the well’s boundary. But the point is that these boundary conditions can only be fulfilled for certain discrete energies.

The example gets more memorable once you consider an infinite potential well, where the particle’s wave function has to assume exactly zero value at the well’s boundary and beyond. (That well is infinitely high, so the particle can’t be at the well’s boundary or even go beyond – quantum tunneling is only possible for potential barriers of finite extent).

The boundary conditions for the wave function are analogous to the string of a guitar being fixed at both ends and thus having standing waves as its eigenmodes. Only modes that leave both ends fixed comply with the boundary conditions and thus occur as eigenmodes or particle wave functions. And therefore, only the energy of those modes will pop up in the energy spectrum, forming a set of discrete energy values.

A similar story can be told about the energy levels of a hydrogen atom: In the simplest, semi-classical picture, an ‘electron wave’ given the De Broglie wavelength $\lambda = \frac{h}{p} $ is orbiting the core proton. As the electron wave goes around, it eventually reaches the starting point of its motion, beginning to interfere with itself. Thus, only those electron waves survive, which form standing waves, i.e., whose orbits cover a multiple of the De Broglie wavelength. And only the energy of those electron waves will show up as discrete energy levels in the energy spectrum.

So it’s all about boundary conditions. If a potential constrains a particle’s motion, it will feature discrete energy levels. Besides that, other quantities such as spin, angular momentum, or parity can also be quantized. It is characteristic of quantum mechanics that physical quantities assume under some circumstances only discrete values or be a multiple of some smallest, indivisible quantum portion.

Finally, we are going one more step further and assume you have heard about quantum mechanical operators, which we will label by a hat, like $\hat{p}$ for the momentum operator. In quantum mechanics, physical observables such as position, momentum, or energy are described by Hermitian operators. Their eigenvalues constitute possible measurement results of the respective observable, and the absolute value squared of their eigenstates tells us about the likelihood that the corresponding eigenvalue is measured.

On an abstract level, quantization now refers to the process of canonical quantization, i.e., making the transition from classical physics to quantum physics. Or, more specifically, constructing a quantum (field) theory out of a classical theory by replacing classical variables like position $x$ or momentum $p$ with quantum mechanical operators $\hat{x}$ or $\hat{p}$, while keeping the formal structure of the theory. So what does ‘keeping the formal structure’ mean?

In classical physics, a system is governed by a Hamiltonian $H(q,p)$, which depends on the position $q$ and the canonical momentum $p$. The relation between these two quantities is manifested by the so-called Poisson bracket

$$

\left\{ A, B \right\} = \frac{\partial A}{\partial q} \frac{\partial B}{\partial p} – \frac{\partial A}{\partial p} \frac{\partial q}{\partial q}

$$

capturing the canonical (also called symplectic) structure of the theory by ${q, p} = 1$. ‘Preserving the structure’ now means, that in analogy to the Poisson bracket one introduces the so-called commutator for quantum operators $\hat{A}, \hat{B}$

$$

\left[ \hat{A}, \hat{B} \right] = \hat{A} \hat{B} – \hat{B} \hat{A}

$$

which checks if you are allowed to interchange these two operators. If so, i.e. $\hat{A} \hat{B} = \hat{B} \hat{A}$, the commutator assumes a value of zero. This has major implications for the relation between the two operators $\hat{A}, \hat{B}$ representing physical observables, for example that measuring the value of $\hat{A}$ does not influence the measurement of the value of $\hat{B}$.

Vice versa, a non-zero commutator implies that these two operators may not be interchanged, which is the case, for example, for the position-momentum-commutator

$$

\left[ \hat{x}, \hat{p} \right] = i \hbar

$$

This tells us that position and momentum of a particle cannot be measured exactly simultaneously (the measurement of either influences the measurement of the other) as expressed by Heisenberg’s uncertainty relation.

To cut a long story short: Canonical quantization is the process of going from classical physics to quantum physics by replacing classical variables (i.e., numbers) with operators (i.e., linear mappings) and Poisson brackets with commutators:

$$

x \to \hat{x}, \quad p \to \hat{p}

$$

$$

{x,p} = 1 \to \frac{1}{i \hbar}[\hat{x},\hat{p}] = 1

$$

This reproduces the familiar canonical structure of classical mechanics and allows for describing all those quantum mechanics effects. However, this mapping is not unique in the sense that not all combinations of $x$ and $p$ can be mapped exactly to their quantum analogs (you get problems for polynomials of degree four and higher), and also the way this mapping is performed in detail can be chosen according to different ‘quantization schemes.’ While quantization usually refers to canonical quantization, there are also alternative approaches such as path integral quantization and more exotic ones.

Quantization might refer to something coming in fundamental, indivisible portions, like a particle or a discrete energy spectrum, or the process of going from classical physics to quantum physics called canonical quantization.

Jerry Schirmer, Energy is quantized, URL (version: 2012-09-27): https://physics.stackexchange.com/q/38438

Physics 582 General Field Theory, Fall Semester 2019, Eduardo Fradkin

The post What means Quantum? appeared first on Ben's Planet.

]]>