# The Azimuth Project Blog - network theory (part 5)

This page is a blog article in progress, written by John Baez. To see the final polished article, go to the Azimuth Blog.

Last time we saw clues that stochastic Petri nets are a lot like quantum field theory, but with probabilities replacing amplitudes. There’s a powerful analogy at work here, which can help us a lot. So, this time I want to make that analogy precise.

But first, let me quickly sketch why it could be worthwhile.

#### A Poisson process

Consider this stochastic Petri net with rate constant $r$:

It describes rabbits roaming around randomly and getting caught by a trap. In any short time $\Delta t$ there’s a chance of about $r \Delta t$ of a rabbit getting caught. There’s also a chance of two or more rabbits getting caught, but this becomes negligible by comparison as $\Delta t \to 0$. Moreover, the chance of a rabbit getting caught during this interval of time is independent of what happens before or afterwards. This sort of process is called a Poisson process.

Problem. - Suppose we start out knowing for sure there are no rabbits in our trap. What’s the probability of having caught $n$ rabbits at time $t$?

At any time there will be some probability of having caught $n$ rabbits; let’s call this probability $\psi_n(t)$, or $\psi_n$ for short. Borrowing a trick from quantum theory, we can summarize all these probabilities in a single power series:

$\psi = \sum_{n=0}^\infty \psi_n \, x^n$

In quantum theory we use this trick when talking about collections of photons rather than rabbits, but then the numbers $\psi_n$ are complex ‘amplitudes’ rather than probabilities. Let’s use this trick to write down and solve the master equation

$\frac{d}{d t} \psi(t) = H \psi(t)$

describing how the probability of having caught any given number of rabbits changes with time.

What’s the operator $H$? Well, in quantum theory we describe the creation of photons using a certain operator on power series called the creation operator:

$a^\dagger \psi = x \psi$

We can try to apply this to our rabbits. If at some time we’re 100% sure we have $n$ rabbits, we have

$\psi = x^n$

so applying the creation operator gives

$a^\dagger \psi = x^{n+1}$

One more rabbit! That’s good. So, an obvious wild guess is

$H = r a^\dagger$

where $r$ is the rate at which we’re catching rabbits. Let’s see how well this guess works.

If you know how to exponentiate operators, you know to solve this equation:

$\frac{d}{d t} \psi(t) = H \psi(t)$

It’s easy:

$\psi(t) = \mathrm{exp}(t H) \psi(0)$

Since we start out knowing there are no rabbits in our trap, we have

$\psi(0) = 1$

so with our guess for $H$ we get

$\psi(t) = \mathrm{exp}(r t a^\dagger) 1$

But $a^\dagger$ is the operator of multiplication by $x$, so $\mathrm{exp}(r t a^\dagger)$ is multiplication by $e^{t r x}$, and

$\psi(t) = e^{r t x} = \sum_{n = 0}^\infty \frac{(r t)^n}{n!} \, x^n$

So, if our guess is right, the probability of having caught $n$ rabbits at time $t$ is

$\frac{(r t)^n}{n!}$

Unfortunately, this can’t be right, because these probabilities don’t sum to 1! Instead their sum is

$\sum_{n=0}^\infty \frac{(r t)^n}{n!} = e^{r t}$

We can try to wriggle out of the mess we’re in by dividing our answer by this fudge factor. It sounds like a desperate measure, but we’ve got to try something!

This amounts to guessing that the probability of having caught $n$ rabbits by time $t$ is

$\frac{(r t)^n}{n!} \, e^{-r t }$

And this is right! This is called the Poisson distribution: it’s famous for being precisely the answer to the problem we’re facing.

So on the one hand our wild guess about $H$ was wrong, but on the other hand it was not so far off. We can fix it as follows:

$H = r (a^\dagger - 1)$

The extra $-1$ gives us the fudge factor we need.

So, a wild guess corrected by an ad hoc procedure seems to have worked! But what’s really going on?

What’s really going on is that $a^\dagger$, or any multiple of this, is not a legitimate Hamiltonian for a master equation: if we define a time evolution operator $\exp(t H)$ using a Hamiltonian like this, probabilities won’t sum to 1. But $a^\dagger - 1$ is okay. So, we need to think about which Hamiltonians are okay.

In quantum theory, self-adjoint Hamiltonians are okay. But in probability theory, we need some other kind of Hamiltonian. Let’s figure it out.

#### Probability versus quantum theory

Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set $X$ of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:

• In a probabilistic model, we may instead say that the system has a probability $\psi(x)$ of being in any state $x \in X$. These probabilities are nonnegative real numbers with

$\sum_{x \in X} \psi(x) = 1$

• In a quantum model, we may instead say that the system has an amplitude $\psi(x)$ of being in any state $x \in X$. These amplitudes are complex numbers with

$\sum_{x \in X} | \psi(x) |^2 = 1$

Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.

We often want to replace the sums above by integrals. For that we need to replace our set $X$ by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:

• In a probabilistic model, the system has a probability distribution $\psi : X \to \mathbb{R}$, which obeys $\psi \ge 0$ and

$\int_X \psi(x) \, d x = 1$

• In a quantum model, the system has a wavefunction $\psi : X \to \mathbb{C}$, which obeys

$\int_X | \psi(x) |^2 \, d x= 1$

In probability theory, we integrate $\psi$ over a set $S \subset X$ to find out the probability that our systems state is in this set. In quantum theory we integrate $|\psi|^2$ over the set to answer the same question.

We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set $X$ into a measure space such that by definition,

$\int_X \psi(x) \, d x = \sum_{x \in X} \psi(x)$

In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.

In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:

• In probability theory, the probability distribution $\psi$ is a vector in the space

$L^1(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)| \, d x < \infty \}$

• In quantum theory, the wavefunction $\psi$ is a vector in the space

$L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)|^2 \, d x < \infty \}$

You may wonder why I defined $L^1(X)$ to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.

Now:

• The main thing we can do with elements of $L^1(X)$, besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:

$\int : L^1(X) \to \mathbb{C}$

• The main thing we can with elements of $L^2(X)$, besides the besides the things we can do with vectors in any vector space, is take the inner product of two:

$\langle \psi, \phi \rangle = \int_X \overline{\psi}(x) \phi(x) \, d x$

This gives a map that’s linear in one slot and conjugate-linear in the other:

$\langle - , - \rangle : L^1(X) \to \mathbb{C}$

First came probability theory with $L^1(X)$; then came quantum theory with $L^2(X)$. Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on $L^3(X).$ In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to $L^1$ and $L^2$ today.

#### Stochastic versus unitary operators

Now let’s think about time evolution:

• In probability theory, the passage of time is described by a map sending probability distributions to probability distributions. This is described using a stochastic operator

$U : L^1(X) \to L^1(X)$

meaning a linear operator such that

$\int U \psi = \int \psi$

and

$\psi \ge 0 \quad \implies \quad U \psi \ge 0$

• In quantum theory the passage of time is described by a map sending wavefunction to wavefunctions. This is described using an isometry

$U : L^2(X) \to L^2(X)$

meaning a linear operator such that

$\langle U \psi , U \phi \rangle = \langle \psi , \phi \rangle$

In quantum theory we usually want time evolution to be reversible, so we focus on isometries that have inverses: these are called unitary operators. In probability theory we often consider stochastic operators that are not invertible.

#### Infinitesimal stochastic versus self-adjoint operators

Sometimes it’s nice to think of time coming in discrete steps. But in theories where we treat time as continuous, to describe time evolution we usually need to solve a differential equation. This is true in both probability theory and quantum theory:

• In probability theory we often describe time evolution using a differential equation called the master equation:

$\frac{d}{d t} \psi(t) = H \psi(t)$

whose solution is

$\psi(t) = \exp(t H)\psi(0)$

• In quantum theory we often describe time evolution using a differential equation called Schrödinger’s equation:

$i \frac{d}{d t} \psi(t) = H \psi(t)$

whose solution is

$\psi(t) = \exp(-i t H)\psi(0)$

In fact the appearance of $i$ in the quantum case is purely conventional; we could drop it to make the analogy better, but then we’d have to work with ‘skew-adjoint’ operators instead of self-adjoint ones in what follows.

Let’s guess what properties an operator $H$ should have to make $\exp(-i t H)$ unitary for all $t$. We start by assuming it’s an isometry:

$\langle \exp(-i t H) \psi, \exp(-i t H) \phi \rangle = \langle \psi, \phi \rangle$

Then we differentiate this with respect to $t$ and set $t = 0$, getting

$\langle -i H \psi, \phi \rangle + \langle \psi, -i H \phi \rangle = 0$

or in other words

$\langle H \psi, \phi \rangle = \langle \psi, H \phi \rangle$

Physicists call an operator obeying this condition self-adjoint. Mathematicians know there’s more to it, but today is not the day to discuss such subtleties, intriguing though they are. All that matters now is that there is, indeed, a correspondence between self-adjoint operators and well-behaved ‘one-parameter unitary groups’ $\exp(-i t H)$. This is called Stone’s Theorem.

But now let’s copy this argument to guess what properties an operator $H$ must have to make $\exp(t H)$ stochastic. We start by assuming $\exp(t H)$ is stochastic, so

$\int \exp(t H) \psi = \int \psi$

and

$\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0$

We can differentiate the first equation with respect to $t$ and set $t = 0$, getting

$\int H \psi = 0$

for all $\psi$.

But what about the second condition,

$\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0?$

It seems easier to deal with this in the special case when integrals over $X$ reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.

In this case, $L^1(X)$ has a basis of ‘Kronecker delta functions’: The Kronecker delta function $\delta_i$ vanishes everywhere except at one point $i \in X$, where it equals 1. Using this basis, we can write any operator on $L^1(X)$ as a matrix.

As a warmup, let’s see what it means for an operator

$U: L^1(X) \to L^1(X)$

to be stochastic in this case. We’ll take the conditions

$\int U \psi = \int \psi$

and

$\psi \ge 0 \quad \implies \quad U \psi \ge 0$

and rewrite them using matrices. For both, it’s enough to consider the case where $\psi$is a Kronecker delta, say $\delta_j$.

In these terms, the first condition says

$\sum_{i \in X} U_{i j} = 1$

for each column $j$. The second says

$U_{i j} \ge 0$

for all $i, j$.

So, in this case, a stochastic operator is just a square matrix where each column sums to 1 and all the entries are nonnegative. (Such matrices are often called left stochastic.)

Next, let’s see what we need for an operator $H$to have the property that $\exp(t H)$is stochastic for all $t \ge 0$. It’s enough to assume $t$ is very small, which lets us use the approximation

$\exp(t H) = 1 + t H + \cdots$

and work to first order in $t$. Saying that each column of this matrix sums to 1 then amounts to

$\sum_{i \in X} \delta_{i j} + t H_{i j} + \cdots = 1$

which requires

$\sum_{i \in X} H_{i j} = 0$

Saying that each entry is nonnegative amounts to

$\delta_{i j} + t H_{i j} + \cdots \ge 0$

When $i = j$ this will be automatic when $t$ is small enough, so the meat of this condition is

$H_{i j} \ge 0 \; \mathrm{if} \; i \ne j$

So, let’s say a matrix $H$ is infinitesimal stochastic if its columns sum to zero and its off-diagonal entries are nonnegative.

I don’t love this terminology: do you know a better one? There should be some standard term. People here say they’ve seen the term ‘stochastic Hamiltonian’, but then a stochastic Hamiltonian is not a stochastic operator, which makes me nervous. The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.

In other words, when we get the details straightened out, any 1-parameter family of stochastic operators

$U(t) : L^1(X) \to L^1(X) \qquad t \ge 0$

obeying

$U(0) = I$
$U(t) U(s) = U(t+s)$

and continuity:

$t_i \to t \quad \implies \quad U(t_i) \psi \to U(t)\psi$

should be of the form

$U(t) = \exp(t H)$

for a unique ‘infinitesimal stochastic operator’ $H$.

When $X$ is a finite set, this is true—and an infinitesimal stochastic operator is just a square matrix whose columns sum to zero and whose off-diagonal entries are nonzero. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces $X$? Someone must have worked it out.

Luckily, for our work on stochastic Petri nets, we only need to understand the case where $X$ is a countable set and our integrals are really just sums. This should be almost like the case where $X$ is a finite set—but we’ll need to take care that all our sums converge.

#### The moral

Now we can see why a Hamiltonian like $a^\dagger$ is no good, while $a^\dagger - 1$ is good. (I’ll ignore the rate constant $r$ since it’s irrelevant here.) The first one is not infinitesimal stochastic, while the second one is!

In this example, our set of states is the natural numbers:

$X = \mathbb{N}$

The probability distribution

$\psi : \mathbb{N} \to \mathbb{C}$

says the probability of having any number of rabbits in the trap.

The creation operator is not infinitesimal stochastic: in fact, it’s stochastic! Why? Well, when we apply the creation operator, what was the probability of having $n$ rabbits now becomes the probability of having $n+1$ rabbits. So, the probabilities remain nonnegative, and their sum over all $n$ is unchanged. Those two conditions are all we need for a stochastic operator.

To see this in a more complicated way, suppose we reinterpret $\psi$ as a power series, using our trick from quantum field theory:

$\psi = \sum_{n = 0}^\infty \psi_n x^n$

Then the total probability, which we’re abstractly calling $\int \psi$, is just the sum of the probabilities $\psi_n$:

$\int \psi = \sum_{n=0}^\infty \psi_n$

On the other hand

$a^\dagger \psi = \sum_{n = 0}^\infty \psi_n x^{n+1}$

so the creation operator is stochastic:

$\int a^\dagger \psi = \int \psi$

and

$\psi \ge 0 \; \implies \; a^\dagger \psi \ge 0$

Precisely by virtue of being stochastic, the creation operator fails to be infinitesimally stochastic:

$\int a^\dagger \psi \ne 0$

So, it’s a bad Hamiltonian for our stochastic Petri net.

On the other hand, $a^\dagger - 1$ is infinitesimally stochastic. Its off-diagonal entries are nonnegative, and

$\int (a^\dagger - 1) \psi = 0$

precisely because

$\int a^\dagger \psi = \int \psi$

You may be thinking: all this fancy math just to understand a single stochastic Petri net, the simplest one of all! But next time I’ll explain a general recipe which will let you write down the Hamiltonian for any stochastic Petri net. All the lessons we’ve so painfully learned today will make this much easier. And pondering the analogy between probability theory and quantum theory will also be good for our bigger project of unifying the applications of network diagrams in different subjects.

category: blog