The Azimuth Project
Blog - network theory (part 5)

This page is a blog article in progress, written by John Baez. To see the final polished article, go to the Azimuth Blog.

Last time we saw clues that stochastic Petri nets are a lot like quantum field theory, but with probabilities replacing amplitudes. There’s a powerful analogy at work here, which can help us a lot. So, this time I want to make that analogy precise.

But first, let me quickly sketch why it could be worthwhile.

A Poisson process

Consider this stochastic Petri net with rate constant rr:

It describes rabbits roaming around randomly and getting caught by a trap. In any short time Δt\Delta t there’s a chance of about rΔtr \Delta t of a rabbit getting caught. There’s also a chance of two or more rabbits getting caught, but this becomes negligible by comparison as Δt0\Delta t \to 0. Moreover, the chance of a rabbit getting caught during this interval of time is independent of what happens before or afterwards. This sort of process is called a Poisson process.

Problem. - Suppose we start out knowing for sure there are no rabbits in our trap. What’s the probability of having caught nn rabbits at time tt?

At any time there will be some probability of having caught nn rabbits; let’s call this probability ψ n(t)\psi_n(t), or ψ n\psi_n for short. Borrowing a trick from quantum theory, we can summarize all these probabilities in a single power series:

ψ= n=0 ψ nx n\psi = \sum_{n=0}^\infty \psi_n \, x^n

In quantum theory we use this trick when talking about collections of photons rather than rabbits, but then the numbers ψ n\psi_n are complex ‘amplitudes’ rather than probabilities. Let’s use this trick to write down and solve the master equation

ddtψ(t)=Hψ(t)\frac{d}{d t} \psi(t) = H \psi(t)

describing how the probability of having caught any given number of rabbits changes with time.

What’s the operator HH? Well, in quantum theory we describe the creation of photons using a certain operator on power series called the creation operator:

a ψ=xψa^\dagger \psi = x \psi

We can try to apply this to our rabbits. If at some time we’re 100% sure we have nn rabbits, we have

ψ=x n\psi = x^n

so applying the creation operator gives

a ψ=x n+1a^\dagger \psi = x^{n+1}

One more rabbit! That’s good. So, an obvious wild guess is

H=ra H = r a^\dagger

where rr is the rate at which we’re catching rabbits. Let’s see how well this guess works.

If you know how to exponentiate operators, you know to solve this equation:

ddtψ(t)=Hψ(t)\frac{d}{d t} \psi(t) = H \psi(t)

It’s easy:

ψ(t)=exp(tH)ψ(0)\psi(t) = \mathrm{exp}(t H) \psi(0)

Since we start out knowing there are no rabbits in our trap, we have

ψ(0)=1\psi(0) = 1

so with our guess for HH we get

ψ(t)=exp(rta )1\psi(t) = \mathrm{exp}(r t a^\dagger) 1

But a a^\dagger is the operator of multiplication by xx, so exp(rta )\mathrm{exp}(r t a^\dagger) is multiplication by e trxe^{t r x}, and

ψ(t)=e rtx= n=0 (rt) nn!x n\psi(t) = e^{r t x} = \sum_{n = 0}^\infty \frac{(r t)^n}{n!} \, x^n

So, if our guess is right, the probability of having caught nn rabbits at time tt is

(rt) nn!\frac{(r t)^n}{n!}

Unfortunately, this can’t be right, because these probabilities don’t sum to 1! Instead their sum is

n=0 (rt) nn!=e rt\sum_{n=0}^\infty \frac{(r t)^n}{n!} = e^{r t}

We can try to wriggle out of the mess we’re in by dividing our answer by this fudge factor. It sounds like a desperate measure, but we’ve got to try something!

This amounts to guessing that the probability of having caught nn rabbits by time tt is

(rt) nn!e rt\frac{(r t)^n}{n!} \, e^{-r t }

And this is right! This is called the Poisson distribution: it’s famous for being precisely the answer to the problem we’re facing.

So on the one hand our wild guess about HH was wrong, but on the other hand it was not so far off. We can fix it as follows:

H=r(a 1)H = r (a^\dagger - 1)

The extra 1-1 gives us the fudge factor we need.

So, a wild guess corrected by an ad hoc procedure seems to have worked! But what’s really going on?

What’s really going on is that a a^\dagger, or any multiple of this, is not a legitimate Hamiltonian for a master equation: if we define a time evolution operator exp(tH)\exp(t H) using a Hamiltonian like this, probabilities won’t sum to 1. But a 1a^\dagger - 1 is okay. So, we need to think about which Hamiltonians are okay.

In quantum theory, self-adjoint Hamiltonians are okay. But in probability theory, we need some other kind of Hamiltonian. Let’s figure it out.

Probability versus quantum theory

Suppose we have a system of any kind: physical, chemical, biological, economic, whatever. The system can be in different states. In the simplest sort of model, we say there’s some set XX of states, and say that at any moment in time the system is definitely in one of these states. But I want to compare two other options:

• In a probabilistic model, we may instead say that the system has a probability ψ(x)\psi(x) of being in any state xXx \in X. These probabilities are nonnegative real numbers with

xXψ(x)=1\sum_{x \in X} \psi(x) = 1

• In a quantum model, we may instead say that the system has an amplitude ψ(x)\psi(x) of being in any state xXx \in X. These amplitudes are complex numbers with

xX|ψ(x)| 2=1\sum_{x \in X} | \psi(x) |^2 = 1

Probabilities and amplitudes are similar yet strangely different. Of course given an amplitude we can get a probability by taking its absolute value and squaring it. This is a vital bridge from quantum theory to probability theory. Today, however, I don’t want to focus on the bridges, but rather the parallels between these theories.

We often want to replace the sums above by integrals. For that we need to replace our set XX by a measure space, which is a set equipped with enough structure that you can integrate real or complex functions defined on it. Well, at least you can integrate so-called ‘integrable’ functions—but I’ll neglect all issues of analytical rigor here. Then:

• In a probabilistic model, the system has a probability distribution ψ:X\psi : X \to \mathbb{R}, which obeys ψ0\psi \ge 0 and

Xψ(x)dx=1\int_X \psi(x) \, d x = 1

• In a quantum model, the system has a wavefunction ψ:X\psi : X \to \mathbb{C}, which obeys

X|ψ(x)| 2dx=1\int_X | \psi(x) |^2 \, d x= 1

In probability theory, we integrate ψ\psi over a set SXS \subset X to find out the probability that our systems state is in this set. In quantum theory we integrate |ψ| 2|\psi|^2 over the set to answer the same question.

We don’t need to think about sums over sets and integrals over measure spaces separately: there’s a way to make any set XX into a measure space such that by definition,

Xψ(x)dx= xXψ(x)\int_X \psi(x) \, d x = \sum_{x \in X} \psi(x)

In short, integrals are more general than sums! So, I’ll mainly talk about integrals, until the very end.

In probability theory, we want our probability distributions to be vectors in some vector space. Ditto for wave functions in quantum theory! So, we make up some vector spaces:

• In probability theory, the probability distribution ψ\psi is a vector in the space

L 1(X)={ψ:X: X|ψ(x)|dx<}L^1(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)| \, d x &lt; \infty \}

• In quantum theory, the wavefunction ψ\psi is a vector in the space

L 2(X)={ψ:X: X|ψ(x)| 2dx<}L^2(X) = \{ \psi: X \to \mathbb{C} \; : \; \int_X |\psi(x)|^2 \, d x &lt; \infty \}

You may wonder why I defined L 1(X)L^1(X) to consist of complex functions when probability distributions are real. I’m just struggling to make the analogy seem as strong as possible. In fact probability distributions are not just real but nonnegative. We need to say this somewhere… but we can, if we like, start by saying they’re complex-valued functions, but then whisper that they must in fact be nonnegative (and thus real). It’s not the most elegant solution, but that’s what I’ll do for now.

Now:

• The main thing we can do with elements of L 1(X)L^1(X), besides what we can do with vectors in any vector space, is integrate one. This gives a linear map:

:L 1(X)\int : L^1(X) \to \mathbb{C}

• The main thing we can with elements of L 2(X)L^2(X), besides the besides the things we can do with vectors in any vector space, is take the inner product of two:

ψ,ϕ= Xψ¯(x)ϕ(x)dx\langle \psi, \phi \rangle = \int_X \overline{\psi}(x) \phi(x) \, d x

This gives a map that’s linear in one slot and conjugate-linear in the other:

,:L 1(X)\langle - , - \rangle : L^1(X) \to \mathbb{C}

First came probability theory with L 1(X)L^1(X); then came quantum theory with L 2(X)L^2(X). Naive extrapolation would say it’s about time for someone to invent an even more bizarre theory of reality based on L 3(X).L^3(X). In this, you’d have to integrate the product of three wavefunctions to get a number! The math of Lp spaces is already well-developed, so give it a try if you want. I’ll stick to L 1L^1 and L 2L^2 today.

Stochastic versus unitary operators

Now let’s think about time evolution:

• In probability theory, the passage of time is described by a map sending probability distributions to probability distributions. This is described using a stochastic operator

U:L 1(X)L 1(X)U : L^1(X) \to L^1(X)

meaning a linear operator such that

Uψ=ψ\int U \psi = \int \psi

and

ψ0Uψ0\psi \ge 0 \quad \implies \quad U \psi \ge 0

• In quantum theory the passage of time is described by a map sending wavefunction to wavefunctions. This is described using an isometry

U:L 2(X)L 2(X)U : L^2(X) \to L^2(X)

meaning a linear operator such that

Uψ,Uϕ=ψ,ϕ\langle U \psi , U \phi \rangle = \langle \psi , \phi \rangle

In quantum theory we usually want time evolution to be reversible, so we focus on isometries that have inverses: these are called unitary operators. In probability theory we often consider stochastic operators that are not invertible.

Infinitesimal stochastic versus self-adjoint operators

Sometimes it’s nice to think of time coming in discrete steps. But in theories where we treat time as continuous, to describe time evolution we usually need to solve a differential equation. This is true in both probability theory and quantum theory:

• In probability theory we often describe time evolution using a differential equation called the master equation:

ddtψ(t)=Hψ(t)\frac{d}{d t} \psi(t) = H \psi(t)

whose solution is

ψ(t)=exp(tH)ψ(0)\psi(t) = \exp(t H)\psi(0)

• In quantum theory we often describe time evolution using a differential equation called Schrödinger’s equation:

iddtψ(t)=Hψ(t)i \frac{d}{d t} \psi(t) = H \psi(t)

whose solution is

ψ(t)=exp(itH)ψ(0)\psi(t) = \exp(-i t H)\psi(0)

In fact the appearance of ii in the quantum case is purely conventional; we could drop it to make the analogy better, but then we’d have to work with ‘skew-adjoint’ operators instead of self-adjoint ones in what follows.

Let’s guess what properties an operator HH should have to make exp(itH)\exp(-i t H) unitary for all tt. We start by assuming it’s an isometry:

exp(itH)ψ,exp(itH)ϕ=ψ,ϕ\langle \exp(-i t H) \psi, \exp(-i t H) \phi \rangle = \langle \psi, \phi \rangle

Then we differentiate this with respect to tt and set t=0t = 0, getting

iHψ,ϕ+ψ,iHϕ=0\langle -i H \psi, \phi \rangle + \langle \psi, -i H \phi \rangle = 0

or in other words

Hψ,ϕ=ψ,Hϕ\langle H \psi, \phi \rangle = \langle \psi, H \phi \rangle

Physicists call an operator obeying this condition self-adjoint. Mathematicians know there’s more to it, but today is not the day to discuss such subtleties, intriguing though they are. All that matters now is that there is, indeed, a correspondence between self-adjoint operators and well-behaved ‘one-parameter unitary groups’ exp(itH)\exp(-i t H). This is called Stone’s Theorem.

But now let’s copy this argument to guess what properties an operator HH must have to make exp(tH)\exp(t H) stochastic. We start by assuming exp(tH)\exp(t H) is stochastic, so

exp(tH)ψ=ψ\int \exp(t H) \psi = \int \psi

and

ψ0exp(tH)ψ0\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0

We can differentiate the first equation with respect to tt and set t=0t = 0, getting

Hψ=0\int H \psi = 0

for all ψ\psi.

But what about the second condition,

ψ0exp(tH)ψ0?\psi \ge 0 \quad \implies \quad \exp(t H) \psi \ge 0?

It seems easier to deal with this in the special case when integrals over XX reduce to sums. So let’s suppose that happens… and let’s start by seeing what the first condition says in this case.

In this case, L 1(X)L^1(X) has a basis of ‘Kronecker delta functions’: The Kronecker delta function δ i\delta_i vanishes everywhere except at one point iXi \in X, where it equals 1. Using this basis, we can write any operator on L 1(X)L^1(X) as a matrix.

As a warmup, let’s see what it means for an operator

U:L 1(X)L 1(X)U: L^1(X) \to L^1(X)

to be stochastic in this case. We’ll take the conditions

Uψ=ψ\int U \psi = \int \psi

and

ψ0Uψ0\psi \ge 0 \quad \implies \quad U \psi \ge 0

and rewrite them using matrices. For both, it’s enough to consider the case where ψ\psiis a Kronecker delta, say δ j\delta_j.

In these terms, the first condition says

iXU ij=1 \sum_{i \in X} U_{i j} = 1

for each column jj. The second says

U ij0U_{i j} \ge 0

for all i,ji, j.

So, in this case, a stochastic operator is just a square matrix where each column sums to 1 and all the entries are nonnegative. (Such matrices are often called left stochastic.)

Next, let’s see what we need for an operator HHto have the property that exp(tH)\exp(t H)is stochastic for all t0t \ge 0. It’s enough to assume tt is very small, which lets us use the approximation

exp(tH)=1+tH+\exp(t H) = 1 + t H + \cdots

and work to first order in tt. Saying that each column of this matrix sums to 1 then amounts to

iXδ ij+tH ij+=1\sum_{i \in X} \delta_{i j} + t H_{i j} + \cdots = 1

which requires

iXH ij=0 \sum_{i \in X} H_{i j} = 0

Saying that each entry is nonnegative amounts to

δ ij+tH ij+0\delta_{i j} + t H_{i j} + \cdots \ge 0

When i=ji = j this will be automatic when tt is small enough, so the meat of this condition is

H ij0ifijH_{i j} \ge 0 \; \mathrm{if} \; i \ne j

So, let’s say a matrix HH is infinitesimal stochastic if its columns sum to zero and its off-diagonal entries are nonnegative.

I don’t love this terminology: do you know a better one? There should be some standard term. People here say they’ve seen the term ‘stochastic Hamiltonian’, but then a stochastic Hamiltonian is not a stochastic operator, which makes me nervous. The idea behind my term is that any infintesimal stochastic operator should be the infinitesimal generator of a stochastic process.

In other words, when we get the details straightened out, any 1-parameter family of stochastic operators

U(t):L 1(X)L 1(X)t0 U(t) : L^1(X) \to L^1(X) \qquad t \ge 0

obeying

U(0)=I U(0) = I
U(t)U(s)=U(t+s) U(t) U(s) = U(t+s)

and continuity:

t itU(t i)ψU(t)ψ t_i \to t \quad \implies \quad U(t_i) \psi \to U(t)\psi

should be of the form

U(t)=exp(tH) U(t) = \exp(t H)

for a unique ‘infinitesimal stochastic operator’ HH.

When XX is a finite set, this is true—and an infinitesimal stochastic operator is just a square matrix whose columns sum to zero and whose off-diagonal entries are nonzero. But do you know a theorem characterizing infinitesimal stochastic operators for general measure spaces XX? Someone must have worked it out.

Luckily, for our work on stochastic Petri nets, we only need to understand the case where XX is a countable set and our integrals are really just sums. This should be almost like the case where XX is a finite set—but we’ll need to take care that all our sums converge.

The moral

Now we can see why a Hamiltonian like a a^\dagger is no good, while a 1 a^\dagger - 1 is good. (I’ll ignore the rate constant rr since it’s irrelevant here.) The first one is not infinitesimal stochastic, while the second one is!

In this example, our set of states is the natural numbers:

X=X = \mathbb{N}

The probability distribution

ψ:\psi : \mathbb{N} \to \mathbb{C}

says the probability of having any number of rabbits in the trap.

The creation operator is not infinitesimal stochastic: in fact, it’s stochastic! Why? Well, when we apply the creation operator, what was the probability of having nn rabbits now becomes the probability of having n+1n+1 rabbits. So, the probabilities remain nonnegative, and their sum over all nn is unchanged. Those two conditions are all we need for a stochastic operator.

To see this in a more complicated way, suppose we reinterpret ψ\psi as a power series, using our trick from quantum field theory:

ψ= n=0 ψ nx n\psi = \sum_{n = 0}^\infty \psi_n x^n

Then the total probability, which we’re abstractly calling ψ\int \psi, is just the sum of the probabilities ψ n\psi_n:

ψ= n=0 ψ n \int \psi = \sum_{n=0}^\infty \psi_n

On the other hand

a ψ= n=0 ψ nx n+1a^\dagger \psi = \sum_{n = 0}^\infty \psi_n x^{n+1}

so the creation operator is stochastic:

a ψ=ψ\int a^\dagger \psi = \int \psi

and

ψ0a ψ0 \psi \ge 0 \; \implies \; a^\dagger \psi \ge 0

Precisely by virtue of being stochastic, the creation operator fails to be infinitesimally stochastic:

a ψ0 \int a^\dagger \psi \ne 0

So, it’s a bad Hamiltonian for our stochastic Petri net.

On the other hand, a 1a^\dagger - 1 is infinitesimally stochastic. Its off-diagonal entries are nonnegative, and

(a 1)ψ=0 \int (a^\dagger - 1) \psi = 0

precisely because

a ψ=ψ\int a^\dagger \psi = \int \psi

You may be thinking: all this fancy math just to understand a single stochastic Petri net, the simplest one of all! But next time I’ll explain a general recipe which will let you write down the Hamiltonian for any stochastic Petri net. All the lessons we’ve so painfully learned today will make this much easier. And pondering the analogy between probability theory and quantum theory will also be good for our bigger project of unifying the applications of network diagrams in different subjects.

category: blog