The Azimuth Project
Blog - stationary stability in finite populations

This page is a blog article in progress, written by Marc Harper. To discuss this article while it is being written, go to the Azimuth Forum.

guest post by Marc Harper

A while back, in the article Relative entropy minimization in evolutionary dynamics, we looked at extensions of the information geometry / evolutionary game theory story to more general time-scales, incentives, and geometries. Today we’ll see how to make this all work in finite populations!

Let’s recall the basic idea from last time, which John also described in his information geometry series. The main theorem is this: when there’s an evolutionarily stable state for a given fitness landscape, the relative entropy between the stable state and the population distribution decreases along the population trajectories as they converge to the stable state. In short, relative entropy is a Lyapunov function. This is a nice way to look at the action of a population under natural selection, and it has interesting analogies to Bayesian inference.

The replicator equation is a nice model from an intuitive viewpoint, and it’s mathematically elegant. But it has some drawbacks when it comes to modeling real populations. One major issue is that the replicator equation implicitly assumes that the population proportions of each type are differentiable functions of time, obeying a differential equation. This only makes sense in the limit of large populations. Other closely related models, such as the Lotka-Volterra model, focus on the number of individuals of each type (e.g. predators and prey) instead of the proportion. But they often assume that the number of individuals is a differentiable function of time, and a population of 3.5 isn’t very realistic either.

Real populations of replicating entities are not infinitely large; in fact they are often relatively small and of course have whole numbers of each type, at least for large biological replicators (like animals). They take up space and only so many can interact meaningfully. There are quite a few models of evolution that handle finite populations and some predate the replicator equation. Models with more realistic assumptions typically have to leave the realm of derivatives and differential equations behind, which means that the analysis of such models is more difficult, but the behaviors of the models are often much more interesting. Hopefully by the end of this post, you’ll see how all of these diagrams fit together:

One of the best-known finite population models is the Moran process, which is a Markov chain on a finite population. This is the quintessential birth-death process. For a moment consider a population of just two types AA and BB. The state of the population is given by a pair of nonnegative integers (a,b)(a,b) with a+b=Na+b=N, the total number of replicators in the population, and aa and bb the number of individuals of type AA and BB respectively. Though it may artificial to fix the population size NN, this often turns out not to be that big of a deal, and you can assume the population is at its carrying capacity to make the assumption realistic. (Lots of people study populations that can change size and that have replicators spatially distributed say on a graph, but we’ll assume they can all interact with each whenever they want for now).

A Markov model works by transitioning from state to state in each round of the process, so we need to define the transitions probabilities to complete the model. Let’s put a fitness landscape on the population, given by two functions f Af_A and f Bf_B of the population state (a,b)(a,b). Now we choose an individual to reproduce proportionally to fitness, e.g. we choose an AA individual to reproduce with probability

af Aaf A+bf B, \frac{a f_A}{a f_A + b f_B},

since there are aa individuals of type AA and they each have fitness f Af_A. This is analogous to the ratio of fitness to mean fitness from the discrete replicator equation, since

af Aaf A+bf B=aNf AaNf A+bNf Bx if i(x)f(x)¯, \frac{a f_A}{a f_A + b f_B} = \frac{\frac{a}{N} f_A}{\frac{a}{N} f_A + \frac{b}{N} f_B} \to \frac{x_i f_i(x)}{\overline{f(x)}},

and the discrete replicator equation is typically similar to the continuous replicator equation (this can be made precise), so the Moran process captures the idea of natural selection in a similar way. Actually there is a way to recover the replicator equation from the Moran process in large populations—details at the end!

We’ll assume that the fitnesses are nonnegative and that the total fitness (the denominator) is never zero; if that seems artificial, some people prefer to transform the fitness landscape by e βf(x)e^{\beta f(x)}, which gives a ratio reminiscent of the Boltzmann or Fermi distribution from statistical physics, with the parameter β\beta playing the role of intensity of selection rather than inverse temperature. This is sometimes called Fermi selection.

That takes care of the birth part. The death part is easier: we just choose an individual at random (uniformly) to be replaced. Now we can form the transition probabilities of moving between population states, for instance the probability of moving from state (a,b)(a,b) to (a+1,b1)(a+1, b-1) is given by the product of the birth and death probabilities (since they are independent)

T a a+1=af Aaf A+bf BbN, T_a^{a+1} = \frac{a f_A}{a f_A + b f_B} \frac{b}{N},

since we have to chose a replicator of type AA to reproduce and one of type BB to be replaced. Similarly for (a,b)(a,b) to (a1,b+1)(a-1, b+1) (switch all the ‘a’s and ‘b’s), and we can write the probability of staying in the state (a,Na)(a, N-a) as

T a a=1T a a+1T a a1T_a^{a} = 1 - T_{a}^{a+1} - T_{a}^{a-1}

Since we only replace one individual at a time, this covers all the possible transitions, and keeps the population constant.

We’d like to analyze this model and many people have come up with clever ways to do so, computing quantities like fixation probabilities (also known as absorption probabilities), indicating the chance that the population will end up with one type completely dominating, i.e. in state (0,N)(0, N) or (N,0)(N,0). If we assume that the fitness of type AA is constant and simply equal to 1, and the fitness of type BB is r1r \neq 1, we can calculate the probability that a single mutant of type BB will take over a population of type AA using standard Markov chain methods:

ρ=1r 11r N \rho = \frac{1 - r^{-1}}{1 - r^{-N}}

For neutral relative fitness (r=1r=1), ρ=1/N\rho = 1/N, which is the probability a neutral mutant invades by drift alone since selection is neutral. Since the two boundary states (0,N)(0, N) or (N,0)(N,0) are absorbing (no transitions out), in the long run every population ends up in one of these two states, i.e. the population is homogeneous. (This is the formulation referred to by Matteo Smerlak in The mathematical origins of irreversibility.)

That’s a bit different flavor of result than what we discussed previously, since we had stable states where both types were present, and now that’s impossible, and a bit disappointing. We need to make the population model a bit more complex to have more interesting behaviors, and we can do this in a very nice way by adding the effects of mutation. At the time of reproduction, we’ll allow either type to mutate into the other with probability μ\mu. This changes the transition probabilities to something like

T a a+1=a(1μ)f A+bμf Baf A+bf BbN T_a^{a+1} = \frac{a (1-\mu) f_A + b \mu f_B}{a f_A + b f_B} \frac{b}{N}

Now the process never stops wiggling around, but it does have something known as a stationary distribution, which gives the probability that the population is in any given state in the long run.

For populations with more than two types the basic ideas are the same, but there are more neighboring states that the population could move to, and many more states in the Markov process. One can also use more complicated mutation matrices, but this setup is good enough to typically guarantee that no one species completely takes over. For interesting behaviors, typically μ=1/N\mu = 1/N is a good choice (there’s some biological evidence that mutation rates are typically inversely proportional to genome size).

Without mutation, once the population reached (0,N)(0,N) or (N,0)(N,0), it stayed there. Now the population bounces between states, either because of drift, selection, or mutation. Based on our stability theorems for evolutionarily stable states, it’s reasonable to hope that for small mutation rates and larger populations (less drift), the population should spend most of its time near the evolutionarily stable state. This can be measured by the stationary distribution which gives the long run probabilities of a process being in a given state.

Previous work by Claussen and Traulsen:

suggested that the stationary distribution is at least sometimes maximal around evolutionarily stable states. Specifically, they showed that for a very similar model with fitness landscape given by

(f A f b)=(1 2 2 1)(a b)\left(\begin{array}{c} f_A \\ f_b \end{array}\right) = \left(\begin{array}{cc} 1 & 2\\ 2&1 \end{array}\right) \left(\begin{array}{c} a\\ b \end{array}\right)

the stationary state is essentially a binomial distribution centered at (N/2,N/2)(N/2, N/2).

So the stationary distribution in some sense measures the relative stability of a state (the process is more likely to be in it) as well as possibly capturing evolutionary stability when it is maximal! Unfortunately, stationary distributions can be very difficult to compute for arbitrary Markov chains. While it can be computed for the Markov process described above without mutation (only non-zero on the two absorbing states) and in the case studied by Claussen and Traulsen, there’s no general analytic formula for the process with mutation, nor for more than two types, because the processes are not reversible. Since we can’t compute the stationary distribution analytically, we’ll have to find another way to show that the local maxima of the stationary distribution are “evolutionarily stable”. We can approximate the stationary distribution fairly easily with a computer, so it’s easy to plot the results for just about any landscape and reasonable population size (e.g. N100N \approx 100).

It turns out that we can use a relative entropy minimization approach, just like for the continuous replicator equation! But how? We lack some essential ingredients such as deterministic and differentiable trajectories. Here’s what we do:

  • We show that the local maxima and minima of the stationary distribution satisfy a complex balance criterion
  • We then show that these states minimize an expected relative entropy
  • This will mean that the current state and the expected next state are “close”
  • Lastly we show that these states satisfy an analogous definition of evolutionary stability (now incorporating mutation).

The relative entropy allows us to measure how close the current state is to the expected next state, which captures the idea of stability in another way. This ports the relative minimization Lyapunov result to some more realistic Markov chain models. The only downside is that we’ll assume the populations are “sufficiently large”, but in practice for populations of three types, N=20N=20 is typically enough for common fitness landscapes (there are lots of examples here for N=80N=80, which are prettier than the smaller populations). The reason for this is that the population state (a,b)(a,b) needs enough “resolution” (a/N,b/N)(a/N, b/N) to get sufficiently close to the stable state, which is not necessarily a ratio of integers. If you allow some wiggle room, smaller populations are still typically pretty close.

Evolutionarily stable states are closely related to Nash equilibria, which have a nice intuitive description in traditional game theory as “states that no player has an incentive to deviate from”. But in evolutionary game theory, we don’t use a game matrix to compute e.g. maximum payoff strategies, rather the game matrix defines a fitness landscape which then determines how natural selection unfolds.

We’re going to see this idea again in a moment, and to help get there let’s introduce an function called an incentive that encodes how a fitness landscape is used for selection. One way is to simply replace the quantities af A(a,b)a f_A(a,b) and bf B(a,b)b f_B(a,b) in the fitness-proportionate selection ratio above, which now becomes (for two population types):

φ A(a,b)φ A(a,b)+φ B(a,b) \frac{\varphi_A(a,b)}{\varphi_A(a,b) + \varphi_B(a,b)}

Here φ A(a,b)\varphi_A(a,b) and φ B(a,b)\varphi_B(a,b) are the incentive function components that determine how the fitness landscape is used for natural selection (if at all). We have seen two examples above, φ A(a,b)=af A(a,b)\varphi_A(a,b) = a f_A(a, b) for the Moran process and fitness-proportionate selection, and φ A(a,b)=ae βf A(a,b)\varphi_A(a,b) = a e^{\beta f_A(a, b)} for an alternative that incorporates a strength of selection term β\beta, preventing division by zero for fitness landscapes defined by zero-sum game matrices, such as a rock-paper-scissors game. Using an incentive function also simplifies the transition probabilities and results as we move to populations of more than two types. Introducing mutation, we can describe the ratio for incentive-proportion selection with mutation for the ii-th population type when the population is in state x=(a,b,)/Nx=(a,b,\ldots) / N as

p i(x)= k=1 nφ k(x)M ik k=1 nφ k(x) p_i(x) = \frac{\sum_{k=1}^{n}{\varphi_k(x) M_{i k} }}{\sum_{k=1}^{n}{\varphi_k(x)}}

for some matrix of mutation probabilities MM. This is just the probability that we get a new individual of the ii-th type (by birth and/or mutation). A common choice for the mutation matrix is to use a single mutation probability μ\mu and spread it out over all the types, such as letting M ij=μ/(n1)M_{ij} = \mu / (n-1) and M ii=1μM_{ii} = 1 - \mu.

Now we are ready to define the expected next state for the population and see how it captures a notion of stability. For a given state population xx in a multitype population, using xx to indicate the normalized population state (a,b,)/N(a,b,\ldots) / N, consider all the neighboring states yy that the population could move to in one step of the process (one birth-death cycle). These neighboring states are the result of increasing a population type by one (birth) and decreasing another by one (death, possibly the same type), of course excluding cases on the boundary where the number of individuals of any type drops below zero or rises above NN. Now we can define the expected next state as the sum of neighboring states weighted by the transition probabilities

E(x)= yyT x yE(x) = \sum_{y}{y T_x^{y}}

with transitions given by

T x y=p i(x)x jT_{x}^{y} = p_{i}(x) x_{j}

for states yy that differ in 1/N1/N at the ii-th coordinate and 1/N-1/N at jj-th coordinate from xx. Here x jx_j is just the probability of the random death of an individual of the jj-th type, so the transition probabilities are still just birth (with mutation) and death as for the Moran process we started with.

Skipping some straight-forward algebraic manipulations, we can show that

E(x)= yyT x y=N1Nx+1Np(x).E(x) = \sum_{y}{y T_x^{y}} = \frac{N-1}{N}x + \frac{1}{N}p(x).

Then it’s easy to see that E(x)=xE(x) = x if and only if x=p(x)x = p(x), and for small mutation probability μ\mu (and the mutation matrix above), and that x=p(x)x = p(x) if and only if x i=φ i(x)x_i = \varphi_i(x). So we have a nice description of “stability” in terms of fixed points of the expected next state function and the incentive function x=E(x)=p(x)=φ(x)x = E(x) = p(x) = \varphi(x), and we’ve gotten back to “no one has an incentive to deviate”. More precisely, for the Moran process φ i(x)=x if i(x)\varphi_i(x) = x_i f_i(x) and we get back f i(x)=f j(x)f_i(x) = f_j(x) for every type. So we take x=φ(x)x = \varphi(x) as our analogous condition to an evolutionarily stable state, though it’s just the “no motion” part and not also the “stable” part. That’s what we need the stationary distribution for!

To turn this into a useful number that measures stability, we use the relative entropy of the expected next state and the current state, in analogy with the Lyapunov theorem for the replicator equation. The relative entropy

D(x,y)= ix ilog(x i)y ilog(x i)D(x, y) = \sum_i x_i \log(x_i) - y_i \log(x_i)

has the really nice property that D(x,y)=0D(x,y) = 0 if and only if x=yx = y, so we can use the expected relative entropy D(E(x),x)D(E(x), x) as a measure of how close to stable any particular state is! (It’s actually the relative entropy of the expected next state and the current state rather than the expect relative entropy). Here the expected next state takes the place of the “evolutionarily stable state” in the result described last time for the replicator equation. Finally, we need to show that the maxima (and minima) of of the stationary distribution are these fixed points by showing that these states minimize the expected relative entropy.

Seeing that local maxima and minima of the stationary distribution minimize the expected relative entropy is a more involved, so let’s just sketch the details. In general, these Markov processes are not reversible, so they don’t satisfy the detailed-balance condition, but the stationary probabilities do satisfy something called the global balance condition, which says that for the stationary distribution ss we have that

s x xT x y= ys yT y xs_x \sum_{x}{T_x^{y}} = \sum_{y}{s_y T_y^{x}}

When the stationary distribution is at a local maximum (or minimum), we can show essentially that this implies (up to an ϵ\epsilon, for a large enough population) that

xT x y= yT y x\sum_{x}{T_x^{y}} = \sum_{y}{T_y^{x}}

a sort of probability inflow-outflow equation, which is very similar to the condition of complex balanced equilibrium described by Manoj Gopalkrishnan in this Azimuth post. With some algebraic manipulation,we can show that these states have E(x)=xE(x)=x.

Now let’s look again at the figures from the start. The first shows the vector field of the replicator equation:

You can see rest points at the center, on the center of each boundary edge, and on the corner points. The center point is evolutionarily stable, the center points of the boundary are semi-stable (but stable when the population is restricted to a boundary simplex), and the corner points are unstable.

This one shows the stationary distribution for a finite population model with a Fermi incentive on the same landscape, for a population of size 80:

A fixed population size gives a partitioning of the simplex, and each triangle of the partition is colored by the value of the stationary distribution. So you can see that there are local maxima in the center and on the centers of the triangle boundary edges. In this case, the size of the mutation probability determines how much of the stationary distribution is concentrated on the center of the simplex.

This shows one-half of the Euclidean distance squared between the current state and the expected next state:

And finally, this shows the same thing but with the relative entropy as the ‘distance function’:

As you can see, the Euclidean distance is locally minimal at each of the local maxima and minima of the stationary distribution (including the corners); the relative entropy is only guaranteed so on the interior states (because the relative entropy doesn’t play nicely with the boundary, and unlike the replicator equation, the Markov process can jump on and off the boundary). It turns out that the qq-relative entropies for qq between 0 and 1 also work just fine, but for the large population limit (the replicator dynamic), the relative entropy is the somehow the right choice for the replicator equation (has the derivative that easily gives Lyapunov stability), which is due to the connections between relative entropy and Fisher information in the information geometry of the simplex. The Euclidean distance is the q=0q=0 case and the relative entropy is q=1q=1.

As it turns out, something very similar holds for another popular finite population model, the Wright-Fisher process! This model is more complicated, so if you are interested in the details, check out our paper, which has many nice examples and figures. We also define a process that bridges the gap between the atomic nature of the Moran process and the generational nature of the Wright-Fisher process, and prove the general result for that model.

Finally, let’s see how the Moran process relates back to the replicator equation (see also the appendix in this paper), and how we recover the stability theory of the replicator equation. We can use the transition probabilities of the Moran process to define a stochastic differential equation (called a Langevin equation) with drift and diffusion terms that are essentially (for populations with two types:

Drift(x)=T +(x)T (x)Drift(x) = T^{+}(x) - T^{-}(x)
Diffusion(x)=T +(x)+T (x)NDiffusion(x) = \sqrt{\frac{T^{+}(x) + T^{-}(x)}{N}}

As the population size gets larger, the diffusion term drops out, and the stochastic differential equation becomes essentially the replicator equation. For the stationary distribution, the variance (e.g. for the binomial example above) also has an inverse dependence on NN, so the distribution limits to a delta-function that is zero except for at the evolutionarily stable state!

What about the relative entropy? Loosely speaking, as the population size gets larger, the iteration of the expected next state also becomes deterministic. Then the evolutionarily stable states is a fixed point of the expected next state function, and the expected relative entropy is essentially the same as the ordinary relative entropy, at least in a neighborhood of the evolutionarily stable state. This is good enough to establish local stability.

Earlier I said both the local maxima and minima minimize the expected relative entropy. Dash and I haven’t proven that the local maxima always correspond to evolutionarily stable states (and the minima to unstable states). That’s because the generalization of evolutionarily stable state we use is really just a “no motion” condition, and isn’t strong enough to imply stability in a neighborhood for the deterministic replicator equation. So for now we are calling the local maxima stationary stable states.

We’ve also tried a similar approach to populations evolving on networks, which is a popular topic in evolutionary graph theory, and the results are encouraging! But there are many more ‘states’ in such a process, since the configuration of the network has to be taken into account, and whether the population is clustered together or not. See the end of our paper for an interesting example of a population on a cycle.