The Bak-Sneppen model of evolution

Examples - Biophysics

Last edited: October 6th 2020.

This notebook is somewhat different from our normal content. The problem at hand and the numerics are quite simple. The aim of the notebook is to demonstrate the Julia language, a language that has emerged in the last few years. Julia has to some degree already cemented itself in the scientific community, and will most likely continue to expand in the coming years. Despite the many wonders of Python, the biggest challenge is that it can be slow. Using NumPy is extremely fast, as behind the scenes it is highly optimized compiled code and takes advantage of well-known libraries like BLAS and LAPACK. However, sometimes it might be very cumbersome or not possible to utilize NumPy, and one is forced to use Python-loops. In these scenarios other languages might be more suitable. Many communities, both in academia and in industry, use C++ for it's speed and powerful features, though it can be both tedious to develop and simply too complex. Also, without the proper knowledge one might very well write slow code in C++ as well. Fortran is also a popular alternative, especially in the scientific community. With the first version being released more than 60 years ago, it is one of the older languages still widely used in computational physics. It is cherished for its speed, thought it might be intimidating and confusing for new developers. Hybrid variants, like using Fortran only for the inner loop, wrapping it in Python, is one possible technique which you can read more about here [1] and used in a Monte Carlo simulation here [2].

Julia is a fairly young language, with huge potential. It aims at taking the middle ground between Python on one side, and Fortran and C++ on the other. In this notebook we will implement the Bak-Sneppen model of evolution in Julia.

Most of the code here will be readily understood for someone used to Python. We are in the process of writing an introductory notebook to Julia, which we hope to finish soon, which will be linked to here when finished.

The model

The Bak-Sneppen model[3] is simple. It gives us some basic insights into the process of evolution. We have $N$ species. Each species is attributed a fitness factor, a random number between 0.0 and 1.0. The species are distributed on a ring, i.e. a line with periodic boundary conditions. The idea of the model is to apply only a very simple set of rules to this system, and see how it evolves. There are two steps in the simulation:

  1. Find the weakest species, the one with the lowest fitness factor.
  2. Give that species and it's two neighbors new random fitness factors.

At first these two rules might seem arbitrary and unmotivated, but consider the following: For fit species, a change in their fitness factor (a mutation), will on average be a negative influence. Mutated offspring of fit species, are likely to not be beneficial for that species, and will thus die. Less fit species however, are in danger of going extinct. Their mutated offspring will have a larger chance of survival than their parents; we allow the original species to die out, and their mutated offspring to survive. We assign a new (random) fitness factor on the mutated offspring. This motivates rule 1 (and the first part of rule 2). The species in the ecosystem are not isolated. Perturbations to some part of an ecosystem could affect the entire ecosystem, even species that do not directly interact at all. We will here restrict the interaction to nearest neighbors - we ignore indirect effects. Imagine that a particular breed of rabbit at some point mutate and become smaller and faster. For the gray fox and lynx preying on the rabbit, this threatens their access to food - the rabbit is too fast and agile to catch. The small gray fox discovers however, that it is small enough to dig into the rabbit's hole, and so is mostly unaffected by the change. The lynx however, is not so lucky. It is too big to fit in the rabbit hole, and struggles to find enough food. This example shows how the mutation of one species, can affect the neighboring species in differently. In our model, we account for this by changing the fitness factor of the two closest neighbors in the ecosystem. The value of the fitness factor of the two closest neighbors is assigned at random, meaning that the effect of their mutated neighbor can be be both negative or positive. This motivates rule 2.


The problem simply consists of a for-loop changing the species. This type of problems, iterative steps where each step is dependent on the previous one, is generally difficult to implement with NumPy, at least directly. Since we will end up writing some loop-structure, a fast language will be preferential to Python - in our case Julia. Comparing the implementation shown here to a similar implementation in Python, we measured the Julia-implementation to be about twice as fast as Python.

The only programmatic challenge to be taken care of is the periodic boundary conditions. Our species will be represented as a one dimensional vector with numbers between 0.0 and 1.0, the fitness factors. The boundary conditions only plays a role when finding the neighbors. A naive, but functional, approach is to for each step check if one is at the boundary, and then execute special code for those cases. We use however a common technique inspired from graph theory. Each element of the vector, node $i$, has two neighbors, one right and one left, usually node $i+1$ and $i-1$. We will create two vectors, one with right indices and one with left indices. The element at position $i$ in our right index-vector, will have the value $i+1$, except at the boundary. Similarly for index left. This allows for the writing of very clean code. 1

Index arrays

The figure shows the arrays ir and il which at index i gives the index of the right and left element.

As the simulation iterates through time, we will for each iteration store the minimum fitness factor in that iteration and also the age of each species. We can interpret age as years passed since its conception. The age is calculated by increasing the age by one for each round, and setting it to zero when a species is "mutated".

In [18]:
N = 1000  # Number of species.
t_end = 100000  # End time of simulation.
t_space = 1:t_end  # An iterable for the time steps. Note that Julia is 1-indexed.

# Create the species' fitness factor.
# rand(N) returns a vector of size N with random elements between 0.0 and 1.0.
species = rand(N)

# Fix boundary conditions
# ir and il are right indices and left indices respectively
# ir[i] gives the right index at position i, ususally i+1.
# `collect` converts a range, for example 2:N, into an array.
# It serves the same purpose as doing `list(range(2, N))` in Python.
ir = collect(2:N+1)
il = collect(0:N-1)
ir[N] = 1  # Right of rightmost is leftmost in a ring
il[1] = N  # Left of leftmost is rightmost

Simulate the species over the given time interval.
Return the age of each species at each time step and 
the minimum fitness factor at each time step.
function simulate!(species, t_space)
    age = zeros(N, t_end)
    mins = zeros(t_end)
    for t in t_space[2:end]
        age[:, t] = age[:, t-1] .+ 1
        # We want to find the weakest species.
        # `findmin` returns the minimum value and the index of that value.
        mins[t], index_min = findmin(species)
        species[index_min] = rand()
        species[ir[index_min]] = rand()  # Right
        species[il[index_min]] = rand()  # Left
        age[index_min, t] = 0
        age[ir[index_min], t] = 0  # Right
        age[il[index_min], t] = 0  # Left
    return age, mins
In [19]:
age, mins = simulate!(species, t_space);
# We add a semicolon so that Jupter does not print the values of age and mins.

Julia supports several backends for plotting. Here we will use PyPlot, which is simply an interface to Python's Matplotlib. For anyone familiar with Matplotlib the following code should readily understood.

In [6]:
using PyPlot
# There is a lot happening in this first line. Let's break it down.
# The `[:, 1:100:end]` resembles Python's slicing syntax, but there are slight differences.
# The first colon simply means every row, just like Python.
# `1:100:end` follow the syntax `START:STEP:END`, so it means
# 'from first column to last column, take every 100th column'.
# `transpose` obviously transposes the matrix, this is done only
# to have the the axis where we think they make the most sense,
# ie. time on second axis.
# `vmax` is the same as in matplotlib. It is included to
# give better contrast in the figure, by setting all ages over
# `vmax` to the same color, bringing out the details in
# rapidly changing areas.
plt.imshow(transpose(age[:, 1:100:end]), vmax=30000)
plt.ylabel("Age [# iterations]")
plt.title("The age of species (x-axis) through the simulation.")

The figure above shows the age of the species over time. Notice the clustering of species, where spatially close species tend to survive together over a long time. One can think of the clusters as stable sub-ecosystems. Evolutions tend to be localized both in time and space. If an important species in a a stable sub-ecosystem suddenly mutates, the effect of this single mutation ripples through the entire sub-ecosystem, leading to mutation in every species inhabiting that sub-ecosystem. This sequence of mass-evolutions, are referred to as avalanches, a phenomenon that is much studied in the field of statistical physics.

We will now look at the minimum fitness factor in the ecosystem as a function of time. We see that it quickly increases and then tapers off to an asymptotic stable value, roughly equal to 0.6. This explains the localization of the evolutions that we see in the age-plot. When a mutation happens, the probability that the three new values generated are lower than any other value is quite high, at least after some time when the minimum value is high.

Now, admittedly, the following code is quite dense. It would be possible to write this segment in a way that would be more understandable to those unfamiliar with Julia. However, this solution is shorter, more efficient, and ultimately we hope that it may convey some of power in Julia. Now, what happens in this one line of code? We wish to see how the minimum values evolve over time. The naive approach of simply plotting the the values of mins would unfortunately convey nothing, as there is too much statistical noise. We will take a more refined approach. Partition the time-axis and take the maximum value of mins in each partition. Or, said in a different manner, group the values of mins into "baskets" of N values each, and take the maximum in each basket. Then plot these values. Julia's "Iteration utilities" offers the iterator Iterators.partition. It does exactly what we just described, given some vector (or iterable) divide it into baskets, each with some given number of elements. Thus Iterators.partition(mins, 500) returns an iterator. Each element in the iterator, is a vector with 500 elements, taken from mins. We then simply take the maximum value of that vector. An educated guess was made to arrive at the number 500. The number itself does not carry any physical meaning; we need some value large enough to filter out the noise, but small enough to not remove the data we are interested in.

In [17]:
basket_size = 500
plt.plot([maximum(a) for a in Iterators.partition(mins, basket_size)])
plt.ylabel("Minimum fitness factor")
# Since we collected our results in baskets, the numbers on the time-axis are wrong.
# We either have to rescale them or remove them.
# Since we are mainly interested in the qualitative behavior, we remove the numbers.

Now, we admit that both the model and the analysis of the model in this notebook is quite much simpler than what we usually do in our notebooks. The goal of this notebook is however simply to illustrate that Julia is a viable option for writing modern efficient numerical computations. We hope that you found this notebook somewhat inspiring, and that hopefully it might serve as a kind introduction to Julia.


1 It is not straightforward to predict if this will give faster execution times or not. In this particular example, this is actually slower than using a conditional approach, ie. executing special code inside an if-statement if one is at the boundary. This was found by simply implementing both methods and measuring the execution times. In general, predicting which is faster is difficult, and if efficiency is at the essence, experimentation with different implementations is advised. The method illustrated here does however offer much cleaner code, which is increasingly important in more complex problems.</span>