In this set of lecture notes, we'll briefly study an elementary model of epidemic spread on networks. Networks are a natural representation of the patterns of human contact. A so-called contact network is a network in which nodes are individuals and edges represent "contact events" between individuals, such as a meeting or conversation. If one has access to a contact network, one can then combine that information with a model of how a disease spreads from individual to individual in order to simulate how a disease might spread on that population.
In these notes, we'll implement and visualize a very simple model of epidemics on networks, using NetworkX and our good friend, linear algebra.
Epidemic simulations can be quite computationally intensive, and for models on large networks, one generally needs some combination of:
The SIR model is one of many compartmental models of epidemic spread. It has three compartments:
Schematically, we can write
$$S \rightarrow I \rightarrow R$$to indicate the "flow" of individuals between compartments: susceptible individuals become infected, infected individuals eventually recover.
The SIR model is often written in continuous-time form as a system of ordinary differential equations. It can, however, be difficult to incorporate network structure into continuous time models. Instead, we're going to use a discrete-time model today, with the following logic:
i
is exposed to their neighbors.j
of i
, we'll flip a weighted coin with weight p
. If any of these coins come up heads, then i
becomes Infected.k
flips a weighted coin with weight q
. If this coin comes up heads, then the Infected individual k
becomes Recovered.We are going to implement this model as a class, which holds a NetworkX graph, as well as its adjacency matrix $\mathbf{A}$ encoding connections between nodes. We'll be able to avoid for-loops, for the most part, by using Numpy operations.
import networkx as nx
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
In Python, the right approach to complex simulations is often object-oriented. With this in mind, we're going to build a class whose instance variables represent system states and structures. These include the graph and adjacency matrix; a log of the current states of each node, and so on. We'll then define methods for:
We'll represent each of the three disease categories by integer labels: 0
means Susceptible, 1
means Infected, and 2
means Recovered.
class SIR:
def __init__(self, G):
self.G = G
self.A = nx.adjacency_matrix(G)
# number of nodes
self.n = self.A.shape[0]
# initial disease state
# 0 corresponds to susceptible
self.x = np.zeros(self.n)
# for logging history
self.t = 0
self.t_infected = np.zeros(self.n)
self.t_infected[:] = np.nan
def seed_epidemic(self):
# randomly infect a single individual
ix = np.random.randint(self.n)
self.x[ix] = 1
self.t_infected[ix] = 0
def timestep(self, infection_rate, recovery_rate):
infected = self.x == 1
# RECOVERY STEP: each Infected individual flips a coin
# with weight equal to recovery rate.
# If heads, they become Recovered.
self.x[infected] += (np.random.rand(sum(infected)) < recovery_rate)
# INFECTION STEP
# need to recalculate Infected since some of them have recovered
# compute number of exposures
infected = self.x == 1
# number of exposed individuals connected to each node
# matrix multiplication is magic!
exposures = self.A @ infected
# each Susceptible flips a number of coins
# equal to their number of exposures
# with weight equal to the infection rate
susceptible = self.x == 0
# probability of at least one "heads"
p = 1 - (1 - infection_rate)**exposures
# flip the coins
new_infections = (np.random.rand(len(p)) < p)*susceptible
# log the infections
self.x[new_infections] = 1
self.t_infected[new_infections] = self.t
# advance time by one step
self.t += 1
def summary_stats(self):
susceptible = (self.x == 0).mean()
infected = (self.x == 1).mean()
recovered = (self.x == 2).mean()
return np.array([susceptible, infected, recovered])
def simulate(self, timesteps, **kwargs):
# empty initial trajectories
X = np.zeros((timesteps, 3))
# initial condition
self.seed_epidemic()
# perform simulation
for i in range(timesteps):
self.timestep(**kwargs)
X[i] = self.summary_stats()
return X
def show_network_history(self, layout):
label_mapper = dict(zip(list(self.G.nodes()), self.t_infected))
nx.draw(G,
layout,
node_color = self.t_infected,
with_labels = True,
labels = label_mapper,
cmap=plt.cm.jet_r)
plt.gca().set(title = "Spread History -- Timestep Infected")
Oooph, that's a lot of methods! Let's use our class to perform some simulations. For now, we'll just arbitrarily choose infection and recovery rates.
To begin with, let's simulate an epidemic on the Karate Club graph:
G = nx.karate_club_graph()
epi = SIR(G)
X = epi.simulate(50, infection_rate = 0.1, recovery_rate = 0.1)
plt.plot(X[:,0], label = "S")
plt.plot(X[:,1], label = "I")
plt.plot(X[:,2], label = "R")
plt.legend()
<matplotlib.legend.Legend at 0x7f95c1087910>
This figure shows the characteristic structure of an SIR model. The I curve is perhaps most interesting, as this shows the number of active cases of disease at any given moment. Generally speaking:
Because this is a small network, we can also take a look at the history of how this disease spread on the network.
layout = nx.fruchterman_reingold_layout(G)
epi.show_network_history(layout)
Here, red nodes were infected early in the process, while blue nodes were infected late in the simulation. Nodes labeled nan
were not infected in this simulation.
We can simulate epidemics on much larger graphs, although this can take some time. Let's go back to the Facebook graph from a previous lecture.
url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/facebook_combined.txt"
facebook = pd.read_csv(url,
delimiter = " ",
names = ["source", "target"])
G = nx.from_pandas_edgelist(facebook)
epi = SIR(G)
X = epi.simulate(50, infection_rate = 0.05, recovery_rate = 0.05)
plt.plot(X[:,0], label = "S")
plt.plot(X[:,1], label = "I")
plt.plot(X[:,2], label = "R")
plt.legend()
<matplotlib.legend.Legend at 0x7f95c5040f40>
What determines whether or not an epidemic will break out in a population? The answer to this question is Very Complicated, but there is a useful mathematical fairytale we often tell, which can give some useful intuition.
To start, let's suppose that we live in a very simple world in which each Infected individual always infects, on average, $R_0$ other individuals. In such a world, if we start with a single infected individual, then in timestep $t$ there will be approximately $R_0^t$ individuals infected. If $R_0 > 1$ then this corresponds to exponential growth, while if $R_0 < 1$ then this corresponds to exponential decay. $R_0$ is often referred to as the basic reproductive number of the infection, although many modern epidemiologists deny that there really is a single such number that can be measured independent of various external contexts.
Let's estimate $R_0$ in a network in which each node has, on average, 10 neighbors. A simple way to create such a graph is via the nx.erdos_renyi_graph()
function, which creates a completely randomized graph in which every node is equally likely to be connected to any other node.
n = 200
G = nx.erdos_renyi_graph(n, 10/n)
c = np.mean([d for i, d in nx.degree(G)]) # mean degree
c
10.08
Let $\beta$ be the infection rate, and $\nu$ be the recovery rate. Suppose that node $i$ becomes Infected, and most of $i$'s neighbors are still Susceptible. In each timestep, the expected number of neighbors that $i$ will infect is roughly $c\beta$, where $c$ is the mean degree (number of neighbors of $i$). On the other hand, there are in expectation $\frac{1}{\nu}$ timesteps until $i$ recovers. So, a very casual estimate of $R_0$ in this model is
$$R_0 \approx c\frac{\beta}{\nu}$$The condition $R_0 > 1$ is the same as the condition $\beta > \frac{\nu}{c}$. So, we'll do an experiment in which we vary the infection rate, while keeping the recovery rate and mean degree constant. We can then see what happens as we cross the epidemic threshold.
infection_rates = np.linspace(0.0, 0.05, 11)
recovery_rate = 0.1
fig, ax = plt.subplots(1)
threshold = recovery_rate/c
ax.plot([threshold, threshold], [0, 1], label = "Epidemic Threshold")
# critical value of infection rate is around 0.01
for ir in infection_rates:
for i in range(10):
epi = SIR(G)
X = epi.simulate(100, infection_rate = ir, recovery_rate = recovery_rate)
ax.scatter([ir], [2*X[:,1:].mean()], color = "black", alpha = 0.2)
ax.set(xlabel = "Infection Rate", ylabel = "% of population infected")
ax.legend()
<matplotlib.legend.Legend at 0x7f95c11aa0d0>
As we cross the epidemic threshold, the behavior of the system qualitatively changes. Prior to the threshold, close to 0% of the population is Infected. After we cross the threshold, however, there are two possibilities. First, the disease may die out, with essentially no infections. On the other hand, the disease might also take off, infecting a large fraction of the population.
Well, this was a pretty clean set of results. Is epidemiology easy?
Of course it's not! In fact, while the above is a useful set of experiments to help us build our understanding, nothing that we did above is unproblematic:
One model that places special emphasis on the use of network methods in the modeling of epidemics is GLEAM, a multi-institution model operated jointly by Northeastern University and the ISI Foundation in Italy.
One of the nice features of mathematical models they suggests some natural ways to think about interventions, like vaccination, mask-wearing, or social distancing.
While modern models used by practicing epidemiologists are much more complex than what we described here, many of the major ideas are still in use.
For (much) more on mathematical epidemiology, you may be interested in consulting this textbook.