Breakthrough infections and Bayes' Theorem

Application of conditional probabilities to vaccine efficiencies.

  • toc: true
  • branch: master
  • badges: true
  • comments: true
  • author: Konrad W├Âlms
  • categories: [python, jupyter, covid, bayes]

Vaccine efficiencies and breakthrough infections

As part of the vaccination efforts during the Covid pandemic, vaccine efficiency and breakthrough infections have gotten a lot of attention. In this post we'll outline how Bayes' theorem can help to understand these concepts and their relationship. For that purpose we'll first cast the concepts into a probabilistic framework with the following definitions

  • $I$ event of somebody being infected
  • $V$ event of somebody being vaccinated
  • $e$ efficiency of vaccination
  • $P(I| \text{not } V)$ probability of an non-vaccinated person getting infected
  • $P(I|V)$ probability of a vaccinated person getting infected
  • $P(I|V) = (1-e)P(I|\text{not }V)$ relationship between vaccination efficiency and probabilities
  • $P(V|I)$ probability of a infected person being vaccinated, i.e. chance of a breakthrough infection

With these definitions in place all we need is Bayes' Theorem

Bayes' Theorem

$$ P(A|B) = \frac {P(B|A)P(A)} {P(B|A)P(A) + P(B| \text{not }A)P(\text{not }A)} $$

We can now apply this to breakthrough infections. For this purpose we set $A = V$ and $B = I$. That way we get

$$ P(V|I) = \frac {P(I|V)P(V)} {P(I|V)P(V) + P(I| \text{not }V)P(\text{not }V)} .$$

Plugging in the above definitions we can write this as

$$\begin{aligned} P(V|I) &= \frac {(1-e)P(I|\text{not }V)P(V)} {(1-e)P(I|\text{not }V)P(V) + P(I| \text{not }V)P(\text{not }V)} \\ &= \frac {(1-e)P(V)} {(1-e)P(V) + P(\text{not }V)} \\ &= \frac {(1-e)q} {(1-e)q + (1-q)} \end{aligned}.$$

Here we set $P(V) = q$ which is the probability of a person being vaccinated that can be set to the percentage of people in a population begin vaccinated.

The above formula can be used to estimate the average efficiency of vaccinations given the percentage of people that where vaccinated and the percentage of breakthrough infections. To this end we replace the probability of a breakthrough infection with the percentage of observed breakthrough infections and for brevity call that variable $b = P(V|I)$. Plugging this into the above equation and solving for $e$ yields.

$$e = 1 - \frac {(1-q)b} {(1-b)q} $$
In [1]:
def avg_vaccine_efficiency(pct_vaccinated, pct_breakthroughs):
    return 100 * (
        1
        - (1 - pct_vaccinated / 100)
        * pct_breakthroughs
        / ((1 - pct_breakthroughs / 100) * pct_vaccinated)
    )

Warning: This is not a not a sophisticated epidemiological model but simply serves to illustrated the relationship to Bayes theorem (which will also be important for more realistic models). This model should only be used to build some rough intuition about the relationship but by itself is not sufficient for decision making.

Some limitations of the presented model for the presented model:

  • unreported cases of infections and breakthroughs
  • fraudulent vaccinations
  • multiple infection scenarios
  • different virus variants throughout time
  • ...

Simple interpretation

One direct observation about this formula is the symmetry between $q$ and $b$ in denominator and numerator. This implies that when $q=b$ the effectiveness of the vaccination is 0! This makes intuitive sense because it means that when the chance of a infected person being vaccinated is the same as the chance of an arbitrary person being vaccinated, then vaccinations don't seem to help.

This also means that as long as the rate of breakthrough infections is below the vaccination rate, the vaccines have a real effect!

Application to German data

The German RKI published data (page 21) on breakthrough infections and percentages of vaccinated at the beginning of Nov 21.

In [2]:
# hide

import numpy as np
import pandas as pd

index = [
    "percentage breakthrough infections",
    "percentage of the population that is fully vaccinated",
]
columns = pd.MultiIndex.from_tuples(
    [
        ("12-17", "05-43"),
        ("12-17", "40-43"),
        ("18-59", "05-43"),
        ("18-59", "40-43"),
        (">= 60", "05-40"),
        (">= 60", "40-43"),
    ],
    names=["age", "calendar weeks (2021)"],
)

rki_data = pd.DataFrame(
    [(1.5, 4.2, 12.4, 39.7, 18.9, 60.5), (38.7, np.nan, 71.9, np.nan, 84.9)],
    index=index,
    columns=columns,
)
In [3]:
rki_data
Out[3]:
age 12-17 18-59 >= 60
calendar weeks (2021) 05-43 40-43 05-43 40-43 05-40 40-43
percentage breakthrough infections 1.5 4.2 12.4 39.7 18.9 60.5
percentage of the population that is fully vaccinated 38.7 NaN 71.9 NaN 84.9 NaN

As the vaccination numbers were mostly stagnating later during 2021 we'll use the vaccination percentages from week 05-43 to approximate the ones from 40-43 (that might be a little higher). We then use the formula derived above to calculate the average vaccine efficiency

In [9]:
(
    rki_data.T.fillna(method="ffill")
    .assign(
        average_vaccine_effectiveness_pct=lambda df: avg_vaccine_efficiency(
            df["percentage of the population that is fully vaccinated"],
            df["percentage breakthrough infections"],
        )
    )
    .applymap(lambda x: round(x, 1))
    .T
)
Out[9]:
age 12-17 18-59 >= 60
calendar weeks (2021) 05-43 40-43 05-43 40-43 05-40 40-43
percentage breakthrough infections 1.5 4.2 12.4 39.7 18.9 60.5
percentage of the population that is fully vaccinated 38.7 38.7 71.9 71.9 84.9 84.9
average_vaccine_effectiveness_pct 97.6 93.1 94.5 74.3 95.9 72.8

One can see that when vaccinated population is high, a larger number of breakthrough infections does not contradict a larger efficiency. The reason why many people don't find this intuitive is that non-linear relationships are not very intuitive in general. As such it helps to visualize the relationship.

The next graphic shows the dependency between the percentage of breakthrough infections and the effectiveness of the vaccination for various overall percentages of vaccination.

Plots of the breakthrough/efficiency relationship

In [5]:
# hide
def add_rate(data, rate):
    new_data = data.copy()
    column_name = f"{rate}"
    new_data[column_name] = new_data["pct breakthroughs"].map(
        lambda x: avg_vaccine_efficiency(rate, x) if x <= rate else np.nan
    )
    return new_data


def add_rates(data, rates):
    new_data = data.copy()
    for rate in rates:
        new_data = new_data.pipe(add_rate, rate=rate)
    return new_data


def melt_rates(data):
    rate_cols = [col for col in data.columns if col != "pct breakthroughs"]
    return x.melt(
        id_vars=["pct breakthroughs"],
        value_vars=rate_cols,
        value_name="vaccine efficiency",
        var_name="pct vaccinated",
    )


bt = np.linspace(1, 99, 4 * 98 + 1)
x = pd.DataFrame(bt, columns=["pct breakthroughs"]).pipe(
    add_rates, rates=[10, 20, 30, 40, 50, 60, 70, 80, 85, 90, 95]
)
In [7]:
# hide-input
import altair as alt

alt.Chart(x.pipe(melt_rates)).mark_line().encode(
    x="pct breakthroughs",
    y="vaccine efficiency",
    color="pct vaccinated",
).properties(width=600, height=300)
Out[7]:

We can clearly see that the relationship become very non-linear the higher the percentage of vaccinated people. The as explained in our earlier interpretation the vaccination rate can easily be identified by looking where a particular curve crosses 0.

Summary

We illustrated how Bayes' theorem can shed some light on the relationship between breakthrough infections and vaccine efficiency.

It is important to realize that this relationship becomes highly non-linear at high vaccination rates, even for effective vaccines.

In [ ]: