Application of conditional probabilities to vaccine efficiencies.

- toc: true
- branch: master
- badges: true
- comments: true
- author: Konrad Wölms
- categories: [python, jupyter, covid, bayes]

As part of the vaccination efforts during the Covid pandemic, vaccine efficiency and breakthrough infections have gotten a lot of attention. In this post we'll outline how Bayes' theorem can help to understand these concepts and their relationship. For that purpose we'll first cast the concepts into a probabilistic framework with the following definitions

- $I$ event of somebody being infected
- $V$ event of somebody being vaccinated
- $e$ efficiency of vaccination
- $P(I| \text{not } V)$ probability of an non-vaccinated person getting infected
- $P(I|V)$ probability of a vaccinated person getting infected
- $P(I|V) = (1-e)P(I|\text{not }V)$ relationship between vaccination efficiency and probabilities
- $P(V|I)$ probability of a infected person being vaccinated, i.e. chance of a breakthrough infection

With these definitions in place all we need is Bayes' Theorem

We can now apply this to breakthrough infections. For this purpose we set $A = V$ and $B = I$. That way we get

$$ P(V|I) = \frac {P(I|V)P(V)} {P(I|V)P(V) + P(I| \text{not }V)P(\text{not }V)} .$$Plugging in the above definitions we can write this as

$$\begin{aligned} P(V|I) &= \frac {(1-e)P(I|\text{not }V)P(V)} {(1-e)P(I|\text{not }V)P(V) + P(I| \text{not }V)P(\text{not }V)} \\ &= \frac {(1-e)P(V)} {(1-e)P(V) + P(\text{not }V)} \\ &= \frac {(1-e)q} {(1-e)q + (1-q)} \end{aligned}.$$Here we set $P(V) = q$ which is the probability of a person being vaccinated that can be set to the percentage of people in a population begin vaccinated.

The above formula can be used to estimate the average efficiency of vaccinations given the percentage of people that where vaccinated and the percentage of breakthrough infections. To this end we replace the probability of a breakthrough infection with the percentage of observed breakthrough infections and for brevity call that variable $b = P(V|I)$. Plugging this into the above equation and solving for $e$ yields.

$$e = 1 - \frac {(1-q)b} {(1-b)q} $$

In [1]:

```
def avg_vaccine_efficiency(pct_vaccinated, pct_breakthroughs):
return 100 * (
1
- (1 - pct_vaccinated / 100)
* pct_breakthroughs
/ ((1 - pct_breakthroughs / 100) * pct_vaccinated)
)
```

Warning: This is not a not a sophisticated epidemiological model but simply serves to illustrated the relationship to Bayes theorem (which will also be important for more realistic models). This model should only be used to build some rough intuition about the relationship but by itself is not sufficient for decision making.

Some limitations of the presented model for the presented model:

- unreported cases of infections and breakthroughs
- fraudulent vaccinations
- multiple infection scenarios
- different virus variants throughout time
- ...

One direct observation about this formula is the symmetry between $q$ and $b$ in denominator and numerator. This implies that when $q=b$ the effectiveness of the vaccination is 0! This makes intuitive sense because it means that when the chance of a infected person being vaccinated is the same as the chance of an arbitrary person being vaccinated, then vaccinations don't seem to help.

This also means that as long as the rate of breakthrough infections is below the vaccination rate, the vaccines have a real effect!

The German RKI published data (page 21) on breakthrough infections and percentages of vaccinated at the beginning of Nov 21.

In [2]:

```
# hide
import numpy as np
import pandas as pd
index = [
"percentage breakthrough infections",
"percentage of the population that is fully vaccinated",
]
columns = pd.MultiIndex.from_tuples(
[
("12-17", "05-43"),
("12-17", "40-43"),
("18-59", "05-43"),
("18-59", "40-43"),
(">= 60", "05-40"),
(">= 60", "40-43"),
],
names=["age", "calendar weeks (2021)"],
)
rki_data = pd.DataFrame(
[(1.5, 4.2, 12.4, 39.7, 18.9, 60.5), (38.7, np.nan, 71.9, np.nan, 84.9)],
index=index,
columns=columns,
)
```

In [3]:

```
rki_data
```

Out[3]:

age | 12-17 | 18-59 | >= 60 | |||
---|---|---|---|---|---|---|

calendar weeks (2021) | 05-43 | 40-43 | 05-43 | 40-43 | 05-40 | 40-43 |

percentage breakthrough infections | 1.5 | 4.2 | 12.4 | 39.7 | 18.9 | 60.5 |

percentage of the population that is fully vaccinated | 38.7 | NaN | 71.9 | NaN | 84.9 | NaN |

In [9]:

```
(
rki_data.T.fillna(method="ffill")
.assign(
average_vaccine_effectiveness_pct=lambda df: avg_vaccine_efficiency(
df["percentage of the population that is fully vaccinated"],
df["percentage breakthrough infections"],
)
)
.applymap(lambda x: round(x, 1))
.T
)
```

Out[9]:

age | 12-17 | 18-59 | >= 60 | |||
---|---|---|---|---|---|---|

calendar weeks (2021) | 05-43 | 40-43 | 05-43 | 40-43 | 05-40 | 40-43 |

percentage breakthrough infections | 1.5 | 4.2 | 12.4 | 39.7 | 18.9 | 60.5 |

percentage of the population that is fully vaccinated | 38.7 | 38.7 | 71.9 | 71.9 | 84.9 | 84.9 |

average_vaccine_effectiveness_pct | 97.6 | 93.1 | 94.5 | 74.3 | 95.9 | 72.8 |

One can see that when vaccinated population is high, a larger number of breakthrough infections does not contradict a larger efficiency. The reason why many people don't find this intuitive is that non-linear relationships are not very intuitive in general. As such it helps to visualize the relationship.

The next graphic shows the dependency between the percentage of breakthrough infections and the effectiveness of the vaccination for various overall percentages of vaccination.

In [5]:

```
# hide
def add_rate(data, rate):
new_data = data.copy()
column_name = f"{rate}"
new_data[column_name] = new_data["pct breakthroughs"].map(
lambda x: avg_vaccine_efficiency(rate, x) if x <= rate else np.nan
)
return new_data
def add_rates(data, rates):
new_data = data.copy()
for rate in rates:
new_data = new_data.pipe(add_rate, rate=rate)
return new_data
def melt_rates(data):
rate_cols = [col for col in data.columns if col != "pct breakthroughs"]
return x.melt(
id_vars=["pct breakthroughs"],
value_vars=rate_cols,
value_name="vaccine efficiency",
var_name="pct vaccinated",
)
bt = np.linspace(1, 99, 4 * 98 + 1)
x = pd.DataFrame(bt, columns=["pct breakthroughs"]).pipe(
add_rates, rates=[10, 20, 30, 40, 50, 60, 70, 80, 85, 90, 95]
)
```

In [7]:

```
# hide-input
import altair as alt
alt.Chart(x.pipe(melt_rates)).mark_line().encode(
x="pct breakthroughs",
y="vaccine efficiency",
color="pct vaccinated",
).properties(width=600, height=300)
```

Out[7]:

We illustrated how Bayes' theorem can shed some light on the relationship between breakthrough infections and vaccine efficiency.

It is important to realize that this relationship becomes highly non-linear at high vaccination rates, even for effective vaccines.

In [ ]:

```
```