# Herd immunity

Simulations investigating herd immunity with heterogeneous infectiousness

Jonatan Pallesen
05-07-2020

## Main take-aways

• There is no reason to think that the requirement for herd immunity is defined by $$(R_{0} - 1) / R_0$$.
• If you consider that some people tend to be more large groups of people, they will both be infected earlier, and reduce the spread of disease by a larger amount if they become immune.
• Thus the minimum threshold possible for herd immunity could plausibly be significantly lower than the claimed 70-80%.
• The path to immunity matters. If infection spreads fast, it is possible can overshoot the herd immunity threshold level.
• Thus, the fact that some towns have 60%+ infected, does not mean that herd immunity can not be reached at much lower levels, in places with lower momentum.

## Introduction

How does herd immunity work exactly? The basic theory is that if $$R_0 = 3$$, then the herd immunity threshold is 66%. This means that if immunuty in a population is above this threshold, the disease would not spread if reintroduced, while if immunity if below this threshold, it would. The argument behind this is that if $$R_0 = 3$$, then every infectious person infects 3 other people. But if 2 or more of these are immune, then each infected person will infect at most one other.

However, I think this argument is based on an inaccurate model, and it is possible that the real herd immunity threshold is significantly lower. I will look at this with some simulations.

I do this in the simplest way with the fewest assumptions:

• A population of 1 million
• SIR model: Each individual is either susceptible, infectious, or immune.
• Each person has an infectivity factor iif
• In each iteration, each infected person spreads the virus to iif other random people. Following this they become immune
• The infection targets who are not immune become infected for the next iteration
• The risk of an individual to be infected is proportional to that persons iif

I plot a number of random runs to see the variation

[Note: These simulations are clearly simplistic. However, I think simple models can show some simple points.]

imports


library(pacman)

p_load(tidyverse, magrittr, glue, zeallot, ggeasy, pander)

panderOptions('round', 2)

set.seed(0)


## Standard assumptions

First, let us look at the standard assumption, that everyone has the same iif


n <- 10**6

df <- tibble(id = 1:n, r = 3, infected = F, immune = F)

stat_table <- function(df)
{
tribble(~variable, ~value,
"n", df %>% nrow(),
"mean", mean(df$r), "median", median(df$r),
"min", min(df$r), "max", max(df$r),
"infections by top 20% most infectious", df %>% arrange(-r) %>% head(0.2 *n) %$% sum(r) / sum(df$r)
)
}

stat_table(df)

variable value
n 1e+06
mean 3
median 3
min 3
max 3
infections by top 20% most infectious 0.2

Here is the development in a population with 70% immunity, in which new infections are introduced:


plot_it <- function(t){
t %>% rowid_to_column("i") %>%
pivot_longer(-i) %>%
mutate(value = value / n) %>%
ggplot(aes(x = i, y = value, fill = name)) +
geom_line(color = "turquoise4") +
theme_light() +
labs(y = "", x = "") +
scale_color_brewer() +
scale_y_continuous(labels = scales::percent, limits = c(0, 1))
}

run_one_iteration <- function(df){
#The sum of infectiousness of the infected
n_newinf <- round(df %>% filter(infected) %$% sum(r)) # These new infections hit people with a chance determined by their infectiousness newinf <- sample(df$id, n_newinf, replace = T, prob = df$r) # After one iteration, the people hit by an infection become infected if they are not immune, # and the previously infected people become immune df %>% mutate( immune = immune | infected, infected = id %in% newinf & !immune ) } run_n_iterations <- function(n, df, l){ for (i in 1:n){ df <- run_one_iteration(df) l <- c(l, df %>% filter(infected | immune) %>% nrow()) } list(df, l) } one_run_orig <- function(df, n_iter, n_newinf = 500){ new_inf <- sample(df %>% filter(!immune) %>% pull(id), n_newinf) df %<>% mutate(infected = id %in% new_inf) l <- df %>% filter(infected | immune) %>% nrow() c(df, l) %<-% run_n_iterations(n_iter, df, l) l } runit <- function(df){ n_iter <- 30 glue("r{1:2}") %>% map_dfc(~tibble(!!.x := one_run_orig(df, n_iter))) %>% plot_it() } df %>% mutate(immune = id <= 0.7 * n) %>% runit()  We can see that the disease does not start spreading again, and we have herd immunity, as predicted by the theory. Now let’s see if we have 50% with immunity and infection is reintroduced:  df %>% mutate(immune = id < 0.5 * n) %>% runit()  Here the immunity level is not high enough, and infection is reintroduced, and continues until the immunity threshold level is reached. In fact, it overshoots. The theoretical immunity threshold is 66%, but it reaches more than 75% infected. This is due to infection momentum. If infection is widespread at one point in time, the number of infected people can exceed the threshold limit. We can see this even more clearly if we look at the spread in a non-immune population:  df %>% runit()  Here we reach infection levels of close to 100%! Much higher than the immunity threshold. This is an important mechanism to remember when people say we should aim for herd immunity. The path matters. ## Heterogeneous infectiousness In the model above, we assume that every person spread the virus to 3 new people. But this assumption is clearly not correct. Some people meet a lot of other people, shake hands, go to crowded parties or public transport, etc. While other people are in the other end of the spectrum. Crucially, if we consider this, the herd immunity threshold will change. This is important since the public debate assumes that 70% or more is required for herd immunity. But these simulations will show that there is good reason to think that this may not be the case. It is of course impossible to know what the real distribution of infectiousness looks like. I will use a gamma distribution with two conditions: • It follows the Pareto rule: The 20% most infectious are responsible for 80% of infections • There is a lower bound of iif at $$1/10$$ of the mean, so that we don’t have people who are practically never at risk of being infected.  mean_r <- 3 cv <- 2.39 r = qgamma(seq(0.5 / n, 1 - 0.5 / n, by = 1 / n), sh = 1 / cv^2, rate = (1/(mean_r * 0.9)) / cv^2) + mean_r / 10 df <- tibble(id = 1:n, r = r, infected = F, immune = F, r_orig = r) stat_table(df)  variable value n 1e+06 mean 3 median 0.49 min 0.3 max 167.1 infections by top 20% most infectious 0.8  ggplot(df, aes(x = r)) + geom_histogram(binwidth= 1, fill = "turquoise4", color = "black") + theme_classic() + labs(x = "Number of people infected by person x")  This looks reasonable to me. Most people infect few others, but with a tail of highly infectious people. The mean infectiousness is still 3 as before. In the following simulations, people infectiousness vary, together with their risk of being infected. If this iif distribution is assumed, the disease will spread in a susceptible population like this:  one_run_base <- function(df, params){ start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf)
df %<>% mutate(infected = id %in% start_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(params$end, df, l) l } params <- tibble(end = 30, n_startinf = 20) a <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run_base(df, params))) a %>% plot_it()  Here we get to the herd immunity threshold at about 55%, which is lower than the predicted 2/3. But note that this is with momentum. If the momentum is slowed, herd immunity could be achieved at even lower percentages, as shown in the plot below. The shaded area is a period where the R rate is reduced to below 1, and the dashed line is the point where 500 new infections are introduced into the population.  one_run <- function(df, params){ start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf)
df %<>% mutate(infected = id %in% start_inf)
l <- df %>% filter(infected | immune) %>% nrow()
c(df, l) %<-% run_n_iterations(params$start_lockdown, df, l) df %<>% mutate(r = r_orig / 5) c(df, l) %<-% run_n_iterations(params$end_lockdown - params$start_lockdown, df, l) df %<>% mutate(r = r_orig) c(df, l) %<-% run_n_iterations(params$reinfection - params$end_lockdown, df, l) new_inf <- sample(df %>% filter(!immune) %>% pull(id), 500) df %<>% mutate(infected = id %in% new_inf) c(df, l) %<-% run_n_iterations(params$end - params$reinfection, df, l) l } params <- tibble(start_lockdown = 4, end_lockdown = 8, reinfection = 20, end = 30, n_startinf = 20) b <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run(df, params)))# %>% b %>% plot_it() + annotate("rect", fill = "turquoise4", alpha = 0.1, xmin = params$start_lockdown, xmax = params$end_lockdown, ymin = -Inf, ymax = Inf) + geom_vline(xintercept = params$reinfection, linetype = "dashed", color = "lightcoral")


It thus seems that there is no good reason to expect a herd immunity threshold at the level defined by $$(X_0 -1) / X_0$$, if a slightly more realistic model can have a much lower threshold.

Intuitively, the reason that a model with more realistic heterogeneity has lower herd immunity threshold, is that those who are the largest spreaders also on average tend to be infected earlier. And if a large portion of the people who have the largest iif are immune, that means herd immunity is easier to reach with fewer people.

Many say that the point of lockdown is to lower the pressure on hospitals only. But these models indicate the possibility that, if the lockdowns are used right, they could also lower the total number of infected significantly.


a %>% rowid_to_column("i") %>%
pivot_longer(-i) %>% mutate(momentum = "full") %>%
bind_rows(
b %>% rowid_to_column("i") %>%
pivot_longer(-i) %>%
mutate(momentum = "lowered")
) %>% mutate(value = value / n) %>%
ggplot(aes(x = i, y = value, fill = name, color = momentum)) +
geom_line() +
theme_light() +
labs(y = "", x = "") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1))


Some people argue that the threshold cannot be less than 60%, since some Italian cities have at least that level. However, we can see here that it depends on the momentum. The same model with the same parameters, can end up at 55% vs 30% depending on whether the momentum is lowered.

The timing of lockdowns (or other initiatives that reduce the mean R) is cricual. If it ends too early, it’s possible to overshoot the minimum required herd immunity threshold:


one_run <- function(df, params){
start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf) df %<>% mutate(infected = id %in% start_inf) l <- df %>% filter(infected | immune) %>% nrow() c(df, l) %<-% run_n_iterations(params$start_lockdown, df, l)
df %<>% mutate(r = r_orig / 5)
c(df, l) %<-% run_n_iterations(params$end_lockdown - params$start_lockdown, df, l)
df %<>% mutate(r = r_orig)
c(df, l) %<-% run_n_iterations(params$end - params$end_lockdown, df, l)
l
}

params <- tibble(start_lockdown = 2, end_lockdown = 6, end = 30, n_startinf = 20)

a <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run(df, params)))# %>%

a %>% plot_it() +
annotate("rect", fill = "turquoise4", alpha = 0.1,
xmin = params$start_lockdown, xmax = params$end_lockdown,
ymin = -Inf, ymax = Inf)


And the same is the case if the lockdown starts too late:


one_run <- function(df, params){
start_inf <- sample(df %>% filter(!immune) %>% pull(id), params$n_startinf) df %<>% mutate(infected = id %in% start_inf) l <- df %>% filter(infected | immune) %>% nrow() c(df, l) %<-% run_n_iterations(params$start_lockdown, df, l)
df %<>% mutate(r = r_orig / 5)
c(df, l) %<-% run_n_iterations(params$end_lockdown - params$start_lockdown, df, l)
df %<>% mutate(r = r_orig)
c(df, l) %<-% run_n_iterations(params$end - params$end_lockdown, df, l)
l
}

params <- tibble(start_lockdown = 5, end_lockdown = 10, end = 30, n_startinf = 25)

a <- glue("r{1:10}") %>% map_dfc(~tibble(!!.x := one_run(df, params)))# %>%

a %>% plot_it() +
annotate("rect", fill = "turquoise4", alpha = 0.1,
xmin = params$start_lockdown, xmax = params$end_lockdown,
ymin = -Inf, ymax = Inf)


## The Swine flu

In the Swine flu pandemic, it was estimated that $$R_0 = 1.5$$ and 24% were eventually infected. How does this fit with the model here?

The claim is that this fits with the claim that herd immunity is around $$(R_0 - 1) / R_0$$. Since this is $$(1.5 - 1) / 1.5$$ = 33%, which is reasonably close to 24%. (And there may have been some lingering immunity left from the 1968 pandemic, which would further further lower the herd immunity threshold.)

But if we assume a model where everyone has the same iif, momentum would actually carry is far past 24%.


n <- 10**6

df_std <- tibble(id = 1:n, r = 1.5, infected = F, immune = F)



df_std %>% runit()


In contrast, the model with heterogeneous iif fits slightly better:


mean_r <- 1.5

cv <- 2.39

r = qgamma(seq(0.5 / n, 1 - 0.5 / n, by = 1 / n),
sh = 1 / cv^2,
rate = (1/(mean_r * 0.9)) / cv^2) + mean_r / 10

df_het <- tibble(id = 1:n, r = r, infected = F, immune = F, r_orig = r)

stat_table(df_het)

variable value
n 1e+06
mean 1.5
median 0.25
min 0.15
max 83.52
infections by top 20% most infectious 0.8


df_het %>% runit()


Thus, while it is true that the level ended up at something close to that indicated by $$(R_0 - 1) / R_0$$, this is not because this estimate is correct. But rather a result of $$(R_0 - 1) / R_0$$ giving to high an evaluation for the necessary herd immunity threshold, but ignoring momentum giving too low an evaluation, which sort of even each other out.