Modeling without data is like riding a bicycle while blindfolded – rarely dull, but often you don't get to where you want to go. This tutorial shows how to use data with Covasim, and gives a brief introduction to people, populations, and contact layers.
Click here to open an interactive version of this notebook.
Covasim is intentionally designed to be flexible with data requirements, acknowledging that some settings have large amounts of data, while others have very little. There are, however, some minimum data requirements if a real-world context is being modeled (as opposed to a theoretical exploration). These are:
In addition to these essential data requirements, several other pieces of data are useful to have. These are:
test_num
intervention (see Tutorial 5), and is very useful for interpreting diagnoses data.Covasim includes pre-downloaded data on country (and US state) age distributions and household size distributions. As we saw in Tutorial 1, you can load these data simply by using the location
parameter. You can show a list of all available locations with cv.data.show_locations()
. The data themselves are simply a set of dictionaries, and these can be modified directly; for example, to add a custom age distribution for Johannesburg would look like this:
# Note data format and key names!
joburg_pop = {
'0-9': 286620,
'10-19': 277020,
'20-29': 212889,
'30-39': 161329,
'40-49': 104399,
'50-59': 51716,
'60-69': 36524,
'70-79': 22581,
'80+': 7086,
}
cv.data.country_age_data.data['Johannesburg'] = joburg_pop
You can then use these data via sim = cv.Sim(location='Johannesburg')
.
Covasim includes a script to automatically download time series data on diagnoses, deaths, and other epidemiological information from several major sources of COVID-19 data. These include the Corona Data Scraper, the European Centre for Disease Prevention and Control, and the COVID Tracking Project. These scrapers provide data for a large number of locations (over 4000 at the time of writing), including the US down to the county level and many other countries down to the district level. The data they download is already in the correct format for Covasim.
Note: These data sources are frequently updated, and some may no longer work. Please contact us at info@covasim.org if you're having trouble.
The correct input data format for Covasim looks like this:
import pandas as pd
df = pd.read_csv('example_data.csv')
print(df)
The data can be CSV, Excel, or JSON format. There must be a column named date
(not "Date" or "day" or anything else). Otherwise, each column label must start with new_
(daily) or cum_
(cumulative) and then be followed by any of: tests
, diagnoses
, deaths
, severe
(corresponding to hospitalizations), or critical
(corresponding to ICU admissions). While other columns can be included and will be loaded, they won't be parsed by Covasim. Note that if you enter a new_
(daily) column, Covasim will automatically calculate a cum_
(cumulative) column for you.
Note: Sometimes date information fails to be read properly, especially when loading from Excel files via pandas. If you encounter this problem, see Tutorial 10 for help on fixing this.
This example shows how a simulation can load in the data, and how it automatically plots it. (We'll cover interventions properly in the next tutorial.)
import covasim as cv
cv.options(jupyter=True, verbose=0)
pars = dict(
start_day = '2020-02-01',
end_day = '2020-04-11',
beta = 0.015,
)
sim = cv.Sim(pars=pars, datafile='example_data.csv', interventions=cv.test_num(daily_tests='data'))
sim.run()
sim.plot(['cum_tests', 'cum_diagnoses', 'cum_deaths'])
As you can see, this is not a great fit to data – but we'll come to calibration in Tutorial 7.
Agents in Covasim are contained in an object called People
, which contains all of the agents' properties, as well as methods for changing them from one state to another (e.g., from susceptible to infected).
Agents interact with each other via one or more contact layers. You can think of each agent as a node in a mathematical graph, and each connection as an edge. By default, Covasim creates a single random contact network where each agent is connected to 20 other agents, completely at random. However, this is not a very realistic representation of households, workplaces, schools, etc.
For greater realism, Covasim also comes with a "hybrid" population option, which provides a more realism while still being fast to generate. (It's called "hybrid" because it's a combination of the random network and the SynthPops network, described in Tutorial 11, which is much more realistic but requires a lot of data and is computationally intensive.) The hybrid option provides four contact layers: households 'h'
, schools 's'
, workplaces 'w'
, and community interactions 'c'
. Each layer is defined by (a) which agents are connected to which other agents, and (b) the weight of each connection (i.e., transmission probability). Specifically:
Note that for most countries, you can load default data (age distribution and household size, both from the UN) by using the location
keyword when creating a sim. For example, to create a realistic (i.e. hybrid) population 10,000 people for Bangladesh and plot the results, you would do:
pars = dict(
pop_size = 10_000, # Alternate way of writing 10000
pop_type = 'hybrid',
location = 'Bangladesh', # Case insensitive
)
sim = cv.Sim(pars)
sim.initialize() # Create people
fig = sim.people.plot() # Show statistics of the people
Note: For an explanation of population size, total population, and dynamic rescaling, please see the FAQ.
Since creating populations can be slow, and since deleting people is a bit mean, sometimes you want to save the population to work with it later. To do this, initialize the people, save them, then load them again. (This example also illustrates how you can use sc.timer()
to check how long a block of code takes.)
import sciris as sc # We'll use this to time how long each one takes
pars = dict(n_agents=50e3, pop_type='hybrid')
with sc.timer('creating'):
sim1 = cv.Sim(pars).init_people()
sim1.people.save('my-people.ppl')
with sc.timer('loading'):
sim2 = cv.Sim(pars, popfile='my-people.ppl').init_people()
It's about twice as fast to load a population than create one, but whether this will actually matter to you depends on the population size and the length of the simulation.