by Pierre Denis
The present coronavirus pandemic highlights the question of the detection of the disease and the significance of symptom occurrences for this purpose. Generally speaking, common reasoning is misleading for this detection because it has several psychological biases, which are nowadays increased by alarming mass media news (as for the present crisis). This can cause wrong evaluation of the risks of having false positives or false negatives.
The present page presents a simple probabilistic model showing the occurrence of symptoms (fever, cough) that may be caused by some diseases (cold, flu, COVID-19). Once this model is set up, several queries can be made for calculating, among others, the probabilities of
This model uses Bayesian reasoning and noisy-OR techniques. It is largely inspired from the noisy-OR model given in this presentation of R. Blutner, as well as in the "Artificial Intelligence: A Modern Approach" book (2nd ed.) of S. Russell and P. Norvig.
The model is implemented using Lea, a Python package dedicated to probabilistic programming. The present page allows executing the command interactively, experimenting the influence of parameter change, doing what-if analysis and even enriching the model (refer to Lea tutorials).
*CAVEAT: The present model is merely a toy model provided for the example. Neither the model nor the probability values it uses have been endorsed by any medical/scientific authority! The numerical results obtained here have therefore no significance for the real world.*
Before running the examples below, you have to import a set of functions from the lea
module:
-> to do so, click on the button located beside the command below!
from lea import __version__, event, if_, joint, P
print ("Using Lea",__version__)
For a given patient, the model defines two symptom random variable (fever, cough), each of which can be either true or false. Each of these two symptoms can be caused by three diseases, modeled also by random variables: cold, flu and COVID-19.
The modelling starts by defining prior probability distributions of the three diseases (arbitrary values, not found in afore-mentioned references):
-> click in sequence on the buttons located beside the commands below!
cold = event(0.200)
flu = event(0.100)
covid19 = event(0.005)
print ("OK (prior probabilities set)")
At any time, you can check what is the probability of a given random variable by using the P
function. For instance:
P(cold)
Let's now define the conditional probabilities of fever if the occurrence of one single disease:
fever_if_sole_cold = event(0.4)
fever_if_sole_flu = event(0.8)
fever_if_sole_covid19 = event(0.9)
print ("OK (fever conditional probabilities set)")
... and to the same for the cough symptom:
cough_if_sole_cold = event(0.8)
cough_if_sole_flu = event(0.2)
cough_if_sole_covid19 = event(0.6)
print ("OK (cough conditional probabilities set)")
All probability values defined above can be changed as you wish. To take effect, you have just to (re-)execute the statements by clicking the button on the left.
Now, the following construct defines the fever
and cough
random variables as caused by the three diseases, in a so-called noisy-OR construct:
fever_by_cold = if_(cold , fever_if_sole_cold , False)
fever_by_flu = if_(flu , fever_if_sole_flu , False)
fever_by_covid19 = if_(covid19, fever_if_sole_covid19, False)
cough_by_cold = if_(cold , cough_if_sole_cold , False)
cough_by_flu = if_(flu , cough_if_sole_flu , False)
cough_by_covid19 = if_(covid19, cough_if_sole_covid19, False)
fever = fever_by_cold | fever_by_flu | fever_by_covid19
cough = cough_by_cold | cough_by_flu | cough_by_covid19
print ("OK (probabilistic model set up)")
The fever_by_cold
variable can be interpreted as "having fever due to a cold". In case of cold, it has some probability to be true, namely the number defined for prob_fever_if_sole_cold
. If there is no cold, fever_by_cold
is surely false, i.e. either there is no fever or the fever is not caused by cold.
Then, the fever
variable is defined by making a disjunction of three fever_by_
... variables. The same applies to define cough
variable as a disjunction of the three cough_by_
... variables.
Now that the model is defined, it can be queried in various ways to get new probabilities, possibly based on new information or assumptions.
*reminder: the calculated values uses the arbitrary probabilities defined above, hence have no guaranteed significance for the real world!*
Note: to run the following queries, you must have executed the statements of the previous section.
Q1: What is the probability of having fever?
P(fever)
Q2: What is the probability of having COVID-19 in the occurrence of fever (presence or absence of cough is unknown)?
P(covid19.given(fever))
Q3: What is the probability of having COVID-19 in the occurrence of fever and cough?
P(covid19.given(fever,cough))
Q4: What is the probability of having COVID-19 in the occurrence of fever but without cough?
P(covid19.given(fever,~cough))
Q5: What is the probability of having COVID-19 in the absence of fever and cough (ie being asymptomatic for this disease)?
P(covid19.given(~fever,~cough))
Q6: What is the probability of having flu in the occurrence of fever and cough?
P(flu.given(fever,cough))
Q7: What is the probability of having COVID-19 in the occurrence of fever and cough, knowing that it's not due to a cold?
P(covid19.given(fever,cough,~cold))
Q8: What is the probability of having COVID-19 in the occurrence of fever and cough, knowing that it's due neither to a cold nor to a flu?
P(covid19.given(fever,cough,~cold,~flu))
For getting more results in one single step, it is also possible to produce tabular data.
Q9: What are the probabilities of each symptom combination, independently of any information about a possible disease?
joint(fever,cough)
Q10: What are the probabilities of each symptom combination, depending of presence/absence of COVID-19?
joint(covid19,fever,cough)
Q11: What are the probabilities of each symptom combination, in case of COVID-19?
joint(fever,cough).given(covid19)
The model defined above may be enriched to cope with new random variables.
For instance, one could have partial information like the occurrence of some symptom(s) without knowing which exactly (fever or cough or both). The inclusive OR, expressed as a vertical bar, may be used to define such variable:
symptom = fever | cough
print ("OK (symptom defined)")
Then, new queries may be done.
Q12: What is the probability to have some symptom(s) in the occurrence of COVID-19?
P(symptom.given(covid19))
Q13: What is the probability to have some symptom(s) in the absence of COVID-19?
P(symptom.given(~covid19))
Another intersting case consists in modeling the probability of death due to COVID-19. Assuming we know that it is 3.4% for a patient having COVID-19 and 0.02% for any other patient (whatever possible other disease). Here is how to express this rule:
death = if_(covid19, event(0.034), event(0.0002))
print ("OK (death defined)")
Then, new types of queries can be made, possibly involving post-mortem analysis.
Q14: What is the probability of death for a patient having fever and coughing?
P(death.given(fever,cough))
Q15: What is the probability that a death is caused by COVID-19?
P(covid19.given(death))
Q16: What is the probability that a death is caused by COVID-19, given that there were no symptom?
P(covid19.given(death,~symptom))
Q17: Which were the probabilities of combinations COVID-19 / symptoms, for a (now) dead patient?
joint(covid19,symptom).given(death)
Independently of the model shown above, you may read a very interesting post on coronavirus and probability from Raphael Sonabend.
Here are the translation of the use case in Lea where
a_lt_65_if_c
is the probability of being under 65-years-old given dying from COVID-19.c
is the (unconditional) probability of dying from COVID-19a_lt_65_prior
is the (unconditional) probability of being under 65.c = event(0.034)
a_lt_65_if_c = event(0.19)
a_lt_65_prior = event(0.92)
a_lt_65 = if_(c, a_lt_65_if_c, prior_lea=a_lt_65_prior)
print ("OK (model defined)")
Now, the probability of dying from COVID-19 if you’re under 65 is calculated as follows:
P(c.given(a_lt_65))
The result here is perfectly in line with the value 0.7% calculated in the post. You may change the probability values above to do what-if analysis.
Now, Lea can do more...
As explained here, Lea enables you to do symbolic calculation, that is producing probability formulas instead of numbers. To do so, simply replace actual probability values in the model above by parameter names, like 'a'
, 'b'
, 'c'
:
c = event('c')
a_lt_65_if_c = event('a')
a_lt_65_prior = event('b')
a_lt_65 = if_(c, a_lt_65_if_c, prior_lea=a_lt_65_prior)
print ("OK (symbolic model defined)")
Then, the calculated conditional probability produces an arithmetic expression
P(c.given(a_lt_65))
which is in line with Bayes' theorem. Other expressions include the probability of being under 65-years-old given NOT dying from COVID-19:
P(a_lt_65.given(~c))
or joint probability distribution of death vs age:
joint(c,a_lt_65)
You are invited to experiment on this "sandbox" page, by changing the given probability values and see the effects on the query results or by making your own queries.
To have a better understanding of the techniques used and possibly refine the model, you are invited to read the Lea tutorials.
Note that a similar medical case study, dedicated to mammography and breast cancer, has been presented in this post of Chris Strelioff. Take care that it uses version 2 of Lea (the current page requires Lea 3, which is not backward compatible).
Questions or comments can be addressed to pie.denis@skynet.be.
last updated on 2020, March 22, 12h00 CET