Bite Size Bayes

Copyright 2020 Allen B. Downey

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

The Elvis problem

Here's a problem from Bayesian Data Analysis:

Elvis Presley had a twin brother (who died at birth). What is the probability that Elvis was an identical twin?

I will answer this question in three steps:

  1. First, we need some background information about the relative frequencies of identical and fraternal twins.

  2. Then we will use Bayes's Theorem to take into account one piece of data, which is that Elvis's twin was male.

  3. Finally, living up to the name of this blog, I will overthink the problem by taking into account a second piece of data, which is that Elvis's twin died at birth.

For background information, I'll use data from 1935, the year Elvis was born, from the U.S. Census Bureau, Birth, Stillbirth, and Infant Mortality Statistics for the Continental United States, the Territory of Hawaii, the Virgin Islands 1935.

It includes this table:

The table doesn't report which twins are identical or fraternal, but we can use the data to compute

  • $x$: The fraction of twins that are opposite sex.
In [1]:
opposite = 8397
same = 8678 + 8122

x = opposite / (opposite + same)
x
Out[1]:
0.3332539588046196

The quantity we want is

  • $f$: The fraction of twins who are fraternal.

So let's see how we can get from $x$ to $f$.

Because identical twins have the same genes, they are almost always the same sex. Fraternal twins do not have the same genes; like other siblings, they are about equally likely to be the same or opposite sex.

So we can write this relationship between $x$ and $f$

$x = f/2 + 0$

which says that the opposite sex twins include half of the fraternal twins and none of the identical twins.

And that implies

$f = 2x$

In [2]:
f = 2*x
f
Out[2]:
0.6665079176092392

In 1935, about 2/3 of twins were fraternal, and 1/3 were identical.

Getting back to the Elvis problem, we can use $1-f$ and $f$ as prior probabilities for the two hypotheses, identical and fraternal:

In [3]:
index = ['identical', 'fraternal']
prior = 1-f, f

Now we can take into account the data:

  • $D$: Elvis's twin was male.

The probability of $D$ is nearly 100% if they were identical twins and about 50% if they were fraternal.

In [4]:
likelihood = 1, 0.5

Here's a function that takes the hypotheses, the priors, and the likelihoods and puts them in a Bayes table:

In [5]:
import pandas as pd

def make_bayes_table(index, prior, likelihood):
    """Make a Bayes table.
    
    index: sequence of hypotheses
    prior: prior probabilities
    likelihood: sequence of likelihoods
    
    returns: DataFrame
    """
    table = pd.DataFrame(index=index)
    table['prior'] = prior
    table['likelihood'] = likelihood
    table['unnorm'] = table['prior'] * table['likelihood']
    prob_data = table['unnorm'].sum()
    table['posterior'] = table['unnorm'] / prob_data
    return table

And here are the results

In [6]:
table = make_bayes_table(index, prior, likelihood)
table
Out[6]:
prior likelihood unnorm posterior
identical 0.333492 1.0 0.333492 0.500179
fraternal 0.666508 0.5 0.333254 0.499821

From the Bayes table I'll extract $p_i$, which is the probability that same sex twins are identical:

In [7]:
p_i = table['posterior']['identical']
p_i
Out[7]:
0.5001785714285714

With priors based on data from 1935, the posterior probability that Elvis was a twin is close to 50%.

But there is one more piece of data to take into account; the fact that Elvis's twin died at birth.

Let's assume that there are different risks for fraternal and identical twins. The quantities we want are

  • $r_f$: The probability that one twin is stillborn, given that they are fraternal.

  • $r_i$: The probability that one twin is stillborn, given that they are identical.

We can't get those quantities directly from the table, but we can compute:

  • $y$: the probability of "1 living", given that the twins are opposite sex

  • $z$: the probability of "1 living", given that the twins are the same sex

In [8]:
y = (258 + 299) / opposite
y
Out[8]:
0.06633321424318209
In [9]:
z = (655 + 564) / same
z
Out[9]:
0.07255952380952381

Assuming that all opposite sex twins are fraternal, we can infer that the risk for fraternal twins is $y$, the risk for opposite sex twins:

$r_f = y$

In [10]:
r_f = y
r_f
Out[10]:
0.06633321424318209

And because we know the fraction of same sex twins who are identical, $p_i$, we can write the following relation

$z = p_i r_i + (1-p_i) r_f$

which says that the risk for same sex twins is the weighted sum of the risks for identical and fraternal twins, with the weight $p_i$.

Solving for $r_i$, we get

$r_i = 2z - r_f$

And we have already computed $z$ and $r_f$

In [11]:
r_i = 2*z - r_f
r_i
Out[11]:
0.07878583337586553

In this dataset, it looks like the probability of "1 alive" is a little higher for identical twins.

So we can do a second update to take into account the data that Elvis's twin died at birth. The posterior probabilities from the first update become the priors for the second.

In [12]:
prior2 = table['posterior']

Here are the likelihoods:

In [13]:
likelihood2 = r_i, r_f

And here are the results.

In [14]:
table2 = make_bayes_table(index, prior2, likelihood2)
table2
Out[14]:
prior likelihood unnorm posterior
identical 0.500179 0.078786 0.039407 0.543082
fraternal 0.499821 0.066333 0.033155 0.456918

With the new data, the posterior probability that Elvis was an identical twin is about 54%.

Credit: Thanks to Jonah Spicher, who took my Bayesian Stats class at Olin and came up with the idea to use data from 1935 and take into account the fact that Elvis's twin died at birth.

In [ ]: