Instructions on using a Jupyter Notebook - Simply click the cell and press shift-enter, or click the "Run" button in the top panel.
Note: If the notebook seems slow to run, try restarting the kernel.
If you want to save your work while using Binder, making sure to save and download your notebook
To run stdpopsim
locally, make sure to follow the installation instructions
Import the necessary packages
import stdpopsim
import matplotlib.pyplot as plt
import numpy as np
Simulate 10% of human chromosome 22 with the three population out-of-Africa model, with 10 samples from each of the three populations.
species = stdpopsim.get_species("HomSap")
contig = species.get_contig("chr22", length_multiplier=0.1)
model = species.get_demographic_model("OutOfAfrica_3G09")
samples = model.get_samples(10, 10, 10)
engine = stdpopsim.get_default_engine()
ts = engine.simulate(model, contig, samples)
print("simulated {} trees and {} sites, from {} samples.".format(ts.num_trees, ts.num_sites, ts.num_samples))
Next, we get subsets of the data corresponding to each set of samples and then we simplfy to get the tree sequences corresponding to these subsets of samples. This removes monomorphic sites within each population.
YRI_samples = ts.samples(0)
CEU_samples = ts.samples(1)
CHB_samples = ts.samples(2)
ts_YRI = ts.simplify(samples=YRI_samples)
ts_CEU = ts.simplify(samples=CEU_samples)
ts_CHB = ts.simplify(samples=CHB_samples)
Next, we calculate the site frequency spectrum for each population.
sfs_YRI = ts_YRI.allele_frequency_spectrum(polarised=True,span_normalise=False)
sfs_CEU = ts_CEU.allele_frequency_spectrum(polarised=True,span_normalise=False)
sfs_CHB = ts_CHB.allele_frequency_spectrum(polarised=True,span_normalise=False)
Finally, we plot the site frequency spectrum for each population.
bar_width = 0.2
r1 = np.arange(0,11) - 0.2
r2 = [x + bar_width for x in r1]
r3 = [x + bar_width for x in r2]
ax = plt.subplot()
plt.bar(x = r1, height = sfs_YRI/ts_YRI.num_sites, width=bar_width, color='grey')
plt.bar(x = r2, height = sfs_CEU/ts_CEU.num_sites, width=bar_width, color='lightblue')
plt.bar(x = r3, height = sfs_CHB/ts_CHB.num_sites, width=bar_width, color='green')
plt.xlabel("Allele count", fontweight="bold")
plt.ylabel("Proportion of mutated sites in sample", fontweight="bold")
ax.set_xticks(np.arange(0,11))
ax.legend(['YRI', 'CEU', 'CHB'])
plt.plot()
Simulate 10% of human chromosome 22 with the three population out-of-Africa model, with 10 samples from each of the three populations.
%%bash
stdpopsim HomSap --chromosome chr22 --length-multiplier 0.1 --demographic-model OutOfAfrica_3G09 --output OutOfAfrica_3G09.trees 10 10 10
Now, we can use tskit to read the simulated file and calculate the site frequency spectrum.
import tskit
ts = tskit.load("OutOfAfrica_3G09.trees")
Once we have loaded the simulated tree sequence file, we can use the same code as above.
YRI_samples = ts.samples(0)
CEU_samples = ts.samples(1)
CHB_samples = ts.samples(2)
ts_YRI = ts.simplify(samples=YRI_samples)
ts_CEU = ts.simplify(samples=CEU_samples)
ts_CHB = ts.simplify(samples=CHB_samples)
sfs_YRI = ts_YRI.allele_frequency_spectrum(polarised=True,span_normalise=False)
sfs_CEU = ts_CEU.allele_frequency_spectrum(polarised=True,span_normalise=False)
sfs_CHB = ts_CHB.allele_frequency_spectrum(polarised=True,span_normalise=False)
bar_width = 0.2
r1 = np.arange(0,11) - 0.2
r2 = [x + bar_width for x in r1]
r3 = [x + bar_width for x in r2]
ax = plt.subplot()
plt.bar(x = r1, height = sfs_YRI/ts_YRI.num_sites, width=bar_width, color='grey')
plt.bar(x = r2, height = sfs_CEU/ts_CEU.num_sites, width=bar_width, color='lightblue')
plt.bar(x = r3, height = sfs_CHB/ts_CHB.num_sites, width=bar_width, color='green')
plt.xlabel("Allele count", fontweight="bold")
plt.ylabel("Proportion of mutated sites in sample", fontweight="bold")
ax.set_xticks(np.arange(0,11))
ax.legend(['YRI', 'CEU', 'CHB'])
plt.plot()