In this notebook, we demonstrate the usage of the script based external simulators, summary statistics, and distance functions features.
These allow to use near-arbitrary programing languages and output for pyABC. The main concept is that all communication happens via the file system. This comes at the cost of a considerable overhead, making this feature not optimal for models with a low simulation time. For more expensive models, the overhead should be negligible.
This notebook is similar to the using_R notebook, but circumvents usage of the rpy2 package.
# install if not done yet
!pip install pyabc --quiet
import pyabc
import pyabc.external
/home/yannik/anaconda3/lib/python3.7/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f)
Here, we define model, summary statistics and distance. Note that if possible, alternatively FileIdSumStat can be used to read in the summary statistics directly to python and then use a python based distance function.
model = pyabc.external.ExternalModel(executable="Rscript", file="rmodel/model.r")
sumstat = pyabc.external.ExternalSumStat(executable="Rscript", file="rmodel/sumstat.r")
distance = pyabc.external.ExternalDistance(executable="Rscript", file="rmodel/distance.r")
dummy_sumstat = pyabc.external.create_sum_stat() # can also use a real file here
pars = {'meanX': 3, 'meanY': 4}
mf = model(pars)
print(mf)
sf = sumstat(mf)
print(sf)
distance(sf, dummy_sumstat)
{'loc': '/tmp/modelsim_n6oahdmc', 'returncode': 0} {'loc': '/tmp/sumstat_nys4kf_l', 'returncode': 0}
2.994722
from pyabc import Distribution, RV, ABCSMC
prior = Distribution(meanX=RV("uniform", 0, 10),
meanY=RV("uniform", 0, 10))
abc = ABCSMC(model, prior, distance,
summary_statistics=sumstat,
population_size=20)
import os
from tempfile import gettempdir
db = "sqlite:///" + os.path.join(gettempdir(), "test.db")
abc.new(db, dummy_sumstat)
history = abc.run(minimum_epsilon=0.9, max_nr_populations=4)
INFO:History:Start <ABCSMC(id=1, start_time=2019-09-23 18:36:16.815211, end_time=None)> INFO:Epsilon:initial epsilon is 5.276756500000001 INFO:ABC:t:0 eps:5.276756500000001 INFO:ABC:t:1 eps:4.03523 INFO:ABC:t:2 eps:2.3500524869456942 INFO:ABC:t:3 eps:1.857179536835703 INFO:History:Done <ABCSMC(id=1, start_time=2019-09-23 18:36:16.815211, end_time=2019-09-23 18:37:14.892234)>
Note the significantly longer runtimes compared to using rpy2. This is because the simulation time of this model is very short, such that repeatedly accessing the file system creates a notable overhead. For more expensive models, this overhead however becomes less notable. Still, if applicable, more efficient ways of communication between model and simulator are preferable.
from pyabc.visualization import plot_kde_2d
for t in range(history.n_populations):
df, w = abc.history.get_distribution(0, t)
ax = plot_kde_2d(df, w, "meanX", "meanY",
xmin=0, xmax=10,
ymin=0, ymax=10,
numx=100, numy=100)
ax.scatter([4], [8],
edgecolor="black",
facecolor="white",
label="Observation");
ax.legend();
ax.set_title("PDF t={}".format(t))