Notebook

Script based external simulators¶

In this notebook, we demonstrate the usage of the script based external simulators, summary statistics, and distance functions features.

These allow to use near-arbitrary programing languages and output for pyABC. The main concept is that all communication happens via the file system. This comes at the cost of a considerable overhead, making this feature not optimal for models with a low simulation time. For more expensive models, the overhead should be negligible.

This notebook is similar to the using_R notebook, but circumvents usage of the rpy2 package.

In [ ]:

# install if not done yet
!pip install pyabc --quiet

In [1]:

import pyabc
import pyabc.external

/home/yannik/anaconda3/lib/python3.7/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  defaults = yaml.load(f)

Here, we define model, summary statistics and distance. Note that if possible, alternatively FileIdSumStat can be used to read in the summary statistics directly to python and then use a python based distance function.

In [2]:

model = pyabc.external.ExternalModel(executable="Rscript", file="rmodel/model.r")
sumstat = pyabc.external.ExternalSumStat(executable="Rscript", file="rmodel/sumstat.r")
distance = pyabc.external.ExternalDistance(executable="Rscript", file="rmodel/distance.r")

dummy_sumstat = pyabc.external.create_sum_stat()  # can also use a real file here

Note that here we have the observed summary statistics encoded directly in the `distance.r` file, such that we don't need any here. Alternatively, we could also specifiy the summary statistics for real here via a file. Example of how the functions can be called:

In [3]:

pars = {'meanX': 3, 'meanY': 4}
mf = model(pars)
print(mf)
sf = sumstat(mf)
print(sf)
distance(sf, dummy_sumstat)

{'loc': '/tmp/modelsim_n6oahdmc', 'returncode': 0}
{'loc': '/tmp/sumstat_nys4kf_l', 'returncode': 0}

Out[3]:

2.994722

In [4]:

from pyabc import Distribution, RV, ABCSMC

prior = Distribution(meanX=RV("uniform", 0, 10),
                     meanY=RV("uniform", 0, 10))
abc = ABCSMC(model, prior, distance,
             summary_statistics=sumstat,
             population_size=20)

import os
from tempfile import gettempdir

db = "sqlite:///" + os.path.join(gettempdir(), "test.db")
abc.new(db, dummy_sumstat)

history = abc.run(minimum_epsilon=0.9, max_nr_populations=4)

INFO:History:Start <ABCSMC(id=1, start_time=2019-09-23 18:36:16.815211, end_time=None)>
INFO:Epsilon:initial epsilon is 5.276756500000001
INFO:ABC:t:0 eps:5.276756500000001
INFO:ABC:t:1 eps:4.03523
INFO:ABC:t:2 eps:2.3500524869456942
INFO:ABC:t:3 eps:1.857179536835703
INFO:History:Done <ABCSMC(id=1, start_time=2019-09-23 18:36:16.815211, end_time=2019-09-23 18:37:14.892234)>

Note the significantly longer runtimes compared to using rpy2. This is because the simulation time of this model is very short, such that repeatedly accessing the file system creates a notable overhead. For more expensive models, this overhead however becomes less notable. Still, if applicable, more efficient ways of communication between model and simulator are preferable.

In [5]:

from pyabc.visualization import plot_kde_2d

for t in range(history.n_populations):
    df, w = abc.history.get_distribution(0, t)
    ax = plot_kde_2d(df, w, "meanX", "meanY",
                      xmin=0, xmax=10,
                      ymin=0, ymax=10,
                      numx=100, numy=100)
    ax.scatter([4], [8],
               edgecolor="black",
               facecolor="white",
               label="Observation");
    ax.legend();
    ax.set_title("PDF t={}".format(t))