import radd
from radd import build, vis
import warnings
warnings.filterwarnings("ignore")
# may take a while to build font cache
%matplotlib inline
%load_ext autoreload
%autoreload 2
radd.style_notebook()
# Initial state of Stop process (red) depends on current strength of
# Go activation (green) assumes Stop signal efficacy at later SSDs diminishes
# as the state of the Go process approaches the execution threshold (upper bound).
# (pink lines denote t=SSD, blue is trial deadline)
radd.load_dpm_animation()
# read data into pandas DataFrame (http://pandas.pydata.org/)
# example_data contains data from 15 subjects in the
# Reactive Stop-Signal task discussed in Dunovan et al., (2015)
data = radd.load_example_data()
data.head()
idx | Cond | ttype | choice | response | acc | rt | ssd | |
---|---|---|---|---|---|---|---|---|
0 | 28 | bsl | go | go | 1 | 1 | 0.5985 | 1000 |
1 | 28 | bsl | go | go | 1 | 1 | 0.5202 | 1000 |
2 | 28 | bsl | go | go | 1 | 1 | 0.5451 | 1000 |
3 | 28 | bsl | go | go | 1 | 1 | 0.5716 | 1000 |
4 | 28 | bsl | go | go | 1 | 1 | 0.5052 | 1000 |
model = build.Model(data=data, kind='xdpm', depends_on={'v':'Cond'})
model.observedDF.head()
idx | Cond | acc | 200 | 250 | 300 | 350 | 400 | c10 | c20 | ... | c90 | e10 | e20 | e30 | e40 | e50 | e60 | e70 | e80 | e90 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 28 | bsl | 0.9917 | 1.0 | 1.0 | 0.95 | 0.60 | 0.00 | 0.5051 | 0.5252 | ... | 0.5982 | 0.4961 | 0.5185 | 0.5317 | 0.5318 | 0.5319 | 0.5449 | 0.5458 | 0.5584 | 0.5674 |
1 | 28 | pnl | 0.9752 | 1.0 | 1.0 | 0.95 | 0.80 | 0.10 | 0.5177 | 0.5318 | ... | 0.6119 | 0.5198 | 0.5324 | 0.5452 | 0.5542 | 0.5586 | 0.5638 | 0.5718 | 0.5851 | 0.6021 |
2 | 29 | bsl | 0.9917 | 1.0 | 1.0 | 1.00 | 0.90 | 0.00 | 0.5250 | 0.5377 | ... | 0.5984 | 0.5268 | 0.5450 | 0.5451 | 0.5489 | 0.5585 | 0.5585 | 0.5709 | 0.5848 | 0.5902 |
3 | 29 | pnl | 0.9669 | 1.0 | 1.0 | 1.00 | 0.75 | 0.35 | 0.5314 | 0.5452 | ... | 0.6250 | 0.5314 | 0.5322 | 0.5448 | 0.5450 | 0.5517 | 0.5629 | 0.5751 | 0.5851 | 0.5980 |
4 | 30 | bsl | 0.9421 | 1.0 | 1.0 | 1.00 | 0.80 | 0.25 | 0.5298 | 0.5585 | ... | 0.6384 | 0.5361 | 0.5452 | 0.5606 | 0.5842 | 0.5854 | 0.5985 | 0.6098 | 0.6118 | 0.6208 |
5 rows × 26 columns
tams()** method gives control over low-level parameters used for global opt
xtol = ftol = tol: error tolerance of global optimization (default=1e-20)
stepsize (default=.05): set basinhopping initial step-size
nsamples (default=3000): number of parameter subsets to sample
number of individual parameter subsets $\theta_i \in \Theta$ to sample and evaluate before initializing global opt
$$\theta$$ | Description | str id | Go/Stop |
---|---|---|---|
$$a_{G}$$ | Threshold | 'a' | Go |
$$tr_{G}$$ | Onset-Delay | 'tr' | Go |
$$v_{G}$$ | Drift-Rate | 'v' | Go |
$$xb_{G}$$ | Dynamic Gain | 'xb' | Go |
$$v_{S}$$ | SS Drift-Rate | 'ssv' | Stop |
$$so_{S}$$ | SS Onset-Delay | 'sso' | Stop |
ninits (default=3): number of initial parameter sets to perform global optimization on
if ninits is 1 global optimization is performed once, using sampled parameter set with the lowest cost error (as described above in nsamples)
if ninits is greater than 1 then global optimization is performed $n$ separate times, one for each each $p_{i} \in P_{inits}$ where $P_{inits}$ is the rank ordered set of parameters subsets corresponding to $n-{th}$ lowest cost error
The optimized parameters corresponding to the lowest global minimum across all iterations of basinhopping are then selected and passed to the next stage in the fitting routine (local gradient-based optimization)
nsuccess (default=60): criterion number of successful steps without finding new global minimum to exit basinhopping
interval (default=10): number of steps before adaptively updating the stepsize
T (default=1.0): set the basinhopping "temperature"
model.set_basinparams(tol=1e-15, nsamples=4000, ninits=6, nsuccess=70)
# sample model.basinparams['nsamples'] from each dist (default=3000)
# and initialize model with best model.basinparams['ninits'] (default=3)
vis.plot_param_distributions(p=model.inits)
set_fitparams() method gives control over low-level parameters used for local opt. Local optimization polishes parameter estimates passed from global optimization step.
method (default='nelder'): optimization algorithm
xtol = ftol = tol (default=1e-30): error tolerance of optimization
maxfev (default=2000): max number of func evaluations to perform
ntrials (default=20000): num. simulated trials per condition
model.set_fitparams(method='tnc', tol=1e-35, ntrials=30000, maxfev=2500)
set_fitparams() also allows you to control low-level attributes of the model, including...
quantiles (default=np.array([.10, .20, ... .90]): quantiles of RT distribution
kind (default='dpm'): model kind (currently only irace and dpm)
depends_on (dict): {parameter_id : condition_name}
tb (float): trial duration (timewindow for response) in seconds
nlevels (int): number of levels in depends_on data[condition_name]
fit_on (default='average'): by default, models are fit to the 'average' data (across subjects). If fit_on='subjects', a model is fit to each individual subject's data.
q = np.arange(.1, 1.,.05)
model.set_fitparams(kind='irace', depends_on={'a': 'Cond'}, quantiles=q)
Flat model fits are performed by identifying the full set of parameter values that minimize the following cost-function:
$$\chi^2 = \sum [\omega * (\hat{Y} - Y)]^2$$
$Y$ is an array of observed data (e.g., accuracy, RT quantiles, etc.)
$\hat{Y}$ is an equal length array of corresponding model-predicted values, given by the parameterized model $f(\theta)$
The error $\chi^2$ between the predicted and the observed data ($\hat{Y} - Y$) is weighted by an equal length array of scalars $\omega$ proportional to the inverse of the variance in each value of $Y$.
flat_data = model.observedDF.mean()
flat_wts = model.wtsDF.mean()
Model.optimize(self, plotfits=True, saveplot=False, saveresults=True, saveobserved=False, custompath=None, progress=False):
""" Method to be used for accessing fitting methods in Optimizer class
see Optimizer method optimize()
::Arguments::
plotfits (bool):
if True (default), plot model predictions over observed data
saveplot (bool):
if True (default), save plots to "~/<self.model_id>/"
saveresults (bool):
if True (default), save fitdf, yhatdf, and txt logs to "~/<self.model_id>/"
saveobserved (bool):
if True (default is False), save observedDF to "~/<self.model_id>/"
custompath (str):
path starting from any subdirectory of "~/" (e.g., home).
all saved output will write to "~/<custompath>/<self.model_id>/"
progress (bool):
track progress across ninits and basinhopping
"""
By default the model creates a folder named after the model's model_id attribute in your user home directory (model_id is string with identifying information about the model) and saves all model output to this location (saveresults=True).
To prevent the model from creating the output directory
m = build.Model(data=data)
m.optimize(saveresults=False)
# note, custompath must be an existing path to the parentdirectory
# where you want the model's output directory to be created
# save model output to /Users/kyle/Dropbox/<model_id>/
m.optimize(custompath='Dropbox')
# save model output and observed data to /Users/kyle/Dropbox/<model_id>/
m.optimize(custompath='Dropbox', saveobserved=True)
# save fit plot to /Users/kyle/Dropbox/<model_id>/
m.optimize(custompath='Dropbox', saveplot=True)
model.optimize(progress=True)
# build a model with no conditional dependencies (a "flat" model)
model = build.Model(data=data, kind='xdpm')
model.set_basinparams(method='basin', nsamples=1000)
# NOTE: fits in the binder demo will be much slower than when run locally
# set_testing_params() sets more liberal fit criteria to speed things up a bit
# comment this line out to run fits using the default optimization criteria
model.set_testing_params()
model.optimize(progress=True, plotfits=True)
IntProgress(value=0, max=2)
IntProgress(value=0, bar_style='danger', max=50)
IntProgress(value=0, bar_style='success', max=1000)
# display the model's poptdf
model.poptdf
# display the model's fitdf
model.fitdf
# display the model's yhatdf
model.yhatdf
# to save output dataframes
model.poptdf.to_csv("path/to/save/poptdf.csv", index=False)
model.fitdf.to_csv("path/to/save/fitdf.csv", index=False)
model.yhatdf.to_csv("path/to/save/yhatdf.csv", index=False)
model.fitdf
idx | pvary | nvary | AIC | BIC | nfev | df | ndata | chi | rchi | logp | cnvrg | niter | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | avg | all | 5.0 | -238.4979 | -232.6077 | 1003.0 | 19.0 | 24.0 | 0.0008 | 4.0247e-05 | -248.4979 | False | 167.0 |
model.poptdf
idx | flat | pvary | a | ssv | tr | v | xb | |
---|---|---|---|---|---|---|---|---|
0 | avg | flat | all | 0.4067 | -0.7065 | 0.2227 | 1.1094 | 1.0979 |
model.yhatdf
idx | flat | pvary | acc | 200 | 250 | 300 | 350 | 400 | c10 | ... | c90 | e10 | e20 | e30 | e40 | e50 | e60 | e70 | e80 | e90 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | avg | flat | all | 0.9811 | 1.0 | 0.979 | 0.948 | 0.574 | 0.093 | 0.519 | ... | 0.612 | 0.519 | 0.531 | 0.54 | 0.546 | 0.558 | 0.564 | 0.573 | 0.585 | 0.603 |
1 rows × 27 columns
where $\sum[\omega_i*(\hat{Y_i} - Y_i)]^2$ gives the cost ($\chi^2$) for level $i$ of condition $C$
$\chi^2$ is equal to the summed and squared error across all $N_c$ levels of that condition
Specifying parameter dependencies is done by providing the model with a depends_on dictionary when initializing the model with the format: {parameter_id : condition_name}.
For instance, in Dunovan et al., (2015) subjects performed two versions of a stop-signal task
To test the hypothesis that observed behavioral differences between penalty conditions was a result of a change Go drift-rate...
# define the model allowing Go drift-rate to vary across 'Cond'
model_v = build.Model(kind='xdpm', depends_on={'v': 'Cond'})
# run optimize to fit the full model (steps 1 & 2)
model_v.optimize(progress=True)
...you'll have multiple alternative hypotheses about which parameters will depend on various task conditions
For instance, instead of modulating the Go drift-rate, assymetric stop/go penalties might alter behavior by changing the height of the decision threshold (a).
To implement this model:
# define the model allowing threshold to vary across 'Cond'
model_a = build.Model(kind='xdpm', depends_on={'a': 'Cond'})
# run optimize to fit the full model (steps 1 & 2)
model_a.optimize(progress=True)
# If True, threshold model provides a better fit
model_a.finfo['AIC'] < model_v.finfo['AIC']
# replace 'Cond' with whatever your condition is called in the input dataframe
cond_data = model.observedDF.groupby('Cond').mean()
cond_wts = model.wtsDF.groupby('Cond').mean()
# build conditional model and optimize with drift-rate free across levels of Cond
model = build.Model(data=data, kind='xdpm', depends_on={'v':'Cond'})
model.set_basinparams(method='basin', nsamples=1000)
# NOTE: fits in the binder demo will be much slower than when run locally
# uncomment line below to speed up the demo fits (at the expense of fit quality)
model.set_testing_params()
model.optimize(progress=True, plotfits=True)
IntProgress(value=0, max=2)
IntProgress(value=0, bar_style='danger', max=50)
IntProgress(value=0, bar_style='success', max=1000)
IntProgress(value=0, bar_style='success', max=300)
IntProgress(value=0, bar_style='success', max=1000)
model.fitdf
idx | pvary | nvary | AIC | BIC | nfev | df | ndata | chi | rchi | logp | cnvrg | niter | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | avg | v | 2.0 | -457.9562 | -454.2138 | 689.0 | 46.0 | 48.0 | 0.0032 | 6.8991e-05 | -461.9562 | True | 5.0 |
model.poptdf
idx | Cond | pvary | a | ssv | tr | v | xb | |
---|---|---|---|---|---|---|---|---|
0 | avg | bsl | v | 0.4075 | -0.7844 | 0.2123 | 1.1395 | 0.7191 |
1 | avg | pnl | v | 0.4075 | -0.7844 | 0.2123 | 1.1029 | 0.7191 |
Typically you'll have multiple competing hypotheses about which parameters are influenced by various task conditions
Nested optimization allows alternative models to be optimized using a single initial parameter set.
model = build.Model(kind='xdpm', data=data)
freeparams = ['v', 'a', 'ssv']
depends = [{pname: 'Cond'} for pname in freeparams]
model.nested_optimize(depends=depends)
After fitting the model with depends_on={'v': 'Cond'}, 'v' is replaced by the next parameter in the nested_models list ('a' or boundary height in this case) and the model is optimized with this dependency using the same init params as the original model
As a result, model selection is less likely to be biased by the initial parameter set
Also, because Step 1 takes significantly longer than Step 2, nested optimization of multiple models is significantly faster than optimizing each model individually through boths steps
model = build.Model(kind='dpm', data=data)
model.set_basinparams(method='basin', nsamples=1000)
# NOTE: fits in the binder demo will be much slower than when run locally
# uncomment line below to speed up the demo fits (at the expense of fit quality)
model.set_testing_params()
freeparams = ['v', 'a', 'ssv']
depends = [{pname: 'Cond'} for pname in freeparams]
fitdf, poptdf, yhatdf = model.nested_optimize(depends=depends, progress=True, plotfits=False)
# compare GOF stats for the three models
fitdf
modelID | idx | pvary | nvary | AIC | BIC | nfev | df | ndata | chi | rchi | logp | cnvrg | niter | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | v | avg | v | 2.0 | -459.4273 | -455.6849 | 200.0 | 46.0 | 48.0 | 0.0031 | 6.6909e-05 | -463.4273 | True | 2.0 |
1 | a | avg | a | 2.0 | -405.0658 | -401.3234 | 260.0 | 46.0 | 48.0 | 0.0096 | 2.0765e-04 | -409.0658 | True | 2.0 |
2 | ssv | avg | ssv | 2.0 | -422.5899 | -418.8475 | 384.0 | 46.0 | 48.0 | 0.0066 | 1.4414e-04 | -426.5899 | True | 4.0 |
# compare param estimates for the three models
poptdf
modelID | idx | Cond | pvary | a | ssv | tr | v | |
---|---|---|---|---|---|---|---|---|
0 | v | avg | bsl | v | 0.4075 | -0.7980 | 0.2148 | 1.1820 |
1 | v | avg | pnl | v | 0.4075 | -0.7980 | 0.2148 | 1.1533 |
2 | a | avg | bsl | a | 0.4036 | -0.7980 | 0.2148 | 1.1689 |
3 | a | avg | pnl | a | 0.4075 | -0.7980 | 0.2148 | 1.1689 |
4 | ssv | avg | bsl | ssv | 0.4075 | -0.7654 | 0.2148 | 1.1689 |
5 | ssv | avg | pnl | ssv | 0.4075 | -0.8292 | 0.2148 | 1.1689 |
# Evaluate all nested fits and plot fit stats: AIC & BIC (Lower is better)
# According to AIC & BIC, the model with execution drift-rate (v_E) free across levels of Cond
# provides a better fit than models with free threshold (a) or braking drift-rate (v_B)
gof = vis.compare_nested_models(fitdf, verbose=True, model_ids=freeparams)
AIC likes v model BIC likes v model
model = build.Model(data=data, fit_on='subjects')
Currently only Dependent Process Model (kind='dpm') and Independent Race Model (kind='irace')
Tell model to include a Dynamic Bias Signal ('xb') by adding an 'x' to the front of model kind
To implement the Dependent Process Model...
#... with dynamic bias:
model = build.Model(data=data, kind='xdpm')
#...and without:
model = build.Model(data=data, kind='dpm')
#... with dynamic bias:
model = build.Model(data=data, kind='xirace')
#... and without:
model = build.Model(data=data, kind='irace')
model.set_basinparams(nsuccess=100, tol=1e-30, ninits=10, nsamples=10000)
model.set_fitparams(maxfev=5000, tol=1e-35)
print(model.cond_wts)
print(model.flat_wts)
model = build.Model(data=data, weighted=False)
import radd
radd.style_notebook()