Notebook

Backcasting Demo Notebook¶

Loren Champlin

Adapted from Adarsh Pyarelal's WM 12 Month Evaluation Notebook

As always, we begin with imports, and print out the commit hash for a rendered version of the notebook.

In [ ]:

%load_ext autoreload
%autoreload 2
%matplotlib inline
import pickle
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
from delphi.visualization import visualize
import delphi.jupyter_tools as jt
import numpy as np
import pandas as pd
from delphi.db import engine
jt.print_commit_hash_message()
import random as rm
import delphi.evaluation_port as EN
import delphi.AnalysisGraph as AG
import warnings
warnings.filterwarnings("ignore")
import logging
logging.getLogger().setLevel(logging.CRITICAL)
from delphi.cpp.DelphiPython import AnalysisGraph as AG, InitialBeta as IB, RNG
import time

Here I will set random seeds

In [ ]:

np.random.seed(87)
rm.seed(87)
R = RNG.rng()
R.set_seed(87)

This is a example of constructing a statement that represents a two-node CAG. A statement is a list of tuples where each tuple represents an edge. Within the tuples that represent the edges are two tuples that have information representing the connected nodes. The first tuple represents the parent and the second tuple represents the child. Within a node tuple is the size of its effect on child nodes, whether it positively or negatively affects its child, and the full node name.

In [ ]:

causal_fragments = [ (("large", -1, "UN/entities/human/financial/economic/inflation"),("small", 1, "UN/events/human/human_migration"))]

Now we load the Causal Analysis Graph (CAG) using the statement above.

In [ ]:

start_time = time.time()
G = AG.from_causal_fragments(causal_fragments)

Next we map indicator variables to nodes. For the most part indicator variables can be inferred from available data and texts, but we can also manually map indicators to nodes.

In [ ]:

G.map_concepts_to_indicators()

G.replace_indicator("UN/events/human/human_migration","Net migration","New asylum seeking applicants", "UNHCR")

In [ ]:

G.construct_beta_pdfs()

Here we generate synthetic data.

Generate a transition matrix based on the provided beta.
Generate an random initial latent state.
Using the transition matrix and the initial latent state, generate the sequence of latent states.
Using the sequence of latent states and the emission model, generate a sequence of observed states.
Train the model using the sequence of observed states.
Generate predictions using the trained model.

If the sampler is working correctly, the generated predictions should be close to the generated sequence of observed states.

This retuns a Tuple[List[List[List[float]]], Tuple[List[str], List[List[Dict[str, Dict[str, float]]]]]]

Element 0 of the outer Tuple is the generated sequence of observed states. It is indexd by [timestep][concept][indicator]

In [ ]:

synthetic_observations, preds = G.test_inference_with_synthetic_data(2015,1,2015,12
                                                                     ,100,900,initial_beta=IB.HALF)
end_time = time.time()

total_time = end_time-start_time
synthetic_observations

I am assuming the bottom line is grabbing the synthetic data for New Asylum seeking applicants, if not just change the 1 to 0.

In [ ]:

test_data = np.array(synthetic_observations)[:,1,0]
test_data

In [ ]:

total_time

Now that the predictions have been generated, a user can store or present the predictions however they choose. However the evaluation module comes with several convienant options for displaying output for a specific indicator. The first option to just return the raw predictions for a given indicator variable in a numpy array. This allows one to do there own plotting and manipulations for a given indicator without having to sort through the entire prediction structure.

*Note: True data values from the delphi database can be retrieved using the data_to_df function in evaluation.py.

In [ ]:

#EN.pred_to_array(preds,'New asylum seeking applicants')
np.array(G.prediction_to_array('New asylum seeking applicants'))

The evaluation module can also output a pandas dataframe with the mean of the predictions along with a specified confidence interval for a given indicator variable. There are also options for presenting the true values, residuals, and error bounds based off of the residuals.

*Note: Setting true_vals = True assumes that real data values exist in the database that match the time points of the predictions. Since the data retrieval function is set to return heuristic estimates for missing data values, then it's possible to have completely "made-up" true data if none actually exist for the prediction time range. Also whatever the mean_pred_to_df function should be passed the same country, state, units arguments as train_model (if any were passed).

In [ ]:

df = EN.mean_pred_to_df(preds,'New asylum seeking applicants',true_vals=False)
df['Synthetic'] = test_data
df

Finally we can get a plots representing the same data shown above.

The plot types are:

Prediction: Shows only the predictions with specified confidence intervals. This is the default setting.
Comparison: Shows the predictions and confidence intervals along with a curve representing the true data values.
Error: Plots just the error with the error bounds along with a red reference line at 0.

*Note: The above note for mean_pred_to_df also holds true for the Comparison and Error plot type. Also any other string argument passed to plot_type results in the defaults in the 'Prediction' plot type. The save_as argument can be set to a filename (with extension) to save the plot as a file (e.g, save_as = pred_plot.pdf).

When using the Test setting, pred_plot expects a keyword argument called test_data.

In [ ]:

EN.pred_plot(preds,'New asylum seeking applicants',plot_type='Test',save_as=None, test_data=test_data)

In [ ]: