Notebook

WM 12 Month Evaluation Notebook¶

Adarsh Pyarelal

As always, we begin with imports, and print out the commit hash for a rendered version of the notebook.

In [ ]:

%load_ext autoreload
%autoreload 2
%matplotlib inline
import pickle
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
from delphi.visualization import visualize
import delphi.jupyter_tools as jt
jt.print_commit_hash_message()

Forecasting¶

Q1: How much rainfall is expected in Northern Bahr el Ghazal and Unity in the lean season?

A: (From Cheryl Porter's note) The lean season represents the time before harvest, when food from the previous harvest is scarce. There may be crops in the field and ample rainfall, but food is scarce. The lean season may vary from year to year based on how much food was harvested the previous year, the timing of the planting season, the growth season length, and other factors.

We can approximate the lean season rainfall as the rainfall that occurs between planting and harvest. In general, maize and sorghum are planted around the same time, but maize is harvested earlier. We can therefore use the maize growing season rainfall as the approximation of lean season rainfall.

Q2: What are the expected crop yields for maize and sorghum during the summer of 2017 in Northern Bahr el Ghazal and Unity?

Function description¶

get_expected_distribution: This function plots the expected distribution of an indicator variable, given some constraints, automatically aggregating as needed if null values are found.

Options:

indicator: Name of the indicator variable. Recommended values for answering 12M eval questions (forecasted = computed by DSSAT):
- Q1:
  - "Historical Average Total Daily Rainfall (Maize)"
  - "Forecasted Average Total Daily Rainfall (Maize)"
- Q2:
  - "Historical Production (Sorghum)"
  - "Historical Production (Maize)"
  - "Forecasted Production (Sorghum)"
  - "Forecasted Production (Maize)"
state: Name of the state to get indicator values for. Recommended values for answering 12-month eval questions:
- "Northern Bahr el Ghazal"
- "Unity"

In [ ]:

# Setting the lean season month range
lean_season_month_range = (1,3)

jt.get_expected_distribution(
    indicator = "Historical Average Total Daily Rainfall (Sorghum)",
    state = "Unity",
    month_range=lean_season_month_range
)

Conditional Forecasting¶

Q: What would be the effect on crop yields for maize and sorghum in Northern Bahr el Ghazal State and Unity State if rainfall is in the lowest 5th percentile?

The function get_percentile_based_conditional_forecast implements a straightforward approach to answering this question - it simply looks for the values of a forecast variable (currently, only Production is supported) that correspond to the lowest $n^{th}$ percentile of the conditioned variable (currently, only Rainfall is supported) and returns them.

The purpose of this particular demonstration is not to show off sophisticated modeling techniques, but rather to demonstrate a proof-of-concept of basic integration between Delphi and a bottom up model (DSSAT).

Currently, the flow is one-way only - precomputed DSSAT values for a few different scenarios related to the 12-month evaluation have been supplied by the University of Florida team, and have been inserted into the Delphi database.

In the future, we envision that DSSAT and other bottom-up models will be run in a MINT workflow, and can be 'called out' to via an Uncharted HMI, which will then communicate the resulting values via a REST API to Delphi.

In [ ]:

jt.get_percentile_based_conditional_forecast(
    forecast_var = "Production",
    conditioned_var = "Rainfall",
    crop = "sorghum", # Options: ("maize", "sorghum")
    percentile = 5,
)

In the next few cells, we examine the downstream effects of reduced precipitation, on other quantities of interest, using a CAG built via a script (see http://vision.cs.arizona.edu/adarsh/12m_eval_report.pdf for details on how the CAG was built).

In [ ]:

with open("../scripts/build/precipitation_centered_CAG.pkl", "rb") as f:
    G = pickle.load(f)

In the cell below, we visualize the CAG.

Legend:

Red edge: overall inhibition, green edge: overall promotion
Edge thickness corresponds roughly to the 'strength' of the influence.
Edge opacity corresponds roughly to the number of evidence fragments that support the causal relationship.

In [ ]:

visualize(G)

In [ ]:

G.map_concepts_to_indicators()
G.parameterize()
visualize(G, indicators=True, indicator_values=True)

We then run an experiment to see how a 20% decrease in precipitation that decays over time, corresponding to

$\frac{\partial(precipitation)}{\partial t} = -0.2\exp{(-\tau t)}$

where $\tau$ is a positive real number representing the rate of the decay of the partial derivative, and $t$ is some whole number representing a time step.

The full ensemble model is a linear dynamical system with a stochastic transition matrix, and is described here: http://vision.cs.arizona.edu/adarsh/export/Arizona_Text_to_Model_Procedure.pdf

In [ ]:

G.assemble_transition_model_from_gradable_adjectives()
G.sample_from_prior()

In [ ]:

jt.run_experiment(G, "UN/events/weather/precipitation", -0.2, 4)

Now, we can sharpen our predictions by learning from data provided by DSSAT. We perform a simple Bayesian linear regression to get a sharper distribution for $\beta_{precipitation,food\_production}$ in the transition matrix.

In [ ]:

G.infer_transition_matrix_coefficient_from_data(
    'UN/events/weather/precipitation', 'UN/events/human/agriculture/food_production',
    state = "Northern Bahr el Ghazal", crop = "maize"
)
jt.run_experiment(G, "UN/events/weather/precipitation", -0.2, 4)

We can see that the distributions of the indicator values have become sharper, indicative of the reduction in uncertainty due to incorporating both gradable adjectives and bottom-up model output.

If we want more precision, we can use the exact distribution provided by a DSSAT run to more accurately infer the distribution of $\beta_{precipitation,food\_production}$ (as opposed to the simple linear fit to historical data done in this example).