We will use the Measured Annual Nutrient loads from AGricultural Environments (MANAGE) data from the USDA, as published by Qian and Harmel, where they used causal inference in order to quantify the effects of water and soil conservation practices in reducing phosphorus loss from agricultural fields.

The data is provided in `Rdata`

format, so we will first have to load it and transform it.
We use the `rpy2`

package, which is not a part of the requirements for `causallib`

and so you will have to install it if reproducing the analysis.

In [1]:

```
import rpy2.robjects as objects
import pandas as pd
```

In [2]:

```
url = objects.r['url']('https://onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2F1752-1688.12377&file=jawr12377-sup-0001-manage.RData')
try:
objects.r['load'](url)
except Exception: # fallback - if the above doesn't work please download the file directly
objects.r['load']('jawr12377-sup-0001-manage.rdata')
```

In [3]:

```
# normally we would use pandas to transform the R-datafram into a pandas dataframe,
# from rpy2.robjects import pandas2ri
# pandas2ri.activate()
# but in many versions of pandas this is broken with categorical data as we have here
# so instead we will use R to save the DataFrame to csv and load it in pandas
objects.r['write.csv'](x=objects.r["LoadApr2007"], file="MANAGE.csv", fileEncoding="UTF-8", **{"row.names": False})
manage = pd.read_csv("MANAGE.csv", index_col='RecordID')
print(manage.shape)
manage.head()
```

Out[3]:

We will follow the procedure used by Qian and Harmel to preprocess the data, which includes creating a few log-transforms and cleaning the data.

The intervention we are interested in is one of either contour farming, terracing, buffering, or creating a waterway.
Together these are referred to as conservation practices, or CP.
The outcome of interest is the amount of phosphorus that was measured in runoff water (`AvgTPLoad`

).
To see why this is a problems see an explanation from the USGS and from the EPA.

In [4]:

```
import numpy as np
#manage.fillna(manage.median(axis=0), inplace=True)
manage["AvgTPLoad"].clip(0.002, inplace=True)
manage["AvgTNLoad"].clip(0.02, inplace=True)
manage["F.AppMethod1"].replace("Other", "Surface Applied", inplace=True)
manage["F.AppMethod1"].replace("Other2", "Unknown", inplace=True)
manage["CFarm"] = (manage[["CP1", "CP2", "CP3"]] == "Contour Farming").any(axis=1)
manage["Buffer"] = ((manage[["CP1", "CP2", "CP3"]] == "Filter Strip").any(axis=1)) | \
((manage[["CP1", "CP2", "CP3"]] == "Riparian Buffer").any(axis=1))
manage["Terrace"] = (manage[["CP1", "CP2", "CP3"]] == "Terrace").any(axis=1)
manage["Waterway"] = (manage[["CP1", "CP2", "CP3"]] == "Waterway").any(axis=1)
manage["NumCP"] = manage[["CFarm", "Buffer", "Terrace", "Waterway"]].sum(axis=1)
manage["AnyCP"] = manage[["CFarm", "Buffer", "Terrace", "Waterway"]].any(axis=1)
manage.dropna(axis=1, thresh=50, inplace=True)
```

In [5]:

```
%matplotlib inline
import matplotlib.pyplot as plt
def boxplot(df:pd.DataFrame, name:str, by="AnyCP", ax=None):
"""A helper function to plot boxplots in log scale and annotate the median"""
ax = ax or plt.gca()
_ = df.boxplot(name, by=by, ax=ax)
ax.set_yscale('log')
ax.autoscale()
ax.grid(False)
medians = df[name].groupby(df[by]).median().values
ticks = ax.get_xticklabels()
for median, tick in zip(medians, ticks):
ax.text(x=tick.get_position()[0], y=median, s=median, horizontalalignment="center")
```

In [6]:

```
fig, axes = plt.subplots(2, 2)
fig.set_size_inches(14,14)
boxplot(manage, "AvgTPLoad", ax=axes[0][0])
boxplot(manage, "AvgTNLoad", ax=axes[0][1])
boxplot(manage, "Avg.N.Applied", ax=axes[1][0])
boxplot(manage, "Avg.P.Applied", ax=axes[1][1])
plt.tight_layout()
```

The box-plots above show both the outcome and several features are correlated with the treatment, which indicates that there is some bias in the data. We will now use causallib to estimate the effect of treatment by correcting for these biases.

As features, we use the amount of phosphorus and nitrogen applied, the total amount of water runoff, the soil volume, the phosphosros application method, and the type of land use (LU).

In [7]:

```
from causallib.estimation import IPW
from sklearn.linear_model import LogisticRegression
```

In [8]:

```
lr = LogisticRegression(solver='lbfgs', max_iter=500)
ipw = IPW(lr)
```

In [9]:

```
y = np.log(manage["AvgTPLoad"])
a = manage["AnyCP"]
X = pd.DataFrame({"Runoff": np.log(manage["Avg.Runoff"]),
"Soil": np.log(manage["Avg.SoilL"] + 1),
"PApplied": np.log(manage["Avg.P.Applied"] + 1),
"NApplied": np.log(manage["Avg.N.Applied"] + 1)})
X = pd.concat([X, pd.get_dummies(manage["F.AppMethod1"], prefix="Method"),
pd.get_dummies(manage["ShortLU"], prefix="LU"),
pd.get_dummies(manage["Tillage"], prefix="Tillage")], axis=1)
X.fillna(X.median(axis=0), inplace=True)
y = y.dropna() # remove the data where there is no outcome
a, X = a[y.index], X.loc[y.index]
print(X.shape)
X.head()
```

Out[9]:

In [10]:

```
ipw.fit(X, a) # train the propensity model to apply causal inference
```

Out[10]:

In [11]:

```
pd.DataFrame({"causal": np.exp(ipw.estimate_population_outcome(X, a, y)),
"marginal": np.exp(y.groupby(a).mean())})
```

Out[11]:

We can see that the causal inference identifies that using any CP will reduce the phosphorus load, while the marginal observation (just checking what happened in treated vs. untreated areas) shows the opposite trend. How can this be?

One way to understand what is occurring is to evaluate the performance of the propensity model and check how consistent it is. Causallib provides the tools to do this.

In [12]:

```
from causallib.evaluation import PropensityEvaluator
```

In [13]:

```
evaluator = PropensityEvaluator(ipw)
_ = evaluator.evaluate_simple(X, a, y, plots=["roc_curve", "weight_distribution", "calibration",
"covariate_balance_love"])
plt.tight_layout()
```

At first glance it might seem that while the model has good predictive performance in the ROC curve (top left). However, the fact that the weighted ROC curve is far from the diagonal hints that the model is in fact not very good at balancing the data. This can also be seen in the covariate balancing plot (bottom right), where we see that the `Pasture`

variable, which was extremely biased in the data, is still quite biased even after weighting. It would make sense, therefore, to stratify the analysis according to it, or to remove it from the analysis completely.

To check whether it is better to stratify or limit our analysis to a specific strata, let's check the overlap between `Pasture`

and the treatment. We may as well look at all the categorical features and check:

In [14]:

```
pd.concat({col: pd.crosstab(X[col], a) for col in X.select_dtypes('uint8').columns.values}, axis=0)
```

Out[14]:

We see that there is only one case of pasture land that was in fact treated. We should limit the analysis only to non-pasture land.

But it's a bit more than that, since treatment was also never observed for Alfalfa, Cotton, Peanuts, or Other. For Other, the difference is quite large while the others might still be attributed to small number fluctuations. While this could be tested using some sort of statistical test, this is not the focus of this example, and we will simply remove all these samples (and then remove the columns to avoid having all-zero columns).

In [15]:

```
X = X.query('LU_Pasture == 0 & LU_Other == 0 & LU_Cotton ==0 & LU_Alfalfa == 0 & LU_Peanuts ==0 & LU_Soybeans == 0')
X = X.drop(columns=['LU_Pasture', 'LU_Other', 'LU_Cotton', 'LU_Alfalfa', 'LU_Peanuts', 'LU_Soybeans'])
a, y = a[X.index], y[X.index]
print(f'We are left with {len(y)} samples')
ipw.fit(X, a)
pd.DataFrame({"causal": np.exp(ipw.estimate_population_outcome(X, a, y)),
"marginal": np.exp(y.groupby(a).mean())})
```

Out[15]:

In [16]:

```
evaluator = PropensityEvaluator(ipw)
evaluator.evaluate_simple(X, a, y, plots=["roc_curve", "weight_distribution", "calibration",
"covariate_balance_love"])
plt.tight_layout()
```

Performance evaluation is much better now:

- all the features are sufficiently balanced
- the model is as calibrated as can be expected given the small sample size that we now have
- the propensity distribution doesn't exhibit any areas with only one treatment group dominsating
- the ROC curves behave well.

Most importantly, after limiting the data the causal analysis agrees with the marginal analysis, and only makes a small adjustment to the causal effect size that is estimated.

In summary, the causal analysis allowed us to focus on where the data allows us to check differences between treatment groups. Indeed, unlike the initial analysis, we now understand that in non-pasture lands the conservation procedures of terracing, buffering, waterway, or contour farming actually improve phosphorus retention in the soil and reduce phosphorus leaking into the watershed when growing crops such as oats, wheat, corn, soybean and sorghum. While we can surmise that this may be true for other crops, we simply cannot answer that from the data.

We would like to thank Biance Zadrozny, Sr. Manager, Spatiotemporal Modeling IBM Research Brazil, for valuable feedback regarding the validity of the analyis and interpretation of the data and the results.