This notebook simulates and then estimates spillover effects where potential outcomes are described by the function $$ y(\tau_{i,t}, v(\tau_{-i,t})) $$
for firm-level treatment $\tau_{i,t}$. Outcomes depend on peer firm treatment status through the scalar function $v(\tau_{-i,t})$ where $\tau_{-i,t}$ denotes treatments outside of the focal firm. The scalar function was first devised by Hong and Raudenbush (2006 JASA) to address the dimensionality of $\tau_{-i,1}$ (the function $y(\tau_{i,t},\tau_{-i,t})$ has $2^N-1$ potential outcomes). The scalar function $v(\cdot)$ is specified as $$ v(\tau_{-i,t}) := \frac{1}{|C|}\sum_{i' \in C} \tau_{i',t} $$ for $C := \{i'|\text{Ind. }i'=\text{Ind. }i\}$. This follows Ferraci, Jolivet, and van den Berg (2014 REStat). As shorthand, we will write $v(\tau_{-i,t})$ as $\rho_{j,t}$ where $j$ indexes the industry.
We begin by constructing a panel with $40$ industries, $50$ firms per industry, and $40$ quarters of data. A handful of true $\beta$ parameters are also specified.
// clear everything from Stata's memory
clear
// create industries
qui set obs 40
gen j = _n
// for each industry, create a set of firms
qui expand 50
bys j: gen i = j*100+_n
// for each industry-firm, create a set of time periods
qui expand 40
bys j i: gen t = _n
// true model parameters
scalar b0 = 0
scalar b1 = -3
scalar b2 = 2
scalar b3 = -2
Suppose first that treatment assignment is random and that spillover effects are linear. This implies a model $$ y_{i,t} = \beta_0 + \beta_1\tau_{i,t} + \beta_2\tau_{i,t}\rho_{j,t} + \beta_3(1-\tau_{i,t})\rho_{j,t} + \epsilon_{i,t} $$
// reproducibility
set seed 0
// create firm and industry treatment
gen tau = rbinomial(1, .05)
bys j t: egen rho = mean(tau)
// create a true model
gen u = rnormal()
gen y = b0 + b1*tau + b2*1.tau*rho + b3*0.tau*rho + u
When treatment is exogenous and spillover effects are linear in $\rho$, treatment effects are easy to estimate.
reg y tau i.tau#c.rho
Source | SS df MS Number of obs = 80,000 -------------+---------------------------------- F(3, 79996) = 9892.12 Model | 29701.2931 3 9900.43102 Prob > F = 0.0000 Residual | 80063.2094 79,996 1.00084016 R-squared = 0.2706 -------------+---------------------------------- Adj R-squared = 0.2706 Total | 109764.502 79,999 1.37207343 Root MSE = 1.0004 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- tau | -2.952744 .0389518 -75.80 0.000 -3.029089 -2.876398 | tau#c.rho | 0 | -2.218683 .1155108 -19.21 0.000 -2.445084 -1.992283 1 | .8730354 .4987173 1.75 0.080 -.1044473 1.850518 | _cons | .0107455 .0066978 1.60 0.109 -.0023821 .023873 ------------------------------------------------------------------------------
The Stable Unit Treatment Value Assumption (SUTVA) imposes the condition that $y(\tau_{i,t},\tau_{-i,t})=y(\tau_{i,t})$ so that there are no spillover effects. This implies the model $$ y_{i,t} = \theta_0 + \theta_1\tau_{i,t} + u_{i,t} $$
Theoretically, $\hat{\theta_1} = \hat{\beta_1} + (\hat{\beta_2}-\hat{\beta_3})\bar{\rho}_{j,t}$. Within the simulated sample, the observed bias is close to the asymptotic bias.
reg y tau
qui su rho
di "Theoretical bias `=(b2-b3)*`r(mean)''"
di "Observed bias `=_b[tau]-b1'"
Source | SS df MS Number of obs = 80,000 -------------+---------------------------------- F(1, 79998) = 29169.45 Model | 29328.9848 1 29328.9848 Prob > F = 0.0000 Residual | 80435.5176 79,998 1.00546911 R-squared = 0.2672 -------------+---------------------------------- Adj R-squared = 0.2672 Total | 109764.502 79,999 1.37207343 Root MSE = 1.0027 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- tau | -2.783437 .0162974 -170.79 0.000 -2.815379 -2.751494 _cons | -.0973879 .0036369 -26.78 0.000 -.1045162 -.0902596 ------------------------------------------------------------------------------ Theoretical bias .1991999967209996 Observed bias .2165633739491448
Now suppose that treatment assignment is endogenous. In particular, there is an industry-level covariate $w$ with which a firm-level covariate $x$ is correlated. We allow the firm-level covariate $x$ to predict treatment status.
For simplicity, we won't allow $x$ to affect outcomes directly, though adding this feature wouldn't really change anything.
// reproducibility
set seed 0
// hold on only to the panel identifiers
keep i j t
// create dummy for unique industry-quarters
bys j t: gen _first = _n == 1
// generate industry-quarter data
qui gen w = runiform(1, 3) if _first == 1
sort j t i
qui by j t: carryforward w, replace
// generate firm data
gen x = rnormal(w, 1)
// simulate treatment
gen pr = 1/(1+exp(2*x))
// create treatment dummy
qui gen tau = rbinomial(1,pr)
// rbinomial yields some missing values for pr very close to zero or one
qui replace tau = 0 if mi(tau) & pr < .5
qui replace tau = 1 if mi(tau) & pr >= .5
// industry treatment intensity
bys j t: egen rho = mean(tau)
// generate outcome of interest
gen u = rnormal() + .05*x
gen y = b0 + b1*tau + b2*1.tau*rho + b3*0.tau*rho + u
(39,503 missing values generated) (39,503 missing values generated) (39,503 missing values generated) (39,503 missing values generated)
As one would expect, without addressing endogenous treatment assignment, the model that we ran before is flawed.
// without addressing endogeneity
reg y tau i.tau#c.rho
Source | SS df MS Number of obs = 40,497 -------------+---------------------------------- F(3, 40493) = 4911.02 Model | 14675.0009 3 4891.66695 Prob > F = 0.0000 Residual | 40333.4624 40,493 .996060118 R-squared = 0.2668 -------------+---------------------------------- Adj R-squared = 0.2667 Total | 55008.4632 40,496 1.35836782 Root MSE = .99803 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- tau | -3.071996 .0404585 -75.93 0.000 -3.151296 -2.992697 | tau#c.rho | 0 | -2.02764 .0225631 -89.87 0.000 -2.071864 -1.983415 1 | 2.036824 .0798152 25.52 0.000 1.880384 2.193263 | _cons | .1169907 .0099473 11.76 0.000 .0974938 .1364876 ------------------------------------------------------------------------------
The simplest approach is to use regression adjustment and include $x$ as a covariate.
// regression adjustment
reg y tau i.tau#c.rho x
Source | SS df MS Number of obs = 40,497 -------------+---------------------------------- F(4, 40492) = 3713.66 Model | 14763.8806 4 3690.97015 Prob > F = 0.0000 Residual | 40244.5826 40,492 .993889722 R-squared = 0.2684 -------------+---------------------------------- Adj R-squared = 0.2683 Total | 55008.4632 40,496 1.35836782 Root MSE = .99694 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- tau | -2.99079 .0413167 -72.39 0.000 -3.071772 -2.909809 | tau#c.rho | 0 | -2.006762 .0226464 -88.61 0.000 -2.051149 -1.962374 1 | 2.053711 .0797482 25.75 0.000 1.897403 2.21002 | x | .0452351 .0047835 9.46 0.000 .0358594 .0546108 _cons | .0105126 .0150171 0.70 0.484 -.0189213 .0399465 ------------------------------------------------------------------------------
Now suppose that spillover effects are nonlinear in $\rho$.
In this scenario, regression adjustment alone isn't sufficient.
// nonlinear spillovers
replace y = b0 + b1*tau + b2*1.tau*log(.1+rho) + b3*0.tau*log(.1+rho) + u
// regression adjustment
reg y tau i.tau#c.rho x
(40,497 real changes made) Source | SS df MS Number of obs = 40,497 -------------+---------------------------------- F(4, 40492) = 36948.53 Model | 159000.108 4 39750.0271 Prob > F = 0.0000 Residual | 43562.1663 40,492 1.07582155 R-squared = 0.7849 -------------+---------------------------------- Adj R-squared = 0.7849 Total | 202562.275 40,496 5.00203167 Root MSE = 1.0372 ------------------------------------------------------------------------------ y | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- tau | -9.582966 .0429859 -222.93 0.000 -9.667219 -9.498712 | tau#c.rho | 0 | -4.528388 .0235614 -192.20 0.000 -4.574568 -4.482207 1 | 4.051024 .0829701 48.83 0.000 3.888401 4.213648 | x | .0640858 .0049767 12.88 0.000 .0543313 .0738403 _cons | 3.418386 .0156238 218.79 0.000 3.387763 3.44901 ------------------------------------------------------------------------------
Begin by computing likelihoods of treatment.
Ideally, this would be done on an industry-by-industry or industry-quarter by industry-quarter basis. To save some computation time, I'll run it all in one go. This is fine because the treatment propensity simulation above does not use industry-specific parameters to calculate treatment likelihoods.
// logit
logit tau x
// store probabilities
predict phat, pr
// store regression weight
gen wt = tau/phat + (1-tau)/(1-phat)
Iteration 0: log likelihood = -11341.123 Iteration 1: log likelihood = -8458.2127 Iteration 2: log likelihood = -7160.0203 Iteration 3: log likelihood = -7088.6004 Iteration 4: log likelihood = -7088.2951 Iteration 5: log likelihood = -7088.2951 Logistic regression Number of obs = 40,497 LR chi2(1) = 8505.66 Prob > chi2 = 0.0000 Log likelihood = -7088.2951 Pseudo R2 = 0.3750 ------------------------------------------------------------------------------ tau | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- x | -1.99339 .0298281 -66.83 0.000 -2.051852 -1.934928 _cons | -.0128757 .0311895 -0.41 0.680 -.0740061 .0482546 ------------------------------------------------------------------------------ (39,503 missing values generated) (39,503 missing values generated)
Impose overlap with a maxima-minima rule for estimated treatment probabilities.
// overlap
qui su phat if tau == 0
local min0 = `r(min)'
local max0 = `r(max)'
qui su phat if tau == 1
local min1 = `r(min)'
local max1 = `r(max)'
replace wt = . if phat < max(`min0',`min1') | phat > min(`max0',`max1')
(4,770 real changes made, 4,770 to missing)
Now, within a particular industry-quarter, the parameter $\rho$ is fixed. This allows us to freely estimate $E[y(\tau=1,\rho)|\rho]$ and $E[\tau=0,\rho)|\rho]$. We will include $x$ as a covariate in the industry-quarter by industry-quarter regressions so that the expected outcomes we estimate are IPWRA estimators and thus doubly-robust.
// within-group expected outcomes
qui gen y1 = .
qui gen y0 = .
egen grp = group(j t)
qui levelsof grp, local(glevels)
foreach g of local glevels{
qui{
cap reg y tau [pw = wt] if grp == `g'
if _rc ~= 0{
continue
}
gen tau_orig = tau
replace tau = 1
predict y1tmp if e(sample)
replace tau = 0
predict y0tmp if e(sample)
replace tau = tau_orig
replace y1 = y1tmp if grp == `g'
replace y0 = y0tmp if grp == `g'
drop tau_orig y*tmp
}
}
// mean group outcomes
bys grp: egen Ey0 = mean(y0)
bys grp: egen Ey1 = mean(y1)
(2,700 missing values generated) (2,700 missing values generated)
To estimate the dose response functions $E[y(1,\rho)]$ and $E[y(0,\rho)]$ over industry-quarter level data, one could either use $\rho$ as the continous treatment variable or a standardized form of $\rho$. In the case of the former, because it is bounded over $[0,1]$ and likely to be skewed, special care needs to be taken with specifying the link function used in the dose response estimation. It turns out that it is more effective to standardize $\rho$ (done below) and then un-standardize it post-estimation because of the good performance that we get out of the usual Gaussian distribution with identity link function.
// normalize rho
su rho if _first, d
cap drop lrho
lnskew0 rho_deskewed = rho
scalar lnskew0_k = `r(gamma)'
qui su rho_deskewed, d
scalar lnskew_sd = `r(sd)'
scalar lnskew_mu = `r(mean)'
replace rho_deskewed = (rho_deskewed-lnskew_mu)/lnskew_sd
su rho_deskewed if _first, d
rho ------------------------------------------------------------- Percentiles Smallest 1% .04 0 5% .1 0 10% .16 0 Obs 1,600 25% .3 0 Sum of wgt. 1,600 50% .54 Mean .53455 Largest Std. dev. .2712384 75% .76 .98 90% .9 .98 Variance .0735703 95% .96 1 Skewness -.0927423 99% .98 1 Kurtosis 1.900762 Transform | k [95% conf. interval] Skewness -----------------+-------------------------------------------------- ln(-rho-k) | -4.434525 (not calculated) -.0000362 (80,000 real changes made) ln(-rho+4.434525) ------------------------------------------------------------- Percentiles Smallest 1% -1.709036 -1.792517 5% -1.626036 -1.792517 10% -1.379873 -1.709036 Obs 1,600 25% -.8213747 -1.709036 Sum of wgt. 1,600 50% .0146534 Mean 4.94e-09 Largest Std. dev. 1.000306 75% .8744471 1.881574 90% 1.353232 1.881574 Variance 1.000613 95% 1.553642 1.881574 Skewness -.0000364 99% 1.751296 1.881574 Kurtosis 1.888488
We can now estimate a dose response function on $E[y(0,\rho)]$ with covariate $w$. Recall that $x$ predicts firm-level treatment and was correlated with $w$, so $w$ should predict industry-quarter level treatment $\rho$. To speed things along, I've specified flag(0)
to turn off covariate balancing tests (since we know that, by construction, w
is sufficient to balance over $\rho$).
// drf on y0
qui{
cap restore
preserve
keep if _first == 1
drf w, outcome(Ey0) treatment(rho_deskewed) method(iwkernel) npoints(100) common(1) flag(0)
matrix x = e(tvec)
local npoints `=rowsof(x)'
matrix y0_hat = e(b)
mat y0_hat = y0_hat[1,1..`npoints']'
svmat x, name(rho_points)
svmat y0_hat, name(y0_points)
replace rho_points1 = -1*lnskew0_k-exp(rho_points1*lnskew_sd+lnskew_mu)
gen ytrue = b0 + b3*log(.1+rho_points1)
twoway (scatter y0_points rho_points1) (scatter ytrue rho_points1)
restore
}
file C:/Users/james/.stata_kernel_cache/graph0.svg saved as SVG format file C:/Users/james/.stata_kernel_cache/graph0.pdf saved as PDF format
We do the same for $E[y(1,\rho)]$.
// drf on y1
qui{
preserve
keep if _first == 1
drf w, outcome(Ey1) treatment(rho_deskewed) method(iwkernel) npoints(100) common(1) flag(0)
matrix x = e(tvec)
local npoints `=rowsof(x)'
matrix y1_hat = e(b)
mat y1_hat = y1_hat[1,1..`npoints']'
svmat x, name(rho_points)
svmat y1_hat, name(y1_points)
replace rho_points1 = -1*lnskew0_k-exp(rho_points1*lnskew_sd+lnskew_mu)
gen ytrue = b0 + b1 + b2*log(.1+rho_points1)
twoway (scatter y1_points rho_points1) (scatter ytrue rho_points1)
restore
}
file C:/Users/james/.stata_kernel_cache/graph1.svg saved as SVG format file C:/Users/james/.stata_kernel_cache/graph1.pdf saved as PDF format
The estimated effects for $\tau=0$ and for $\tau=1$ firms line up very well with their true values. The only failure is at the tail end of the distribution for $\rho$. Given that the dose response function is estimated nonparameterically, failures near the edge of the distribution are expected.