Regularized optimal transport
The gravity equation
Generalized linear models
Pseudo-Poisson maximum likelihood estimation
Anderson and van Wincoop (2003). "Gravity with Gravitas: A Solution to the Border Puzzle". American Economic Review.
Head and Mayer (2014). "Gravity Equations: Workhorse, Toolkit and Cookbook". Handbook of International Economics.
Choo and Siow (2005). "Who marries whom and why". Journal of Political Economy.
Gourieroux, Trognon, Monfort (1984). "Pseudo Maximum Likelihood Methods: Theory". Econometrica.
McCullagh and Nelder (1989). Generalized Linear Models. Chapman and Hall/CRC.
Santos Silva and Tenreyro (2006). "The Log of Gravity". Review of Economics and Statistics.
Yotov et al. (2011). An advanced guide to trade policy analysis. WTO.
Guimares and Portugal (2012). "Real Wages and the Business Cycle: Accounting for Worker, Firm, and Job Title Heterogeneity". AEJ: Macro.
Dupuy and G (2014), "Personality traits and the marriage market". Journal of Political Economy.
Dupuy, G and Sun (2019), "Estimating matching affinity matrix under low-rank constraints". Information and Inference.
Carlier, Dupuy, Galichon and Sun "SISTA: learning optimal transport costs under sparsity constraints." Communications on Pure and Applied Mathematics (forthcoming).
The gravity equation is a very useful tool for explaining trade flows by various measures of proximity between countries.
A number of regressors have been proposed. They include: geographic distance, common official languague, common colonial past, share of common religions, etc.
The dependent variable is the volume of exports from country $i$ to country $n$, for each pair of country $\left( i,n\right)$.
Today, we shall see a close connection between gravity models of international trade and separable matching models.
To start with, let's load some of the libraries we shall need.
import numpy as np
import os
import pandas as pd
import string as str
import math
import sys
import time
from scipy import optimize, special
# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here
import gurobipy as grb
And let's load our data, which comes from the book An Advanced Guide to Trade Policy Analysis: The Structural Gravity Mode, by Yotov et al. We will estimate the gravity model using optimal transport as well as using Poisson regression.
#thepath = os.path.join(os.getcwd(),'data_mec_optim/gravity_wtodata/')
thepath = 'https://raw.githubusercontent.com/math-econ-code/mec_optim_2021-01/master/data_mec_optim/gravity_wtodata/'
tradedata = pd.read_csv(os.path.join(thepath ,'1_TraditionalGravity_from_WTO_book.csv'), sep=',')
tradedata = tradedata[['exporter', 'importer','year', 'trade', 'DIST','ln_DIST', 'CNTG', 'LANG', 'CLNY']]
tradedata.sort_values(['year','exporter','importer'], inplace = True)
tradedata.reset_index(inplace = True, drop = True)
nbt = len(tradedata['year'].unique())
nbi = len(tradedata['importer'].unique())
nbk = 4
tradedata.head()
exporter | importer | year | trade | DIST | ln_DIST | CNTG | LANG | CLNY | |
---|---|---|---|---|---|---|---|---|---|
0 | ARG | ARG | 1986 | 61288.590263 | 533.908240 | 6.280224 | 0 | 0 | 0 |
1 | ARG | AUS | 1986 | 27.764874 | 12044.574134 | 9.396370 | 0 | 0 | 0 |
2 | ARG | AUT | 1986 | 3.559843 | 11751.146521 | 9.371706 | 0 | 0 | 0 |
3 | ARG | BEL | 1986 | 96.102567 | 11305.285764 | 9.333026 | 0 | 0 | 0 |
4 | ARG | BGR | 1986 | 3.129231 | 12115.572046 | 9.402246 | 0 | 0 | 0 |
Consistent the common practice, we only look at the flows of trade between pairs of distinct countries:
tradedata.loc[np.where(tradedata['importer']==tradedata['exporter'],True, False),['DIST', 'ln_DIST', 'CNTG', 'LANG', 'CLNY']]=0
Let's prepare the data so that we can use it. We want to construct
$D_{ni,t}^k$ which is the $k$th pairwise discrepancy measure between importer $n$ and exporter $i$ at time $t$
$X_{n,t}$ total value of expenditure of importer $n$ at time $t$
$Y_{i,t}$ total value of production of exporter $i$ at time $t$
Xhatnit = []
Dnikt = []
years = tradedata['year'].unique()
for t, year in enumerate(years):
tradedata_year = tradedata[tradedata['year']==year]
Xhatnit.append(tradedata_year.pivot(index = 'exporter', columns = 'importer', values ='trade').values)
np.fill_diagonal(Xhatnit[t],0)
Dnikt.append(tradedata_year[[ 'ln_DIST', 'CNTG', 'LANG', 'CLNY']].values)
Xnt = np.zeros((nbi,nbt))
Yit = np.zeros((nbi,nbt))
for t in range(nbt):
Xnt[:,t] = Xhatnit[t].sum(axis = 1)
Yit[:,t] = Xhatnit[t].sum(axis = 0)
totalmass_t = sum(Xhatnit).sum(axis=(0,1))/nbt
pihat_nit = Xhatnit/totalmass_t
Let's standardize the data and construct some useful objects:
meanD_k = np.asmatrix([mat.mean(axis = 0) for mat in Dnikt]).mean(axis = 0)
sdD_k = np.asmatrix([mat.std(axis = 0,ddof = 1) for mat in Dnikt]).mean(axis = 0)
Dnikt = [(mat - meanD_k)/sdD_k for mat in Dnikt]
p_nt = Xnt/totalmass_t
q_nt = Yit/totalmass_t
IX = np.repeat(1, nbi).reshape(nbi,1)
tIY = np.repeat(1, nbi).reshape(1,nbi)
f_nit = []
g_nit = []
for t in range(nbt):
f_nit.append(p_nt[:,t].reshape(nbi,1).dot(tIY))
g_nit.append(IX.dot(q_nt[:,t].reshape(1,nbi)))
Consider the optimal transport duality
\begin{align*} \max_{\pi\in\mathcal{M}\left( P,Q\right) }\sum_{xy}\pi_{xy}\Phi_{xy}=\min_{u_{x}+v_{y}\geq\Phi_{xy}}\sum_{x\in\mathcal{X}}p_{x}u_{x}+\sum_{y\in\mathcal{Y}}q_{y}v_{y} \end{align*}Now let's assume that we are adding an entropy to the primal objective function. For any $\sigma>0$, we get
\begin{align*} & \max_{\pi\in\mathcal{M}\left( P,Q\right) }\sum_{xy}\pi_{xy}\Phi_{xy}-\sigma\sum_{xy}\pi_{xy}\ln\pi_{xy}\\ & =\min_{u,v}\sum_{x\in\mathcal{X}}p_{x}u_{x}+\sum_{y\in\mathcal{Y}}q_{y}v_{y}+\sigma\sum_{xy}\exp\left( \frac{\Phi_{xy}-u_{x}-v_{y}-\sigma}{\sigma}\right) \end{align*}The latter problem is an unconstrained convex optimization problem. But the most efficient numerical computation technique is often coordinate descent, i.e. alternate between minimization in $u$ and minimization in $v$.
Maximize wrt to $u$ yields
\begin{align*} e^{-u_{x}/\sigma}=\frac{p_{x}}{\sum_{y}\exp\left( \frac{\Phi_{xy}-v_{y}-\sigma}{\sigma}\right) } \end{align*}and wrt $v$ yields
\begin{align*} e^{-v_{y}/\sigma}=\frac{q_{y}}{\sum_{x}\exp\left( \frac{\Phi_{xy}-v_{y}-\sigma}{\sigma}\right) } \end{align*}It is called the "iterated projection fitting procedure" (ipfp), aka "matrix scaling", "RAS algorithm", "Sinkhorn-Knopp algorithm", "Kruithof's method", "Furness procedure", "biproportional fitting procedure", "Bregman's procedure". See survey in Idel (2016).
Maybe the most often reinvented algorithm in applied mathematics. Recently rediscovered in a machine learning context.
The goal is to estimate the matching surplus $\Phi_{xy}$. For this, take a linear parameterization
\begin{align*} \Phi_{xy}^{\beta}=\sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}. \end{align*}Following Choo and Siow (2006), Galichon and Salanie (2011) introduce logit heterogeneity in individual preferences and show that the equilibrium now maximizes the regularized Monge-Kantorovich problem
\begin{align*} W\left( \beta\right) =\max_{\pi\in\mathcal{M}\left( P,Q\right) }\sum_{xy}\pi_{xy}\Phi_{xy}^{\beta}-\sigma\sum_{xy}\pi_{xy}\ln\pi_{xy} \end{align*}By duality, $W\left( \beta\right) $ can be expressed
\begin{align*} W\left( \beta\right) =\min_{u,v}\sum_{x}p_{x}u_{x}+\sum_{y}q_{y}v_{y}+\sigma\sum_{xy}\exp\left( \frac{\Phi_{xy}^{\beta}-u_{x}-v_{y}-\sigma}{\sigma}\right) \end{align*}and w.l.o.g. can set $\sigma=1$ and drop the additive constant $-\sigma$ in the $\exp$.
We observe the actual matching $\hat{\pi}_{xy}$. Note that $\partial W/ \partial\beta^{k}=\sum_{xy}\pi_{xy}\phi_{xy}^{k},$\ hence $\beta$ is estimated by running
\begin{align*} \min_{u,v,\beta}\sum_{x}p_{x}u_{x}+\sum_{y}q_{y}v_{y}+\sum_{xy}\exp\left(\Phi_{xy}^{\beta}-u_{x}-v_{y}\right) -\sum_{xy,k}\hat{\pi}_{xy}\beta_{k}\phi_{xy}^{k} \end{align*}
which is still a convex optimization problem.
This is actually the objective function of the log-likelihood in a Poisson regression with $x$ and $y$ fixed effects, where we assume
\begin{align*} \pi_{xy}|xy\sim Poisson\left( \exp\left( \sum_{k=1}^{K}\beta_{k}\phi _{xy}^{k}-u_{x}-v_{y}\right) \right) . \end{align*}Let $\theta=\left( \beta,u,v\right) $ and $Z=\left( \phi,D^{x},D^{y}\right) $ where $D_{x^{\prime}y^{\prime}}^{x}=1\left\{ x=x^{\prime}\right\} $ and $D_{x^{\prime}y^{\prime}}^{y}=1\left\{ y=y^{\prime}\right\}$ are $x$-and $y$-dummies. Let $m_{xy}\left( Z;\theta\right) =\exp\left(\theta^{\intercal}Z_{xy}\right) $ be the parameter of the Poisson distribution.
The conditional likelihood of $\hat{\pi}_{xy}$ given $Z_{xy}$ is
\begin{align*} l_{xy}\left( \hat{\pi}_{xy};\theta\right) & =\hat{\pi}_{xy}\log m_{xy}\left( Z;\theta\right) -m_{xy}\left( Z;\theta\right) \\ & =\hat{\pi}_{xy}\left( \theta^{\intercal}Z_{xy}\right) -\exp\left(\theta^{\intercal}Z_{xy}\right) \\ & =\hat{\pi}_{xy}\left( \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right) -\exp\left( \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right) \end{align*}Summing over $x$ and $y$, the sample log-likelihood is
\begin{align*} \sum_{xy}\hat{\pi}_{xy}\sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-\sum_{x}p_{x}u_{x}-\sum_{y}q_{y}v_{y}-\sum_{xy}\exp\left( \sum_{k=1}^{K}\beta_{k}\phi_{xy}^{k}-u_{x}-v_{y}\right) \end{align*}hence we recover the objective function.
If $\pi_{xy}|xy$ is Poisson, then $\mathbb{E}\left[\pi_{xy}\right]=m_{xy}\left( Z_{xy};\theta\right) =\mathbb{V}ar\left( \pi_{xy}\right) $. While it makes sense to assume the former equality, the latter is a rather strong assumption.
For estimation purposes, $\hat{\theta}$ is obtained by
\begin{align*} \max_{\theta}\sum_{xy}l\left( \hat{\pi}_{xy};\theta\right) =\sum_{xy}\left(\hat{\pi}_{xy}\left( \theta^{\intercal}Z_{xy}\right) -\exp\left(\theta^{\intercal}Z_{xy}\right) \right) \end{align*}however, for inference purposes, one shall not assume the Poisson distribution. Instead
\begin{align*} \sqrt{N}\left( \hat{\theta}-\theta\right) \Longrightarrow\left(A_{0}\right) ^{-1}B_{0}\left( A_{0}\right) ^{-1} \end{align*}where $N=\left\vert \mathcal{X}\right\vert \times\left\vert \mathcal{Y}\right\vert $ and $A_{0}$ and $B_{0}$ are estimated by
\begin{align*} \hat{A}_{0} & =N^{-1}\sum_{xy}D_{\theta\theta}^{2}l\left( \hat{\pi}_{xy};\hat{\theta}\right) =N^{-1}\sum_{xy}\exp\left( \hat{\theta}^{\intercal}Z_{xy}\right) Z_{xy}Z_{xy}^{\intercal}\\ \hat{B}_{0} & =N^{-1}\sum_{xy}\left( \hat{\pi}_{xy}-\exp\left( \hat{\theta}^{\intercal}Z_{xy}\right) \right) ^{2}Z_{xy}Z_{xy}^{\intercal}. \end{align*}"Structural gravity equation" (Anderson and van Wincoop, 2003) as exposited in Head and Mayer (2014) handbook chapter:
\begin{align*} X_{ni}=\underset{S_{i}}{\underbrace{\frac{Y_{i}}{\Omega_{i}}}}\underset{M_{n}}{\underbrace{\frac{X_{n}}{\Psi_{n}}}}\Phi_{ni}% \end{align*}where $n$=importer, $i$=exporter, $X_{ni}$=trade flow from $i$ to $n$, $Y_{i}=\sum_{n}X_{ni}$ is value of production, $X_{n}=\sum_{i}X_{ni}$ is importers' expenditures, and $\phi_{ni}$=bilateral accessibility of $n$ to $i$.
$\Omega_{i}$ and $\Psi_{n}$ are \textquotedblleft multilateral resistances\textquotedblright, satisfying the set of implicit equations
\begin{align*} \Psi_{n}=\sum_{i}\frac{\Phi_{ni}Y_{i}}{\Omega_{i}}\text{ and }\Omega_{i}% =\sum_{n}\frac{\Phi_{ni}X_{n}}{\Psi_{n}}% \end{align*}These are exactly the same equations as those of the regularized OT.
Parameterize $\Phi_{ni}=\exp\left( \sum_{k=1}^{K}\beta_{k}D_{ni}^{k}\right) $, where the $D_{ni}^{k}$ are $K$ pairwise measures of distance between $n$ and $i$. We have
\begin{align*} X_{ni}=\exp\left( \sum_{k=1}^{K}\beta_{k}D_{ni}^{k}-s_{i}-m_{n}\right) \end{align*}where fixed effects $s_{i}=-\ln S_{i}$ and $m_{n}=-\ln M_{n}$ are adjusted by
\begin{align*} \sum_{i}X_{ni}=Y_{i}\text{ and }\sum_{n}X_{ni}=X_{n}. \end{align*}Standard choices of $D_{ni}^{k}$'s:
Logarithm of bilateral distance between $n$ and $i$
Indicator of contiguous borders; of common official language; of
colonial ties
Trade policy variables: presence of a regional trade agreement; tariffs
Could include many other measures of proximity, e.g. measure of genetic/cultural distance, intensity of communications, etc.
We will solve this model by fixing a $\beta$ and solving the matching problem using IPFP. Then in an outer loop we will solve for the $\beta$ which minimizes the distance between model and empirical moments.
sigma = 1
maxiterIpfp = 1000
maxiter = 500
tolIpfp = 1e-12
tolDescent = 1e-12
t_s = 0.03
iterCount = 0
contIter = True
v_it = np.zeros((nbi, nbt))
beta_k = np.repeat(0, nbk)
thegrad = np.repeat(0, nbk)
pi_nit = []
theval_old = -math.inf
ptm = time.time()
while(contIter):
#print("Iteration", iterCount)
for t in range(nbt):
#print("Year", t)
D_ij_k = Dnikt[t]
Phi = D_ij_k.dot(beta_k.reshape(nbk,1)).reshape(nbi,nbi)
contIpfp = True
iterIpfp = 0
v = v_it[:, t].reshape(1,nbi)
f = f_nit[t]
g = g_nit[t]
K = np.exp(Phi/sigma)
np.fill_diagonal(K,0)
fK = np.multiply(f,K)
gK = np.multiply(g,K)
while(contIpfp):
iterIpfp = iterIpfp + 1
u = sigma * np.log(np.sum(np.multiply(gK,np.exp((-IX.dot(v))/sigma)), axis = 1)).flatten()
vnext = sigma * np.log(np.sum(np.multiply(fK,np.exp((-u.T.dot(tIY))/sigma)), axis = 0))
error = np.max(np.abs(np.sum(np.multiply(gK,np.exp((-IX.dot(vnext) - u.T.dot(tIY))/sigma)), axis = 1) - 1))
if (error < tolIpfp or iterIpfp >= maxiterIpfp):
contIpfp = False
v = vnext
v_it[:,t] = np.asarray(v)[0]
fgK = np.multiply(f,gK)
pi_nit.append(np.multiply(fgK,np.exp((-IX.dot(v) - u.T.dot(tIY))/sigma)))
thegrad = thegrad + (pi_nit[t]-pihat_nit[t]).flatten(order = 'F').dot(D_ij_k)
beta_k = beta_k - t_s * thegrad
nonzero_pi_nit = np.concatenate(pi_nit).ravel()[np.where(np.concatenate(pi_nit).ravel()>0, True, False)]
theval = float(np.sum(np.multiply(thegrad,beta_k), axis = 1)) - sigma * float(np.sum(np.multiply(nonzero_pi_nit, np.log(nonzero_pi_nit)),axis=(0,1)))
iterCount = iterCount + 1
if (iterCount > maxiter or np.abs(theval - theval_old) < tolDescent):
contIter = False
theval_old = theval
thegrad = np.repeat(0, nbk)
pi_nit = []
diff = time.time() - ptm
print('Time elapsed = ', diff, 's.')
Time elapsed = 8.417711734771729 s.
beta_k = beta_k/sdD_k
print(beta_k)
[[-0.84092368 0.43744866 0.2474767 -0.22249036]]
We recover the PPML estimates on Table 1 p. 42 of Yotov et al.'s book
We can compare the results and speed of our computation to that of Poisson regression packages. As a warning, these give the same results, but at the cost of a much longer run time, so use at your own risk. We can solve instead using the Poisson regression from the glm package. XXXXXXXXXXXX