Foundations of Computational Economics #19¶

by Fedor Iskhakov, ANU

Measuring the volume of illegal trade with linear programming¶

https://youtu.be/4z7MU73cx0M

Description: Application of the optional transport problem.

Application: measuring illegal trade¶

“Black Market Performance: Illegal Trade in Beijing License Plates”

by Øystein Daljord, Mandy Hu, Guillaume Pouliot, Junji Xiao

From abstract:

We estimate the incentives to trade in the black market for license plates that emerged following the recent rationing of new car sales in Beijing by lottery. Under weak assumptions on car preferences, we use optimal transport methods and comprehensive data on car sales to estimate that at least 12% of the quota is illegally traded.

PDF for the paper (right-click and Save as…)

Øystein Daljord (1979-2020)¶

Black market of license plates¶

Measure the size of black market for license plates
Case of Beijing license plates regulation
Allocation by random lottery should have no effect on car sales
In reality, there is sizable shift in distribution of cars
Optimal transportation method is ideal to compute the lower bound on the volume of illegal trade of license plates

Beijing license plate lottery¶

Cars driving in Beijing are required to have Beijing license plates
From Jan 2011 license plates are rationed to a quota of about 35% of the previous year’s sales
License plates are allocated by a lottery with simple application
A Beijing household needs a license plate before it can register a new car
License plates are non-transferable

Material shift in distribution of cars¶

From cheaper to more expensive car models
Hard to explain if lottery is a truly random allocation of license plates to the car purchasers
No similar shifts in sales in comparable cities without rationing policy, in the same time period
No supply side responses to the rationing policy

Modeling framework¶

Let $ \mathbb{P}_0 $ be the distribution of car sales prices from pre-lottery time, and $ \mathbb{P}_1 $ the analogous distribution post-lottery.

Under assumptions

Pricing policy did not change between 2010 and 2011
Demand structure did not change between 2010 and 2011
Lottery is uniform

the sales distributions should not change from the pre- to the post lottery period, i.e. $ \mathbb{P}_0 = \mathbb{P}_1 $

Data¶

Data on manufacturer suggested retail prices (MSRP) of the registered vehicles.

In [1]:

import pandas as pd
dt = pd.read_stata('_static/data/beijin_data.dta')
dt.dropna(inplace=True)  # drop rows with nan
print('Data has %d observations and %d variables'%tuple(dt.shape))  # print expects tuple
print(dt.head(n=10))

Data has 243677 observations and 3 variables
   year  month        MSRP
0  2010      9  153.313139
1  2010      9   44.543519
2  2011      2   88.812069
3  2010     11  210.732564
4  2011      4  101.591900
5  2010     12   56.428979
6  2011      1  140.571004
7  2010      8  170.066283
8  2011     11  111.935614
9  2010      7  191.988099

In [2]:

print(dt['MSRP'].describe())
q99 = dt['MSRP'].quantile(0.99)
dt = dt[dt['MSRP']<q99]
print(dt['MSRP'].describe())

count    243677.000000
mean        166.374743
std         114.990191
min          19.728006
25%          91.780464
50%         134.771217
75%         209.744757
max        1130.669488
Name: MSRP, dtype: float64
count    241240.000000
mean        161.169630
std         102.551081
min          19.728006
25%          91.229416
50%         134.283479
75%         207.808600
max         617.902636
Name: MSRP, dtype: float64

In [3]:

import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [12, 8]

def plot2hist(d1,d2,bins=10,labels=['1','2']):
    '''Plots two overlapping histograms'''
    plt.hist(d1,bins=bins,density=True,histtype='step',label=labels[0])
    plt.hist(d2,bins=bins,density=True,histtype='step',label=labels[1])
    plt.legend()
    plt.show()

In [4]:

dt10 = dt[dt['year']==2010]['MSRP']
dt11 = dt[dt['year']==2011]['MSRP']
plot2hist(dt10,dt11,labels=['2010','2011'])

Optimal transport problem¶

$$ \min \sum_{i=1}^{m}\sum_{j=1}^{n} cost_{ij} x_{ij}, \text{ subject to} $$$$ \sum_{i=1}^{m} x_{ij} = origin_j, j \in \{1,\dots,n\}, $$$$ \sum_{j=1}^{n} x_{ij} = destination_i, i \in \{1,\dots,m\}, $$$$ x_{ij} \ge 0 \text{ for all } i,j $$

A linear programming problem¶

Linear programming problem solved by scipy.optimize.linprog (equality constraints automatically converted)

$$ \max(c \cdot x) \text{ subject to } $$$$ \begin{array}{l} A_{ub}x \le b_{ub} \\ A_{eq}x = b_{eq} \\ l \le x \le u \end{array} $$

stack all $ x_{ij} $ into a single vector
express equality constraints for origins and destinations as inequalities

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html

In [5]:

# Code up the model

In [6]:

#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
# Answer below

In [7]:

N = 5  # number of bins to represent distribution
dt['gr'] = pd.qcut(dt.MSRP,q=N,labels=False)  # N quantiles
gr10 = dt[dt.year==2010].groupby('gr')
gr11 = dt[dt.year==2011].groupby('gr')
d10 = gr10.MSRP.count()/dt[dt.year==2010].MSRP.count()
d11 = gr11.MSRP.count()/dt[dt.year==2011].MSRP.count()
print(d10,d11,sep='\n\n')

gr
0    0.231451
1    0.211331
2    0.205582
3    0.189273
4    0.162363
Name: MSRP, dtype: float64

gr
0    0.123699
1    0.172511
2    0.186457
3    0.226024
4    0.291310
Name: MSRP, dtype: float64

In [8]:

import numpy as np
# Set up transportation problem
costs = np.ones((N,N)) - np.eye(N)  # costs matrix
origins = np.array(d10)        # origins
destinations = np.array(d11)   # destinations
plt.rcParams['figure.figsize'] = [5, 5]
plt.spy(costs)

Out[8]:

<matplotlib.image.AxesImage at 0x7fbd40c5e350>

In [9]:

# convert to linear programming problem
C = costs.reshape(N*N)
A1 = np.kron(np.eye(N),np.ones((1,N)))  # sums of x for each origin
A2 = np.kron(np.ones((1,N)),np.eye(N))  # sums of x for each destination
A = np.vstack((A1,A2))  # concatenate vertically
plt.spy(A)
b = np.concatenate((origins,destinations))

In [10]:

# Solve the transportation problem
from scipy.optimize import linprog
res = linprog(c=C,A_eq=A[:-1],b_eq=b[:-1],bounds=(0,None),method='simplex')
print(res.message)
X = res.x.reshape((N,N)) # reshape back to X_ij
plt.spy(X)
print(X)
black_market_estim = 1 - np.diag(X).sum() # do not count the stationary diagonal
print('With N=%d the lower bound on black market share is %1.5f'%(N,black_market_estim))

Optimization terminated successfully.
[[0.12369875 0.         0.         0.         0.10775178]
 [0.         0.17251076 0.         0.01762504 0.02119497]
 [0.         0.         0.18645705 0.01912521 0.        ]
 [0.         0.         0.         0.18927336 0.        ]
 [0.         0.         0.         0.         0.16236309]]
With N=5 the lower bound on black market share is 0.16570

Displacement of the car sales distributions¶

Main result: significant evidence for a sizable black market share!
Computed the lower boundary on the fraction of illegal trade (why?)
Grain of salt: this is one of possible mechanisms, need to eliminate other possible routes (see the paper)
What robustness checks should be run? Technical parameter $ N $ clearly affects the numerical result