#!/usr/bin/env python
# coding: utf-8
# #
Block 7b: Characteristics-based models of demand
# ### Alfred Galichon (NYU & Sciences Po)
# ## 'math+econ+code' masterclass on optimal transport and economic applications
# © 2018-2021 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274, and contributions by Jules Baudet, Pauline Corblet, Gregory Dannay, and James Nesbit are acknowledged.
#
# #### With python code examples
# ### Learning objectives
#
# * Beyond GEV: the pure characteristics models, the random coefficient logit model, the probit model
#
# * Simulation methods: Accept-Reject and SARS
#
# * Demand inversion: The inversion theorem
# ### References
#
# * Galichon (2016). *Optimal Transport Methods in Economics*, Chapter 9.2, Princeton University Press
#
# * McFadden (1981). "Econometric Models of Probabilistic Choice", in C.F. Manski and D. McFadden (eds.), *Structural analysis of discrete data with econometric applications*, MIT Press.
#
# * Berry, Levinsohn, and Pakes (1995). "Automobile Prices in Market Equilibrium," *Econometrica*.
#
# * Train. (2009). *Discrete Choice Methods with Simulation*. 2nd Edition. Cambridge University Press.
#
# * Galichon and Salanie (2020). "Cupid's Invisible Hands". Preprint (first version 2011).
#
# * Chiong, Galichon and Shum (2016), "Duality in Discrete Choice Models". *Quantitative Economics*.
#
# * Bonnet, Galichon, O'Hara and Shum (2017). "Yogurts choose consumers? Identification of Random Utility Models via Two-Sided Matching". Working paper.
# ### Libraries
#
# Let's start by loading the libraries we shall need for this course.
# In[1]:
import numpy as np
import scipy.sparse as spr
# !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here
import gurobipy as grb
# ## Datasets
#
# Now we load the dataset we shall use.
#
# XXX Load BLP's dataset here XXX.
# ## Choice models beyond GEV
# ## The need for further models
#
# The GEV models are convenient analytically, but not very flexible.
#
# The logit model imposes zero correlation across alternatives
#
# The nested logit allows for nonzero correlation, but in a very rigid way (needs to define nests).
#
# A good example is the probit model, where $\varepsilon$ is a Gaussian vector. For this model, there is no close-form solution neither for $G$ nor for $G^*$.
#
# More recently, a number of modern models don't have closed-form either. These models require simulation methods in order to approximate them by discrete models.
# ## The pure characteristics model
#
# #### Motivation
#
# The pure characteristics model (Berry and Pakes, 2007) can be motivated as follows. Assume $y$ stands for the number of bedrooms. The logit model would assume that the random utility associated with a 2-BR is uncorrelated with a 3-BR, which is not realistic.
#
# Let $\xi_{y}$ is the typical size of a bedroom of size $y$, one may introduce $\epsilon$ as the valuation of size; in which case the utility shock associated with $y$ should be $\varepsilon_{y}=\epsilon\xi_{y}$. More generally, the characteristics $\xi_{y}$ is a $d$-dimensional (deterministic) vector, and $\epsilon\sim\mathbf{P}_{\epsilon}$ is a (random) vector of the same size standing for the valuations of the respective dimensions, so that
#
# \begin{align*}
# \varepsilon_{y}=\epsilon^{\intercal}\xi_{y}.
# \end{align*}
#
# For example, if each alternative $y$ stands for a model of car, the first component of $\xi_{y}$ may be the price of car $y$; the other components may be other characteristics such as number of seats, fuel efficiency, size, etc. In that case, for a given dimension $y\in\mathcal{Y}_{0}$, $\epsilon_{y}$ is the (random) valuation of this dimension by the consumer with taste vector $\epsilon$.
# #### Definition
#
# Assume without loss of generality that $\varepsilon_{y}=0$, that is $\xi_{0}=0$ as we can always reduce the setting to this case by replacing $\xi_{y}$ by $\xi_{y}-\xi_{0}$.
#
# Letting $Z$ be the $\left\vert \mathcal{Y}\right\vert \times d\,$\ matrix of $\left( y,k\right) $-term $\xi_{y}^{k}$, this rewrites as
#
# \begin{align*}
# \varepsilon = Z\epsilon.
# \end{align*}
#
# Hence, we have
#
# \begin{align*}
# G\left( U\right) =\mathbb{E}\left[ \max\left\{ U+Z\epsilon,0\right\}\right].
# \end{align*}
#
# and
#
# \begin{align*}
# \sigma_{y}\left( U\right) =\Pr\left( U_{y}-U_{z}\geq\left( Z\epsilon\right)_{y}-\left( Z\epsilon\right)_{z}, \quad forall z\in\mathcal{Y}_{0}\backslash\left\{ y\right\} \right).
# \end{align*}
#
# #### In dimension 1
#
# When $d=1$ (scalar characteristics), one has $\sigma_{y}\left(U\right) =\Pr\left( U_{y}-U_{z}\geq\left( \xi_{y}-\xi_{z}\right)\epsilon~\forall z\in\mathcal{Y}_{0}\backslash\left\{ y\right\} \right) $, and thus
#
# \begin{align*}
# \sigma_{y}\left( U\right) =\Pr\left( \max_{z:\xi_{y}>\xi_{z}}\left\{\frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\} \leq\epsilon\leq\min_{z:\xi_{y}<\xi_{z}}\left\{ \frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\} \right)
# \end{align*}
#
# with the understanding that $\max_{z\in\emptyset}f_{z}=-\infty$ and
# $\min_{z\in\emptyset}f_{z}=+\infty$.
#
# Therefore, letting $\mathbf{F}_{\epsilon}$ be the c.d.f. associated with the distribution of $\epsilon$, one has a closed-form expression for $\sigma_{y}$:
#
# \begin{align*}
# \sigma_{y}\left( U\right) =\mathbf{F}_{\epsilon}\left( \left[ \max_{z:\xi_{y}>\xi_{z}}\left\{ \frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\},\min_{z:\xi_{y}<\xi_{z}}\left\{ \frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\}\right] \right)
# \end{align*}
# ### The probit model
#
# When $\mathbf{P}_{\epsilon}$ is the $\mathcal{N}\left( 0,S\right) $
# distribution, then the pure characteristics model is called a Probit model; in this case,
#
# \begin{align*}
# \varepsilon\sim\mathcal{N}\left( 0,\Sigma\right) \text{ where }%
# \Sigma=ZSZ^{\intercal}.
# \end{align*}
#
# Note the distribution $\varepsilon\,$will not have full support unless $d\geq\left\vert \mathcal{Y}\right\vert $ and $Z$ is of full rank.
#
# Computing $\sigma$ in the Probit model thus implies computing the mass assigned by the Gaussian distribution to rectangles of the type
#
# \begin{align*}
# \left[ l_{y},u_{y}\right] .
# \end{align*}
#
# When $\Sigma$ is diagonal (random utility terms are i.i.d. across alternatives), this is numerically easy. However, this is computationally difficult in general (more on this later).
# ### The random coefficient logit model
#
# The random coefficient logit model (Berry, Levinsohn and Pakes, 1995) may be viewed as an interpolant between the random characteristics model and the logit model. In this case,
#
# \begin{align*}
# \varepsilon=\left( 1-\lambda\right) Z\epsilon+\lambda\eta
# \end{align*}
#
# where $\epsilon\sim\mathbf{P}_{\epsilon}$, $\eta$ is an EV1 distribution independent from the previous term, and $\lambda$ is a interpolation parameter ($\lambda=1$ is the logit model, and $\lambda=0$ is the pure characteristics model).
#
# In this case, one may compute the Emax operator as
#
# \begin{align*}
# G\left( U\right) & =\mathbb{E}\left[ \max_{y\in\mathcal{Y}_{0}}\left\{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}+\lambda\eta_{y}\right\} \right] \\
# & =\mathbb{E}\left[ \mathbb{E}\left[ \max_{y\in\mathcal{Y}_{0}}\left\{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}+\lambda\eta_{y}\right\} |\epsilon\right] \right] \\
# & =\mathbb{E}\left[ \lambda\log\sum_{y\in\mathcal{Y}_{0}}\exp\left(
# \frac{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}}{\lambda}\right) \right]
# \end{align*}
#
# Recall
#
# \begin{align*}
# G\left( U\right) =\mathbb{E}\left[ \lambda\log\sum_{y\in\mathcal{Y}_{0}}\exp\left( \frac{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right){y}}{\lambda}\right) \right] .
# \end{align*}
#
# The demand map in the random coefficients logit model is obtained by derivation of the expression of the Emax, i.e.
#
# \begin{align*}
# \sigma_{y}\left( U\right) =\mathbb{E}\left[ \frac{\exp\left( \frac{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}}{\lambda}\right) }{\sum_{y^{\prime}\in\mathcal{Y}_{0}}\exp\left( \frac{U_{y^{\prime}}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y^{\prime}}}{\lambda}\right) }\right] .
# \end{align*}
# ## Simulation methods
#
# In a number of cases, one cannot compute the choice probabilities $\sigma\left( U\right)$ using a closed-form expression. In this case, we need to resort to simulation to compute $G$, $G^{\ast}$, $\sigma$ and $\sigma^{-1}$.
#
# The idea is that:
#
# * One is able to compute $G$ and $G^{\ast}$ for discrete distributions (more on this later)
#
# * The sampled versions of $G$, $G^{\ast}$, $\sigma$ and $\sigma^{-1}$ converge to the populations objects when the sample size is large.
# ### Accept-reject simulator
#
# One simulates $N$ points $\varepsilon^{i}\sim P$. The Emax operator associated with the empirical sample distribution $P_{N}$ is
#
# \begin{align*}
# G_{N}=N^{-1}\sum_{i=1}^{N}\max_{y\in\mathcal{Y}}\left\{ U_{y} + \varepsilon_{y}^{i}\right\}
# \end{align*}
#
# and the demand map is given by
#
# \begin{align*}
# \sigma_{N,y}\left( U\right) =N^{-1}\sum_{i=1}^{N}1\left\{ U_{y} + \varepsilon_{y}^{i}\geq U_{z}+\varepsilon_{z}^{i}, \quad \forall z\in\mathcal{Y}
# _{0}\right\}
# \end{align*}
#
# In the literature, $\sigma_{N}$ is called the *accept-reject simulator*.
# **Example**. We shall code the AR simulator for the probit model and generate choice probabilities. Take a vector of systematic utilities:
# In[2]:
seed = 777
nbDraws = 1000
U_y = np.array([0.4, 0.5, 0.2, 0.3, 0.1, 0])
nbY = len(U_y)
# We shall specify a probit model where alternatives have correlation $\rho = .5$.
# In[3]:
rho = 0.5
Covar = rho * np.ones((nbY, nbY)) + (1 - rho) * np.eye(nbY)
E = np.linalg.eigh(Covar)
V = E[0]
Q = E[1]
SqrtCovar = Q.dot(np.diag(np.sqrt(V))).dot(Q.T)
# Now generate the $\varepsilon_{iy}$'s:
# In[4]:
epsilon_iy = np.random.normal(0,1,nbDraws*nbY).reshape(nbDraws,nbY).dot(SqrtCovar)
u_iy = epsilon_iy + U_y
ui = np.max(u_iy, axis=1)
s_y = np.sum((u_iy.T - ui).T == 0, axis=0) / nbDraws
# ### McFadden's SARS
#
# McFadden's smoothed accept-reject simulator (SARS) consists in sampling $\varepsilon\sim P$: $\varepsilon^{1},...,\varepsilon^{N}$, and replacing the max by the smooth-max
#
# \begin{align*}
# \sigma_{N,T,y}\left( U\right) =\sum_{i=1}^{N}\frac{1}{N}\frac{\exp\left((U_{y}+\varepsilon_{y}^{i})/T\right) }{\sum_{z}\exp\left( (U_{z}+\varepsilon_{z}^{i})/T\right)}
# \end{align*}
#
# One seeks $U$ so that the induced choice probabilities are $s$, that is
#
# \begin{align*}
# s_{y}=\sum_{i=1}^{N}\frac{1}{N}\frac{\exp\left( (U_{y}+\varepsilon_{y}^{i})/T\right) }{\sum_{z}\exp\left( (U_{z}+\varepsilon_{z}^{i})/T\right)}.
# \end{align*}
#
# The associated Emax operator is
#
# \begin{align*}
# G_{N,T}\left( U\right) =\mathbb{E}_{\mathbf{P}_{N}}\left[G_{\operatorname{logit}}\left( U+\varepsilon^{i}\right) \right]
# \end{align*}
#
# so the underlying random utility structure is a random coefficient logit.
# # The inversion theorem
# The following theorem, first shown in Galichon and Salanié (2011), shows that the inversion of discrete choice models is an optimal transport problem.
#
# ---
# **Theorem** [Galichon and Salanie]
#
# Consider a solution $\left(u\left( \varepsilon\right),v_{y}\right) $ to the dual Monge-Kantorovich problem with cost $\Phi\left(\varepsilon,y\right) =\varepsilon_{y}$, that is:
#
#
# \begin{align*}
# \min_{u,v} & \int u\left( \varepsilon\right) d\mathbf{P}\left(
# \varepsilon\right) +\sum_{y\in\mathcal{Y}_{0}}v_{y}s_{y}\\
# s.t.~ & u\left( \varepsilon\right) +v_{y}\geq\Phi\left( \varepsilon
# ,y\right)
# \end{align*}
#
# Then:
#
# * $U=\sigma^{-1}\left( s\right) $ is given by $U_{y}=v_{0}-v_{y}$.
#
# * The value of the [MK dual](#MKDualDC) is $-G^{\ast}\left( s\right) $.
#
# **Proof**
#
# $\sigma^{-1}\left( s\right) =\arg\max_{U:U_{0}=0}\left\{ \sum_{y\in\mathcal{Y}}s_{y}U_{y}-G(U)\right\} $, thus, letting $v=-U$, $v$ is the solution to
#
# \begin{align*}
# \min_{v:v_{0}=0}\left\{ \sum_{y\in\mathcal{Y}_{0}}s_{y}v_{y}+G(-v)\right\}
# \end{align*}
# which is exactly the [MK dual](#MKDualDC).
#
# ### Inversion of the pure characteristics model
#
# It follows from the inversion theorem that the problem of demand inversion in the pure characteristics model is a semi-discrete transport problem, a point made in Bonnet, Galichon, O'Hara, and Shum (2017).
#
# Indeed, the correspondence is:
#
# * an alternative $y$ is a fountain
#
# * the characteristics of an alternative is a fountain location
#
# * the systematic utility associated with alternative $y$ is minus the price of fountain $y$
#
# * the market share of altenative $y$ coindides with the capacity of fountain $y$
#
# * the random vector $\epsilon$ is the location of an inhabitant
# **Example**. As an example, we invert the probit model above.
# In[11]:
A1 = spr.kron(np.ones((1, nbY)), spr.identity(nbDraws))
A2 = spr.kron(spr.identity(nbY), np.ones((1, nbDraws)))
A = spr.vstack([A1, A2])
obj = epsilon_iy.flatten(order='F') # change this
rhs = np.ones(nbDraws)/nbDraws
rhs = np.append(rhs, s_y)
m = grb.Model('optimal')
x = m.addMVar(len(obj), name='couple')
m.setObjective(obj @ x, grb.GRB.MAXIMIZE)
m.addConstr(A @ x == rhs, name="Constr")
m.optimize()
if m.Status == grb.GRB.Status.OPTIMAL:
pi = m.getAttr('pi')
Uhat_y = np.subtract(pi[nbY + nbDraws - 1], pi[nbDraws:nbY+nbDraws])
print('U_y (true and recovered)')
print(U_y)
print(Uhat_y)
# ### McFadden's SARS and regularized Optimal Transport
#
# Cf. Bonnet, Galichon, O'Hara and Shum (2017). Let $u_{i}=T\log\sum_{z}\exp\left((U_{z}+\varepsilon_{z}^{i})/T\right) $. One has
#
# \begin{align*}
# \left\{
# \begin{array}
# [c]{l}%
# s_{y}=\sum_{i=1}^{N}\frac{1}{N}\exp\left( (U_{y}-u_{i}+\varepsilon_{y}^{i})/T\right) \\
# \frac{1}{N}=\sum_{y}\frac{1}{N}\exp\left( (U_{y}-u_{i}+\varepsilon_{y}^{i})/T\right)
# \end{array}
# \right. .
# \end{align*}
#
# As a result, $\left( u_{i},U_{y}\right) $ are the solution of the regularized OT problem
#
# \begin{align*}
# \min_{u,U}\sum_{i=1}^{N}\frac{1}{N}u_{i}-\sum s_{y}U_{y}+\sum_{i,y}\frac{1}{N}\exp\left( (U_{y}-u_{i}+\varepsilon_{y}^{i})/T\right) .
# \end{align*}
# ### BLP's contraction mapping
#
# Consider the IPFP algorithm for solving the latter problem:
#
# \begin{align*}
# \left\{
# \begin{array}
# [c]{l}
# \exp\left( u_{i}^{k+1}/T\right) =\sum_{z}\exp\left( (U_{z}^{k}%
# +\varepsilon_{z}^{i})/T\right) \\
# \exp U_{y}^{k+1}/T=\frac{Ns_{y}}{\sum_{i=1}^{N}\exp\left( (-u_{i}%
# ^{k+1}+\varepsilon_{y}^{i})/T\right) }
# \end{array}
# \right.
# \end{align*}
#
# This rewrites as
#
# \begin{align*}
# \exp U_{y}^{k+1}/T & =\frac{Ns_{y}}{\sum_{i=1}^{N}\frac{\exp\left(
# \varepsilon_{y}^{i}/T\right) }{\sum_{z}\exp\left( (U_{z}^{k}+\varepsilon
# _{z}^{i})/T\right) }},\text{ i.e.}\\
# U_{y}^{k+1} & =T\log s_{y}-T\log\sum_{i=1}^{N}\frac{1}{N}\frac{\exp\left(
# \varepsilon_{y}^{i}/T\right) }{\sum_{z}\exp\left( (U_{z}^{k}+\varepsilon
# _{z}^{i})/T\right) }
# \end{align*}
#
# which is exactly the contraction mapping algorithm of Berry, Levinsohn and Pakes (1995, appendix 1).