#!/usr/bin/env python # coding: utf-8 # #
Block 7b: Characteristics-based models of demand
# ###
Alfred Galichon (NYU & Sciences Po)
# ##
'math+econ+code' masterclass on optimal transport and economic applications
#
© 2018-2021 by Alfred Galichon. Past and present support from NSF grant DMS-1716489, ERC grant CoG-866274, and contributions by Jules Baudet, Pauline Corblet, Gregory Dannay, and James Nesbit are acknowledged.
# # ####
With python code examples
# ### Learning objectives # # * Beyond GEV: the pure characteristics models, the random coefficient logit model, the probit model # # * Simulation methods: Accept-Reject and SARS # # * Demand inversion: The inversion theorem # ### References # # * Galichon (2016). *Optimal Transport Methods in Economics*, Chapter 9.2, Princeton University Press # # * McFadden (1981). "Econometric Models of Probabilistic Choice", in C.F. Manski and D. McFadden (eds.), *Structural analysis of discrete data with econometric applications*, MIT Press. # # * Berry, Levinsohn, and Pakes (1995). "Automobile Prices in Market Equilibrium," *Econometrica*. # # * Train. (2009). *Discrete Choice Methods with Simulation*. 2nd Edition. Cambridge University Press. # # * Galichon and Salanie (2020). "Cupid's Invisible Hands". Preprint (first version 2011). # # * Chiong, Galichon and Shum (2016), "Duality in Discrete Choice Models". *Quantitative Economics*. # # * Bonnet, Galichon, O'Hara and Shum (2017). "Yogurts choose consumers? Identification of Random Utility Models via Two-Sided Matching". Working paper. # ### Libraries # # Let's start by loading the libraries we shall need for this course. # In[1]: import numpy as np import scipy.sparse as spr # !python -m pip install -i https://pypi.gurobi.com gurobipy ## only if Gurobi not here import gurobipy as grb # ## Datasets # # Now we load the dataset we shall use. # # XXX Load BLP's dataset here XXX. # ## Choice models beyond GEV # ## The need for further models # # The GEV models are convenient analytically, but not very flexible. # # The logit model imposes zero correlation across alternatives # # The nested logit allows for nonzero correlation, but in a very rigid way (needs to define nests). # # A good example is the probit model, where $\varepsilon$ is a Gaussian vector. For this model, there is no close-form solution neither for $G$ nor for $G^*$. # # More recently, a number of modern models don't have closed-form either. These models require simulation methods in order to approximate them by discrete models. # ## The pure characteristics model # # #### Motivation # # The pure characteristics model (Berry and Pakes, 2007) can be motivated as follows. Assume $y$ stands for the number of bedrooms. The logit model would assume that the random utility associated with a 2-BR is uncorrelated with a 3-BR, which is not realistic. # # Let $\xi_{y}$ is the typical size of a bedroom of size $y$, one may introduce $\epsilon$ as the valuation of size; in which case the utility shock associated with $y$ should be $\varepsilon_{y}=\epsilon\xi_{y}$. More generally, the characteristics $\xi_{y}$ is a $d$-dimensional (deterministic) vector, and $\epsilon\sim\mathbf{P}_{\epsilon}$ is a (random) vector of the same size standing for the valuations of the respective dimensions, so that # # \begin{align*} # \varepsilon_{y}=\epsilon^{\intercal}\xi_{y}. # \end{align*} # # For example, if each alternative $y$ stands for a model of car, the first component of $\xi_{y}$ may be the price of car $y$; the other components may be other characteristics such as number of seats, fuel efficiency, size, etc. In that case, for a given dimension $y\in\mathcal{Y}_{0}$, $\epsilon_{y}$ is the (random) valuation of this dimension by the consumer with taste vector $\epsilon$. # #### Definition # # Assume without loss of generality that $\varepsilon_{y}=0$, that is $\xi_{0}=0$ as we can always reduce the setting to this case by replacing $\xi_{y}$ by $\xi_{y}-\xi_{0}$. # # Letting $Z$ be the $\left\vert \mathcal{Y}\right\vert \times d\,$\ matrix of $\left( y,k\right) $-term $\xi_{y}^{k}$, this rewrites as # # \begin{align*} # \varepsilon = Z\epsilon. # \end{align*} # # Hence, we have # # \begin{align*} # G\left( U\right) =\mathbb{E}\left[ \max\left\{ U+Z\epsilon,0\right\}\right]. # \end{align*} # # and # # \begin{align*} # \sigma_{y}\left( U\right) =\Pr\left( U_{y}-U_{z}\geq\left( Z\epsilon\right)_{y}-\left( Z\epsilon\right)_{z}, \quad forall z\in\mathcal{Y}_{0}\backslash\left\{ y\right\} \right). # \end{align*} # # #### In dimension 1 # # When $d=1$ (scalar characteristics), one has $\sigma_{y}\left(U\right) =\Pr\left( U_{y}-U_{z}\geq\left( \xi_{y}-\xi_{z}\right)\epsilon~\forall z\in\mathcal{Y}_{0}\backslash\left\{ y\right\} \right) $, and thus # # \begin{align*} # \sigma_{y}\left( U\right) =\Pr\left( \max_{z:\xi_{y}>\xi_{z}}\left\{\frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\} \leq\epsilon\leq\min_{z:\xi_{y}<\xi_{z}}\left\{ \frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\} \right) # \end{align*} # # with the understanding that $\max_{z\in\emptyset}f_{z}=-\infty$ and # $\min_{z\in\emptyset}f_{z}=+\infty$. # # Therefore, letting $\mathbf{F}_{\epsilon}$ be the c.d.f. associated with the distribution of $\epsilon$, one has a closed-form expression for $\sigma_{y}$: # # \begin{align*} # \sigma_{y}\left( U\right) =\mathbf{F}_{\epsilon}\left( \left[ \max_{z:\xi_{y}>\xi_{z}}\left\{ \frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\},\min_{z:\xi_{y}<\xi_{z}}\left\{ \frac{U_{y}-U_{z}}{\xi_{y}-\xi_{z}}\right\}\right] \right) # \end{align*} # ### The probit model # # When $\mathbf{P}_{\epsilon}$ is the $\mathcal{N}\left( 0,S\right) $ # distribution, then the pure characteristics model is called a Probit model; in this case, # # \begin{align*} # \varepsilon\sim\mathcal{N}\left( 0,\Sigma\right) \text{ where }% # \Sigma=ZSZ^{\intercal}. # \end{align*} # # Note the distribution $\varepsilon\,$will not have full support unless $d\geq\left\vert \mathcal{Y}\right\vert $ and $Z$ is of full rank. # # Computing $\sigma$ in the Probit model thus implies computing the mass assigned by the Gaussian distribution to rectangles of the type # # \begin{align*} # \left[ l_{y},u_{y}\right] . # \end{align*} # # When $\Sigma$ is diagonal (random utility terms are i.i.d. across alternatives), this is numerically easy. However, this is computationally difficult in general (more on this later). # ### The random coefficient logit model # # The random coefficient logit model (Berry, Levinsohn and Pakes, 1995) may be viewed as an interpolant between the random characteristics model and the logit model. In this case, # # \begin{align*} # \varepsilon=\left( 1-\lambda\right) Z\epsilon+\lambda\eta # \end{align*} # # where $\epsilon\sim\mathbf{P}_{\epsilon}$, $\eta$ is an EV1 distribution independent from the previous term, and $\lambda$ is a interpolation parameter ($\lambda=1$ is the logit model, and $\lambda=0$ is the pure characteristics model). # # In this case, one may compute the Emax operator as # # \begin{align*} # G\left( U\right) & =\mathbb{E}\left[ \max_{y\in\mathcal{Y}_{0}}\left\{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}+\lambda\eta_{y}\right\} \right] \\ # & =\mathbb{E}\left[ \mathbb{E}\left[ \max_{y\in\mathcal{Y}_{0}}\left\{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}+\lambda\eta_{y}\right\} |\epsilon\right] \right] \\ # & =\mathbb{E}\left[ \lambda\log\sum_{y\in\mathcal{Y}_{0}}\exp\left( # \frac{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}}{\lambda}\right) \right] # \end{align*} # # Recall # # \begin{align*} # G\left( U\right) =\mathbb{E}\left[ \lambda\log\sum_{y\in\mathcal{Y}_{0}}\exp\left( \frac{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right){y}}{\lambda}\right) \right] . # \end{align*} # # The demand map in the random coefficients logit model is obtained by derivation of the expression of the Emax, i.e. # # \begin{align*} # \sigma_{y}\left( U\right) =\mathbb{E}\left[ \frac{\exp\left( \frac{U_{y}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y}}{\lambda}\right) }{\sum_{y^{\prime}\in\mathcal{Y}_{0}}\exp\left( \frac{U_{y^{\prime}}+\left( 1-\lambda\right) \left( Z\epsilon\right) _{y^{\prime}}}{\lambda}\right) }\right] . # \end{align*} # ## Simulation methods # # In a number of cases, one cannot compute the choice probabilities $\sigma\left( U\right)$ using a closed-form expression. In this case, we need to resort to simulation to compute $G$, $G^{\ast}$, $\sigma$ and $\sigma^{-1}$. # # The idea is that: # # * One is able to compute $G$ and $G^{\ast}$ for discrete distributions (more on this later) # # * The sampled versions of $G$, $G^{\ast}$, $\sigma$ and $\sigma^{-1}$ converge to the populations objects when the sample size is large. # ### Accept-reject simulator # # One simulates $N$ points $\varepsilon^{i}\sim P$. The Emax operator associated with the empirical sample distribution $P_{N}$ is # # \begin{align*} # G_{N}=N^{-1}\sum_{i=1}^{N}\max_{y\in\mathcal{Y}}\left\{ U_{y} + \varepsilon_{y}^{i}\right\} # \end{align*} # # and the demand map is given by # # \begin{align*} # \sigma_{N,y}\left( U\right) =N^{-1}\sum_{i=1}^{N}1\left\{ U_{y} + \varepsilon_{y}^{i}\geq U_{z}+\varepsilon_{z}^{i}, \quad \forall z\in\mathcal{Y} # _{0}\right\} # \end{align*} # # In the literature, $\sigma_{N}$ is called the *accept-reject simulator*. # **Example**. We shall code the AR simulator for the probit model and generate choice probabilities. Take a vector of systematic utilities: # In[2]: seed = 777 nbDraws = 1000 U_y = np.array([0.4, 0.5, 0.2, 0.3, 0.1, 0]) nbY = len(U_y) # We shall specify a probit model where alternatives have correlation $\rho = .5$. # In[3]: rho = 0.5 Covar = rho * np.ones((nbY, nbY)) + (1 - rho) * np.eye(nbY) E = np.linalg.eigh(Covar) V = E[0] Q = E[1] SqrtCovar = Q.dot(np.diag(np.sqrt(V))).dot(Q.T) # Now generate the $\varepsilon_{iy}$'s: # In[4]: epsilon_iy = np.random.normal(0,1,nbDraws*nbY).reshape(nbDraws,nbY).dot(SqrtCovar) u_iy = epsilon_iy + U_y ui = np.max(u_iy, axis=1) s_y = np.sum((u_iy.T - ui).T == 0, axis=0) / nbDraws # ### McFadden's SARS # # McFadden's smoothed accept-reject simulator (SARS) consists in sampling $\varepsilon\sim P$: $\varepsilon^{1},...,\varepsilon^{N}$, and replacing the max by the smooth-max # # \begin{align*} # \sigma_{N,T,y}\left( U\right) =\sum_{i=1}^{N}\frac{1}{N}\frac{\exp\left((U_{y}+\varepsilon_{y}^{i})/T\right) }{\sum_{z}\exp\left( (U_{z}+\varepsilon_{z}^{i})/T\right)} # \end{align*} # # One seeks $U$ so that the induced choice probabilities are $s$, that is # # \begin{align*} # s_{y}=\sum_{i=1}^{N}\frac{1}{N}\frac{\exp\left( (U_{y}+\varepsilon_{y}^{i})/T\right) }{\sum_{z}\exp\left( (U_{z}+\varepsilon_{z}^{i})/T\right)}. # \end{align*} # # The associated Emax operator is # # \begin{align*} # G_{N,T}\left( U\right) =\mathbb{E}_{\mathbf{P}_{N}}\left[G_{\operatorname{logit}}\left( U+\varepsilon^{i}\right) \right] # \end{align*} # # so the underlying random utility structure is a random coefficient logit. # # The inversion theorem # The following theorem, first shown in Galichon and Salanié (2011), shows that the inversion of discrete choice models is an optimal transport problem. # # --- # **Theorem** [Galichon and Salanie] # # Consider a solution $\left(u\left( \varepsilon\right),v_{y}\right) $ to the dual Monge-Kantorovich problem with cost $\Phi\left(\varepsilon,y\right) =\varepsilon_{y}$, that is: # # # \begin{align*} # \min_{u,v} & \int u\left( \varepsilon\right) d\mathbf{P}\left( # \varepsilon\right) +\sum_{y\in\mathcal{Y}_{0}}v_{y}s_{y}\\ # s.t.~ & u\left( \varepsilon\right) +v_{y}\geq\Phi\left( \varepsilon # ,y\right) # \end{align*} # # Then: # # * $U=\sigma^{-1}\left( s\right) $ is given by $U_{y}=v_{0}-v_{y}$. # # * The value of the [MK dual](#MKDualDC) is $-G^{\ast}\left( s\right) $. # # **Proof** # # $\sigma^{-1}\left( s\right) =\arg\max_{U:U_{0}=0}\left\{ \sum_{y\in\mathcal{Y}}s_{y}U_{y}-G(U)\right\} $, thus, letting $v=-U$, $v$ is the solution to # # \begin{align*} # \min_{v:v_{0}=0}\left\{ \sum_{y\in\mathcal{Y}_{0}}s_{y}v_{y}+G(-v)\right\} # \end{align*} # which is exactly the [MK dual](#MKDualDC). # # ### Inversion of the pure characteristics model # # It follows from the inversion theorem that the problem of demand inversion in the pure characteristics model is a semi-discrete transport problem, a point made in Bonnet, Galichon, O'Hara, and Shum (2017). # # Indeed, the correspondence is: # # * an alternative $y$ is a fountain # # * the characteristics of an alternative is a fountain location # # * the systematic utility associated with alternative $y$ is minus the price of fountain $y$ # # * the market share of altenative $y$ coindides with the capacity of fountain $y$ # # * the random vector $\epsilon$ is the location of an inhabitant # **Example**. As an example, we invert the probit model above. # In[11]: A1 = spr.kron(np.ones((1, nbY)), spr.identity(nbDraws)) A2 = spr.kron(spr.identity(nbY), np.ones((1, nbDraws))) A = spr.vstack([A1, A2]) obj = epsilon_iy.flatten(order='F') # change this rhs = np.ones(nbDraws)/nbDraws rhs = np.append(rhs, s_y) m = grb.Model('optimal') x = m.addMVar(len(obj), name='couple') m.setObjective(obj @ x, grb.GRB.MAXIMIZE) m.addConstr(A @ x == rhs, name="Constr") m.optimize() if m.Status == grb.GRB.Status.OPTIMAL: pi = m.getAttr('pi') Uhat_y = np.subtract(pi[nbY + nbDraws - 1], pi[nbDraws:nbY+nbDraws]) print('U_y (true and recovered)') print(U_y) print(Uhat_y) # ### McFadden's SARS and regularized Optimal Transport # # Cf. Bonnet, Galichon, O'Hara and Shum (2017). Let $u_{i}=T\log\sum_{z}\exp\left((U_{z}+\varepsilon_{z}^{i})/T\right) $. One has # # \begin{align*} # \left\{ # \begin{array} # [c]{l}% # s_{y}=\sum_{i=1}^{N}\frac{1}{N}\exp\left( (U_{y}-u_{i}+\varepsilon_{y}^{i})/T\right) \\ # \frac{1}{N}=\sum_{y}\frac{1}{N}\exp\left( (U_{y}-u_{i}+\varepsilon_{y}^{i})/T\right) # \end{array} # \right. . # \end{align*} # # As a result, $\left( u_{i},U_{y}\right) $ are the solution of the regularized OT problem # # \begin{align*} # \min_{u,U}\sum_{i=1}^{N}\frac{1}{N}u_{i}-\sum s_{y}U_{y}+\sum_{i,y}\frac{1}{N}\exp\left( (U_{y}-u_{i}+\varepsilon_{y}^{i})/T\right) . # \end{align*} # ### BLP's contraction mapping # # Consider the IPFP algorithm for solving the latter problem: # # \begin{align*} # \left\{ # \begin{array} # [c]{l} # \exp\left( u_{i}^{k+1}/T\right) =\sum_{z}\exp\left( (U_{z}^{k}% # +\varepsilon_{z}^{i})/T\right) \\ # \exp U_{y}^{k+1}/T=\frac{Ns_{y}}{\sum_{i=1}^{N}\exp\left( (-u_{i}% # ^{k+1}+\varepsilon_{y}^{i})/T\right) } # \end{array} # \right. # \end{align*} # # This rewrites as # # \begin{align*} # \exp U_{y}^{k+1}/T & =\frac{Ns_{y}}{\sum_{i=1}^{N}\frac{\exp\left( # \varepsilon_{y}^{i}/T\right) }{\sum_{z}\exp\left( (U_{z}^{k}+\varepsilon # _{z}^{i})/T\right) }},\text{ i.e.}\\ # U_{y}^{k+1} & =T\log s_{y}-T\log\sum_{i=1}^{N}\frac{1}{N}\frac{\exp\left( # \varepsilon_{y}^{i}/T\right) }{\sum_{z}\exp\left( (U_{z}^{k}+\varepsilon # _{z}^{i})/T\right) } # \end{align*} # # which is exactly the contraction mapping algorithm of Berry, Levinsohn and Pakes (1995, appendix 1).