29-197002 経済学研究科D1
奥村恭平
Below, we derive $$ \Pr(d_i = j) = \frac{\exp(\delta_j)}{\sum_{k \in \mathcal{J}} \exp(\delta_k)} $$
Let $f$ be the pdf of Extreme Value Type I random variable: $f(x) := F'(x) = \mathrm e^{-x} \cdot \exp(-\exp(-x))$.
\begin{align*} \Pr(d_i = j) &= \Pr(\forall k \neq j; \ u_{ij} \geq u_{ik}) \\ &= \Pr(\forall k \neq j; \ \epsilon_{ij} + \delta_j - \delta_k \geq \epsilon_{ik}) \\ &= \prod_{k \neq j} \Pr(\epsilon_{ij} + \delta_j - \delta_k \geq \epsilon_{ik}) \quad (\because \ \text{indep.}) \\ &= \int_{-\infty}^\infty F(\epsilon_{ij} + \delta_j - \delta_k) f(\epsilon_{ij}) \mathrm d \epsilon_{ij} \\ &= \int_{-\infty}^\infty \exp\left( \sum_{k \neq j} - \exp(-\epsilon_{ij} - \delta_j + \delta_k) \right) \mathrm e^{-\epsilon_{ij}} \cdot \exp(-\exp(-\epsilon_{ij})) \mathrm d \epsilon_{ij} \\ &= \int_{-\infty}^\infty \exp\left( \sum_{k} - \exp(-x - \delta_j + \delta_k) \right) \mathrm e^{-x} \mathrm d x \quad (x := \epsilon_{ij})\\ &= \int_{-\infty}^\infty \exp\left( - \mathrm e^{-x} \sum_{k} \exp(\delta_k - \delta_j) \right) \mathrm e^{-x} \mathrm d x \\ &= \int_{0}^\infty \exp\left( -t \sum_{k} \exp(\delta_k - \delta_j) \right) \mathrm d t \quad (t := \mathrm e^{-x}) \\ &= \left[- \left(\sum_{k} \exp(\delta_k - \delta_j)\right)^{-1} \exp\left( -t \sum_{k} \exp(\delta_k - \delta_j) \right) \right]_{0}^{\infty} \\ &= \left(\sum_{k} \exp(\delta_k - \delta_j)\right)^{-1} \\ &= \frac{\exp(\delta_j)}{\sum_k \exp (\delta_k)} \end{align*}The log-likelihood is: $$ \sum_i \sum_j y_{ij} \cdot \log \left( \frac{\exp(X_{ij} \beta)}{1 + \sum_k \exp(X_{ik} \beta)} \right) $$
In order to use scipy.optimize.minimize
, we define a loss function that is the likelihood multiplied by $-1$.
The estimated value is $\beta^* \approx (-1.88673942, 0.09717683, -1.02683967)$.
Below is the code I used:
# import
import pandas as pd
import numpy as np
from scipy.optimize import minimize
# read csv
df = pd.read_csv('../data/DataPS201901.csv', header=None)
data = df.values
# pre-processing
x = [{'hp': 1.0, 'fe': 5.0}, {'hp': 1.2, 'fe': 3.5}, {'hp': 1.4, 'fe': 2.0}]
y = data[:,0]
age = data[:, 1]
gender = data[:, 2]
N = data.shape[0] # the number of agents
J = data.shape[1] # the number of goods
age = age[:, np.newaxis]/100 # rescaling
gender = gender[:, np.newaxis]
X = []
for j in range(data.shape[1]):
temp = np.hstack([np.ones((N,1)), age * x[j]['hp'], gender * x[j]['fe']])
X.append(temp)
# loss function
## use numpy to speed up
def loss(beta):
# exp_Xij_beta(i,j) = exp(X_ij @ beta)
for j in range(J):
if j == 0:
exp_Xij_beta = np.exp(X[j] @ beta)[:, np.newaxis]
else:
exp_Xij_beta = np.hstack([exp_Xij_beta, np.exp(X[j] @ beta)[:, np.newaxis]])
exp_sum = np.sum(exp_Xij_beta, axis=1) + 1
# z_ij は,(i,j)に対応するlogの中身
z0 = np.ones((N,1)) / exp_sum[:, np.newaxis] # outside option
z1 = exp_Xij_beta / exp_sum[:, np.newaxis]
z = np.hstack([z0, z1])
w = np.log(z)
# Y_ij = 1 iff agent i chooses option j
Y = np.zeros((N, J+1))
for i in range(N):
Y[i][y[i]] = 1
return - (Y * w).sum()
res = minimize(loss, x0=[1, 1, 1])
beta = res.x
beta[1] = beta[1] / 100 # rescaling
print('The estimated beta is {}'.format(beta))
The estimated beta is [-1.88673942 0.09717683 -1.02683967]