Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$ $\newcommand{\eqdef}{\equiv}$
This numerical tours exposes the general methodology of regularizing the optimal transport (OT) linear program using entropy. This allows to derive fast computation algorithm based on iterative projections according to a Kulback-Leiber divergence. $$ \DeclareMathOperator{\KL}{KL} \newcommand{\KLdiv}[2]{\KL\pa{#1 | #2}} \newcommand{\KLproj}{\text{Proj}^{\tiny\KL}} \renewcommand{\epsilon}{\varepsilon} \def\ones{\mathbb{I}} $$
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import scipy as scp
import pylab as pyl
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
%load_ext autoreload
%autoreload 2
We consider two input histograms $a,b \in \Si_n$, where we denote the simplex in $\RR^n$ $$ \Si_n \eqdef \enscond{ a \in \RR_+^n }{ \sum_i a_i = 1 }. $$ We consider the following discrete regularized transport $$ W_\epsilon(a,b) \eqdef \umin{P \in U(a,b)} \dotp{C}{P} - \epsilon E(P). $$ where the polytope of coupling is defined as $$ U(a,b) \eqdef \enscond{P \in (\RR^+)^{n \times m}}{ P \ones_m = a, P^\top \ones_n = b }, $$ where $\ones_n \eqdef (1,\ldots,1)^\top \in \RR^n $, and for $P \in \RR_+^{n \times m}$, we define its entropy as $$ E(P) \eqdef -\sum_{i,j} P_{i,j} ( \log(P_{i,j}) - 1). $$
When $\epsilon=0$ one recovers the classical (discrete) optimal transport. We refer to the monograph Villani for more details about OT. The idea of regularizing transport to allows for faster computation is introduced in Cuturi.
Here the matrix $C \in (\RR^+)^{n \times m} $ defines the ground cost, i.e. $C_{i,j}$ is the cost of moving mass from a bin indexed by $i$ to a bin indexed by $j$.
The regularized transportation problem can be re-written as a projection $$ W_\epsilon(a,b) = \epsilon \umin{P \in U(a,b)} \KLdiv{P}{K} \qwhereq K_{i,j} \eqdef e^{ -\frac{C_{i,j}}{\epsilon} } $$ of the Gibbs kernel $K$ according to the Kullback-Leibler divergence. The Kullback-Leibler divergence between $P, K \in \RR_+^{n \times m}$ is $$ \KLdiv{P}{K} \eqdef \sum_{i,j} P_{i,j} \pa{ \log\pa{ \frac{P_{i,j}}{K_{i,j}} } - 1}. $$
This interpretation of regularized transport as a KL projection and its numerical applications are detailed in BenamouEtAl.
Given a convex set $\Cc \subset \RR^N$, the projection according to the Kullback-Leiber divergence is defined as $$ \KLproj_\Cc(\xi) = \uargmin{ \pi \in \Cc } \KLdiv{\pi}{\xi}. $$
Given affine constraint sets $ (\Cc_1,\Cc_2) $, we aim at computing $$ \KLproj_\Cc(K) \qwhereq \Cc = \Cc_1 \cap \Cc_2 $$ (this description can of course be extended to more than 2 sets).
This can be achieved, starting by $P_0=K$, by iterating $\forall \ell \geq 0$, $$ P_{2\ell+1} = \KLproj_{\Cc_1}(P_{2\ell}) \qandq P_{2\ell+2} = \KLproj_{\Cc_2}(P_{2\ell+1}). $$
One can indeed show that $P_\ell \rightarrow \KLproj_\Cc(K)$. We refer to BauschkeLewis for more details about this algorithm and its extension to compute the projection on the intersection of convex sets (Dikstra algorithm).
A fundamental remark is that the optimality condition of the entropic regularized problem shows that the optimal coupling $P_\epsilon$ necessarily has the form $$P_\epsilon = \diag{u} K \diag{v}$$ where the Gibbs kernel is defined as $$K \eqdef e^{-\frac{C}{\epsilon}}.$$
One thus needs to find two positive scaling vectors $u \in \RR_+^n$ and $v \in \RR_+^m$ such that the two following equality holds $$P \ones = u \odot (K v) = a \qandq P^\top \ones = v \odot (K^\top u) = b.$$
Sinkhorn's algorithm alternate between the resolution of these two equations, and reads $$u \longleftarrow \frac{a}{K v} \qandq v \longleftarrow \frac{b}{K^\top u}.$$ This algorithm was shown to converge to a solution of the entropic regularized problem by Sinkhorn.
We first test the method for two input measures that are uniform measures (i.e. constant histograms) supported on two point clouds (that do not necessarily have the same size).
We thus first load two points clouds $x=(x_i)_{i=1}^{n}, y=(y_i)_{i=1}^{m}, $ where $x_i, y_i \in \RR^2$.
Number of points in each cloud, $N=(n,m)$.
N = [300,200]
Dimension of the clouds.
d = 2
Point cloud $x$, of $n$ points inside a square.
x = np.random.rand(2,N[0])-.5
Point cloud $y$, of $m$ points inside an anulus.
theta = 2*np.pi*np.random.rand(1,N[1])
r = .8 + .2*np.random.rand(1,N[1])
y = np.vstack((np.cos(theta)*r,np.sin(theta)*r))
Shortcut for displaying point clouds.
plotp = lambda x,col: plt.scatter(x[0,:], x[1,:], s=200, edgecolors="k", c=col, linewidths=2)
Display of the two clouds.
plt.figure(figsize=(10,10))
plotp(x, 'b')
plotp(y, 'r')
plt.axis("off")
plt.xlim(np.min(y[0,:])-.1,np.max(y[0,:])+.1)
plt.ylim(np.min(y[1,:])-.1,np.max(y[1,:])+.1)
plt.show()
Cost matrix $C_{i,j} = \norm{x_i-y_j}^2$.
x2 = np.sum(x**2,0)
y2 = np.sum(y**2,0)
C = np.tile(y2,(N[0],1)) + np.tile(x2[:,np.newaxis],(1,N[1])) - 2*np.dot(np.transpose(x),y)
Target histograms $(a,b)$, here uniform histograms.
a = np.ones(N[0])/N[0]
b = np.ones(N[1])/N[1]
Regularization strength $\epsilon>0$.
epsilon = .01;
Gibbs Kernel $K$.
K = np.exp(-C/epsilon)
Initialization of $v=\ones_{m}$ ($u$ does not need to be initialized).
v = np.ones(N[1])
One sinkhorn iterations.
u = a / (np.dot(K,v))
v = b / (np.dot(np.transpose(K),u))
Exercise 1
Implement Sinkhorn algorithm. Display the evolution of the constraints satisfaction errors $$ \norm{ P \ones - a }_1 \qandq \norm{ P^\top \ones - b } $$ (you need to think about how to compute these residuals from $(u,v)$ alone). isplay the violation of constraint error in log-plot.
run -i nt_solutions/optimaltransp_5_entropic/exo1
Compute the final matrix $P$.
P = np.dot(np.dot(np.diag(u),K),np.diag(v))
Display it.
plt.imshow(P);
Exercise 2
Display the regularized transport solution for various values of $\epsilon$. For a too small value of $\epsilon$, what do you observe ?
run -i nt_solutions/optimaltransp_5_entropic/exo2
Compute the obtained optimal $P$.
P = np.dot(np.dot(np.diag(u),K),np.diag(v))
Keep only the highest entries of the coupling matrix, and use them to draw a map between the two clouds. First we draw "strong" connexions, i.e. linkds $(i,j)$ corresponding to large values of $P_{i,j}$. We then draw weaker connexions.
plt.figure(figsize=(10,10))
plotp(x, 'b')
plotp(y, 'r')
A = P * (P > np.max(P)*.8)
i,j = np.where(A != 0)
plt.plot([x[0,i],y[0,j]],[x[1,i],y[1,j]],'k',lw = 2)
A = P * (P > np.max(P)*.2)
i,j = np.where(A != 0)
plt.plot([x[0,i],y[0,j]],[x[1,i],y[1,j]],'k:',lw = 1)
plt.axis("off")
plt.xlim(np.min(y[0,:])-.1,np.max(y[0,:])+.1)
plt.ylim(np.min(y[1,:])-.1,np.max(y[1,:])+.1)
plt.show()