#!/usr/bin/env python
# coding: utf-8

# Entropic Regularization of Optimal Transport
# ============================================
# 
# *Important:* Please read the [installation page](http://gpeyre.github.io/numerical-tours/installation_matlab/) for details about how to install the toolboxes.
# $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$
# $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$
# $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$
# $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$
# $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$
# $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$
# $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$
# $\newcommand{\norm}[1]{\|#1\|}$
# $\newcommand{\abs}[1]{\left|#1\right|}$
# $\newcommand{\choice}[1]{ \left\{  \begin{array}{l} #1 \end{array} \right. }$
# $\newcommand{\pa}[1]{\left(#1\right)}$
# $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$
# $\newcommand{\qandq}{\quad\text{and}\quad}$
# $\newcommand{\qwhereq}{\quad\text{where}\quad}$
# $\newcommand{\qifq}{ \quad \text{if} \quad }$
# $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$
# $\newcommand{\ZZ}{\mathbb{Z}}$
# $\newcommand{\CC}{\mathbb{C}}$
# $\newcommand{\RR}{\mathbb{R}}$
# $\newcommand{\EE}{\mathbb{E}}$
# $\newcommand{\Zz}{\mathcal{Z}}$
# $\newcommand{\Ww}{\mathcal{W}}$
# $\newcommand{\Vv}{\mathcal{V}}$
# $\newcommand{\Nn}{\mathcal{N}}$
# $\newcommand{\NN}{\mathcal{N}}$
# $\newcommand{\Hh}{\mathcal{H}}$
# $\newcommand{\Bb}{\mathcal{B}}$
# $\newcommand{\Ee}{\mathcal{E}}$
# $\newcommand{\Cc}{\mathcal{C}}$
# $\newcommand{\Gg}{\mathcal{G}}$
# $\newcommand{\Ss}{\mathcal{S}}$
# $\newcommand{\Pp}{\mathcal{P}}$
# $\newcommand{\Ff}{\mathcal{F}}$
# $\newcommand{\Xx}{\mathcal{X}}$
# $\newcommand{\Mm}{\mathcal{M}}$
# $\newcommand{\Ii}{\mathcal{I}}$
# $\newcommand{\Dd}{\mathcal{D}}$
# $\newcommand{\Ll}{\mathcal{L}}$
# $\newcommand{\Tt}{\mathcal{T}}$
# $\newcommand{\si}{\sigma}$
# $\newcommand{\al}{\alpha}$
# $\newcommand{\la}{\lambda}$
# $\newcommand{\ga}{\gamma}$
# $\newcommand{\Ga}{\Gamma}$
# $\newcommand{\La}{\Lambda}$
# $\newcommand{\si}{\sigma}$
# $\newcommand{\Si}{\Sigma}$
# $\newcommand{\be}{\beta}$
# $\newcommand{\de}{\delta}$
# $\newcommand{\De}{\Delta}$
# $\newcommand{\phi}{\varphi}$
# $\newcommand{\th}{\theta}$
# $\newcommand{\om}{\omega}$
# $\newcommand{\Om}{\Omega}$
# $\newcommand{\eqdef}{\equiv}$

# This numerical tours exposes the general methodology of regularizing the
# optimal transport (OT) linear program using entropy. This allows to
# derive fast computation algorithm based on iterative projections
# according to a Kulback-Leiber divergence.
# $$ \DeclareMathOperator{\KL}{KL}
# \newcommand{\KLdiv}[2]{\KL\pa{#1 | #2}}
# \newcommand{\KLproj}{\text{Proj}^{\tiny\KL}}
# \renewcommand{\epsilon}{\varepsilon}
# \def\ones{\mathbb{I}} $$

# In[1]:


from __future__ import division

import numpy as np
import matplotlib.pyplot as plt
import scipy as scp
import pylab as pyl

import warnings
warnings.filterwarnings('ignore')

get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('load_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')


# Entropic Regularization of Optimal Transport
# --------------------------------------------
# We consider two input histograms $a,b \in \Si_n$, where we denote the simplex in $\RR^n$
# $$ \Si_n \eqdef \enscond{ a \in \RR_+^n }{ \sum_i a_i = 1 }.  $$
# We consider the following discrete regularized transport
# $$  W_\epsilon(a,b) \eqdef \umin{P \in U(a,b)} \dotp{C}{P} - \epsilon E(P).  $$
# where the polytope of coupling is defined as
# $$ U(a,b) \eqdef \enscond{P \in (\RR^+)^{n \times m}}{ P \ones_m = a, P^\top \ones_n = b },  $$
# where $\ones_n \eqdef (1,\ldots,1)^\top \in \RR^n $,
# and for $P \in \RR_+^{n \times m}$, we define its entropy as
# $$ E(P) \eqdef -\sum_{i,j} P_{i,j} ( \log(P_{i,j}) - 1). $$
# 
# 
# When $\epsilon=0$ one recovers the classical (discrete) optimal transport.
# We refer to the monograph [Villani](#biblio) for more details about OT.
# The idea of regularizing transport to allows for faster computation is
# introduced in [Cuturi](#biblio).
# 
# 
# Here the matrix $C \in (\RR^+)^{n \times m} $ defines the ground cost, i.e.
# $C_{i,j}$ is the cost of moving mass from a bin indexed by $i$ to a bin indexed by $j$.
# 
# 
# The regularized transportation problem can be re-written as a projection
# $$ W_\epsilon(a,b) = \epsilon \umin{P \in U(a,b)} \KLdiv{P}{K}
# 	\qwhereq
# 	K_{i,j} \eqdef e^{ -\frac{C_{i,j}}{\epsilon} }  $$
# of the Gibbs kernel $K$ according to the Kullback-Leibler divergence.
# The Kullback-Leibler divergence between $P, K \in \RR_+^{n \times m}$ is
# $$ \KLdiv{P}{K} \eqdef \sum_{i,j} P_{i,j} \pa{ \log\pa{ \frac{P_{i,j}}{K_{i,j}} } - 1}. $$
# 
# 
# This interpretation of regularized transport as a KL projection and its numerical
# applications are detailed in [BenamouEtAl](#biblio).
# 
# 
# Given a convex set $\Cc \subset \RR^N$, the projection according to the Kullback-Leiber divergence is defined as
# $$ \KLproj_\Cc(\xi) = \uargmin{ \pi \in \Cc } \KLdiv{\pi}{\xi}. $$

# Iterative Bregman Projection Algorithm
# --------------------------------------
# Given affine constraint sets $ (\Cc_1,\Cc_2) $, we aim at computing
# $$   \KLproj_\Cc(K) \qwhereq \Cc = \Cc_1 \cap \Cc_2 $$
# (this description can of course be extended to more than 2 sets). 
# 
# This can be achieved, starting by $P_0=K$, by iterating $\forall \ell \geq 0$, 
# $$ P_{2\ell+1} =  \KLproj_{\Cc_1}(P_{2\ell}) 
#  \qandq 
#   P_{2\ell+2} =  \KLproj_{\Cc_2}(P_{2\ell+1}). $$
# 
# One can indeed show that $P_\ell \rightarrow \KLproj_\Cc(K)$.
# We refer to [BauschkeLewis](#biblio) for more details about this
# algorithm and its extension to compute the projection on the intersection of
# convex sets (Dikstra algorithm).

# Sinkhorn's Algorithm
# -----------------------------------------------------------------------
# 
# A fundamental remark is that the optimality condition of the entropic regularized problem shows that the optimal coupling $P_\epsilon$ necessarily has the form 
# $$P_\epsilon = \diag{u} K \diag{v}$$
# where the Gibbs kernel is defined as
# $$K \eqdef e^{-\frac{C}{\epsilon}}.$$
# 
# One thus needs to find two positive scaling vectors $u \in \RR_+^n$ and $v \in \RR_+^m$ such that the two following equality holds
# $$P \ones  = u \odot (K v) = a 
# \qandq
# P^\top \ones  = v \odot (K^\top u) = b.$$
# 
# Sinkhorn's algorithm alternate between the resolution of these two equations, and reads
# $$u \longleftarrow \frac{a}{K v} \qandq v \longleftarrow \frac{b}{K^\top u}.$$
# This algorithm was shown to converge to a solution of the entropic regularized problem by [Sinkhorn](#biblio).

# Transport Between Point Clouds
# ------------------------------
# We first test the method for two input measures that are uniform measures
# (i.e. constant histograms) supported on two point clouds
# (that do not necessarily have the same size).
# 
# 
# We thus first load two points clouds $x=(x_i)_{i=1}^{n}, y=(y_i)_{i=1}^{m}, $
# where $x_i, y_i \in \RR^2$.
# 
# 
# Number of points in each cloud, $N=(n,m)$.

# In[2]:


N = [300,200]


# Dimension of the clouds.

# In[3]:


d = 2


# Point cloud $x$, of $n$ points inside a square.

# In[4]:


x = np.random.rand(2,N[0])-.5


# Point cloud $y$, of $m$ points inside an anulus.

# In[5]:


theta = 2*np.pi*np.random.rand(1,N[1])
r = .8 + .2*np.random.rand(1,N[1])
y = np.vstack((np.cos(theta)*r,np.sin(theta)*r))


# Shortcut for displaying point clouds.

# In[6]:


plotp = lambda x,col: plt.scatter(x[0,:], x[1,:], s=200, edgecolors="k", c=col, linewidths=2)


# Display of the two clouds.

# In[7]:


plt.figure(figsize=(10,10))

plotp(x, 'b')
plotp(y, 'r')

plt.axis("off")
plt.xlim(np.min(y[0,:])-.1,np.max(y[0,:])+.1)
plt.ylim(np.min(y[1,:])-.1,np.max(y[1,:])+.1)

plt.show()


# Cost matrix $C_{i,j} = \norm{x_i-y_j}^2$.

# In[8]:


x2 = np.sum(x**2,0)
y2 = np.sum(y**2,0)
C = np.tile(y2,(N[0],1)) + np.tile(x2[:,np.newaxis],(1,N[1])) - 2*np.dot(np.transpose(x),y)


# Target histograms $(a,b)$, here uniform histograms.

# In[9]:


a = np.ones(N[0])/N[0]
b = np.ones(N[1])/N[1]


# Regularization strength $\epsilon>0$.

# In[10]:


epsilon = .01;


# Gibbs Kernel $K$.

# In[11]:


K = np.exp(-C/epsilon)


# Initialization of $v=\ones_{m}$ ($u$ does not need to be
# initialized).

# In[12]:


v = np.ones(N[1])


# One sinkhorn iterations.

# In[13]:


u = a / (np.dot(K,v))
v = b / (np.dot(np.transpose(K),u))


# __Exercise 1__
# 
# Implement Sinkhorn algorithm.
# Display the evolution of the constraints satisfaction errors
# $$ \norm{ P \ones - a }_1 \qandq \norm{ P^\top \ones - b } $$
# (you need to think about how to compute these residuals from $(u,v)$ alone).
# isplay the violation of constraint error in log-plot.

# In[17]:


run -i nt_solutions/optimaltransp_5_entropic/exo1


# Compute the final matrix $P$.

# In[18]:


P = np.dot(np.dot(np.diag(u),K),np.diag(v))


# Display it.

# In[19]:


plt.imshow(P);


# __Exercise 2__
# 
# Display the regularized transport solution for various values of $\epsilon$.
# For a too small value of $\epsilon$, what do you observe ?

# In[21]:


run -i nt_solutions/optimaltransp_5_entropic/exo2


# Compute the obtained optimal $P$.

# In[22]:


P = np.dot(np.dot(np.diag(u),K),np.diag(v))


# Keep only the highest entries of the coupling matrix, and use them to
# draw a map between the two clouds.
# First we draw "strong" connexions, i.e. linkds $(i,j)$ corresponding to
# large values of $P_{i,j}$.
# We then draw weaker connexions.

# In[23]:


plt.figure(figsize=(10,10))

plotp(x, 'b')
plotp(y, 'r')

A = P * (P > np.max(P)*.8)
i,j = np.where(A != 0)
plt.plot([x[0,i],y[0,j]],[x[1,i],y[1,j]],'k',lw = 2)

A = P * (P > np.max(P)*.2)
i,j = np.where(A != 0)
plt.plot([x[0,i],y[0,j]],[x[1,i],y[1,j]],'k:',lw = 1)

plt.axis("off")
plt.xlim(np.min(y[0,:])-.1,np.max(y[0,:])+.1)
plt.ylim(np.min(y[1,:])-.1,np.max(y[1,:])+.1)

plt.show()


# Transport Between Histograms
# ----------------------------
# We now consider a different setup, where the histogram values
# $a,b$ are not uniform, but the measures are defined on a uniform grid
# $x_i=y_i=i/n$. They are thue often refered to as "histograms".
# 
# 
# Size $n$ of the histograms.

# In[24]:


N = 200


# We use here a 1-D square Euclidean metric.

# In[25]:


t = np.arange(0,N)/N


# Define the histogram $a,b$ as translated Gaussians.

# In[26]:


Gaussian = lambda t0,sigma: np.exp(-(t-t0)**2/(2*sigma**2))
normalize = lambda p: p/np.sum(p)

sigma = .06;
a = Gaussian(.25,sigma)
b = Gaussian(.8,sigma)


# Add some minimal mass and normalize.

# In[27]:


vmin = .02;
a = normalize( a+np.max(a)*vmin)
b = normalize( b+np.max(b)*vmin)


# Display the histograms.

# In[28]:


plt.figure(figsize = (10,7))

plt.subplot(2, 1, 1)
plt.bar(t, a, width = 1/len(t), color = "darkblue")
plt.subplot(2, 1, 2)
plt.bar(t, b, width = 1/len(t), color = "darkblue")

plt.show()


# Regularization strength $\ga$.

# In[29]:


epsilon = (.03)**2


# The Gibbs kernel is a Gaussian convolution,
# $$ K_{i,j} \eqdef e^{ -(i/N-j/N)^2/\epsilon }. $$

# In[30]:


[Y,X] = np.meshgrid(t,t)
K = np.exp(-(X-Y)**2/epsilon)


# Initialization of $v=\ones_{N}$.

# In[31]:


v = np.ones(N)


# One sinkhorn iteration.

# In[32]:


u = a / (np.dot(K,v))
v = b / (np.dot(np.transpose(K),u))


# __Exercise 3__
# 
# Implement Sinkhorn algorithm.
# Display the evolution of the constraints satisfaction errors
# $ \norm{ P \ones - a }_1, \norm{ P^\top \ones - b }_1$. You need to think how to compute it from $(u,v)$.
# Display the violation of constraint error in log-plot.

# In[33]:


run -i nt_solutions/optimaltransp_5_entropic/exo3


# Display the coupling. Use a log domain plot to better vizualize it.

# In[35]:


P = np.dot(np.dot(np.diag(u),K),np.diag(v))
plt.figure(figsize=(5,5))
plt.imshow(np.log(P+1e-5))
plt.axis('off');


# One can compute an approximation of the transport plan
# between the two measure by computing the so-called barycentric projection map
# $$ t_i \in [0,1] \longmapsto s_j \eqdef \frac{\sum_{j} P_{i,j} t_j }{ \sum_{j} P_{i,j} }
#    =  \frac{ [u \odot K(v \odot t)]_j }{ a_i }. $$
# where $\odot$ and $\frac{\cdot}{\cdot}$ are the enry-wise multiplication and division.
# 
# 
# This computation can thus be done using only multiplication with the
# kernel $K$.

# In[36]:


s = np.dot(K,v*t)*u/a


# Display the transport map, super-imposed over the coupling.

# In[40]:


plt.figure(figsize=(5,5))
plt.imshow(np.log(P+1e-5))
plt.plot(s*N,t*N, 'r', linewidth=3);
plt.axis('off');


# __Exercise (bonus)__
# 
# Try different regularization strength $\epsilon$.

# Using GPUs
# -----------------------

# We will use here [Pytorch](https://pytorch.org/) to implement Sinkhorn on the GPU. If you are running the code on Google Colab, this means you need to switch on in the preferences the use of a GPU.

# In[15]:


import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print( device )


# Since CUDA uses float number on 32 bits, one needs to use a quite large value for $\epsilon$ to avoid overflow.

# In[16]:


epsilon = (.06)**2
K = np.exp(-(X-Y)**2/epsilon)


# Convert Sinkohrn variables and host them on GPU (if available).

# In[17]:


u = torch.ones(N);
v = torch.ones(N);
K1 = torch.from_numpy(K).type(torch.FloatTensor); 
a1 = torch.from_numpy(a).type(torch.FloatTensor); 
b1 = torch.from_numpy(b).type(torch.FloatTensor); 
# send them to the GPU
K1 = K1.to(device);
u = u.to(device); 
v = v.to(device);
a1 = a1.to(device); 
b1 = b1.to(device);


# When using Pytorch, it is good practice to implement matrix operation as summation and dummy variables.  We show here how to implement one iteration of Sinkhorn this way.

# In[18]:


u = a1 / (K1 * v[None,:]).sum(1)
v = b1 / (K1 * u[:,None]).sum(0)


# __Exercise:__
# 
# Implement the full algorithm.

# In[19]:


v = torch.ones(N)
niter = 2000
Err_p = torch.zeros(niter)
Err_q = torch.zeros(niter)
for i in range(niter):
    # sinkhorn step 1
    u = a1 / (K1 * v[None,:]).sum(1)    
    # error computation
    r = v*(K1 * u[:,None]).sum(0)
    Err_q[i] = torch.norm(r - b1, p=1)
    # sinkhorn step 2
    v = b1 / (K1 * u[:,None]).sum(0)
    s = u*(K1 * v[None,:]).sum(1)
    Err_p[i] = torch.norm(s - a1,p=1)
plt.figure(figsize = (10,7))
plt.subplot(2,1,1)
plt.title("$||P1 -a||_1$")
plt.plot(np.log(np.asarray(Err_p)), linewidth = 2)
plt.subplot(2,1,2)
plt.title("$||P^T 1 -b||_1$")
plt.plot(np.log(np.asarray(Err_q)), linewidth = 2)
plt.show()


# __Exercice__
# To avoid underflow, replace the matrix/vector multiplication in a log-sum-exp style, and use the log-sum-exp stabilization trick.

# Wasserstein Barycenters
# -----------------------
# Instead of computing transport, we now turn to the problem of computing
# barycenter of $R$ input measures $(a_k)_{k=1}^R$. A barycenter $b$ solves
# $$ \umin{b} \sum_{k=1}^R W_\ga(a_k,b)  $$
# where $\la_k$ are positive weights with $\sum_k \la_k=1$. This
# follows the definition of barycenters proposed in
# [AguehCarlier](#biblio).

# Dimension (width of the images) $N$ of the histograms.

# In[41]:


N = 70


# You need to install imageio, for instance using
# > conda install -c conda-forge imageio
# 
# If you need to rescale the image size, you can use
# > skimage.transform.resize

# Load input histograms $(a_k)_{k=1}^R$, store them in a tensor $A$.

# In[42]:


import imageio
rescale = lambda x: (x-x.min())/(x.max()-x.min())
names = ['disk','twodisks','letter-x','letter-z']
vmin = .01
A = np.zeros([N,N,len(names)])
for i in range(len(names)):
    a = imageio.imread("nt_toolbox/data/" + names[i] + ".bmp") # ,N) 
    a = normalize(rescale(a)+vmin)
    A[:,:,i] = a
R = len(names)


# Display the input histograms.

# In[43]:


plt.figure(figsize=(5,5))
for i in range(R):
    plt.subplot(2,2,i+1)
    plt.imshow(A[:,:,i])
    plt.axis('off');


# In this specific case, the kernel $K$ associated with the
# squared Euclidean norm is a convolution with a Gaussian filter
# $$ K_{i,j} = e^{ -\norm{i/N-j/N}^2/\epsilon } $$
# where here $(i,j)$ are 2-D indexes.
# 
# 
# The multiplication against the kernel, i.e. $K(a)$, 
# can now be computed efficiently, using fast convolution methods. 
# This crucial points was exploited and generalized in [SolomonEtAl](#biblio)
# to design fast optimal transport algorithm.

# Regularization strength $\epsilon>0$.

# In[44]:


epsilon = (.04)**2


# Define the $K$ kernel.
# We use here the fact that the convolution is separable to implement it
# using only 1-D convolution, which further speeds up computations.

# In[45]:


t = np.linspace(0,1,N)
[Y,X] = np.meshgrid(t,t)
K1 = np.exp(-(X-Y)**2/epsilon)
K = lambda x: np.dot(np.dot(K1,x),K1)


# Display the application of the $K$ kernel on one of the input histogram.

# In[46]:


plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.imshow(A[:,:,0])
plt.title("$a$")
plt.axis('off');
plt.subplot(1,2,2)
plt.imshow(K(A[:,:,0]))
plt.title("$K(a)$")
plt.axis('off');


# Weights $\la_k$ for isobarycenter.

# In[47]:


lambd = np.ones(R)/R


# It is shown in [BenamouEtAl](#biblio) that the problem of Barycenter computation
# boilds down to
# optimizing over couplings $(P_k)_{k=1}^R$, and that this can be achieved
# using iterative a Sinkhorn-like algorithm, since the optimal coupling has the scaling form
# $$P_k = \diag{u_k} K \diag{v_k}$$
# for some unknown positive weights $(u_k,v_k)$.

# Initialize the scaling factors $(u_k,v_k)_k$, store them in matrices.

# In[48]:


v = np.ones([N,N,R])
u = np.copy(v)


# The first step of the Bregman projection method corresponds to the
# projection on the fixed marginals constraints $P^k \ones = a_k$. This
# is achieved by updating
# $$ \forall k=1,\ldots,R, \quad u_k \longleftarrow \frac{a_k}{ K( v_k ) }. $$

# In[49]:


for k in range(R):
    u[:,:,k] = A[:,:,k]/K(v[:,:,k])


# The second step of the Bregman projection method corresponds to the
# projection on the equal marginals constraints $\forall k, P_k^\top \ones=b$ for a common barycenter target $b$. This
# is achieved by first computing the target barycenter $b$ using a geometric means
# $$ \log(b) \eqdef \sum_k \lambda_k \log( u_{k} \odot K ( v_{k} ) ). $$

# In[50]:


b = np.zeros(N)
for k in range(R):
    b = b + lambd[k] * np.log(np.maximum(1e-19*np.ones(len(v[:,:,k])), v[:,:,k]*K(u[:,:,k])))
b = np.exp(b)


# Display $b$.

# In[51]:


plt.imshow(b);
plt.axis('off');


# And then one can update the scaling by a Sinkhorn step using this newly computed histogram $b$ as follow (note that $K=K^\top$ here):
# $$ \forall k=1,\ldots,R, \quad v_{k} \longleftarrow \frac{b}{ K(u_{k}) }. $$

# In[52]:


for k in range(R):
    v[:,:,k] = b/K(u[:,:,k])


# __Exercise 4__
# 
# Implement the iterative algorithm to compute the iso-barycenter of the measures.
# Plot the decay of the error $\sum_k \norm{P_k \ones - a_k} $.

# In[54]:


run -i nt_solutions/optimaltransp_5_entropic/exo4


# Display the barycenter.

# In[55]:


plt.imshow(b)
plt.axis('off');


# __Exercise 5__
# 
# Compute barycenters for varying weights $\la$ corresponding to
# a bilinear interpolation inside a square.

# In[ ]:


run -i nt_solutions/optimaltransp_5_entropic/exo5


# Bibliography
# ------------
# <html><a name="biblio"></a></html>
# 
# 
# * [Villani] C. Villani, (2009). Optimal transport: old and new, volume 338. Springer Verlag.
# * [Cuturi] M. Cuturi, (2013). Sinkhorn distances: Lightspeed computation of optimal transport. In Burges, C. J. C., Bottou, L., Ghahramani, Z., and Weinberger, K. Q., editors, Proc. NIPS, pages 2292-2300.
# * [AguehCarlier] M. Agueh, and G Carlier, (2011). Barycenters in the Wasserstein space. SIAM J. on Mathematical Analysis, 43(2):904-924.
# * [CuturiDoucet] M. Cuturi and A. Doucet (2014). Fast computation of wasserstein barycenters. In Proc. ICML.
# * [BauschkeLewis] H. H. Bauschke and A. S. Lewis. Dykstra's algorithm with Bregman projections: a convergence proof. Optimization, 48(4):409-427, 2000.
# * [Sinkhorn] R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Statist., 35:876-879, 1964.
# * [SolomonEtAl] J. Solomon, F. de Goes, G. Peyr , M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional Wasserstein distances: Efficient optimal transportation on geometric domains. Transaction on Graphics, Proc. SIGGRAPH, 2015.
# * [BenamouEtAl] J-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, G. Peyr . Iterative Bregman Projections for Regularized Transportation Problems. SIAM Journal on Scientific Computing, 37(2), pp. A1111-A1138, 2015.
<script>
  $(document).ready(function(){
      $('div.prompt').hide();
  });
</script>