Arbitrage Identification Strategies

Assume that there is a market with $N$ currencies. Also assume for simplicity that every possible pair can be traded (not a necessary assumption for the depicted representations). The exchange rates can be displayed by a source-to-target exchange rate matrix. Let us generate and analyze an exemplary such matrix and translate it into a graph representation.

In [1]:

```
# set things up and create button
```

In [2]:

```
# set some parameters, generate exchange rate matrix and the corresponding graph
N = 6 # number of currencies
max_spread_pct = 0.05 # maximum bid-ask spread in pct of bid, 0.05 for 5%
```

The exchange rate matrix takes the current best bids and best asks in the market and generates a matrix of multipliers that shows how many units of the target currency one would receive for a unit of the source currency.

In [3]:

```
# generate and display random exchange rate matrix
```

The above exchange rate matrix can be translated into a directed graph with nodes representing the currencies and the edge weights set to the exchange rate matrix multipliers.

In [4]:

```
# display corresponding directed graph
```

An arbitrage is a cylce for which the **cumulative product** of the edge weights in an exchange rate graph is bigger than one.

There are many well known graph algorithms that deal with cycle lengths - the **cumulative sum** of the edge weights. Fortunately, one can easily translate a cumulative product optimization problem into a convex cumulative sum optimzation problem by taking logs. To see this, note that for $a>0<b$
\begin{align}
ln\left(a\times b \right) = ln\left(a\right) + ln\left(b\right)
\end{align}

while noting that \begin{align} ln\left(x\right) \quad \text{is a positive and monotonic transformation.} \end{align}

If we multiply everything with -1, convex maximization becomes convex minimization... Let us transform the above exchange rate matrix and the corresponding graph by taking negative logs.

In [5]:

```
# display log-transformed exchange rate matrix and corresponding graph
```

Every cycle of **negative length** in the above graph with negative-log-transformed weights is an arbitrage cylce.

One can check if an arbitrage cylce exists and if so, **identify one such a cylce** in the log-transformed graph representation using a slighlty adjusted Bellman-Ford algorithm which has a complexity of $\mathcal{O}(|V||E|)$ where $|V|$ denotes the number of vertices and $|E|$ denotes the number of edges. A neat visualization of how the Bellman-Ford algorithm works can be found here.

The Bellman-Ford algorithm itself only gives the shortest distance from a prespecified source node to all other nodes in the graph. It needs to be adjusted to:

- show if there is a negative cycle
- to output a negative cycle
- to always find a negative cycle (if there is one) even the graph is not connected

In [6]:

```
# The Bellman-Ford function used in this notebook
def bf_negative_cycle(G):
# Remove nan edges
n = len(G.nodes()) + 1
edges = [edge for edge in G.edges().data() if ~np.isnan(edge[2]['weight'])]
# Add a starting node and add edges with zero weight to all other nodes
start_node_edges = [(n-1, i, {'weight': 0}) for i in range(n-1)]
edges = edges + start_node_edges
# Initialize node distances and predecessors
d = np.ones(n) * np.inf
d[n - 1] = 0 # Starting node has zero distance
p = np.ones(n) * -1
# Relax n times
for i in range(n):
x = -1
for e in edges:
if d[int(e[0])] + e[2]['weight'] < d[int(e[1])]:
d[int(e[1])] = d[int(e[0])] + e[2]['weight']
p[int(e[1])] = int(e[0])
x = int(e[1])
if x == -1: # If no relaxation possible, no negative cycle
return None
# Identify negative cycle
for i in range(n):
x = p[int(x)]
cycle = []
v = x
while True:
cycle.append(int(v))
if v == x and len(cycle) > 1:
break
v = p[int(v)]
return list(reversed(cycle))
```

In [7]:

```
# Check if arbitrage cycle exists using Bellman-Ford. If so, output an arbitrage cycle
```

Identifying the cycle with an arbitrary number of vertices with the **largest** arbitrage multiplier is NP-hard in general. Finding this cycle with many tradable pairs thus quickly becomes computationally infeasible and one has to resort to approximate/heuristic algorithms.

Observe that finding the cycle with the largest multiplier (i.e. the cycle with the most negative cycle length in the graph with negative-log-transformed weights) of all cycles that visit each vertex exactly once is equivalent to the (asymmetric) Travelling Salesman Problem - a well known NP-hard problem. The Travelling Salesman Problem (TSP) with $n+1$ cities can be formulated as an integer linear program for a distance matrix $C$ as follows \begin{align} \min &\sum_{i=0}^n \sum_{j\ne i,j=0}^nc_{ij}x_{ij} && \\ & 0 \le x_{ij} \le 1 && i,j=0, \cdots, n \\ & u_{i} \in \mathbf{Z} && i=0, \cdots, n \\ & \sum_{i=0,i\ne j}^n x_{ij} = 1 && j=0, \cdots, n \\ & \sum_{j=0,j\ne i}^n x_{ij} = 1 && i=0, \cdots, n \\ &u_i-u_j +nx_{ij} \le n-1 && 1 \le i \ne j \le n \end{align}

The TSP is only **NP-hard because we are asking for a single roundtrip solution** that covers all cities / currencies (the last two sets of constraints in the above formulation). But **subtours are fine in the arbitrage finding task** and even skipping some assets completely (a self-cylce) is not a problem!

If we remove the "no-subtours" constaints of the TSP and also account for the diagonal of $X$ in the objective function, we arrive at the linear programming relaxation of the assignment problem \begin{align} \min &\sum_{i=0}^n \sum_{j=0}^nc_{ij}x_{ij} && \\ & 0 \le x_{ij} \le 1 && i,j=0, \cdots, n \\ & \sum_{i=0}^n x_{ij} = 1 && j=0, \cdots, n \\ & \sum_{j=0}^n x_{ij} = 1 && i=0, \cdots, n \\ \end{align} This linear program always has an integer solution (due to the totally unimodular linear constraint matrix) and thus solves the assignment problem. You can use a very specialized solver for the assignment problem (this notebook uses this one - extremely fast and exact solutions!) or just solve the linear program above - CVXPY makes this very easy.

In [8]:

```
# Alternative: Solve Assignment Problem with CVXPY
X = Variable((len(C_ln),len(C_ln)))
iota = np.ones((len(C_ln),1))
constraints = [X <= 1,
X >= 0,
X*iota == 1,
np.transpose(iota)*X == 1]
# Form objective.
obj = Minimize(sum(multiply(X, C_ln)))
# Form and solve problem.
prob = Problem(obj, constraints)
prob.solve() # Returns the optimal value.
print ("status:", prob.status)
print ("optimal value", prob.value)
X = pd.DataFrame(X.value)
display(X.round(3))
```

In [9]:

```
# Find best set of arbitrage cycles by solving the assignment problem
```

Assume that we start out with one quantity in each currency. Let us find the set of trades that maximize the quantity in currency $k$ without changing the other quantities. For two matrices $A$ and $B$ of the same dimensions, let $A\circ B$ denote the element-wise product (also denoted Hadamard product)).

Let $C$ denote the exchange rate matrix (not log-transformed) and $Q$ a matrix of the same size as $C$ where the entry $Q_{i,j}$ denotes the number of $i$ coins that flow from currency $i$ to currency $j$. Let $\mathbb{1}$ denote a vector of ones of size $N\times 1$, then vector of outflows from the respective currency inventories is

\begin{align} \text{outflows} = Q \mathbb{1}. \end{align}The vector of inflows (also in currency $i$ units) is

\begin{align} \text{inflows} = (Q \circ C)' \mathbb{1} \end{align}and hence the delta inventory is

\begin{align} \Delta\! I = (Q \circ C)' \mathbb{1} - Q \mathbb{1}. \end{align}Let $I_{init}$ denote the initial inventory the maximizre

\begin{align} \max_{Q} \Delta\! I_k && \\ & \Delta\! I_{\neq k} =0 && \text{| No change in inventory in other assets}\\ & \Delta\! I_k \leq 1000 && \text{| Problem would be unbounded in case of arb-cycle}\\ \\ \end{align}The nice thing about this setup is that it allows to easily incorperate new constraints like the blocksize of the observed quotes. The drawback is that this is numerically extremely challenging!

In [10]:

```
# Maximize inventory
```

In this part, we are looking for live arbitrage opportunities on some centralized exchanges. The main tool we use for this is CCXT - an open source JavaScript / Python / PHP cryptocurrency trading library with support for more than 130 crypto exchanges.

The Binance fee structure can be found here.

In [41]:

```
# set some parameters
```

In [54]:

```
# set things up and create button
```

In [55]:

```
# create real time exchange rate matrix
```

In [56]:

```
# Check if arbitrage cycle exists using Bellman-Ford. If so, output an arbitrage cycle
```

In [57]:

```
# Find best set of arbitrage cycles by solving the assignment problem
```

Aspects to consider:

- Speed, speed, speed!
- Timing issues
- Costs of inventory (opportunity cost, market risk, custody risk)
- Available liqudity
- Precision of quantities
- Error handling
- ...