#!/usr/bin/env python
# coding: utf-8

# # Matrix inversion or solve
# > Short note on matrix inversion and why to prefer solve. Uses Hilbert matrices!

# The short advice is always to solve for $x$ in linear equations $Ax=y$ with (for example) LU-decomposition, and don't do $A^{-1}y$. By why _exactly_?
# 
# This post came to be after reading [Why Shouldn't I Invert That Matrix](http://gregorygundersen.com/blog/2020/12/09/matrix-inversion/) by Gregory Gundersen. He does an exellent job on describing matrix inversion, LU-decomposition and (ill) conditioned matrices. I'll repeat some terms here.
# 
# The argument is that solving is _faster_ and _more precise_ than inversion:
# 
# * LU decomposition requires less floating point operations than (plain) matrix inversion.
# * Floating point operations are less precise, so more lead to larger errors.
# 
# 
# In this post I use matrices related to polynomial interpolation, including Hilbert matrices. Hilbert matrices are _ill-conditioned_, so we expect that inversion will be bad.
# 
# 
# ## Precision and floats
# I'll not go into the computational complexity, read the above post for a nice explanation. We are not solving symbolically but with 32 (or 64) bit floats. For example $0.1$ cannot be represented as a float: [see this article](https://www.exploringbinary.com/why-0-point-1-does-not-exist-in-floating-point/). The more operations we execute the more potential there is for errors. 
# 
# In mathematics both methods are precise if we solve symbolically. In mathematics there is the _condition number_ of matrices, a higher condition number means that small changes in $A$ lead to large changes in $x$. This seems exactly the intuition that we need to explore under what conditions we can expect more errors, and the difference between two methods is more pronounced.

# In[67]:


#hide
import numpy as np
import matplotlib.pyplot as plt
from numpy.random import default_rng
from sklearn.datasets import load_diabetes
import math
from numpy.random import default_rng


# ## Solve $Ax=y$
# We can solve $Ax=y$ with the LU-decomposition $A=LU$ and then solve $Lz=x$ and then $Ux = z$. Since $L$ and $U$ are triangular matrices this is straightforward.
# With triangular matrices you can start with an equation with one variable, solve it and continue.
# There'll always be an equation on a single variable (that you didn't solve yet).
# 
# Finding the decomposition is the tricky part. The NumPy function `solve` does this with the method from LAPACK. We create an example from polynomial interpolation of the function $f(x)=x^2$ on the points $\{-\frac{1}{2},0,\frac{1}{2}\}$:

# In[2]:


f = lambda x: x**2
x = np.array([-.5, 0., .5])
A = np.vander(x, increasing=True)


# For a small number of points this gives the same results:

# In[3]:


np.linalg.solve(A, f(x)) -  np.linalg.inv(A) @ f(x)


# But for larger matrices it starts to differ:

# In[4]:


x = np.linspace(-1, 1, 40)
A = np.vander(x, increasing=True)


# In[5]:


np.linalg.solve(A, f(x)) - np.linalg.inv(A) @ f(x)


# Also the speeds are different:

# In[6]:


get_ipython().run_line_magic('timeit', 'np.linalg.inv(A) @ f(x)')


# In[7]:


get_ipython().run_line_magic('timeit', 'np.linalg.solve(A, f(x))')


# ## Hilbert
# The difference between the two method is due to the number of floating point operations and the amount of error introduced. We will show this with the Hilbert matrix as an example. The Hilbert matrix is _ill conditioned_, small changes in the matrix will lead to large changes in the computed output. 
# 
# Note that the Vandermonde matrix above is almost the Hilbert matrix.

# Define the Hilbert matrix

# In[36]:


def Hilbert(n):
    return np.array([ [(1/(i+j-1)) for j in range (1, n+1)] for i in range (1, n+1)])
Hilbert(5)


# The inverse of the Hilbert matrix is known in closed form
# (see https://en.wikipedia.org/wiki/Hilbert_matrix):

# In[37]:


def Hilbert_inv(n):
    return np.array([ [
        (-1)**(i+j) * (i+j-1) * math.comb(n+i-1, n-j) * math.comb(n+j-1, n-i) * math.comb(i+j-2, i-1)**2
        for j in range (1, n+1)] for i in range (1, n+1)])
Hilbert_inv(5)


# The inverse of the inverse should be the same as the original:

# In[38]:


np.all(np.isclose(Hilbert(10), np.linalg.inv(Hilbert_inv(10))))


# ## Error inversion vs solve
# Define the error as the mean squared difference between solving through taking the inverse and solving throuh LU-decomposition:

# In[97]:


rng = default_rng()

def error(n):
    # Create the matrix A and solution y
    A = Hilbert(n)
    x = rng.uniform(size=(n,1))
    y = A @ x
    
    # Solve Ax=y
    x_inverse = np.linalg.inv(A)@y
    x_solve = np.linalg.solve(A, y)
    e_inverse = np.mean((x - x_inverse)**2)
    e_solve = np.mean((x - x_solve)**2)
    return np.stack((e_inverse, e_solve))


# You can see that the difference is quite pronounced for large $n$ (chart is in log-scale)

# In[104]:


fig, ax =  plt.subplots()
xs = np.arange(10, 200, 10)
ys = [error(x) for x in xs]

series = plt.plot(xs, ys)
plt.legend(iter(series), ('Inverse', 'Solve'))
ax.set_yscale("log", nonpositive='clip')


# ## Error
# Additionally we can look at the quality of the inverse and you can see that the difference increases sharply for larger matrices (chart is in log-scale):

# In[58]:


def error_inv(n):
    return np.mean((Hilbert_inv(n) - np.linalg.inv(Hilbert(n)))**2)

fig, ax =  plt.subplots()
xs = np.arange(10, 50, 10)
ys = [error_inv(x) for x in xs]

plt.plot(xs, ys)
ax.set_yscale("log", nonpositive='clip')


# In[ ]: