#!/usr/bin/env python
# coding: utf-8

# # Implementing linear regression in Python
# 
# Like many data analysis problems, there are a number of different ways to perform a linear regression in Python. This notebook shows a few different methods. The final method motivates a review of matrix multiplication, which will be helpful in the next section on multivariate regression.
# 
# ## Example: 2007 West Coast Ocean Acidification Cruise 

# In[1]:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy import stats
import statsmodels.formula.api as smf
import pingouin as pg

import PyCO2SYS as pyco2


# ## Load data
# 
# ##### Load 2007 data

# In[2]:


filename07 = 'data/wcoa_cruise_2007/32WC20070511.exc.csv'
df07 = pd.read_csv(filename07,header=29,na_values=-999,parse_dates=[[6,7]])


# In[3]:


df07.keys()


# ## Linear regression: five methods in Python

# Create $x$ and $y$ variables.

# In[4]:


x = df07['PHSPHT']
y = df07['NITRAT']


# Plot data.

# In[5]:


plt.figure()
plt.plot(x,y,'.')
plt.xlabel('phosphate')
plt.ylabel('nitrate')


# Create a subset where both variables have finite values.

# In[6]:


ii = np.isfinite(x+y)


# In[7]:


ii


# ### Method 1: Scipy

# In[8]:


result = stats.linregress(x[ii],y[ii])


# In[9]:


result


# In[10]:


result.slope


# Exercise: Draw the regression line with the data

# In[11]:


plt.figure()
plt.plot(x,y,'k.')
plt.plot(x,result.slope*x+result.intercept,'r-')


# ### Method 2: statsmodels

# Ordinary least squares fit using [statsmodels](https://www.statsmodels.org/).

# In[12]:


smres = smf.ols('NITRAT ~ PHSPHT',df07).fit()


# In[13]:


smres.summary()


# ### Method 3: pingouin

# In[14]:


pg.linear_regression(x[ii],y[ii])


# ### Method 4: Regression coefficients using numpy's polyfit function
# We can also use the `polyfit` function from numpy to calculate the coefficients of the line of best fit (minimizing the sum of square errors): 

# In[15]:


coefficients = np.polyfit(x[ii],y[ii],1)
print(coefficients)


# ### Method 5: Matrix form
# 
# In the next section, we'll consider solving for the coefficients of the linear fit using matrices. But first, let's do a quick review of matrix multiplication:
# 
# #### Review: Matrix Multiplication
# Suppose we have the following matrices:
# 
# $$ \textbf{A}= \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6 \end{bmatrix} $$
# 
# $$ \textbf{B} = \begin{bmatrix} 7 & 8\\ 9 & 10 \\ 11 & 12 \end{bmatrix} $$
# 
# What will be the matrix product $\textbf{AB}$?
# 
# To define matrices in Python, we define 2-d arrays as lists of lists wrapped in numpy's ```array``` function, for example:

# In[16]:


# define matrix A
A = np.array([[1, 2, 3], 
              [4, 5, 6]])

# define matrix B 
B = np.array([[7, 8], 
              [9, 10], 
              [11, 12]])


# We can check the dimensions of these array's using numpy's ```shape``` function:

# In[17]:


print('shape of A: ', np.shape(A))
print('shape of B: ', np.shape(B))


# Finally, we can multiply two arrays using numpy's ```dot``` function:

# In[18]:


np.dot(A,B)


# It is important to remember that matrix multiplication is not *commutative*, meaning $\textbf{AB}$ is generally not the same as $\textbf{BA}$. In this example, $\textbf{BA}$ gives us a different size matrix.

# In[19]:


np.dot(B,A)


# #### Review: Matrix Transpose
# 
# The *transpose* of a matrix $\textbf{A}^T$ has the same values as $\textbf{A}$, but the rows are converted to columns. One way to do this is with the `np.transpose` function.

# In[20]:


print(A)


# In[21]:


print(np.transpose(A))


# Another way is to use the `.T` method on a Numpy array.

# In[22]:


print(A.T)


# Note that the product $\textbf{A}^T\textbf{A}$ is a *square* matrix, which has the same number of rows and columns. 

# In[23]:


np.dot(A.T, A)


# #### Review: Matrix Inverse

# The concept of a matrix inverse is similar to the matrix of a single number. If we have a single value $b$, its inverse can be represented as $b^{-1}$. A value times its inverse is equal to 1.
# 
# $$b^{-1}b = 1$$

# Let's say we have a *square* matrix $\textbf{B}$ where the number of rows and columns are equal. The inverse $\textbf{B}^{-1}$ is the matrix that gives 
# 
# $$ \textbf{B}^{-1}\textbf{B} = \textbf{I} $$
# 
# where the *identity matrix* $\textbf{I}$ is a matrix that has all 0 values, except for 1 values along the diagonal from the upper left to the lower right. For example, a 3x3 identity matrix would be
# 
# $$ \textbf{I} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0  & 1  \end{bmatrix} $$

# This is called an *identity matrix* because $\textbf{B}\textbf{I} = \textbf{I}\textbf{B} = \textbf{B}$ for square matrices. This is analagous to $1b = b$ for single values.
# 
# In a linear algebra class, you might calculate $\textbf{B}^{-1}$ by hand, but in this class we will rely in Numpy to do it for us. Let's set up a $3 \times 3$ $\textbf{B}$ matrix.

# In[24]:


B = np.array([[1, 2, 1],
              [3, 2, 1],
              [1, 2, 2]])
print(B)


# The inverse $\textbf{B}^{-1}$ is

# In[25]:


np.linalg.inv(B)


# The product $\textbf{B}^{-1}\textbf{B}$ can be calculated as

# In[26]:


BinvB = np.dot(np.linalg.inv(B), B)
print(BinvB)


# If we round these values, we can see more clearly that this is nearly identical to the identity matrix $\textbf{I}$, with some very small round-off error.

# In[27]:


print(np.round(BinvB))


# For reference, an identity matrix can be created with the `np.eye` function

# In[28]:


print(np.eye(3))


# #### Formulating linear regression in matrix form
# 
# We can formulate regression in terms of matrices as
# 
# $$\begin{bmatrix} y_1\\ y_2\\ \vdots \\ y_N \end{bmatrix} = \begin{bmatrix} 1 & x_1\\ 1 & x_2\\ \vdots & \vdots\\ 1 & x_N \end{bmatrix} \begin{bmatrix} c_0\\ c_1\end{bmatrix} + \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots \\ \varepsilon_N \end{bmatrix}$$
# 
# or
# 
# $$\vec{y} = \textbf{X}\vec{c} + \vec{\varepsilon}$$
# 
# Here, $\vec{\varepsilon}$ represents the vector of errors, or differences between the model and data values.
# 
# $$ \vec{\varepsilon} = \hat{y} - \vec{y} $$
# 
# To solve for the parameters c using matrix multiplication, we first need to fomulate the $\vec{y}$ and $\textbf{X}$ matrices

# In[29]:


# check to see that the y_subset is only 1-d (and won't work for matrix multiplication)
# print(np.shape(y_subset))

# define an (N, 1 matrix of the y values)
y_matrix = np.reshape(y[ii].ravel(),(len(y[ii]),1))

# print the matrix
print(y_matrix)

# print the shape of the matrix
print('shape of y: ', np.shape(y_matrix))


# In[30]:


# define a matrix X with a column of ones and a column of the x values
X = np.column_stack([np.ones((len(x[ii]),1)),
                     np.ravel(x[ii])])

# print the X matrix
print(X)

# print the shape of the X matrix
print(np.shape(X))


# Now that we have our matrices set up, let's return to our model equation in matrix form.
# 
# $$\hat{y} = \textbf{X}\vec{c}$$
# 
# We can multiply each side of the equation by the transpose
# 
# $$\textbf{X}^T\hat{y} = \textbf{X}^T\textbf{X}\vec{c}$$
# 
# then multiply each side by the inverse of $\textbf{X}^T\textbf{X}$
# 
# $$(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\hat{y} = (\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{X}\vec{c}$$
# 
# which reduces to
# 
# $$(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\hat{y} = \textbf{I}\vec{c}.$$
# 
# $$(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\hat{y} = \vec{c}$$
# 
# giving us an expression for the vector of coefficents $\vec{c}$.

# Here $\hat{y}$ represents the vector of *model* values. With the vector of *data* values $\vec{y}$ we can solve for the coefficients that minimize the sum of square errors using the same equation:
# 
# $$\vec{c} = (\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\vec{y}$$
# 
# Using numpy, we can define the matrix components of the above equation and then run the calculation to find the coefficients:

# In[31]:


# calculate the transpose of the matrix X
XT = np.transpose(X)

# calculate the product of XT and X
XTX = np.dot(XT,X)

# calculate the inverse of XTX
XTX_inverse = np.linalg.inv(XTX)

# then calculate the products of XTX, XT and using two dot products
c = np.dot(XTX_inverse,np.dot(XT,y_matrix))

# print the coefficients
print(c)


# As a sanity check, we can double check that the coefficients are the same as those from numpy's ```polyfit``` function

# In[32]:


coefficients = np.polyfit(x[ii],y[ii],1)
print(coefficients)