Linear Regression

For starters, lets create some data showing almost a linear function between page speed of an e-shop, and the amount clients spend:

In [5]:
%matplotlib inline
import numpy as np
from pylab import *

pageSpeed = np.random.normal(3.0, 1.0, 1000)
moneySpent = 100 - (pageSpeed + np.random.normal(0, 0.1, 1000)) * 3

scatter(pageSpeed, moneySpent)
<matplotlib.collections.PathCollection at 0x7fdc81f104a8>

Because we have only 2 variables (page speed and amount) we can just use scipy.state.linregress:

In [6]:
from scipy import stats

slope, intercept, r_value, p_value, std_err = stats.linregress(pageSpeed, moneySpent)

Not surprisngly, our R-squared value shows a really good fit:

In [7]:
r_value ** 2

Let's now use the slope and intercept we got from the regression, in order plot predicted values vs. observed:

In [9]:
import matplotlib.pyplot as plt

def predict(x):
    return slope * x + intercept

fitLine = predict(pageSpeed)

plt.scatter(pageSpeed, moneySpent)
plt.plot(pageSpeed, fitLine, c='r')
In [ ]: