In [1]:

```
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
```

Today we'll use the "scikit-learn" (aka `sklearn`

) implementation of these models. "scikit-learn" is a machine learning-focused library that has a ton of helper functions for data scientists to quickly and efficiently try a lot of different models.

Since this notebook is mainly to show you how to implement these models in python, you should watch this Statquest playlist to get the intuition behind these models:

`sklearn`

workflow¶The most basic `sklearn`

model workflow goes something like this:

Step | Action | Example |
---|---|---|

0 | Import the model from `sklearn` |
`from sklearn.linear_model import LinearRegression` |

1 | Instantiate the model class | `model = LinearRegression()` |

2 | Fit the model | `model.fit(X, y)` |

3 | Use the fitted model to make a prediction | `model.predict(X)` |

This standard interface is what makes `sklearn`

so powerful - with `.fit()`

and `.predict()`

and a huge variety of models in the library, you can use this common language to try all sorts of machine learning models.

And that's just the start, from this basic 3-step pipeline you can add all kinds of additional steps to this worfklow that make experimentation very convenient for the user.

`sklearn`

¶Before we cover Lasso, Ridge, and ElasticNet in `sklearn`

it might be helpful to demonstrate some basic functionality with doing a linear regression.

**Reference**: Linear Regression

In [2]:

```
# Import the model
from sklearn.linear_model import LinearRegression
```

Next we'll load in our data. We'll be using `seaborn`

, a visualization library that has some 'toy' datasets to play with and we'll also use it briefly for some specialized plots.

The `mpg`

dataset is a famous dataset that has some car engine attributes like engine displacement (engine size) and others that can be used to estimate the car's fuel efficiency, measured in miles per gallon or `mpg`

.

We can load it in using the `sns.load_dataset()`

function:

In [3]:

```
# load mpg example dataset
mpg = sns.load_dataset("mpg")
# we'll filter on just a few columns and drop na/empty values
mpg = mpg[['mpg', 'displacement', 'horsepower', 'weight']].dropna()
X = mpg[['displacement', 'horsepower', 'weight']]
y = mpg['mpg']
```

`sns.pairplot`

function that will plot scatterplots for each of the variables against each other and also display the distribution for each variable as a histogram:

In [4]:

```
# Convenience method to plot pairs of variables together all in one go!
sns.pairplot(mpg)
```

Out[4]:

<seaborn.axisgrid.PairGrid at 0x7ff5f0b498e0>

Next, we'll follow the `sklearn`

workflow.

Let's model the relationship between average MPG and engine displacement and get a quick $R^2$ value:

In [5]:

```
# Filter on just the 'displacement' variable in the X dataframe
x = X[["displacement"]]
model = LinearRegression()
model.fit(x, y)
# model.score() returns the model R^2 in this case
R2 = model.score(x, y)
print("Model R^2: ", R2)
```

Model R^2: 0.6482294003193044

In [6]:

```
# Show the fitted values
yhat = model.predict(x)
plt.scatter(x, y, label="Engine Displacement")
plt.plot(x, model.predict(x), label="Fitted model", c="r")
plt.title("MPG ~ Displacement")
plt.legend()
plt.show()
```

With Linear Regression and similar models, we can extract the fitted coefficients and the intercept with two commands:

`model.coef_`

will return the coefficients`model.intercept_`

will return the intercept

In [7]:

```
print("model coefficient(s):", model.coef_)
print("model intercept:", model.intercept_)
```

model coefficient(s): [-0.06005143] model intercept: 35.12063593840391

*manually* in `sklearn`

.

But all that aside, in math terms, we've just fit the model:

$$ \hat{Y} = -0.06x + 35.12 $$where $x$ is the engine displacement and $\hat{Y}$ is the estimated average value of mpg. In other words, for every unit increase in engine displacement, average mpg is expected to decrease by -0.06.

In [8]:

```
from sklearn.linear_model import Lasso, Ridge, ElasticNet
# For this exercise we'll split the data into training and test sets
from sklearn.model_selection import train_test_split
# And we'll need to scale the data
from sklearn.preprocessing import StandardScaler
```

`sklearn`

Reference:

With regularization models **you must scale your data**!

Since we're dealing with penalties, we need to standardize the scale of data so the regularization penalty doesn't over-penalize numbers that are orders of magnitude higher than others.

The `StandardScaler()`

class in `sklearn`

helps us with this, and using it is simple:

- Instantiate the scaler class with
`StandardScaler()`

`fit`

the scaler with the`.fit()`

method on the data- Finally,
`transform`

the data with`.transform()`

- Alternatively, you can combine both the
`.fit`

and`.transform`

steps with a single`.fit_transform()`

call

- Alternatively, you can combine both the

And this particular scaler simply takes the X data and standardizes them into Z-scores by subtracting the mean and dividing over the standard deviation, i.e.:

$$ \text{scaled horsepower} = \frac{\text{horsepower} - \text{average horsepower}}{\text{horsepower standard deviation}} $$In [9]:

```
# Here's the data again for clarity
X = mpg[['displacement', 'horsepower', 'weight']]
y = mpg['mpg']
# Instantiate the Scaler class
scaler = StandardScaler()
# Fit and then transform the X-data
scaled_X = scaler.fit_transform(X)
# The transformed data is now an array, we want to convert it back
# to a dataframe for easier parsing (optional)
scaled_X = pd.DataFrame(scaled_X, columns=['displacement', 'horsepower', 'weight'])
```

**note**: This transformation is a little messy because I'm scaling only the X values (i.e. the engine displacement, horsepower, and weight) and leaving the y unscaled to help a little with the interpretation piece. This code is normally much simpler when you are scaling the y values as well.

In [10]:

```
# We'll take our scaled X values and combine it with the unscaled y
df = scaled_X.copy()
df['mpg'] = y
df = df.dropna()
```

`train_test_split()`

¶The next step we will perform is to divide the data into two parts: a training set and a test set.

- Here's a Statquest video on why we do this: Machine Learning Fundamentals: Bias and Variance

In [11]:

```
# split the new dataframe into train/test
mpg_train, mpg_test = train_test_split(df, test_size=0.20, random_state=42)
X_train = mpg_train[['displacement', 'horsepower', 'weight']]
X_test = mpg_test[['displacement', 'horsepower', 'weight']]
y_train = mpg_train['mpg']
y_test = mpg_test['mpg']
```

Statquest video: Regularization Part 2: Lasso (L1) Regression

First model we'll demonstrate is a single call of Lasso and manually set the value of the alpha parameter.

[Lasso](Regularization Part 2: Lasso (L1) Regression) penalizes the less important features by reducing their coefficients (possibly to zero), unlike Ridge Regression which merely shrinks the coefficients for the less important features but cannot set them to zero entirely.

In [12]:

```
model = Lasso(alpha=0.01, random_state=42)
model.fit(X_train, y_train)
R_2 = model.score(X_test, y_test)
# print the score rounded to two digits
print("R^2 Value: ", np.round(R_2,2))
```

R^2 Value: 0.27

In [13]:

```
print("intercept: ", model.intercept_)
pd.DataFrame({"feature": model.feature_names_in_,
"coefficients":model.coef_})
```

intercept: 23.348832405776655

Out[13]:

feature | coefficients | |
---|---|---|

0 | displacement | -2.317209 |

1 | horsepower | -1.852412 |

2 | weight | -0.818235 |

So, in math terms, here's what the above model is saying:

$$ \text{expected mpg} = -2.32(\text{displacement}) - 1.85(\text{horsepower}) - 0.82(\text{weight}) + 23.35 $$So for every unit increase in `displacement`

we expect to see a **decrease** in `mpg`

by 2.32 miles per gallon. That's a lot! But note here that we *scaled* the displacement, horsepower, and weight numbers. If we want to convert these coefficients back to the original scale, we'll need to do a little math. I'll skip it for now but just know that it's possible in this situation.

But this is a post on **feature importance**, so how would you do this with Lasso, Ridge, or ElasticNet?

So we notice that with the `alpha`

parameter at a low value of 0.01, Lasso didn't send any of these variables to zero. If we were to increase the `alpha`

parameter, we would see that some of these coefficients would go to zero.

So one way of interpreting **feature importance** with Lasso or Ridge is to try different values of `alpha`

and see what happens to the coefficients.

The code below shows an example:

In [14]:

```
coefs = []
alphas = np.arange(0.01,10, 0.01)
for a in alphas:
model = Lasso(alpha=a, random_state=42)
model.fit(X_train, y_train)
coefs.append(model.coef_)
```

Then we'll do a plot of the coeffient values vs `alpha`

.

In [15]:

```
ax = plt.gca()
ax.plot(alphas, coefs, label=['displacement', 'horsepower', 'weight'])
ax.set_xscale("log")
ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis
plt.xlabel("log(alpha)")
plt.ylabel("coefficients")
plt.title("Lasso coefficients as a function of regularization strength")
plt.axis("tight")
plt.legend()
plt.show()
```

So you might notice a few funky things about this plot. I reversed the x-axis so that alpha decreases from left to right, so regularization strength is *decreasing* from left to right. And the coefficients are *negative* before going to zero with higher values of alpha. A little confusing but it's a convention.

But notice how we have an order in which the lines go to zero. In the above example, `weight`

is the first value to go to zero as alpha increases, followed by `horsepower`

, and then `displacement`

. This gives you a *relative* idea of the feature importance which might be helpful in your understanding of the problem.

For another example with Ridge Regression, check this out

**Note:** Another term for this kind of plot is a 'path' plot, and `sklearn`

has some helper functions, see example

Now, another twist on the problem is if your goal isn't learning feature importance and your goal is to do prediction and let the models do the shrinkage for you, then you can repeatedly try a lot of different values of `alpha`

and find the value that gives you the least error on the test set. In the below example we'll use root mean squared error (RMSE) as our error metric that we're optimizing for:

In [16]:

```
yhat = model.predict(X_test)
RMSE = (np.mean((yhat - y_test)**2))**0.5
print(RMSE)
```

7.019453139121642

Then we iterate over several values of alpha:

In [17]:

```
errors = []
alphas = np.arange(0.01,10, 0.01)
for a in alphas:
model = Lasso(alpha=a, random_state=42)
model.fit(X_train, y_train)
yhat = model.predict(X_test)
RMSE = (np.mean((yhat - y_test)**2))**0.5
errors.append(RMSE)
```

In [18]:

```
ax = plt.gca()
plt.scatter(alphas, errors)
ax.set_xscale("log")
plt.xlabel("log(alpha)")
plt.ylabel("RMSE")
plt.title("Lasso Test RMSE at various values of alpha")
plt.show()
```

In [19]:

```
best_alpha = alphas[np.argmin(errors)]
print("best_alpha = ", best_alpha)
```

best_alpha = 1.07

We can then plug that best alpha back in to our model:

In [20]:

```
model = Lasso(alpha=best_alpha, random_state=42)
model.fit(X_train, y_train)
R_2 = model.score(X_test, y_test)
# print the score rounded to two digits
print("Test R^2 Value: ", np.round(R_2,2))
```

Test R^2 Value: 0.3

So, we get a minor improvement!

In [21]:

```
pd.DataFrame({"feature": model.feature_names_in_,
"coefficients":model.coef_})
```

Out[21]:

feature | coefficients | |
---|---|---|

0 | displacement | -2.084907 |

1 | horsepower | -1.353816 |

2 | weight | -0.377263 |

And notice here that since we were optimizing for RMSE on the test set, we found a value of `alpha`

that shrunk the coefficients but didn't set any of them to zero. If we had increased `alpha`

more we would eventually see `weight`

fall off.

So since we are dealing with multivariable regression we can't make a 4-D plot to show the 4-D hyperplane that this model fit. But we can kind of get a sense by plotting just a single predictor variable (i.e. 'horsepower') against the value of `mpg`

to get some intuition:

In [22]:

```
y_fitted = model.predict(X_train)
plt.scatter(X_train[["horsepower"]], y_train, label="Training Data")
plt.scatter(X_train[["horsepower"]], y_fitted, label="Fitted Values", c="r")
plt.title("Lasso ")
plt.xlabel("scaled horsepower")
plt.ylabel("mpg")
plt.legend()
plt.show()
```

But, can we do better?

`alpha`

?¶In practice you could iteratively try many many different kinds of alpha until you arrived at the best answer - and you could do this manually (as we just did above) but `sklearn`

has helper functions that already do this for you.

Here we'll use 5-fold cross validation to find the ideal value of alpha from the training set, and then use that fitted model on the test set.

`sklearn`

has `LassoCV`

, `RidgeCV`

, and `ElasticNetCV`

implementations for our convenience. Here we'll repeat the analysis from above as a demo with these three.

In [23]:

```
from sklearn.linear_model import LassoCV, RidgeCV, ElasticNetCV
```

In [24]:

```
model = LassoCV(cv=5, random_state=0).fit(X_train, y_train)
print(model.score(X_test,y_test))
y_fitted = model.predict(X_train)
plt.scatter(X_train[["horsepower"]], y_train, label="X")
plt.scatter(X_train[["horsepower"]], y_fitted, label="Fitted", c="r")
plt.legend()
plt.show()
```

0.26601042348436355

In [25]:

```
model.alpha_
```

Out[25]:

0.004619906312005148

In [26]:

```
pd.DataFrame({"feature": model.feature_names_in_,
"coefficients":model.coef_})
```

Out[26]:

feature | coefficients | |
---|---|---|

0 | displacement | -2.318341 |

1 | horsepower | -1.854957 |

2 | weight | -0.820504 |

In [27]: