An intro to SK-learn + Fitting One Model

This is just showing you how sklearn fits ONE model for ONE set of hyperparameters on a generic set of X and y. The idea is to show you the flow of how we work through estimation. Do NOT wholesale copy this code for assignments - it is deliberately missing a bunch of best practices, as we build up your familiarity with developing a ML model. But the steps here are universally present in everything we do.

Five steps to fit a model

Step 1: Import class of model from sklearn

In [1]:
from sklearn.linear_model import Ridge

Step 2: Load data into y and X, and split off test data

In [2]:
# this cell is copied from the L17 lecture file
# EXCEPT: I put the interest rate in its own "y" variable
#         and remove the y variable from the fannie_mae data

import pandas as pd
import numpy as np

url        = 'https://github.com/LeDataSciFi/ledatascifi-2021/blob/main/data/Fannie_Mae_Plus_Data.gzip?raw=true'
fannie_mae = pd.read_csv(url,compression='gzip').dropna()
y          = fannie_mae.Original_Interest_Rate
fannie_mae = (fannie_mae
                  .assign(l_credscore = np.log(fannie_mae['Borrower_Credit_Score_at_Origination']),
                          l_LTV = np.log(fannie_mae['Original_LTV_(OLTV)']),
                         )
              .iloc[:,-11:] # limit to these vars for the sake of this example
             )
In [3]:
from sklearn.model_selection import train_test_split

rng = np.random.RandomState(0) # this helps us control the randomness so we can reproduce results exactly
X_train, X_test, y_train, y_test = train_test_split(fannie_mae, y, random_state=rng)

Step 3: Choose initial model hyperparameters by instantiating this class with desired values

In [4]:
# create ("instantiate") the class, here I set hyper param alpha=1
ridge = Ridge(alpha=1.0) 

Step 4: fit() the model on training data

In [5]:
ridge.fit(X_train,y_train)
Out[5]:
Ridge()

Step 5: Apply the model to new data. Either:

  • <modelname>.predict(X_test) will predict what $y$ should be using $X_test$, and is used in supervised learning tasks
  • <modelname>.transform(X_test) will change $X_test$ using the model, and is common with preprocessing and unsupervised learning
In [6]:
ridge.predict(X_test)
Out[6]:
array([5.95256433, 4.20060942, 3.9205946 , ..., 4.06401663, 5.30024985,
       7.32600213])

The text here is adapted from PDSH