⭐ Scaling Machine Learning in Three Week course¶

- Week 3:¶

Deployment¶

Prerequisite Run notebook week-3.0-data-prep-for-training and week-3.0-evaluate-and-automate-pipelines.ipynb before.

In this excercise, you will use:

deployments in batch setting

This excercise is part of the Scaling Machine Learning with Spark book available on the O'Reilly platform or on Amazon.

In [17]:

import mlflow
import mlflow.spark
from pyspark.sql.types import ArrayType, StringType
from pyspark.sql.functions import col, struct
from pyspark.ml.regression import LinearRegression, LinearRegressionModel
from pyspark.sql import SparkSession 

In [18]:

spark = SparkSession.builder \
    .master('local[*]') \
    .appName("deployment") \
    .getOrCreate()

✅ Task 1 : ### Move model from Model folder to Best Model¶

Now that we have a model that gives us a good results, it's time to move it to the next phase.

In [19]:

model_path =  "../models/linearRegression_model"

In [20]:

restored_mllib_model = LinearRegressionModel.load(model_path)

In [21]:

restored_mllib_model.save("../models/best_model")

✅ Task 2 : use the model for prediction in production¶

imagine there is a deployment to production of the best_model that meanes, that there is a new app that is going to load the model within it and leverage it with Spark. so now, there is a production dataframe.

Write the functionality to load the model, and use it to predict production dataframe in a batch setting.

In [22]:

# your code goes
# ...

How is it different from what you have done so far?

shar your response in the chat!

In [ ]: