Prerequisite
Run notebook week-3.0-data-prep-for-training
and week-3.0-evaluate-and-automate-pipelines.ipynb
before.
In this excercise, you will use:
This excercise is part of the Scaling Machine Learning with Spark book available on the O'Reilly platform or on Amazon.
import mlflow
import mlflow.spark
from pyspark.sql.types import ArrayType, StringType
from pyspark.sql.functions import col, struct
from pyspark.ml.regression import LinearRegression, LinearRegressionModel
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master('local[*]') \
.appName("deployment") \
.getOrCreate()
Now that we have a model that gives us a good results, it's time to move it to the next phase.
model_path = "../models/linearRegression_model"
restored_mllib_model = LinearRegressionModel.load(model_path)
restored_mllib_model.save("../models/best_model")
imagine there is a deployment to production of the best_model that meanes, that there is a new app that is going to load the model within it and leverage it with Spark. so now, there is a production dataframe.
Write the functionality to load the model, and use it to predict production dataframe in a batch setting.
# your code goes
# ...
How is it different from what you have done so far?
shar your response in the chat!