Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
In many real life scenarios, trained machine learning models need to be deployed to production. As we saw in the [prior](21_deployment_on_azure_container_instances.ipynb) deployment notebook, this can be done by deploying on Azure Container Instances. In this tutorial, we will get familiar with another way of implementing a model into a production environment, this time using [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads) (AKS).
AKS manages hosted Kubernetes environments. It makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters.
At the end of this tutorial, we will have learned how to:
This notebook relies on resources we created in [21_deployment_on_azure_container_instances.ipynb](21_deployment_on_azure_container_instances.ipynb):
If we are missing any of these, we should go back and run the steps from the sections "Pre-requisites" to "3.D Environment setup" to generate them.
Now that our prior resources are available, let's first import a few libraries we will need for the deployment on AKS.
# For automatic reloading of modified libraries
%reload_ext autoreload
%autoreload 2
import sys
sys.path.extend(["..", "../.."]) # to access the utils_cv library
# Azure
from azureml.core import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import AksWebservice, Webservice
Let's now load the workspace we used in the [prior notebook](21_deployment_on_azure_container_instances.ipynb).
Note: The Docker image we will use below is attached to that workspace. It is then important to use the same workspace here. If, for any reason, we needed to use another workspace instead, we would need to reproduce, here, the steps followed to create a Docker image containing our image classifier model in the prior notebook.
To create or access an Azure ML Workspace, you will need the following information. If you are coming from previous notebook you can retreive existing workspace, or create a new one if you are just starting with this notebook.
subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "YOUR_RESOURCE_GROUP_NAME"
workspace_name = "YOUR_WORKSPACE_NAME"
workspace_region = "YOUR_WORKSPACE_REGION" #Possible values eastus, eastus2 and so on.
In prior notebook notebook, we created a workspace. This is a critical object from which we will build all the pieces we need to deploy our model as a web service. Let's start by retrieving it.
# A util method that creates a workspace or retrieves one if it exists, also takes care of Azure Authentication
from utils_cv.common.azureml import get_or_create_workspace
ws = get_or_create_workspace(
subscription_id,
resource_group,
workspace_name,
workspace_region)
# Print the workspace attributes
print('Workspace name: ' + ws.name,
'Workspace region: ' + ws.location,
'Subscription id: ' + ws.subscription_id,
'Resource group: ' + ws.resource_group, sep = '\n')
WARNING - Warning: Falling back to use azure cli login credentials. If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication. Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.
Workspace name: amlnotebookws Workspace region: eastus Resource group: amlnotebookrg
print("Docker images:")
for docker_im in ws.images:
print(f" --> Name: {ws.images[docker_im].name}\n \
--> ID: {ws.images[docker_im].id}\n \
--> Tags: {ws.images[docker_im].tags}\n \
--> Creation time: {ws.images[docker_im].created_time}\n"
)
Docker images: --> Name: image-classif-resnet18-f48 --> ID: image-classif-resnet18-f48:2 --> Tags: {'training set': 'ImageNet', 'architecture': 'CNN ResNet18', 'type': 'Pretrained'} --> Creation time: 2019-07-18 17:51:26.927240+00:00
As we did not delete it in the prior notebook, our Docker image is still present in our workspace. Let's retrieve it.
docker_image = ws.images["image-classif-resnet18-f48"]
We can also check that the model it contains is the one we registered and used during our deployment on ACI. In our case, the Docker image contains only 1 model, so taking the 0th element of the docker_image.models
list returns our model.
Note: We will not use the registered_model
object anywhere here. We are running the next 2 cells just for verification purposes.
registered_model = docker_image.models[0]
print(f"Existing model:\n --> Name: {registered_model.name}\n \
--> Version: {registered_model.version}\n --> ID: {registered_model.id} \n \
--> Creation time: {registered_model.created_time}\n \
--> URL: {registered_model.url}"
)
Existing model: --> Name: im_classif_resnet18 --> Version: 8 --> ID: im_classif_resnet18:8 --> Creation time: 2019-07-18 17:51:17.521804+00:00 --> URL: aml://asset/5c63dec5ea424557838d109d3294b611
In the case of deployment on AKS, in addition to the Docker image, we need to define computational resources. This is typically a cluster of CPUs or a cluster of GPUs. If we already have a Kubernetes-managed cluster in our workspace, we can use it, otherwise, we can create a new one.
Note: The name we give to our compute target must be between 2 and 16 characters long.
Let's first check what types of compute resources we have, if any
print("List of compute resources associated with our workspace:")
for cp in ws.compute_targets:
print(f" --> {cp}: {ws.compute_targets[cp]}")
List of compute resources associated with our workspace:
In the case where we have no compute resource available, we can create a new one. For this, we can choose between a CPU-based or a GPU-based cluster of virtual machines. The latter is typically better suited for web services with high traffic (i.e. > 100 requests per second) and high GPU utilization. There is a wide variety of machine types that can be used. In the present example, however, we will not need the fastest machines that exist nor the most memory optimized ones. We will use typical default machines:
Notes:
agent nodes
), we require, multiplied by the number of vCPUs on each machine must be greater than or equal to 12 vCPUs. This is indeed the minimum needed for such cluster. By default, a pool of 3 virtual machines gets provisioned on a new AKS cluster to allow for redundancy. So, if the type of virtual machine we choose has a number of vCPUs (vm_size
) smaller than 4, we need to increase the number of machines (agent_count
) such that agent_count x vm_size
≥ 12
virtual CPUs. agent_count
and vm_size
are both parameters we can pass to the provisioning_configuration()
method below.Usage + quotas
section. If we need more machines than are currently available, we can request a quota increase.Here, we will use a cluster of CPUs. The creation of such resource typically takes several minutes to complete.
# Declare the name of the cluster
virtual_machine_type = 'cpu'
aks_name = f'imgclass-aks-{virtual_machine_type}'
if aks_name not in ws.compute_targets:
# Define the type of virtual machines to use
if virtual_machine_type == 'gpu':
vm_size_name ="Standard_NC6"
else:
vm_size_name = "Standard_D3_v2"
# Configure the cluster using the default configuration (i.e. with 3 virtual machines)
prov_config = AksCompute.provisioning_configuration(vm_size = vm_size_name, agent_count=3)
# Create the cluster
aks_target = ComputeTarget.create(workspace = ws,
name = aks_name,
provisioning_configuration = prov_config)
aks_target.wait_for_completion(show_output = True)
print(f"We created the {aks_target.name} AKS compute target")
else:
# Retrieve the already existing cluster
aks_target = ws.compute_targets[aks_name]
print(f"We retrieved the {aks_target.name} AKS compute target")
Creating.................................................................................................................................................................. SucceededProvisioning operation finished, operation "Succeeded" We created the imgclass-aks-cpu AKS compute target
If we need a more customized AKS cluster, we can provide more parameters to the provisoning_configuration()
method, the full list of which is available here.
When the cluster deploys successfully, we typically see the following:
Creating ...
SucceededProvisioning operation finished, operation "Succeeded"
In the case when our cluster already exists, we get the following message:
We retrieved the <aks_cluster_name> AKS compute target
This compute target can be seen on the Azure portal, under the Compute
tab.
# Check provisioning status
print(f"The AKS compute target provisioning {aks_target.provisioning_state.lower()} -- There were '{aks_target.provisioning_errors}' errors")
The AKS compute target provisioning succeeded -- There were 'None' errors
The set of resources we will use to deploy our web service on AKS is now provisioned and available.
Once our web app is up and running, it is very important to monitor it, and measure the amount of traffic it gets, how long it takes to respond, the type of exceptions that get raised, etc. We will do so through [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview), which is an application performance management service. To enable it on our soon-to-be-deployed web service, we first need to update our AKS configuration file:
# Set the AKS web service configuration and add monitoring to it
aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)
We are now ready to deploy our web service. As in the [first](21_deployment_on_azure_container_instances.ipynb) notebook, we will deploy from the Docker image. It indeed contains our image classifier model and the conda environment needed for the scoring script to work properly. The parameters to pass to the Webservice.deploy_from_image()
command are similar to those used for the deployment on ACI. The only major difference is the compute target (aks_target
), i.e. the CPU cluster we just spun up.
Note: This deployment takes a few minutes to complete.
if aks_target.provisioning_state== "Succeeded":
aks_service_name ='aks-cpu-image-classif-web-svc'
aks_service = Webservice.deploy_from_image(
workspace = ws,
name = aks_service_name,
image = docker_image,
deployment_config = aks_config,
deployment_target = aks_target
)
aks_service.wait_for_deployment(show_output = True)
print(f"The web service is {aks_service.state}")
else:
raise ValueError("The web service deployment failed.")
Creating service Running................................ SucceededAKS service creation operation finished, operation "Succeeded" The web service is Healthy
When successful, we should see the following:
Creating service
Running ...
SucceededAKS service creation operation finished, operation "Succeeded"
The web service is Healthy
In the case where the deployment is not successful, we can look at the service logs to debug. These instructions can also be helpful.
# Access to the service logs
# print(aks_service.get_logs())
The new deployment can be seen on the portal, under the Deployments tab.
Our web service is up, and is running on AKS.
In a real-life scenario, it is likely that the service we created would need to be up and running at all times. However, in the present demonstrative case, and once we have verified that our service works (cf. "Next steps" section below), we can delete it as well as all the resources we used.
In this notebook, the only resource we added to our subscription, in comparison to what we had at the end of the notebook on ACI deployment, is the AKS cluster. There is no fee for cluster management. The only components we are paying for are:
Here, we used Standard D3 V2 machines, which come with a temporary storage of 200 GB. Over the course of this tutorial (assuming ~ 1 hour), this changed almost nothing to our bill. Now, it is important to understand that each hour during which the cluster is up gets billed, whether the web service is called or not. The same is true for the ACI and workspace we have been using until now.
To get a better sense of pricing, we can refer to [this calculator](https://azure.microsoft.com/en-us/pricing/calculator/?service=kubernetes-service#kubernetes-service). We can also navigate to the [Cost Management + Billing pane](https://ms.portal.azure.com/#blade/Microsoft_Azure_Billing/ModernBillingMenuBlade/Overview) on the portal, click on our subscription ID, and click on the Cost Analysis tab to check our credit usage.
If we plan on no longer using this web service, we can turn monitoring off, and delete the compute target, the service itself as well as the associated Docker image.
# Application Insights deactivation
# aks_service.update(enable_app_insights=False)
# Service termination
# aks_service.delete()
# Compute target deletion
# aks_target.delete()
# This command executes fast but the actual deletion of the AKS cluster takes several minutes
# Docker image deletion
# docker_image.delete()
At this point, all the service resources we used in this notebook have been deleted. We are only now paying for our workspace.
If our goal is to continue using our workspace, we should keep it available. On the contrary, if we plan on no longer using it and its associated resources, we can delete it.
Note: Deleting the workspace will delete all the experiments, outputs, models, Docker images, deployments, etc. that we created in that workspace.
# ws.delete(delete_dependent_resources=True)
# This deletes our workspace, the container registry, the account storage, Application Insights and the key vault
In the [next notebook](23_aci_aks_web_service_testing.ipynb), we will test the web services we deployed on ACI and on AKS.