Notebook

Licensed under the MIT License.

Deployment of a model to Azure Kubernetes Service (AKS)¶

Table of contents¶

Introduction
Model deployment on AKS
Workspace retrieval
Docker image retrieval
AKS compute target creation
Monitoring activation
Service deployment
Clean up
Next steps

1. Introduction ¶

In many real life scenarios, trained machine learning models need to be deployed to production. As we saw in the [prior](21_deployment_on_azure_container_instances.ipynb) deployment notebook, this can be done by deploying on Azure Container Instances. In this tutorial, we will get familiar with another way of implementing a model into a production environment, this time using [Azure Kubernetes Service](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads) (AKS).

AKS manages hosted Kubernetes environments. It makes it easy to deploy and manage containerized applications without container orchestration expertise. It also supports deployments with CPU clusters and deployments with GPU clusters.

At the end of this tutorial, we will have learned how to:

Deploy a model as a web service using AKS
Monitor our new service.

Impressions

Pre-requisites ¶

This notebook relies on resources we created in [21_deployment_on_azure_container_instances.ipynb](21_deployment_on_azure_container_instances.ipynb):

Our Azure Machine Learning workspace
The Docker image that contains the model and scoring script needed for the web service to work.

If we are missing any of these, we should go back and run the steps from the sections "Pre-requisites" to "3.D Environment setup" to generate them.

Library import ¶

Now that our prior resources are available, let's first import a few libraries we will need for the deployment on AKS.

In [1]:

# For automatic reloading of modified libraries
%reload_ext autoreload
%autoreload 2

import sys
sys.path.extend(["..", "../.."]) # to access the utils_cv library

# Azure
from azureml.core import Workspace
from azureml.core.compute import AksCompute, ComputeTarget
from azureml.core.webservice import AksWebservice, Webservice

2. Model deployment on AKS ¶

2.A Workspace retrieval

Let's now load the workspace we used in the [prior notebook](21_deployment_on_azure_container_instances.ipynb).

Note: The Docker image we will use below is attached to that workspace. It is then important to use the same workspace here. If, for any reason, we needed to use another workspace instead, we would need to reproduce, here, the steps followed to create a Docker image containing our image classifier model in the prior notebook.

To create or access an Azure ML Workspace, you will need the following information. If you are coming from previous notebook you can retreive existing workspace, or create a new one if you are just starting with this notebook.

subscription ID: the ID of the Azure subscription we are using
resource group: the name of the resource group in which our workspace resides
workspace region: the geographical area in which our workspace resides (e.g. "eastus2" -- other examples are ---available here -- note the lack of spaces)
workspace name: the name of the workspace we want to create or retrieve.

In [2]:

subscription_id = "YOUR_SUBSCRIPTION_ID"
resource_group = "YOUR_RESOURCE_GROUP_NAME"  
workspace_name = "YOUR_WORKSPACE_NAME"  
workspace_region = "YOUR_WORKSPACE_REGION" #Possible values eastus, eastus2 and so on.

3.A Workspace retrieval ¶

In prior notebook notebook, we created a workspace. This is a critical object from which we will build all the pieces we need to deploy our model as a web service. Let's start by retrieving it.

In [3]:

# A util method that creates a workspace or retrieves one if it exists, also takes care of Azure Authentication
from utils_cv.common.azureml import get_or_create_workspace

ws = get_or_create_workspace(
        subscription_id,
        resource_group,
        workspace_name,
        workspace_region)


# Print the workspace attributes
print('Workspace name: ' + ws.name, 
      'Workspace region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

WARNING - Warning: Falling back to use azure cli login credentials.
If you run your code in unattended mode, i.e., where you can't give a user input, then we recommend to use ServicePrincipalAuthentication or MsiAuthentication.
Please refer to aka.ms/aml-notebook-auth for different authentication mechanisms in azureml-sdk.

Workspace name: amlnotebookws
Workspace region: eastus
Resource group: amlnotebookrg

2.B Docker image retrieval

We can reuse the Docker image we created in section 3. of the [previous tutorial](21_deployment_on_azure_container_instances.ipynb). Let's make sure that it is still available.

In [4]:

print("Docker images:")
for docker_im in ws.images:    
    print(f" --> Name: {ws.images[docker_im].name}\n \
    --> ID: {ws.images[docker_im].id}\n \
    --> Tags: {ws.images[docker_im].tags}\n \
    --> Creation time: {ws.images[docker_im].created_time}\n"
         )

Docker images:
 --> Name: image-classif-resnet18-f48
     --> ID: image-classif-resnet18-f48:2
     --> Tags: {'training set': 'ImageNet', 'architecture': 'CNN ResNet18', 'type': 'Pretrained'}
     --> Creation time: 2019-07-18 17:51:26.927240+00:00

As we did not delete it in the prior notebook, our Docker image is still present in our workspace. Let's retrieve it.

In [5]:

docker_image = ws.images["image-classif-resnet18-f48"]

We can also check that the model it contains is the one we registered and used during our deployment on ACI. In our case, the Docker image contains only 1 model, so taking the 0th element of the docker_image.models list returns our model.

Note: We will not use the registered_model object anywhere here. We are running the next 2 cells just for verification purposes.

In [6]:

registered_model = docker_image.models[0]

print(f"Existing model:\n --> Name: {registered_model.name}\n \
--> Version: {registered_model.version}\n --> ID: {registered_model.id} \n \
--> Creation time: {registered_model.created_time}\n \
--> URL: {registered_model.url}"
     )

Existing model:
 --> Name: im_classif_resnet18
 --> Version: 8
 --> ID: im_classif_resnet18:8 
 --> Creation time: 2019-07-18 17:51:17.521804+00:00
 --> URL: aml://asset/5c63dec5ea424557838d109d3294b611

2.C AKS compute target creation ¶

In the case of deployment on AKS, in addition to the Docker image, we need to define computational resources. This is typically a cluster of CPUs or a cluster of GPUs. If we already have a Kubernetes-managed cluster in our workspace, we can use it, otherwise, we can create a new one.

Note: The name we give to our compute target must be between 2 and 16 characters long.

Let's first check what types of compute resources we have, if any

In [7]:

print("List of compute resources associated with our workspace:")
for cp in ws.compute_targets:
    print(f"   --> {cp}: {ws.compute_targets[cp]}")

List of compute resources associated with our workspace:

In the case where we have no compute resource available, we can create a new one. For this, we can choose between a CPU-based or a GPU-based cluster of virtual machines. The latter is typically better suited for web services with high traffic (i.e. > 100 requests per second) and high GPU utilization. There is a wide variety of machine types that can be used. In the present example, however, we will not need the fastest machines that exist nor the most memory optimized ones. We will use typical default machines:

Standard D3 V2:
- 4 vCPUs
- 14 GB of memory
Standard NC6:
- 1 GPU
- 12 GB of GPU memory
- These machines also have 6 vCPUs and 56 GB of memory.

Notes:

These are Azure-specific denominations
Information on optimized machines can be found here
When configuring the provisioning of an AKS cluster, we need to choose a type of machine, as examplified above. This choice must be such that the number of virtual machines (also called agent nodes), we require, multiplied by the number of vCPUs on each machine must be greater than or equal to 12 vCPUs. This is indeed the minimum needed for such cluster. By default, a pool of 3 virtual machines gets provisioned on a new AKS cluster to allow for redundancy. So, if the type of virtual machine we choose has a number of vCPUs (vm_size) smaller than 4, we need to increase the number of machines (agent_count) such that agent_count x vm_size ≥ 12 virtual CPUs. agent_count and vm_size are both parameters we can pass to the provisioning_configuration() method below.
This document provides the full list of virtual machine types that can be deployed in an AKS cluster
Additional considerations on deployments using GPUs are available here
If the Azure subscription we are using is shared with other users, we may encounter quota restrictions when trying to create a new cluster. To ensure that we have enough machines left, we can go to the Portal, click on our workspace name, and navigate to the Usage + quotas section. If we need more machines than are currently available, we can request a quota increase.

Here, we will use a cluster of CPUs. The creation of such resource typically takes several minutes to complete.

In [8]:

# Declare the name of the cluster
virtual_machine_type = 'cpu'
aks_name = f'imgclass-aks-{virtual_machine_type}'

if aks_name not in ws.compute_targets:
    # Define the type of virtual machines to use
    if virtual_machine_type == 'gpu':
        vm_size_name ="Standard_NC6"
    else:
        vm_size_name = "Standard_D3_v2"

    # Configure the cluster using the default configuration (i.e. with 3 virtual machines)
    prov_config = AksCompute.provisioning_configuration(vm_size = vm_size_name, agent_count=3)

    # Create the cluster
    aks_target = ComputeTarget.create(workspace = ws, 
                                      name = aks_name, 
                                      provisioning_configuration = prov_config)
    aks_target.wait_for_completion(show_output = True)
    print(f"We created the {aks_target.name} AKS compute target")
else:
    # Retrieve the already existing cluster
    aks_target = ws.compute_targets[aks_name]
    print(f"We retrieved the {aks_target.name} AKS compute target")

Creating..................................................................................................................................................................
SucceededProvisioning operation finished, operation "Succeeded"
We created the imgclass-aks-cpu AKS compute target

If we need a more customized AKS cluster, we can provide more parameters to the provisoning_configuration() method, the full list of which is available here.

When the cluster deploys successfully, we typically see the following:

Creating ...
SucceededProvisioning operation finished, operation "Succeeded"

In the case when our cluster already exists, we get the following message:

We retrieved the <aks_cluster_name> AKS compute target

This compute target can be seen on the Azure portal, under the Compute tab.

In [9]:

# Check provisioning status
print(f"The AKS compute target provisioning {aks_target.provisioning_state.lower()} -- There were '{aks_target.provisioning_errors}' errors")

The AKS compute target provisioning succeeded -- There were 'None' errors

The set of resources we will use to deploy our web service on AKS is now provisioned and available.

2.D Monitoring activation ¶

Once our web app is up and running, it is very important to monitor it, and measure the amount of traffic it gets, how long it takes to respond, the type of exceptions that get raised, etc. We will do so through [Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview), which is an application performance management service. To enable it on our soon-to-be-deployed web service, we first need to update our AKS configuration file:

In [10]:

# Set the AKS web service configuration and add monitoring to it
aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)

2.E Service deployment ¶

We are now ready to deploy our web service. As in the [first](21_deployment_on_azure_container_instances.ipynb) notebook, we will deploy from the Docker image. It indeed contains our image classifier model and the conda environment needed for the scoring script to work properly. The parameters to pass to the Webservice.deploy_from_image() command are similar to those used for the deployment on ACI. The only major difference is the compute target (aks_target), i.e. the CPU cluster we just spun up.

Note: This deployment takes a few minutes to complete.

In [11]:

if aks_target.provisioning_state== "Succeeded": 
    aks_service_name ='aks-cpu-image-classif-web-svc'
    aks_service = Webservice.deploy_from_image(
        workspace = ws, 
        name = aks_service_name,
        image = docker_image,
        deployment_config = aks_config,
        deployment_target = aks_target
    )
    aks_service.wait_for_deployment(show_output = True)
    print(f"The web service is {aks_service.state}")
else:
    raise ValueError("The web service deployment failed.")

Creating service
Running................................
SucceededAKS service creation operation finished, operation "Succeeded"
The web service is Healthy

When successful, we should see the following:

Creating service
Running ...
SucceededAKS service creation operation finished, operation "Succeeded"
The web service is Healthy

In the case where the deployment is not successful, we can look at the service logs to debug. These instructions can also be helpful.

In [12]:

# Access to the service logs
# print(aks_service.get_logs())

The new deployment can be seen on the portal, under the Deployments tab.

Our web service is up, and is running on AKS.

3. Clean up

In a real-life scenario, it is likely that the service we created would need to be up and running at all times. However, in the present demonstrative case, and once we have verified that our service works (cf. "Next steps" section below), we can delete it as well as all the resources we used.

In this notebook, the only resource we added to our subscription, in comparison to what we had at the end of the notebook on ACI deployment, is the AKS cluster. There is no fee for cluster management. The only components we are paying for are:

the cluster nodes
the managed OS disks.

Here, we used Standard D3 V2 machines, which come with a temporary storage of 200 GB. Over the course of this tutorial (assuming ~ 1 hour), this changed almost nothing to our bill. Now, it is important to understand that each hour during which the cluster is up gets billed, whether the web service is called or not. The same is true for the ACI and workspace we have been using until now.

To get a better sense of pricing, we can refer to [this calculator](https://azure.microsoft.com/en-us/pricing/calculator/?service=kubernetes-service#kubernetes-service). We can also navigate to the [Cost Management + Billing pane](https://ms.portal.azure.com/#blade/Microsoft_Azure_Billing/ModernBillingMenuBlade/Overview) on the portal, click on our subscription ID, and click on the Cost Analysis tab to check our credit usage.

If we plan on no longer using this web service, we can turn monitoring off, and delete the compute target, the service itself as well as the associated Docker image.

In [13]:

# Application Insights deactivation
# aks_service.update(enable_app_insights=False)

# Service termination
# aks_service.delete()

# Compute target deletion
# aks_target.delete()
# This command executes fast but the actual deletion of the AKS cluster takes several minutes

# Docker image deletion
# docker_image.delete()

At this point, all the service resources we used in this notebook have been deleted. We are only now paying for our workspace.

If our goal is to continue using our workspace, we should keep it available. On the contrary, if we plan on no longer using it and its associated resources, we can delete it.

Note: Deleting the workspace will delete all the experiments, outputs, models, Docker images, deployments, etc. that we created in that workspace.

In [14]:

# ws.delete(delete_dependent_resources=True)
# This deletes our workspace, the container registry, the account storage, Application Insights and the key vault

4. Next steps ¶

In the [next notebook](23_aci_aks_web_service_testing.ipynb), we will test the web services we deployed on ACI and on AKS.