Welcome to our end-to-end distributed
Question-Answering example. In this demo, we will use the Hugging Face transformers
and datasets
library together with a custom Amazon sagemaker-sdk extension to fine-tune a pre-trained transformer for question-answering on multiple-gpus. In particular, the pre-trained model will be fine-tuned using the squad
dataset. The demo will use the new smdistributed
library to run training on multiple gpus as training scripting we are going to use one of the transformers
example scripts from the repository.
To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on.
NOTE: You can run this demo in Sagemaker Studio, your local machine or Sagemaker Notebook Instances
Note: we only install the required libraries from Hugging Face and AWS. You also need PyTorch or Tensorflow, if you haven´t it installed
!pip install "sagemaker>=2.48.0" --upgrade
import sagemaker.huggingface
If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find here more about it.
import sagemaker
import boto3
sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
# set to default bucket if a bucket name is not given
sagemaker_session_bucket = sess.default_bucket()
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)
print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")
In order to create a sagemaker training job we need an HuggingFace
Estimator. The Estimator handles end-to-end Amazon SageMaker training and deployment tasks. In a Estimator we define, which fine-tuning script should be used as entry_point
, which instance_type
should be used, which hyperparameters
are passed in .....
huggingface_estimator = HuggingFace(entry_point='train.py',
source_dir='./scripts',
sagemaker_session=sess,
base_job_name='huggingface-sdk-extension',
instance_type='ml.p3.2xlarge',
instance_count=1,
transformers_version='4.4',
pytorch_version='1.6.0',
py_version='py36',
role=role,
hyperparameters = {'epochs': 1,
'train_batch_size': 32,
'model_name':'distilbert-base-uncased'
})
When we create a SageMaker training job, SageMaker takes care of starting and managing all the required ec2 instances for us with the huggingface
container, uploads the provided fine-tuning script train.py
and downloads the data from our sagemaker_session_bucket
into the container at /opt/ml/input/data
. Then, it starts the training job by running.
/opt/conda/bin/python train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32
The hyperparameters
you define in the HuggingFace
estimator are passed in as named arguments.
Sagemaker is providing useful properties about the training environment through various environment variables, including the following:
SM_MODEL_DIR
: A string that represents the path where the training job writes the model artifacts to. After training, artifacts in this directory are uploaded to S3 for model hosting.
SM_NUM_GPUS
: An integer representing the number of GPUs available to the host.
SM_CHANNEL_XXXX:
A string that represents the path to the directory that contains the input data for the specified channel. For example, if you specify two input channels in the HuggingFace estimator’s fit call, named train
and test
, the environment variables SM_CHANNEL_TRAIN
and SM_CHANNEL_TEST
are set.
To run your training job locally you can define instance_type='local'
or instance_type='local_gpu'
for gpu usage. Note: this does not working within SageMaker Studio
from sagemaker.huggingface import HuggingFace
# hyperparameters, which are passed into the training job
hyperparameters={
'model_name_or_path': 'bert-large-uncased-whole-word-masking',
'dataset_name':'squad',
'do_train': True,
'do_eval': True,
'fp16': True,
'per_device_train_batch_size': 4,
'per_device_eval_batch_size': 4,
'num_train_epochs': 2,
'max_seq_length': 384,
'max_steps': 100,
'pad_to_max_length': True,
'doc_stride': 128,
'output_dir': '/opt/ml/model'
}
# configuration for running training on smdistributed Data Parallel
distribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}
# git configuration to download our fine-tuning script
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.26.0'}
# instance configurations
instance_type='ml.p3.16xlarge'
instance_count=2
volume_size=200
# metric definition to extract the results
metric_definitions=[
{"Name": "train_runtime", "Regex": "train_runtime.*=\D*(.*?)$"},
{'Name': 'train_samples_per_second', 'Regex': "train_samples_per_second.*=\D*(.*?)$"},
{'Name': 'epoch', 'Regex': "epoch.*=\D*(.*?)$"},
{'Name': 'f1', 'Regex': "f1.*=\D*(.*?)$"},
{'Name': 'exact_match', 'Regex': "exact_match.*=\D*(.*?)$"}]
# estimator
huggingface_estimator = HuggingFace(entry_point='run_qa.py',
source_dir='./examples/pytorch/question-answering',
git_config=git_config,
metric_definitions=metric_definitions,
instance_type=instance_type,
instance_count=instance_count,
volume_size=volume_size,
role=role,
transformers_version='4.26.0',
pytorch_version='1.13.1',
py_version='py39',
distribution= distribution,
hyperparameters = hyperparameters)
# starting the train job
huggingface_estimator.fit()
To deploy our endpoint, we call deploy()
on our HuggingFace estimator object, passing in our desired number of instances and instance type.
predictor = huggingface_estimator.deploy(1,"ml.g4dn.xlarge")
Then, we use the returned predictor object to call the endpoint.
data = {
"inputs": {
"question": "What is used for inference?",
"context": "My Name is Philipp and I live in Nuremberg. This model is used with sagemaker for inference."
}
}
predictor.predict(data)
Finally, we delete the endpoint again.
predictor.delete_model()
predictor.delete_endpoint()