!mkdir -p code/cloudformation
!wget -q --show-progress -O code/cloudformation/immersion_day.yaml https://personalization-at-amazon.s3.amazonaws.com/amazon-personalize/AmazonPersonalizeImmersionDay.yaml
code/cloudformation 100%[===================>] 2.57K --.-KB/s in 0s
!cat code/cloudformation/immersion_day.yaml
--- AWSTemplateFormatVersion: '2010-09-09' Description: Creates an S3 Bucket, IAM Policies, and SageMaker Notebook to work with Personalize. Parameters: NotebookName: Type: String Default: AmazonPersonalizeImmersionDay Description: Enter the name of the SageMaker notebook instance. Default is PersonalizeImmersionDay. VolumeSize: Type: Number Default: 64 MinValue: 5 MaxValue: 16384 ConstraintDescription: Must be an integer between 5 (GB) and 16384 (16 TB). Description: Enter the size of the EBS volume in GB. domain: Type: String Default: Media Description: Enter the name of the domain (Retail, Media, or CPG) you would like to use in your Amazon Personalize Immersion Day. Resources: SAMArtifactsBucket: Type: AWS::S3::Bucket # SageMaker Execution Role SageMakerIamRole: Type: "AWS::IAM::Role" Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: sagemaker.amazonaws.com Action: sts:AssumeRole Path: "/" ManagedPolicyArns: - "arn:aws:iam::aws:policy/IAMFullAccess" - "arn:aws:iam::aws:policy/AWSCloudFormationFullAccess" - "arn:aws:iam::aws:policy/AmazonS3FullAccess" - "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" - "arn:aws:iam::aws:policy/AWSStepFunctionsFullAccess" - "arn:aws:iam::aws:policy/AWSLambda_FullAccess" - "arn:aws:iam::aws:policy/AmazonSNSFullAccess" - "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess" # SageMaker notebook NotebookInstance: Type: "AWS::SageMaker::NotebookInstance" Properties: InstanceType: "ml.t2.medium" NotebookInstanceName: !Ref NotebookName RoleArn: !GetAtt SageMakerIamRole.Arn VolumeSizeInGB: !Ref VolumeSize LifecycleConfigName: !GetAtt AmazonPersonalizeMLOpsLifecycleConfig.NotebookInstanceLifecycleConfigName AmazonPersonalizeMLOpsLifecycleConfig: Type: "AWS::SageMaker::NotebookInstanceLifecycleConfig" Properties: OnStart: - Content: Fn::Base64: !Sub | #!/bin/bash sudo -u ec2-user -i <<'EOF' cd /home/ec2-user/SageMaker/ git clone https://github.com/aws-samples/amazon-personalize-immersion-day.git cd /home/ec2-user/SageMaker/amazon-personalize-immersion-day/automation/ml_ops/ nohup sh deploy.sh "${SAMArtifactsBucket}" "${domain}" & EOF
import time
from time import sleep
import json
from datetime import datetime
import pandas as pd
original_data = pd.read_csv('./data/bronze/ml-latest-small/ratings.csv')
original_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 100836 entries, 0 to 100835 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 userId 100836 non-null int64 1 movieId 100836 non-null int64 2 rating 100836 non-null float64 3 timestamp 100836 non-null int64 dtypes: float64(1), int64(3) memory usage: 3.1 MB
The int64 format is clearly suitable for userId and movieId. However, we need to dive deeper to understand the timestamps in the data. To use Amazon Personalize, you need to save timestamps in Unix Epoch format. Currently, the timestamp values are not human-readable. So let's grab an arbitrary timestamp value and figure out how to interpret it. Do a quick sanity check on the transformed dataset by picking an arbitrary timestamp and transforming it to a human-readable format.
arb_time_stamp = original_data.iloc[50]['timestamp']
print(arb_time_stamp)
print(datetime.utcfromtimestamp(arb_time_stamp).strftime('%Y-%m-%d %H:%M:%S'))
964982681.0 2000-07-30 18:44:41
Since this is a dataset of an explicit feedback movie ratings, it includes movies rated from 1 to 5. We want to include only moves that were "liked" by the users, and simulate a dataset of data that would be gathered by a VOD platform. In order to do that, we will filter out all interactions under 2 out of 5, and create two EVENT_Types "click" and and "watch". We will then assign all movies rated 2 and above as "click" and movies rated 4 and above as both "click" and "watch".
Note that this is to correspond with the events we are modeling, for a real data set you would actually model based on implicit feedback such as clicks, watches and/or explicit feedback such as ratings, likes etc.
watched_df = original_data.copy()
watched_df = watched_df[watched_df['rating'] > 3]
watched_df = watched_df[['userId', 'movieId', 'timestamp']]
watched_df['EVENT_TYPE']='watch'
display(watched_df.head())
clicked_df = original_data.copy()
clicked_df = clicked_df[clicked_df['rating'] > 1]
clicked_df = clicked_df[['userId', 'movieId', 'timestamp']]
clicked_df['EVENT_TYPE']='click'
display(clicked_df.head())
interactions_df = clicked_df.copy()
interactions_df = interactions_df.append(watched_df)
interactions_df.sort_values("timestamp", axis = 0, ascending = True,
inplace = True, na_position ='last')
interactions_df.info()
userId | movieId | timestamp | EVENT_TYPE | |
---|---|---|---|---|
0 | 1 | 1 | 964982703 | watch |
1 | 1 | 3 | 964981247 | watch |
2 | 1 | 6 | 964982224 | watch |
3 | 1 | 47 | 964983815 | watch |
4 | 1 | 50 | 964982931 | watch |
userId | movieId | timestamp | EVENT_TYPE | |
---|---|---|---|---|
0 | 1 | 1 | 964982703 | click |
1 | 1 | 3 | 964981247 | click |
2 | 1 | 6 | 964982224 | click |
3 | 1 | 47 | 964983815 | click |
4 | 1 | 50 | 964982931 | click |
<class 'pandas.core.frame.DataFrame'> Int64Index: 158371 entries, 66679 to 81092 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 userId 158371 non-null int64 1 movieId 158371 non-null int64 2 timestamp 158371 non-null int64 3 EVENT_TYPE 158371 non-null object dtypes: int64(3), object(1) memory usage: 6.0+ MB
Amazon Personalize has default column names for users, items, and timestamp. These default column names are USER_ID, ITEM_ID, AND TIMESTAMP. So the final modification to the dataset is to replace the existing column headers with the default headers.
interactions_df.rename(columns = {'userId':'USER_ID', 'movieId':'ITEM_ID',
'timestamp':'TIMESTAMP'}, inplace = True)
interactions_df.head()
USER_ID | ITEM_ID | TIMESTAMP | EVENT_TYPE | |
---|---|---|---|---|
66679 | 429 | 222 | 828124615 | watch |
66681 | 429 | 227 | 828124615 | click |
66719 | 429 | 595 | 828124615 | watch |
66718 | 429 | 592 | 828124615 | watch |
66717 | 429 | 590 | 828124615 | watch |
interactions_df.to_csv('./data/silver/ml-latest-small/interactions.csv', index=False, float_format='%.0f')
original_data = pd.read_csv('./data/bronze/ml-latest-small/movies.csv')
original_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9742 entries, 0 to 9741 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 movieId 9742 non-null int64 1 title 9742 non-null object 2 genres 9742 non-null object dtypes: int64(1), object(2) memory usage: 228.5+ KB
original_data['year'] =original_data['title'].str.extract('.*\((.*)\).*',expand = False)
original_data = original_data.dropna(axis=0)
itemmetadata_df = original_data.copy()
itemmetadata_df = itemmetadata_df[['movieId', 'genres', 'year']]
itemmetadata_df.head()
movieId | genres | year | |
---|---|---|---|
0 | 1 | Adventure|Animation|Children|Comedy|Fantasy | 1995 |
1 | 2 | Adventure|Children|Fantasy | 1995 |
2 | 3 | Comedy|Romance | 1995 |
3 | 4 | Comedy|Drama|Romance | 1995 |
4 | 5 | Comedy | 1995 |
We will add a new dataframe to help us generate a creation timestamp. If you don’t provide the CREATION_TIMESTAMP for an item, the model infers this information from the interaction dataset and uses the timestamp of the item’s earliest interaction as its corresponding release date. If an item doesn’t have an interaction, its release date is set as the timestamp of the latest interaction in the training set and it is considered a new item. For the current dataset we will set the CREATION_TIMESTAMP to 0.
itemmetadata_df['CREATION_TIMESTAMP'] = 0
itemmetadata_df.rename(columns = {'genres':'GENRE', 'movieId':'ITEM_ID', 'year':'YEAR'}, inplace = True)
itemmetadata_df.to_csv('./data/silver/ml-latest-small/item-meta.csv', index=False, float_format='%.0f')
!pip install -q boto3
import boto3
import json
import time
!mkdir -p ~/.aws && cp /content/drive/MyDrive/AWS/d01_admin/* ~/.aws
# Configure the SDK to Personalize:
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')
print("We can communicate with Personalize!")
We can communicate with Personalize!
# create the dataset group (the highest level of abstraction)
create_dataset_group_response = personalize.create_dataset_group(
name = "immersion-day-dataset-group-movielens-latest"
)
dataset_group_arn = create_dataset_group_response['datasetGroupArn']
print(json.dumps(create_dataset_group_response, indent=2))
# wait for it to become active
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
describe_dataset_group_response = personalize.describe_dataset_group(
datasetGroupArn = dataset_group_arn
)
status = describe_dataset_group_response["datasetGroup"]["status"]
print("DatasetGroup: {}".format(status))
if status == "ACTIVE" or status == "CREATE FAILED":
break
time.sleep(60)
interactions_schema = schema = {
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "EVENT_TYPE",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
}
],
"version": "1.0"
}
create_schema_response = personalize.create_schema(
name = "personalize-poc-movielens-interactions",
schema = json.dumps(interactions_schema)
)
interaction_schema_arn = create_schema_response['schemaArn']
print(json.dumps(create_schema_response, indent=2))
dataset_type = "INTERACTIONS"
create_dataset_response = personalize.create_dataset(
name = "personalize-poc-movielens-ints",
datasetType = dataset_type,
datasetGroupArn = dataset_group_arn,
schemaArn = interaction_schema_arn
)
interactions_dataset_arn = create_dataset_response['datasetArn']
print(json.dumps(create_dataset_response, indent=2))
region = 'us-east-1'
s3 = boto3.client('s3')
account_id = boto3.client('sts').get_caller_identity().get('Account')
bucket_name = account_id + "-" + region + "-" + "personalizepocvod"
print(bucket_name)
if region == "us-east-1":
s3.create_bucket(Bucket=bucket_name)
else:
s3.create_bucket(
Bucket=bucket_name,
CreateBucketConfiguration={'LocationConstraint': region}
)
746888961694-us-east-1-personalizepocvod
interactions_file_path = './data/silver/ml-latest-small/interactions.csv'
interactions_filename = 'interactions.csv'
boto3.Session().resource('s3').Bucket(bucket_name).Object(interactions_filename).upload_file(interactions_file_path)
interactions_s3DataPath = "s3://"+bucket_name+"/"+interactions_filename
policy = {
"Version": "2012-10-17",
"Id": "PersonalizeS3BucketAccessPolicy",
"Statement": [
{
"Sid": "PersonalizeS3BucketAccessPolicy",
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": [
"s3:*Object",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::{}".format(bucket_name),
"arn:aws:s3:::{}/*".format(bucket_name)
]
}
]
}
s3.put_bucket_policy(Bucket=bucket_name, Policy=json.dumps(policy))
iam = boto3.client("iam")
role_name = "PersonalizeRolePOC"
assume_role_policy_document = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
create_role_response = iam.create_role(
RoleName = role_name,
AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)
# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize"
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
iam.attach_role_policy(
RoleName = role_name,
PolicyArn = policy_arn
)
# Now add S3 support
iam.attach_role_policy(
PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate
role_arn = create_role_response["Role"]["Arn"]
print(role_arn)
arn:aws:iam::746888961694:role/PersonalizeRolePOC
create_dataset_import_job_response = personalize.create_dataset_import_job(
jobName = "personalize-poc-import1",
datasetArn = interactions_dataset_arn,
dataSource = {
"dataLocation": "s3://{}/{}".format(bucket_name, interactions_filename)
},
roleArn = role_arn
)
dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
print(json.dumps(create_dataset_import_job_response, indent=2))
# wait fir this import job to gets activated
max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
describe_dataset_import_job_response = personalize.describe_dataset_import_job(
datasetImportJobArn = dataset_import_job_arn
)
status = describe_dataset_import_job_response["datasetImportJob"]['status']
print("DatasetImportJob: {}".format(status))
if status == "ACTIVE" or status == "CREATE FAILED":
break
time.sleep(60)
import sys
sys.path.insert(0,'./code')
from generic_modules.import_dataset import personalize_dataset
dataset_group_arn = 'arn:aws:personalize:us-east-1:746888961694:dataset-group/immersion-day-dataset-group-movielens-latest'
bucket_name = '746888961694-us-east-1-personalizepocvod'
role_arn = 'arn:aws:iam::746888961694:role/PersonalizeRolePOC'
dataset_type = 'ITEMS'
source_data_path = './data/silver/ml-latest-small/item-meta.csv'
target_file_name = 'item-meta.csv'
personalize_item_meta = personalize_dataset(
dataset_group_arn = dataset_group_arn,
bucket_name = bucket_name,
role_arn = role_arn,
dataset_type = dataset_type,
source_data_path = source_data_path,
target_file_name = target_file_name,
dataset_arn = dataset_arn,
)
personalize_item_meta.setup_connection()
SUCCESS | We can communicate with Personalize!
itemmetadata_schema = {
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "GENRE",
"type": "string",
"categorical": True
},{
"name": "YEAR",
"type": "int",
},
{
"name": "CREATION_TIMESTAMP",
"type": "long",
}
],
"version": "1.0"
}
personalize_item_meta.create_dataset(schema=itemmetadata_schema,
schema_name='personalize-poc-movielens-item',
dataset_name='personalize-poc-movielens-items')
personalize_item_meta.dataset_arn
'arn:aws:personalize:us-east-1:746888961694:dataset/immersion-day-dataset-group-movielens-latest/ITEMS'
personalize_item_meta.upload_data_to_s3()
personalize_item_meta.import_data_from_s3(import_job_name='personalize-poc-item-import1')
import boto3
import json
import time
class personalize_dataset:
def __init__(self,
dataset_group_arn=None,
schema_arn=None,
dataset_arn=None,
dataset_type='INTERACTIONS',
region='us-east-1',
bucket_name=None,
role_arn=None,
source_data_path=None,
target_file_name=None,
dataset_import_job_arn=None
):
self.personalize = None
self.personalize_runtime = None
self.s3 = None
self.iam = None
self.dataset_group_arn = dataset_group_arn
self.schema_arn = schema_arn
self.dataset_arn = dataset_arn
self.dataset_type = dataset_type
self.region = region
self.bucket_name = bucket_name
self.role_arn = role_arn
self.source_data_path = source_data_path
self.target_file_name = target_file_name
self.dataset_import_job_arn = dataset_import_job_arn
def setup_connection(self):
try:
self.personalize = boto3.client('personalize')
self.personalize_runtime = boto3.client('personalize-runtime')
self.s3 = boto3.client('s3')
self.iam = boto3.client("iam")
print("SUCCESS | We can communicate with Personalize!")
except:
print("ERROR | Connection can't be established!")
def create_dataset_group(self, dataset_group_name=None):
"""
The highest level of isolation and abstraction with Amazon Personalize
is a dataset group. Information stored within one of these dataset groups
has no impact on any other dataset group or models created from one. they
are completely isolated. This allows you to run many experiments and is
part of how we keep your models private and fully trained only on your data.
"""
create_dataset_group_response = self.personalize.create_dataset_group(name=dataset_group_name)
self.dataset_group_arn = create_dataset_group_response['datasetGroupArn']
# print(json.dumps(create_dataset_group_response, indent=2))
# Before we can use the dataset group, it must be active.
# This can take a minute or two. Execute the cell below and wait for it
# to show the ACTIVE status. It checks the status of the dataset group
# every minute, up to a maximum of 3 hours.
max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
status = self.check_dataset_group_status()
print("DatasetGroup: {}".format(status))
if status == "ACTIVE" or status == "CREATE FAILED":
break
time.sleep(60)
def check_dataset_group_status(self):
"""
Check the status of dataset group
"""
describe_dataset_group_response = self.personalize.describe_dataset_group(
datasetGroupArn = self.dataset_group_arn
)
status = describe_dataset_group_response["datasetGroup"]["status"]
return status
def create_dataset(self, schema=None, schema_name=None, dataset_name=None):
"""
First, define a schema to tell Amazon Personalize what type of dataset
you are uploading. There are several reserved and mandatory keywords
required in the schema, based on the type of dataset. More detailed
information can be found in the documentation.
"""
create_schema_response = self.personalize.create_schema(
name = schema_name,
schema = json.dumps(schema)
)
self.schema_arn = create_schema_response['schemaArn']
"""
With a schema created, you can create a dataset within the dataset group.
Note that this does not load the data yet, it just defines the schema for
the data. The data will be loaded a few steps later.
"""
create_dataset_response = self.personalize.create_dataset(
name = dataset_name,
datasetType = self.dataset_type,
datasetGroupArn = self.dataset_group_arn,
schemaArn = self.schema_arn
)
self.dataset_arn = create_dataset_response['datasetArn']
def create_s3_bucket(self):
if region == "us-east-1":
self.s3.create_bucket(Bucket=self.bucket_name)
else:
self.s3.create_bucket(
Bucket=self.bucket_name,
CreateBucketConfiguration={'LocationConstraint': self.region}
)
def upload_data_to_s3(self):
"""
Now that your Amazon S3 bucket has been created, upload the CSV file of
our user-item-interaction data.
"""
boto3.Session().resource('s3').Bucket(self.bucket_name).Object(self.target_file_name).upload_file(self.source_data_path)
s3DataPath = "s3://"+self.bucket_name+"/"+self.target_file_name
def set_s3_bucket_policy(self, policy=None):
"""
Amazon Personalize needs to be able to read the contents of your S3
bucket. So add a bucket policy which allows that.
"""
if not policy:
policy = {
"Version": "2012-10-17",
"Id": "PersonalizeS3BucketAccessPolicy",
"Statement": [
{
"Sid": "PersonalizeS3BucketAccessPolicy",
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": [
"s3:*Object",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::{}".format(self.bucket_name),
"arn:aws:s3:::{}/*".format(self.bucket_name)
]
}
]
}
self.s3.put_bucket_policy(Bucket=self.bucket_name, Policy=json.dumps(policy))
def create_iam_role(self, role_name=None):
"""
Amazon Personalize needs the ability to assume roles in AWS in order to
have the permissions to execute certain tasks. Let's create an IAM role
and attach the required policies to it. The code below attaches very permissive
policies; please use more restrictive policies for any production application.
"""
assume_role_policy_document = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
create_role_response = self.iam.create_role(
RoleName = role_name,
AssumeRolePolicyDocument = json.dumps(assume_role_policy_document)
)
# AmazonPersonalizeFullAccess provides access to any S3 bucket with a name that includes "personalize" or "Personalize"
# if you would like to use a bucket with a different name, please consider creating and attaching a new policy
# that provides read access to your bucket or attaching the AmazonS3ReadOnlyAccess policy to the role
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonPersonalizeFullAccess"
self.iam.attach_role_policy(
RoleName = role_name,
PolicyArn = policy_arn
)
# Now add S3 support
self.iam.attach_role_policy(
PolicyArn='arn:aws:iam::aws:policy/AmazonS3FullAccess',
RoleName=role_name
)
time.sleep(60) # wait for a minute to allow IAM role policy attachment to propagate
self.role_arn = create_role_response["Role"]["Arn"]
def import_data_from_s3(self, import_job_name=None):
"""
Earlier you created the dataset group and dataset to house your information,
so now you will execute an import job that will load the data from the S3
bucket into the Amazon Personalize dataset.
"""
create_dataset_import_job_response = self.personalize.create_dataset_import_job(
jobName = import_job_name,
datasetArn = self.dataset_arn,
dataSource = {
"dataLocation": "s3://{}/{}".format(self.bucket_name, self.target_file_name)
},
roleArn = self.role_arn
)
self.dataset_import_job_arn = create_dataset_import_job_response['datasetImportJobArn']
"""
Before we can use the dataset, the import job must be active. Execute the
cell below and wait for it to show the ACTIVE status. It checks the status
of the import job every minute, up to a maximum of 6 hours.
Importing the data can take some time, depending on the size of the dataset.
In this workshop, the data import job should take around 15 minutes.
"""
max_time = time.time() + 6*60*60 # 6 hours
while time.time() < max_time:
describe_dataset_import_job_response = personalize.describe_dataset_import_job(
datasetImportJobArn = dataset_import_job_arn
)
status = self.check_import_job_status()
print("DatasetImportJob: {}".format(status))
if status == "ACTIVE" or status == "CREATE FAILED":
break
time.sleep(60)
def check_import_job_status(self):
describe_dataset_import_job_response = self.personalize.describe_dataset_import_job(
datasetImportJobArn = self.dataset_import_job_arn
)
status = describe_dataset_import_job_response["datasetImportJob"]['status']
return status
def __getstate__(self):
attributes = self.__dict__.copy()
del attributes['personalize']
del attributes['personalize_runtime']
del attributes['s3']
del attributes['iam']
return attributes
dataset_arn = 'arn:aws:personalize:us-east-1:746888961694:dataset/immersion-day-dataset-group-movielens-latest/ITEMS'
dataset_import_job_arn = 'arn:aws:personalize:us-east-1:746888961694:dataset-import-job/personalize-poc-item-import1'
personalize_item_meta = personalize_dataset(
dataset_group_arn = dataset_group_arn,
bucket_name = bucket_name,
role_arn = role_arn,
dataset_type = dataset_type,
source_data_path = source_data_path,
target_file_name = target_file_name,
dataset_arn = dataset_arn,
dataset_import_job_arn = dataset_import_job_arn
)
personalize_item_meta.setup_connection()
SUCCESS | We can communicate with Personalize!
personalize_item_meta.check_import_job_status()
'ACTIVE'
import pickle
with open('./artifacts/etc/personalize_item_meta.pkl', 'wb') as outp:
pickle.dump(personalize_item_meta, outp, pickle.HIGHEST_PROTOCOL)
personalize_item_meta.__getstate__()
{'bucket_name': '746888961694-us-east-1-personalizepocvod', 'dataset_arn': 'arn:aws:personalize:us-east-1:746888961694:dataset/immersion-day-dataset-group-movielens-latest/ITEMS', 'dataset_group_arn': 'arn:aws:personalize:us-east-1:746888961694:dataset-group/immersion-day-dataset-group-movielens-latest', 'dataset_import_job_arn': 'arn:aws:personalize:us-east-1:746888961694:dataset-import-job/personalize-poc-item-import1', 'dataset_type': 'ITEMS', 'region': 'us-east-1', 'role_arn': 'arn:aws:iam::746888961694:role/PersonalizeRolePOC', 'schema_arn': None, 'source_data_path': './data/silver/ml-latest-small/item-meta.csv', 'target_file_name': 'item-meta.csv'}