Run the following two cellls to sync with Google Drive only if you run from Google Colab.
Note: we recommend using Google Colab for this specific homework, since the training phase will require a GPU
from os import makedirs
from os.path import exists
! git clone https://github.com/LM1997610/AdavancedML.git
print()
%cd /content/AdavancedML/Assignment_3/Practice
Cloning into 'AdavancedML'... remote: Enumerating objects: 1272, done. remote: Counting objects: 100% (518/518), done. remote: Compressing objects: 100% (292/292), done. remote: Total 1272 (delta 258), reused 452 (delta 224), pack-reused 754 Receiving objects: 100% (1272/1272), 9.56 MiB | 31.79 MiB/s, done. Resolving deltas: 100% (698/698), done. /content/AdavancedML/Assignment_3/Practice
! gdown 1fNjPKEBHJObyhZkgpnP4gYbIXp_D0eYA
Downloading... From: https://drive.google.com/uc?id=1fNjPKEBHJObyhZkgpnP4gYbIXp_D0eYA To: /content/AdavancedML/Assignment_3/Practice/data.zip 100% 144M/144M [00:00<00:00, 156MB/s]
if exists("data.zip"):
! unzip -q data.zip
! rm data.zip
Welcome to this guide on training, testing, and fine-tuning a deep learning model. Deep learning is at the forefront of artificial intelligence, with applications spanning image recognition, natural language processing, and more.
Throughout this assignment, you'll:
Prepare Data: Preprocess and load the data.
Use Neural Networks: Instantiate a neural network architecture.
Train Models: Utilize optimization, loss functions, and backpropagation.
Evaluate Performance: Assess model performance, prevent overfitting, and underfitting.
Fine-Tune Models: Explore hyperparameter tuning.
import torch
import torch.autograd
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from utils import h36motion3d as datasets
import time
import numpy as np
import matplotlib.pyplot as plt
from utils.loss_funcs import *
from utils.data_utils import define_actions
from utils.h36_3d_viz import visualize
# Use GPU if available, otherwise stick with cpu
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device, '- Type:', torch.cuda.get_device_name(0))
Using device: cuda - Type: Tesla T4
For this homework, you will use Human3.6M, which is a large-scale dataset of 3.6 million accurate 3D human poses acquired by recording the performance of five female and six male subjects under four different viewpoints. The dataset includes:
The dataset aims to provide diverse motions and poses encountered in typical human activities, with additional data to train realistic human sensing systems.
For this assignment, we will leverage the rich motion data (See in the figure above) provided by H3.6M to perform a task known as motion prediction. Motion prediction involves using historical motion data to forecast future movements. This task is fundamental in human-robot interaction, animation, and sports analytics applications.
Each created sequence has the shape (35, 17, 3), where:
The original data provides high-resolution progressive scan videos at 50 Hz. However, the dataset has been downsampled to 25 Hz for research purposes. This means that 25 frames of motion data are provided per second.
Note: the figure above shows 18 joints, however the dataset contains 32. For this specific case we will consider 22 joints, ignoring some of the finer ones (e.g. foot tip, hand tip, etc)
# # Arguments to setup the datasets
datas = 'h36m' # dataset name
path = './data/h3.6m/h3.6m/dataset'
input_n = 10 # number of frames to train on (default=10)
output_n = 25 # number of frames to predict on
input_dim = 3 # dimensions of the input coordinates (default=3)
skip_rate = 1 # skip rate of frames
joints_to_consider = 22
#FLAGS FOR THE TRAINING
mode = 'train' # choose either train or test mode
batch_size_test = 8
model_path = './checkpoints/' # path to the model checkpoint file
actions_to_consider_test = 'all' # actions to test on.
#the model name to save/load
model_name = datas+'_3d_'+str(output_n)+'frames_ckpt'
#FLAGS FOR THE VISUALIZATION
actions_to_consider_viz = 'all' # actions to visualize
visualize_from = 'test'
n_viz = 2
Load Dataset
Note: It will take you ~ 5 minutes
import warnings
warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning)
# Load Data
print('Loading Train Dataset...')
dataset = datasets.Datasets(path,input_n,output_n,skip_rate, split=0)
print('Loading Validation Dataset...')
vald_dataset = datasets.Datasets(path,input_n,output_n,skip_rate, split=1)
#! Note: Ignore warning: "VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences"
Loading Train Dataset... Loading Validation Dataset...
Following we create a torch dataloader that create the batches for each epoch.
batch_size=256
print('>>> Training dataset length: {:d}'.format(dataset.__len__()))
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)#
print('>>> Validation dataset length: {:d}'.format(vald_dataset.__len__()))
vald_loader = DataLoader(vald_dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)
>>> Training dataset length: 180077 >>> Validation dataset length: 28110
Each sequence comprises an observed part to train the Encoder and a part that attempts to predict the future sequence, the Decoder.
Generally, the standard setup plans to use the first 10 sequences of poses ($N_{obs}=10$) for the observation and the following 25 ($N_{pred} = 25$) for the prediction.
We create an instance of a custom Spatio-Temporal transformer with the chosen configuration.
(Note: explore the model in ./models/sttr/sttformer.py)
Then we allocate it to the GPU for forward and backward accelerated computation.
from models.sttr.sttformer import Model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device: %s'%device)
n_heads = 1
model = Model(num_joints=joints_to_consider,
num_frames=input_n, num_frames_out=output_n, num_heads=n_heads,
num_channels=3, kernel_size=[3,3], use_pes=True).to(device)
print('total number of parameters of the network is: '+str(sum(p.numel() for p in model.parameters() if p.requires_grad)))
Using device: cuda total number of parameters of the network is: 26859
As we embark on training deep learning models for motion prediction using the H3.6M dataset, it's essential to recognize several key parameters and components that significantly impact the training phase:
Learning Rate: This parameter determines the convergence speed during optimization.
Batch Size: It influences model generalization and training efficiency.
Number of Epochs: The number of training iterations affects model learning.
Loss Function: The choice of loss function directly affects learning and final performance.
Optimizer: The optimization algorithm used (e.g., Adam, SGD) impacts gradient descent during training.
Milestones and Gamma: These parameters control learning rate schedules, allowing for adaptive adjustments during training.
Weight Decay: It regulates the impact of model parameters during optimization.
Scheduler: Scheduler strategies (e.g., StepLR, ReduceLROnPlateau) manage learning rate adaptation during training.
# Arguments to setup the optimizer
lr = 1e-01 # learning rate
use_scheduler = True # use MultiStepLR scheduler
milestones = [10, 30] # the epochs after which the learning rate is adjusted by gamma
gamma = 0.1 # gamma correction to the learning rate, after reaching the milestone epochs
weight_decay = 1e-05 # weight decay (L2 penalty)
optimizer = optim.Adam(model.parameters(), lr = lr, weight_decay = weight_decay)
if use_scheduler:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=gamma)
clip_grad= None # select max norm to clip gradients
# Argument for training
n_epochs = 41
log_step = 200
The loss and metric used during training and evaluation respectively, compare the predicted joint positions to ground truth joint positions for all frames, which is typically referred to as the Average Mean Per Joint Position Error (A-MPJPE) can be seen as an $L_2$. This loss quantifies the dissimilarity between the predicted and ground truth joint positions by measuring the squared Euclidean distance between corresponding joint positions.
\begin{align*} A-MPJPE &= \frac{1}{N_{pred}} \sum_{i=1}^{N_{pred}} \left(\frac{1}{J} \sum_{j=1}^{J} \left\| P_{\text{predicted}_{t,j}} - P_{\text{gt}_{t,j}} \right\|^2\right) \end{align*}$$where:$$\begin{align*} P_{\text{predicted}} &: \text{Set of predicted joint positions estimated by the model.} \\ P_{\text{gt}} &: \text{Corresponding set of ground truth joint positions.} \\ \end{align*}Note: If you restart the training for any reason, remember to instantiate the model and the optimizer again. This will avoid continuing the training with the initialized weights of the previous one
Objective: In this exercise, you will practice implementing a code snippet to save your deep learning model's checkpoints and visualize the training and validation loss on the same plot every 5 epochs during model training.
Your task is to implement the following:
Set up a mechanism to save the model's checkpoints (weights and architecture) during training. These checkpoints should be saved periodically, say, every 5 epochs.
Create a plot displaying the training and validation losses on the same graph. The x-axis should represent the number of epochs, and the y-axis should represent the loss values. The training and validation losses should be plotted as separate lines on the same graph.
Ensure that the code saves the model's checkpoints in a specified directory, including the model's architecture and weights, and that the loss plot is displayed.
Analyze the loss plot to gain insights into how your model is learning over time and whether there are any signs of overfitting or underfitting.
Note: see the Pytorch Documentation on how to save your model's checkpoints.
from IPython.display import clear_output
def do_my_plot_and_save(my_model, train_loss, val_loss, path_to_save_model, model_name, this_epoch):
#if not exists(path_to_save_model): makedirs(path_to_save_model)
if not exists(path_to_save_model+ "plots_dir/"): makedirs(path_to_save_model + "plots_dir/")
torch.save(my_model.state_dict(), path_to_save_model + model_name + "_epoch_"+str(this_epoch+1)+".pt")
fig = plt.figure(figsize=(5, 2))
fig.tight_layout(pad = 2)
x_lenght = list(range(1, len(train_loss)+1))
plt.plot(x_lenght , train_loss, 'r', label = 'Train loss')
plt.plot(x_lenght , val_loss, 'g', label =' Val loss')
plt.title('\n Loss History \n', fontsize=14)
plt.xlabel('n_of_epochs \n'); plt.ylabel('loss')
t = 1 if this_epoch < 11 else 2 if this_epoch<21 else 3
plt.xticks(list(range(1, len(train_loss)+1, t)));
plt.grid(linewidth=0.4); plt.legend()
plt.savefig(path_to_save_model + "plots_dir/" +"loss_epoch_"+str(this_epoch+1)+".png", bbox_inches='tight')
plt.show()
def train(data_loader,vald_loader, path_to_save_model=None):
train_loss = []
val_loss = []
val_loss_best = 1000
dim_used = np.array([6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 51, 52, 53, 54, 55, 56, 57, 58, 59, 63, 64, 65, 66, 67, 68,
75, 76, 77, 78, 79, 80, 81, 82, 83, 87, 88, 89, 90, 91, 92])
for epoch in range(n_epochs-1):
running_loss=0
n=0
model.train()
for cnt,batch in enumerate(data_loader):
batch=batch.float().to(device)
batch_dim=batch.shape[0]
n+=batch_dim
sequences_train=batch[:, 0:input_n, dim_used].view(-1,input_n,len(dim_used)//3,3).permute(0,3,1,2)
sequences_gt=batch[:, input_n:input_n+output_n, dim_used].view(-1,output_n,len(dim_used)//3,3)
optimizer.zero_grad()
sequences_predict=model(sequences_train).view(-1, output_n, joints_to_consider, 3)
loss=mpjpe_error(sequences_predict,sequences_gt)
#if cnt % log_step == 0:
# print('[Epoch: %d, Iteration: %5d] training loss: %.3f' %(epoch + 1, cnt + 1, loss.item()))
loss.backward()
if clip_grad is not None:
torch.nn.utils.clip_grad_norm_(model.parameters(),clip_grad)
optimizer.step()
running_loss += loss*batch_dim
train_loss.append(running_loss.detach().cpu()/n)
model.eval()
with torch.no_grad():
running_loss=0
n=0
for cnt,batch in enumerate(vald_loader):
batch=batch.float().to(device)
batch_dim=batch.shape[0]
n+=batch_dim
sequences_train=batch[:, 0:input_n, dim_used].view(-1,input_n,len(dim_used)//3,3).permute(0,3,1,2)
sequences_gt=batch[:, input_n:input_n+output_n, dim_used].view(-1,output_n,len(dim_used)//3,3)
sequences_predict=model(sequences_train).view(-1, output_n, joints_to_consider, 3)
loss=mpjpe_error(sequences_predict,sequences_gt)
if cnt % log_step == 0:
print('[Epoch: %d, Iteration: %5d] validation loss: %.3f' %(epoch + 1, cnt + 1, loss.item()))
running_loss+=loss*batch_dim
val_loss.append(running_loss.detach().cpu()/n)
if running_loss/n < val_loss_best:
val_loss_best = running_loss/n
if use_scheduler:
scheduler.step()
# save and plot model every 5 epochs
# Insert your code below. Use the argument path_to_save_model to save the model to the path specified.
#if save_and_plot and epoch in [4 + 5 * i for i in range(n_epochs)]:
if save_and_plot and epoch in list(range(4, n_epochs, 5)):
#clear_output(wait=True)
do_my_plot_and_save(model, train_loss, val_loss, path_to_save_model, model_name, epoch )
# Save the model and plot the loss
# Change to True if you want to save the model and plot the loss
save_and_plot = True
# launch training
train(data_loader,vald_loader, path_to_save_model=model_path)
# plots and model checkpoints are saved in "checkpoints" directory
[Epoch: 1, Iteration: 1] validation loss: 106.702 [Epoch: 2, Iteration: 1] validation loss: 98.812 [Epoch: 3, Iteration: 1] validation loss: 86.495 [Epoch: 4, Iteration: 1] validation loss: 88.856 [Epoch: 5, Iteration: 1] validation loss: 88.452
[Epoch: 6, Iteration: 1] validation loss: 83.586 [Epoch: 7, Iteration: 1] validation loss: 87.983 [Epoch: 8, Iteration: 1] validation loss: 84.283 [Epoch: 9, Iteration: 1] validation loss: 81.305 [Epoch: 10, Iteration: 1] validation loss: 85.331
[Epoch: 11, Iteration: 1] validation loss: 84.124 [Epoch: 12, Iteration: 1] validation loss: 80.189 [Epoch: 13, Iteration: 1] validation loss: 76.834 [Epoch: 14, Iteration: 1] validation loss: 81.435 [Epoch: 15, Iteration: 1] validation loss: 83.391
[Epoch: 16, Iteration: 1] validation loss: 80.990 [Epoch: 17, Iteration: 1] validation loss: 83.519 [Epoch: 18, Iteration: 1] validation loss: 82.499 [Epoch: 19, Iteration: 1] validation loss: 81.216 [Epoch: 20, Iteration: 1] validation loss: 82.998
[Epoch: 21, Iteration: 1] validation loss: 82.410 [Epoch: 22, Iteration: 1] validation loss: 80.091 [Epoch: 23, Iteration: 1] validation loss: 77.536 [Epoch: 24, Iteration: 1] validation loss: 83.058 [Epoch: 25, Iteration: 1] validation loss: 83.168
[Epoch: 26, Iteration: 1] validation loss: 82.381 [Epoch: 27, Iteration: 1] validation loss: 80.350 [Epoch: 28, Iteration: 1] validation loss: 79.160 [Epoch: 29, Iteration: 1] validation loss: 79.196 [Epoch: 30, Iteration: 1] validation loss: 79.306
[Epoch: 31, Iteration: 1] validation loss: 83.749 [Epoch: 32, Iteration: 1] validation loss: 79.198 [Epoch: 33, Iteration: 1] validation loss: 76.732 [Epoch: 34, Iteration: 1] validation loss: 80.527 [Epoch: 35, Iteration: 1] validation loss: 79.460
[Epoch: 36, Iteration: 1] validation loss: 84.075 [Epoch: 37, Iteration: 1] validation loss: 80.603 [Epoch: 38, Iteration: 1] validation loss: 81.395 [Epoch: 39, Iteration: 1] validation loss: 84.408 [Epoch: 40, Iteration: 1] validation loss: 85.960
After training and validation, the test loop is the final phase that evaluates the model's performance on an entirely independent dataset known as the test dataset. This dataset is distinct from the training and validation data, ensuring unbiased assessment. The test loop provides a reliable estimate of how well the model will perform in real-world scenarios, confirming that any improvements observed during training and validation are not due to overfitting or chance. It's a crucial step before deploying the model in practical applications.
def test(ckpt_path=None):
model.load_state_dict(torch.load(ckpt_path))
print('\n ...model loaded \n')
model.eval()
accum_loss = 0
n_batches = 0 # number of batches for all the sequences
actions = define_actions(actions_to_consider_test)
dim_used = np.array([6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 51, 52, 53, 54, 55, 56, 57, 58, 59, 63, 64, 65, 66, 67, 68,
75, 76, 77, 78, 79, 80, 81, 82, 83, 87, 88, 89, 90, 91, 92])
# joints at same loc
joint_to_ignore = np.array([16, 20, 23, 24, 28, 31])
index_to_ignore = np.concatenate((joint_to_ignore * 3, joint_to_ignore * 3 + 1, joint_to_ignore * 3 + 2))
joint_equal = np.array([13, 19, 22, 13, 27, 30])
index_to_equal = np.concatenate((joint_equal * 3, joint_equal * 3 + 1, joint_equal * 3 + 2))
totalll=0
counter=0
for action in actions:
running_loss=0
n=0
dataset_test = datasets.Datasets(path, input_n, output_n, skip_rate, split=2, actions=[action])
#print('>>> test action for sequences: {:d}'.format(dataset_test.__len__()))
test_loader = DataLoader(dataset_test, batch_size=batch_size_test, shuffle=False, num_workers=0, pin_memory=True)
for cnt,batch in enumerate(test_loader):
with torch.no_grad():
batch=batch.to(device)
batch_dim=batch.shape[0]
n+=batch_dim
all_joints_seq=batch.clone()[:, input_n:input_n+output_n,:]
sequences_train=batch[:, 0:input_n, dim_used].view(-1,input_n,len(dim_used)//3,3).permute(0,3,1,2)
sequences_gt=batch[:, input_n:input_n+output_n, :]
running_time = time.time()
#sequences_predict = model(sequences_train)
sequences_predict = model(sequences_train).view(-1, output_n, joints_to_consider, 3)
totalll += time.time()-running_time
counter += 1
sequences_predict = sequences_predict.contiguous().view(-1,output_n,len(dim_used))
all_joints_seq[:,:,dim_used] = sequences_predict
all_joints_seq[:,:,index_to_ignore] = all_joints_seq[:,:,index_to_equal]
loss = mpjpe_error(all_joints_seq.view(-1,output_n,32,3),sequences_gt.view(-1,output_n,32,3))
running_loss += loss*batch_dim
accum_loss += loss*batch_dim
#print('loss at test subject for action : '+str(action)+ ' is: '+ str(running_loss/n))
print(str(action),': ', str(np.round((running_loss/n).item(),1)))
n_batches += n
print('\nAverage: '+str(np.round((accum_loss/n_batches).item(),1)))
print('Prediction time: ', totalll/counter)
Note: Your results should be better than 95 millimiters on average
# Change the epoch according to the validation curve :
ckpt_path = f'./checkpoints/h36m_3d_25frames_ckpt_epoch_{35}.pt'
test(ckpt_path)
...model loaded walking : 62.8 eating : 60.9 smoking : 62.1 discussion : 87.9 directions : 80.2 greeting : 102.0 phoning : 76.6 posing : 117.0 purchases : 101.7 sitting : 90.2 sittingdown : 113.7 takingphoto : 89.7 waiting : 83.1 walkingdog : 111.3 walkingtogether : 59.0 Average: 86.5 Prediction time: 0.008471096058686574
The qualitative results are as important as the quantitative ones. In this section, you will visualize and compare the predicted poses with the ground truth ones. For simplicity, you will visualize only the first predicted pose and the ground truth.
Ideally, the same plot should show the predicted pose in red and the ground truth one in green.
Note: you will find which nodes are connected in the file ./models/skeleton_connection.py
# Insert your code below
visualize(input_n, output_n,
visualize_from, path, model,
device, n_viz, skip_rate,
actions_to_consider_viz, directory = "images_dir")
# The visualization was performed by the 'visualize' function from the provided 'h36_3d_viz.py' file.
# Although the module was imported, it was not utilized...
# Some issues due to defining the three axes with Matplotlib.
# Once this was addressed, only adjustment is changing the colors according to requirements.
# Results are saved as gif images in 'images_dir' directory
0%| | 0/15 [00:00<?, ?it/s]
... doing action = walking ... doing action = eating ... doing action = smoking ... doing action = discussion ... doing action = directions ... doing action = greeting ... doing action = phoning ... doing action = posing ... doing action = purchases ... doing action = sitting ... doing action = sittingdown ... doing action = takingphoto ... doing action = waiting ... doing action = walkingdog ... doing action = walkingtogether
Objective: In this exercise, you will analyze the results obtained from a deep learning model you previously trained and perform parameter fine-tuning to optimize its performance. The key considerations are learning rate, milestones, and weight decay. You will also use tables and plots to visualize and interpret the outcomes.
Instructions:
Analysis: Analyze the generated report and answer the following questions:
Parameter Fine-Tuning: Based on your analysis, perform parameter fine-tuning to optimize model performance. Adjust the following parameters:
Re-Training: Train the model with the adjusted hyperparameters. Record the training progress and generate a new report, including performance metrics and line plots as before.
Final Analysis: Analyze the results of the fine-tuned model and compare them with the initial training. Answer the following questions:
Report and final analysis submitted separately with a pdf file named AML_HW3_report.pdf
Search carried out with the help of Weights & Biases
model= Model(num_joints=joints_to_consider,
num_frames=input_n, num_frames_out=output_n, num_heads=n_heads,
num_channels=3, kernel_size=[3,3], use_pes=True).to(device)
print('total number of parameters of the network is: '+str(sum(p.numel() for p in model.parameters() if p.requires_grad)))
total number of parameters of the network is: 26859
# Arguments to setup the optimizer
lr = 3e-01 # learning rate
milestones = [10, 30] # the epochs after which the learning rate is adjusted by gamma
weight_decay = 1e-08 # weight decay (L2 penalty)
optimizer = optim.Adam(model.parameters(), lr = lr, weight_decay = weight_decay)
use_scheduler = True # use MultiStepLR scheduler
if use_scheduler:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=gamma)
clip_grad= None # select max norm to clip gradients
# Argument for training
n_epochs = 41
log_step = 200
# Save the model and plot the loss
# Change to True if you want to save the model and plot the loss
save_and_plot = True
# launch training
# train(data_loader, vald_loader, path_to_save_model='./checkpoints_fine_tuned/')
# Change the epoch according to the validation curve :
# ckpt_path = f'./checkpoints_fine_tuned/h36m_3d_25frames_ckpt_epoch_{35}.pt'
# test(ckpt_path)
In this exercise, you will calculate the Mean Per Joint Position Error (MPJPE) for a specific frame. This skill is valuable for assessing the accuracy of your model's predictions at a particular moment.
\begin{align*} \text{MPJPE}_t = \frac{1}{N} \sum_{j=1}^{J} \left\| P_{\text{predicted}_{t,j}} - P_{\text{gt}_{t,j}} \right\| \end{align*}Fixed the frame $t$, you will calculate the MPJPE for the predicted pose and the ground truth. Steps:
def mpjpe_per_frame(sequences_predict, sequences_gt, frames_to_consider):
# Insert your code below
mpjpe_diz = {}
#print(f"predicted.shape: {sequences_predict.shape}")
#print(f"ground_true.shape : {sequences_gt.shape}")
for frame_index in frames_to_consider:
predicted_frame = sequences_predict[:, frame_index-1, :, :]
gt_frame = sequences_gt[:, frame_index-1, :, :]
joint_distances = torch.norm(predicted_frame - gt_frame, dim=1, p=1)
mpjpe_frame = torch.mean(joint_distances)
mpjpe_diz[frame_index] = mpjpe_frame.cpu()
return mpjpe_diz
def test_per_frame(ckpt_path=None):
model.load_state_dict(torch.load(ckpt_path))
print('\n ...model loaded \n')
model.eval()
accum_loss=0
n_batches=0 # number of batches for all the sequences
actions=define_actions(actions_to_consider_test)
dim_used = np.array([6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 51, 52, 53, 54, 55, 56, 57, 58, 59, 63, 64, 65, 66, 67, 68,
75, 76, 77, 78, 79, 80, 81, 82, 83, 87, 88, 89, 90, 91, 92])
# joints at same loc
joint_to_ignore = np.array([16, 20, 23, 24, 28, 31])
index_to_ignore = np.concatenate((joint_to_ignore * 3, joint_to_ignore * 3 + 1, joint_to_ignore * 3 + 2))
joint_equal = np.array([13, 19, 22, 13, 27, 30])
index_to_equal = np.concatenate((joint_equal * 3, joint_equal * 3 + 1, joint_equal * 3 + 2))
totalll = 0
counter = 0
frames_to_consider = [5, 10, 15, 25]
for action in actions:
running_loss = 0
n = 0
dataset_test = datasets.Datasets(path,input_n,output_n,skip_rate, split=2,actions=[action])
#print('>>> test action for sequences: {:d}'.format(dataset_test.__len__()))
test_loader = DataLoader(dataset_test, batch_size=batch_size_test, shuffle=False, num_workers=0, pin_memory=True)
for cnt,batch in enumerate(test_loader):
with torch.no_grad():
batch = batch.to(device)
batch_dim = batch.shape[0]
n += batch_dim
all_joints_seq = batch.clone()[:, input_n:input_n+output_n,:]
sequences_train=batch[:, 0:input_n, dim_used].view(-1,input_n,len(dim_used)//3,3).permute(0,3,1,2)
sequences_gt=batch[:, input_n:input_n+output_n, :]
running_time = time.time()
#sequences_predict = model(sequences_train)
sequences_predict = model(sequences_train).view(-1, output_n, joints_to_consider, 3)
totalll += time.time()-running_time
counter += 1
sequences_predict=sequences_predict.contiguous().view(-1,output_n,len(dim_used))
all_joints_seq[:,:,dim_used] = sequences_predict
all_joints_seq[:,:,index_to_ignore] = all_joints_seq[:,:,index_to_equal]
# Insert your code below.
# The function mpjpe_per_frame should return the loss for each frame in the sequence.
# (e.g. a dictionary with keys the frames and values the loss for each frame)
# Keep a tab of the running loss for each frame and the number of frames in the sequence.
frames_to_consider = [5, 10, 15, 25]
dict_loss = mpjpe_per_frame(all_joints_seq.view(-1, output_n, 32, 3),
sequences_gt.view(-1, output_n, 32, 3),
frames_to_consider)
loss = sum(dict_loss.values()) / n
running_loss += loss
# Insert your code below.
# Average the loss over all the frames in the sequence and print the results.
accum_loss +=running_loss
print(str(action),': ', str(np.round((running_loss/n).item(),1)))
n_batches+=n
print('\nAverage: '+str(np.round((accum_loss/n_batches).item(),1)))
print('Prediction time: ', np.round(totalll/counter, 5))
# Insert your code below where you want to load the model and test it.
# You need to specify the path to the model checkpoint file and call the test function.
ckpt_path = f'./checkpoints/h36m_3d_25frames_ckpt_epoch_{35}.pt'
test_per_frame(ckpt_path)
...model loaded walking : 8.2 eating : 8.1 smoking : 7.7 discussion : 12.0 directions : 9.5 greeting : 11.3 phoning : 10.4 posing : 15.9 purchases : 14.1 sitting : 11.6 sittingdown : 13.3 takingphoto : 10.7 waiting : 11.1 walkingdog : 15.5 walkingtogether : 7.6 Average: 11.1 Prediction time: 0.00881
In this exercise, you will explore the concept of an iterative mechanism and its adaptability when the model's output length changes. You will start with a model designed to produce 25 output frames but adapt it to generate only 10. The exercise will involve modifying and re-training the model for the new output length. During test time, the model will generate 10 frames and then use them as input to generate the successive 10 frames, and so on, until the desired number of frames is reached. In this case, you are asked to generate 25 frames.
The steps are as follows:
# # Arguments to setup the datasets
datas = 'h36m' # dataset name
path = './data/h3.6m/h3.6m/dataset'
input_n = 10 # number of frames to train on (default=10)
# Insert your code below
output_n = 10 # number of frames to predict on
input_dim = 3 # dimensions of the input coordinates(default=3)
skip_rate = 1 # skip rate of frames
joints_to_consider = 22
#FLAGS FOR THE TRAINING
mode = 'train' #choose either train or test mode
batch_size_test = 8
model_path_iterative = './checkpoints_iterative/' # path to the model checkpoint file
actions_to_consider_test = 'all' # actions to test on.
model_name = datas+'_3d_'+str(output_n)+'frames_ckpt' #the model name to save/load
#FLAGS FOR THE VISUALIZATION
actions_to_consider_viz = 'all' # actions to visualize
visualize_from = 'test'
n_viz = 2
# Load Data
print('Loading Train Dataset...')
dataset = datasets.Datasets(path,input_n,output_n,skip_rate, split=0)
print('Loading Validation Dataset...')
vald_dataset = datasets.Datasets(path,input_n,output_n,skip_rate, split=1)
#! Note: Ignore warning: "VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences"
Loading Train Dataset... Loading Validation Dataset...
batch_size=256
print('>>> Training dataset length: {:d}'.format(dataset.__len__()))
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)#
print('>>> Validation dataset length: {:d}'.format(vald_dataset.__len__()))
vald_loader = DataLoader(vald_dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)
>>> Training dataset length: 182327 >>> Validation dataset length: 28560
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device: %s'%device)
n_heads = 1
model = Model(num_joints=joints_to_consider,
num_frames=input_n, num_frames_out=output_n, num_heads=n_heads,
num_channels=3, kernel_size=[3,3], use_pes=True).to(device)
print('total number of parameters of the network is: '+str(sum(p.numel() for p in model.parameters() if p.requires_grad)))
Using device: cuda total number of parameters of the network is: 26694
# Arguments to setup the optimizer
lr=1e-01 # learning rate
use_scheduler=True # use MultiStepLR scheduler
milestones=[10,30] # the epochs after which the learning rate is adjusted by gamma
gamma=0.1 # gamma correction to the learning rate, after reaching the milestone epochs
weight_decay=1e-05 # weight decay (L2 penalty)
optimizer=optim.Adam(model.parameters(),lr=lr,weight_decay=weight_decay)
if use_scheduler:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=gamma)
clip_grad=None # select max norm to clip gradients
# Argument for training
n_epochs = 41
log_step = 200
# Save the model and plot the loss.
# Change to True if you want to save the model and plot the loss
save_and_plot = True
# launch training with the new output_n
train(data_loader, vald_loader, path_to_save_model=model_path_iterative)
[Epoch: 1, Iteration: 1] validation loss: 78.042 [Epoch: 2, Iteration: 1] validation loss: 79.070 [Epoch: 3, Iteration: 1] validation loss: 68.852 [Epoch: 4, Iteration: 1] validation loss: 55.792 [Epoch: 5, Iteration: 1] validation loss: 52.656
[Epoch: 6, Iteration: 1] validation loss: 52.142 [Epoch: 7, Iteration: 1] validation loss: 58.568 [Epoch: 8, Iteration: 1] validation loss: 53.875 [Epoch: 9, Iteration: 1] validation loss: 58.187 [Epoch: 10, Iteration: 1] validation loss: 52.146
[Epoch: 11, Iteration: 1] validation loss: 46.532 [Epoch: 12, Iteration: 1] validation loss: 47.965 [Epoch: 13, Iteration: 1] validation loss: 47.104 [Epoch: 14, Iteration: 1] validation loss: 47.788 [Epoch: 15, Iteration: 1] validation loss: 46.178
[Epoch: 16, Iteration: 1] validation loss: 46.009 [Epoch: 17, Iteration: 1] validation loss: 49.399 [Epoch: 18, Iteration: 1] validation loss: 47.230 [Epoch: 19, Iteration: 1] validation loss: 45.784 [Epoch: 20, Iteration: 1] validation loss: 48.085
[Epoch: 21, Iteration: 1] validation loss: 44.867 [Epoch: 22, Iteration: 1] validation loss: 44.780 [Epoch: 23, Iteration: 1] validation loss: 47.583 [Epoch: 24, Iteration: 1] validation loss: 46.648 [Epoch: 25, Iteration: 1] validation loss: 44.380
[Epoch: 26, Iteration: 1] validation loss: 45.829 [Epoch: 27, Iteration: 1] validation loss: 47.521 [Epoch: 28, Iteration: 1] validation loss: 45.196 [Epoch: 29, Iteration: 1] validation loss: 47.350 [Epoch: 30, Iteration: 1] validation loss: 47.172
[Epoch: 31, Iteration: 1] validation loss: 45.537 [Epoch: 32, Iteration: 1] validation loss: 44.923 [Epoch: 33, Iteration: 1] validation loss: 45.862 [Epoch: 34, Iteration: 1] validation loss: 45.944 [Epoch: 35, Iteration: 1] validation loss: 44.304
[Epoch: 36, Iteration: 1] validation loss: 45.018 [Epoch: 37, Iteration: 1] validation loss: 46.367 [Epoch: 38, Iteration: 1] validation loss: 47.310 [Epoch: 39, Iteration: 1] validation loss: 45.780 [Epoch: 40, Iteration: 1] validation loss: 46.420
def test(ckpt_path=None):
model.load_state_dict(torch.load(ckpt_path))
print('...model loaded \n')
model.eval()
accum_loss=0
n_batches=0 # number of batches for all the sequences
actions=define_actions(actions_to_consider_test)
dim_used = np.array([6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 51, 52, 53, 54, 55, 56, 57, 58, 59, 63, 64, 65, 66, 67, 68,
75, 76, 77, 78, 79, 80, 81, 82, 83, 87, 88, 89, 90, 91, 92])
# joints at same loc
joint_to_ignore = np.array([16, 20, 23, 24, 28, 31])
index_to_ignore = np.concatenate((joint_to_ignore * 3, joint_to_ignore * 3 + 1, joint_to_ignore * 3 + 2))
joint_equal = np.array([13, 19, 22, 13, 27, 30])
index_to_equal = np.concatenate((joint_equal * 3, joint_equal * 3 + 1, joint_equal * 3 + 2))
totalll=0
counter=0
for action in actions:
running_loss=0
n=0
dataset_test = datasets.Datasets(path,input_n,35, skip_rate, split=2,actions=[action])
test_loader = DataLoader(dataset_test, batch_size=batch_size_test, shuffle=False, num_workers=0, pin_memory=True)
for cnt,batch in enumerate(test_loader):
with torch.no_grad():
batch=batch.to(device)
batch_dim=batch.shape[0]
n+=batch_dim
all_joints_seq=batch.clone()[:, input_n:input_n+output_n+15,:]
running_time = time.time()
sequences_train=batch[:, 0:input_n, dim_used].view(-1,input_n,len(dim_used)//3,3).permute(0,3,1,2)
sequences_gt=batch[:, input_n:input_n+output_n+15, :]
sequences_predict = model(sequences_train).view(-1, output_n, joints_to_consider, 3)
#sequences_predict = model(sequences_train)
# Insert your code below. You will need to iteratively predict the next frames
# and feed it to back to the model until you reach the desired number of frames.
input_frames = sequences_predict[:, -10:, :, :].permute(0,3,1,2)
new_frames = model(input_frames).view(-1, output_n, joints_to_consider, 3)
sequences_predict = torch.cat((sequences_predict, new_frames), dim=1)
input_frames = sequences_predict[:, -10:, :, :].permute(0,3,1,2)
new_frames = model(input_frames).view(-1, output_n, joints_to_consider, 3)
sequences_predict = torch.cat((sequences_predict, new_frames), dim=1)
sequences_predict = sequences_predict[:, :-5, :, :]
sequences_predict=sequences_predict.contiguous().view(-1,25,len(dim_used))
all_joints_seq[:,:,dim_used] = sequences_predict
all_joints_seq[:,:,index_to_ignore] = all_joints_seq[:,:,index_to_equal]
loss= mpjpe_error(all_joints_seq.view(-1,25,32,3),sequences_gt.view(-1,25,32,3))
totalll += time.time()-running_time
counter += 1
running_loss+=loss*batch_dim
accum_loss+=loss*batch_dim
print(str(action),': ', str(np.round((running_loss/n).item(),1)))
n_batches+=n
print('\nAverage: '+str(np.round((accum_loss/n_batches).item(),1)))
print('Prediction time: ', totalll/counter)
# Insert your code below where you want to load the model and test it.
# You need to specify the path to the model checkpoint file and call the test function.
ckpt_path = model_path_iterative + model_name + f"_epoch_{35}.pt"
test(ckpt_path = ckpt_path )
...model loaded walking : 76.8 eating : 63.6 smoking : 62.2 discussion : 88.1 directions : 76.7 greeting : 103.5 phoning : 77.0 posing : 113.6 purchases : 99.6 sitting : 85.5 sittingdown : 107.0 takingphoto : 82.4 waiting : 81.2 walkingdog : 110.0 walkingtogether : 68.2 Average: 86.4 Prediction time: 0.026681232949097952
In this exercise, you will implement a Transformer-like network (based on the Theory notebook) for this specific task. You can use the Transformer's Encoder and implement your own Decoder to predict future poses. (e.g. RNN, MLP, CNN, TCN, ...). We won't provide any code for this exercise, but you can use the code provided in the Theory notebook as a starting point. The goal of this exercise is not to beat the previous model but to understand how to implement a Transformer network for this specific task. For this reason, the evaluation will be based on the code you write and the explanation you provide in the report rather than the results.
# Input Shape: [batch_size, input_time, joints, 3]
#
# Encoder:
# Input shape: [batch_size, input_time, joints, 3]
# Output shape: [batch_size, input_time/output_time, joints, FREE]
#
# # Decoder:
# Input shape: [batch_size, input_time/output_time, joints, FREE]
# Output shape: [batch_size, output_time, joints, 3]
#
#
# Hint: Transformers often take an input of shape [batch_size, time, joints*channels],
# use the reshape or view function to match the dimensionality.
from transformer import Transformer, subsequent_mask, transformer_inputs, train
A Transformer Network based on the Theory_Notebook, where no changes have been made to either the Encoder or the Decoder.
input_n = 10 # number of frames to train on (default=10)
output_n = 25 # number of frames to predict on
skip_rate = 1 # skip rate of frames
batch_size=256
path = './data/h3.6m/h3.6m/dataset'
c_model_path = './checkpoints_transformer/' # path to the model checkpoint file
print('Loading Train Dataset...')
dataset = datasets.Datasets(path,input_n,output_n,skip_rate, split=0)
print('Loading Validation Dataset...\n')
vald_dataset = datasets.Datasets(path,input_n,output_n,skip_rate, split=1)
print('>>> Training dataset length: {:d}'.format(dataset.__len__()))
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)#
print('>>> Validation dataset length: {:d}'.format(vald_dataset.__len__()))
vald_loader = DataLoader(vald_dataset, batch_size=batch_size, shuffle=True, num_workers=0, pin_memory=True)
Loading Train Dataset... Loading Validation Dataset... >>> Training dataset length: 180077 >>> Validation dataset length: 28110
torch.manual_seed(0)
num_heads = 4
d_model = 512
dim_feedforward = 1024
dropout = 0.4
coder_blocks = 3
tf = Transformer(enc_inp_size = 33, dec_inp_size=34, dec_out_size=66, N=coder_blocks,
d_model=d_model, dim_feedforward=dim_feedforward,
num_heads=num_heads, dropout=dropout).to(device)
# Arguments to setup the optimizer
lr = 3e-05 # learning rate
use_scheduler = True # use MultiStepLR scheduler
milestones = [10, 30] # the epochs after which the learning rate is adjusted by gamma
gamma = 0.1 # gamma correction to the learning rate, after reaching the milestone epochs
weight_decay = 1e-02 # weight decay (L2 penalty)
optimizer = optim.Adam(tf.parameters(), lr = lr, weight_decay = weight_decay)
if use_scheduler:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=milestones, gamma=gamma)
epoches = 15
train(data_loader,vald_loader, epoches, tf, scheduler, optimizer, device, c_model_path)
[Epoch: 1, Iteration: 1] training loss: 592.388 [Epoch: 1, Iteration: 201] training loss: 412.256 [Epoch: 1, Iteration: 401] training loss: 345.014 [Epoch: 1, Iteration: 601] training loss: 314.546 [Epoch: 1, Iteration: 1] validation loss: 269.299 [Epoch: 2, Iteration: 1] training loss: 300.865 [Epoch: 2, Iteration: 201] training loss: 280.634 [Epoch: 2, Iteration: 401] training loss: 271.422 [Epoch: 2, Iteration: 601] training loss: 260.721 [Epoch: 2, Iteration: 1] validation loss: 234.159 [Epoch: 3, Iteration: 1] training loss: 250.014 [Epoch: 3, Iteration: 201] training loss: 248.412 [Epoch: 3, Iteration: 401] training loss: 239.881 [Epoch: 3, Iteration: 601] training loss: 235.701 [Epoch: 3, Iteration: 1] validation loss: 216.836 [Epoch: 4, Iteration: 1] training loss: 231.950 [Epoch: 4, Iteration: 201] training loss: 222.771 [Epoch: 4, Iteration: 401] training loss: 221.679 [Epoch: 4, Iteration: 601] training loss: 220.523 [Epoch: 4, Iteration: 1] validation loss: 211.344 [Epoch: 5, Iteration: 1] training loss: 215.781 [Epoch: 5, Iteration: 201] training loss: 213.013 [Epoch: 5, Iteration: 401] training loss: 209.622 [Epoch: 5, Iteration: 601] training loss: 210.208 [Epoch: 5, Iteration: 1] validation loss: 210.057
[Epoch: 6, Iteration: 1] training loss: 207.745 [Epoch: 6, Iteration: 201] training loss: 209.919 [Epoch: 6, Iteration: 401] training loss: 208.190 [Epoch: 6, Iteration: 601] training loss: 208.202 [Epoch: 6, Iteration: 1] validation loss: 210.265 [Epoch: 7, Iteration: 1] training loss: 214.177 [Epoch: 7, Iteration: 201] training loss: 207.403 [Epoch: 7, Iteration: 401] training loss: 217.282 [Epoch: 7, Iteration: 601] training loss: 214.453 [Epoch: 7, Iteration: 1] validation loss: 214.335 [Epoch: 8, Iteration: 1] training loss: 221.027 [Epoch: 8, Iteration: 201] training loss: 217.222 [Epoch: 8, Iteration: 401] training loss: 223.953 [Epoch: 8, Iteration: 601] training loss: 222.582 [Epoch: 8, Iteration: 1] validation loss: 226.435 [Epoch: 9, Iteration: 1] training loss: 220.888 [Epoch: 9, Iteration: 201] training loss: 218.973 [Epoch: 9, Iteration: 401] training loss: 226.022 [Epoch: 9, Iteration: 601] training loss: 227.549 [Epoch: 9, Iteration: 1] validation loss: 230.278 [Epoch: 10, Iteration: 1] training loss: 226.940 [Epoch: 10, Iteration: 201] training loss: 228.189 [Epoch: 10, Iteration: 401] training loss: 228.844 [Epoch: 10, Iteration: 601] training loss: 231.440 [Epoch: 10, Iteration: 1] validation loss: 227.601
[Epoch: 11, Iteration: 1] training loss: 229.561 [Epoch: 11, Iteration: 201] training loss: 236.309 [Epoch: 11, Iteration: 401] training loss: 231.966 [Epoch: 11, Iteration: 601] training loss: 234.909 [Epoch: 11, Iteration: 1] validation loss: 223.356 [Epoch: 12, Iteration: 1] training loss: 230.088 [Epoch: 12, Iteration: 201] training loss: 229.070 [Epoch: 12, Iteration: 401] training loss: 234.251 [Epoch: 12, Iteration: 601] training loss: 226.336 [Epoch: 12, Iteration: 1] validation loss: 224.393 [Epoch: 13, Iteration: 1] training loss: 227.723 [Epoch: 13, Iteration: 201] training loss: 229.079 [Epoch: 13, Iteration: 401] training loss: 229.245 [Epoch: 13, Iteration: 601] training loss: 228.326 [Epoch: 13, Iteration: 1] validation loss: 224.372 [Epoch: 14, Iteration: 1] training loss: 231.456 [Epoch: 14, Iteration: 201] training loss: 225.505 [Epoch: 14, Iteration: 401] training loss: 226.294 [Epoch: 14, Iteration: 601] training loss: 230.280 [Epoch: 14, Iteration: 1] validation loss: 224.972
!zip -r -q /content/images_dir.zip /content/AdavancedML/Assignment_3/Practice/images_dir