Notebook

Homework 2-1 Phoneme Classification¶

Slides: https://speech.ee.ntu.edu.tw/~hylee/ml/ml2021-course-data/hw/HW02/HW02.pdf
Video (Chinese): https://youtu.be/PdjXnQbu2zo
Video (English): https://youtu.be/ESRr-VCykBs

The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT)¶

The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.

This homework is a multiclass classification task, we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.

link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

Download Data¶

Download data from google drive, then unzip it.

You should have timit_11/train_11.npy, timit_11/train_label_11.npy, and timit_11/test_11.npy after running this block.

timit_11/

train_11.npy: training data
train_label_11.npy: training label
test_11.npy: testing data

notes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace

In [1]:

!gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip
!unzip data.zip
!ls

/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID.
  category=FutureWarning,
Downloading...
From: https://drive.google.com/uc?id=1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR
To: /content/data.zip
100% 372M/372M [00:03<00:00, 95.5MB/s]
Archive:  data.zip
   creating: timit_11/
  inflating: timit_11/train_11.npy   
  inflating: timit_11/test_11.npy    
  inflating: timit_11/train_label_11.npy  
data.zip  sample_data  timit_11

Preparing Data¶

Load the training and testing data from the .npy file (NumPy array).

In [2]:

import numpy as np

print('Loading data ...')

data_root='./timit_11/'

# read data from .npy file using np.load
train = np.load(data_root + 'train_11.npy')
train_label = np.load(data_root + 'train_label_11.npy')
test = np.load(data_root + 'test_11.npy')

print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))

Loading data ...
Size of training data: (1229932, 429)
Size of testing data: (451552, 429)

Create Dataset¶

In [3]:

import torch
from torch.utils.data import Dataset

class TIMITDataset(Dataset):
    def __init__(self, X, y=None):
        self.data = torch.from_numpy(X).float()
        if y is not None:
            y = y.astype(np.int)
            self.label = torch.LongTensor(y)
        else:
            self.label = None

    def __getitem__(self, idx):
        if self.label is not None:
            return self.data[idx], self.label[idx]
        else:
            return self.data[idx]

    def __len__(self):
        return len(self.data)

Split the labeled data into a training set and a validation set, you can modify the variable VAL_RATIO to change the ratio of validation data.

In [4]:

VAL_RATIO = 0.2

percent = int(train.shape[0] * (1 - VAL_RATIO))     # pivot of train data and dev data
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))

Size of training set: (983945, 429)
Size of validation set: (245987, 429)

Create a data loader from the dataset, feel free to tweak the variable BATCH_SIZE here.

In [5]:

BATCH_SIZE = 64

from torch.utils.data import DataLoader

train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) # only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:8: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Cleanup the unneeded variables to save memory.

notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later
the data size is quite huge, so be aware of memory usage in colab

In [6]:

del train, train_label, train_x, train_y, val_x, val_y

# Garbage Collector
import gc
# gc.collect(generation=2): For variable of generation 0~2 of generational algorithm, delete memory for unreachable variable in them using mark-sweep algotirhm(travel algotirhm).
gc.collect()        # del变量后立即gc.collect，可以确保它们引用的内存被立即清除，以释放RAM

Out[6]:

Create Model¶

Define model architecture, you are encouraged to change and experiment with the model architecture.

In [7]:

import torch
import torch.nn as nn

class Classifier(nn.Module):
    def __init__(self):
        super(Classifier, self).__init__()
        self.layer1 = nn.Linear(429, 1024)
        self.layer2 = nn.Linear(1024, 512)
        self.layer3 = nn.Linear(512, 128)
        self.out = nn.Linear(128, 39) 

        self.act_fn = nn.Sigmoid()

    def forward(self, x):
        x = self.layer1(x)
        x = self.act_fn(x)

        x = self.layer2(x)
        x = self.act_fn(x)

        x = self.layer3(x)
        x = self.act_fn(x)

        x = self.out(x)
        
        return x

Training¶

In [8]:

# check device
def get_device():
  return 'cuda' if torch.cuda.is_available() else 'cpu'

Fix random seeds for reproducibility.

In [9]:

# fix random seed
def same_seeds(seed):
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

Feel free to change the training parameters here.

In [10]:

# fix random seed for reproducibility 可重复性
same_seeds(0)

# get device 
device = get_device()
print(f'DEVICE: {device}')

# training parameters
num_epoch = 20               # number of training epoch
learning_rate = 0.0001       # learning rate

# the path where checkpoint saved
model_path = './model.ckpt'

# create model, define a loss function, and optimizer
model = Classifier().to(device)
criterion = nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

DEVICE: cuda

In [11]:

# start training

best_acc = 0.0
for epoch in range(num_epoch):
    train_acc = 0.0
    train_loss = 0.0
    val_acc = 0.0
    val_loss = 0.0

    # training
    model.train()   # set the model to training mode
    for i, data in enumerate(train_loader):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad() 
        outputs = model(inputs) 
        batch_loss = criterion(outputs, labels)
        # torch.max(input, dim): Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim.
        _, train_pred = torch.max(outputs, 1)   # get the index of the class with the highest probability
        batch_loss.backward() 
        optimizer.step() 

        # Tensor.cpu(): Returns a copy of this object in CPU memory.
        # Tensor.==: 对应位置相同为1，不同为0！！，等同于Tensor.eq_()方法。
        # Tensor.sum(): Returns the sum of all elements in the input tensor.
        # Tensor.item(): Returns the value of this tensor as a standard Python number.
        train_acc += (train_pred.cpu() == labels.cpu()).sum().item()        # 2 tensor -> 1 tensor -> 1 tensor of 1 element -> 1 number
        train_loss += batch_loss.item()

    # validation
    if len(val_set) > 0:
        model.eval()    # set the model to evaluation mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                batch_loss = criterion(outputs, labels) 
                _, val_pred = torch.max(outputs, 1) 
            
                val_acc += (val_pred.cpu() == labels.cpu()).sum().item()    # get the index of the class with the highest probability
                val_loss += batch_loss.item()

            print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
                epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
            ))

            # if the model improves, save a checkpoint at this epoch
            if val_acc > best_acc:
                best_acc = val_acc
                torch.save(model.state_dict(), model_path)
                print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))
    else:
        print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
            epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)
        ))

# if not validating, save the last epoch
if len(val_set) == 0:
    torch.save(model.state_dict(), model_path)
    print('saving model at last epoch')

[001/020] Train Acc: 0.467302 Loss: 1.811661 | Val Acc: 0.567428 loss: 1.433065
saving model with acc 0.567
[002/020] Train Acc: 0.594383 Loss: 1.330666 | Val Acc: 0.628639 loss: 1.211098
saving model with acc 0.629
[003/020] Train Acc: 0.644506 Loss: 1.154064 | Val Acc: 0.660421 loss: 1.101216
saving model with acc 0.660
[004/020] Train Acc: 0.672216 Loss: 1.052246 | Val Acc: 0.676300 loss: 1.038718
saving model with acc 0.676
[005/020] Train Acc: 0.691347 Loss: 0.983104 | Val Acc: 0.685154 loss: 1.001852
saving model with acc 0.685
[006/020] Train Acc: 0.705615 Loss: 0.931955 | Val Acc: 0.689301 loss: 0.984177
saving model with acc 0.689
[007/020] Train Acc: 0.716344 Loss: 0.891687 | Val Acc: 0.694516 loss: 0.964627
saving model with acc 0.695
[008/020] Train Acc: 0.725881 Loss: 0.857907 | Val Acc: 0.697720 loss: 0.951889
saving model with acc 0.698
[009/020] Train Acc: 0.733717 Loss: 0.829495 | Val Acc: 0.696691 loss: 0.949866
[010/020] Train Acc: 0.741151 Loss: 0.803701 | Val Acc: 0.699374 loss: 0.944832
saving model with acc 0.699
[011/020] Train Acc: 0.748049 Loss: 0.781106 | Val Acc: 0.697773 loss: 0.946494
[012/020] Train Acc: 0.753793 Loss: 0.760380 | Val Acc: 0.702830 loss: 0.938236
saving model with acc 0.703
[013/020] Train Acc: 0.759404 Loss: 0.741234 | Val Acc: 0.700452 loss: 0.945627
[014/020] Train Acc: 0.764573 Loss: 0.723574 | Val Acc: 0.702159 loss: 0.942118
[015/020] Train Acc: 0.769470 Loss: 0.707325 | Val Acc: 0.704432 loss: 0.936154
saving model with acc 0.704
[016/020] Train Acc: 0.773687 Loss: 0.691314 | Val Acc: 0.701736 loss: 0.945713
[017/020] Train Acc: 0.778676 Loss: 0.676633 | Val Acc: 0.701586 loss: 0.953081
[018/020] Train Acc: 0.783106 Loss: 0.662425 | Val Acc: 0.699667 loss: 0.963290
[019/020] Train Acc: 0.786395 Loss: 0.649180 | Val Acc: 0.700082 loss: 0.957681
[020/020] Train Acc: 0.790643 Loss: 0.636623 | Val Acc: 0.699732 loss: 0.964269

Testing¶

Create a testing dataset, and load model from the saved checkpoint.

In [12]:

# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)

# create model and load weights from checkpoint
model = Classifier().to(device)
model.load_state_dict(torch.load(model_path))

Out[12]:

<All keys matched successfully>

Make prediction.

In [13]:

predict = []
model.eval()    # set the model to evaluation mode
with torch.no_grad():
    for i, data in enumerate(test_loader):
        inputs = data
        inputs = inputs.to(device)
        outputs = model(inputs)
        _, test_pred = torch.max(outputs, 1)    # get the index of the class with the highest probability

        for y in test_pred.cpu().numpy():
            predict.append(y)

Write prediction to a CSV file.

After finish running this block, download the file prediction.csv from the files section on the left-hand side and submit it to Kaggle.

In [14]:

with open('prediction.csv', 'w') as f:
    f.write('Id,Class\n')
    for i, y in enumerate(predict):
        f.write('{},{}\n'.format(i, y))