The TIMIT corpus of reading speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems.
This homework is a multiclass classification task, we are going to train a deep neural network classifier to predict the phonemes for each frame from the speech corpus TIMIT.
link: https://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3
Download data from google drive, then unzip it.
You should have timit_11/train_11.npy
, timit_11/train_label_11.npy
, and timit_11/test_11.npy
after running this block.
timit_11/
train_11.npy
: training datatrain_label_11.npy
: training labeltest_11.npy
: testing datanotes: if the google drive link is dead, you can download the data directly from Kaggle and upload it to the workspace
!gdown --id '1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR' --output data.zip
!unzip data.zip
!ls
/usr/local/lib/python3.7/dist-packages/gdown/cli.py:131: FutureWarning: Option `--id` was deprecated in version 4.3.1 and will be removed in 5.0. You don't need to pass it anymore to use a file ID. category=FutureWarning, Downloading... From: https://drive.google.com/uc?id=1HPkcmQmFGu-3OknddKIa5dNDsR05lIQR To: /content/data.zip 100% 372M/372M [00:03<00:00, 95.5MB/s] Archive: data.zip creating: timit_11/ inflating: timit_11/train_11.npy inflating: timit_11/test_11.npy inflating: timit_11/train_label_11.npy data.zip sample_data timit_11
Load the training and testing data from the .npy
file (NumPy array).
import numpy as np
print('Loading data ...')
data_root='./timit_11/'
# read data from .npy file using np.load
train = np.load(data_root + 'train_11.npy')
train_label = np.load(data_root + 'train_label_11.npy')
test = np.load(data_root + 'test_11.npy')
print('Size of training data: {}'.format(train.shape))
print('Size of testing data: {}'.format(test.shape))
Loading data ... Size of training data: (1229932, 429) Size of testing data: (451552, 429)
import torch
from torch.utils.data import Dataset
class TIMITDataset(Dataset):
def __init__(self, X, y=None):
self.data = torch.from_numpy(X).float()
if y is not None:
y = y.astype(np.int)
self.label = torch.LongTensor(y)
else:
self.label = None
def __getitem__(self, idx):
if self.label is not None:
return self.data[idx], self.label[idx]
else:
return self.data[idx]
def __len__(self):
return len(self.data)
Split the labeled data into a training set and a validation set, you can modify the variable VAL_RATIO
to change the ratio of validation data.
VAL_RATIO = 0.2
percent = int(train.shape[0] * (1 - VAL_RATIO)) # pivot of train data and dev data
train_x, train_y, val_x, val_y = train[:percent], train_label[:percent], train[percent:], train_label[percent:]
print('Size of training set: {}'.format(train_x.shape))
print('Size of validation set: {}'.format(val_x.shape))
Size of training set: (983945, 429) Size of validation set: (245987, 429)
Create a data loader from the dataset, feel free to tweak the variable BATCH_SIZE
here.
BATCH_SIZE = 64
from torch.utils.data import DataLoader
train_set = TIMITDataset(train_x, train_y)
val_set = TIMITDataset(val_x, val_y)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True) # only shuffle the training data
val_loader = DataLoader(val_set, batch_size=BATCH_SIZE, shuffle=False)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:8: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Cleanup the unneeded variables to save memory.
notes: if you need to use these variables later, then you may remove this block or clean up unneeded variables later
the data size is quite huge, so be aware of memory usage in colab
del train, train_label, train_x, train_y, val_x, val_y
# Garbage Collector
import gc
# gc.collect(generation=2): For variable of generation 0~2 of generational algorithm, delete memory for unreachable variable in them using mark-sweep algotirhm(travel algotirhm).
gc.collect() # del变量后立即gc.collect,可以确保它们引用的内存被立即清除,以释放RAM
153
Define model architecture, you are encouraged to change and experiment with the model architecture.
import torch
import torch.nn as nn
class Classifier(nn.Module):
def __init__(self):
super(Classifier, self).__init__()
self.layer1 = nn.Linear(429, 1024)
self.layer2 = nn.Linear(1024, 512)
self.layer3 = nn.Linear(512, 128)
self.out = nn.Linear(128, 39)
self.act_fn = nn.Sigmoid()
def forward(self, x):
x = self.layer1(x)
x = self.act_fn(x)
x = self.layer2(x)
x = self.act_fn(x)
x = self.layer3(x)
x = self.act_fn(x)
x = self.out(x)
return x
# check device
def get_device():
return 'cuda' if torch.cuda.is_available() else 'cpu'
Fix random seeds for reproducibility.
# fix random seed
def same_seeds(seed):
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
Feel free to change the training parameters here.
# fix random seed for reproducibility 可重复性
same_seeds(0)
# get device
device = get_device()
print(f'DEVICE: {device}')
# training parameters
num_epoch = 20 # number of training epoch
learning_rate = 0.0001 # learning rate
# the path where checkpoint saved
model_path = './model.ckpt'
# create model, define a loss function, and optimizer
model = Classifier().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
DEVICE: cuda
# start training
best_acc = 0.0
for epoch in range(num_epoch):
train_acc = 0.0
train_loss = 0.0
val_acc = 0.0
val_loss = 0.0
# training
model.train() # set the model to training mode
for i, data in enumerate(train_loader):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
batch_loss = criterion(outputs, labels)
# torch.max(input, dim): Returns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim.
_, train_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
batch_loss.backward()
optimizer.step()
# Tensor.cpu(): Returns a copy of this object in CPU memory.
# Tensor.==: 对应位置相同为1,不同为0!!,等同于Tensor.eq_()方法。
# Tensor.sum(): Returns the sum of all elements in the input tensor.
# Tensor.item(): Returns the value of this tensor as a standard Python number.
train_acc += (train_pred.cpu() == labels.cpu()).sum().item() # 2 tensor -> 1 tensor -> 1 tensor of 1 element -> 1 number
train_loss += batch_loss.item()
# validation
if len(val_set) > 0:
model.eval() # set the model to evaluation mode
with torch.no_grad():
for i, data in enumerate(val_loader):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
batch_loss = criterion(outputs, labels)
_, val_pred = torch.max(outputs, 1)
val_acc += (val_pred.cpu() == labels.cpu()).sum().item() # get the index of the class with the highest probability
val_loss += batch_loss.item()
print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f} | Val Acc: {:3.6f} loss: {:3.6f}'.format(
epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader), val_acc/len(val_set), val_loss/len(val_loader)
))
# if the model improves, save a checkpoint at this epoch
if val_acc > best_acc:
best_acc = val_acc
torch.save(model.state_dict(), model_path)
print('saving model with acc {:.3f}'.format(best_acc/len(val_set)))
else:
print('[{:03d}/{:03d}] Train Acc: {:3.6f} Loss: {:3.6f}'.format(
epoch + 1, num_epoch, train_acc/len(train_set), train_loss/len(train_loader)
))
# if not validating, save the last epoch
if len(val_set) == 0:
torch.save(model.state_dict(), model_path)
print('saving model at last epoch')
[001/020] Train Acc: 0.467302 Loss: 1.811661 | Val Acc: 0.567428 loss: 1.433065 saving model with acc 0.567 [002/020] Train Acc: 0.594383 Loss: 1.330666 | Val Acc: 0.628639 loss: 1.211098 saving model with acc 0.629 [003/020] Train Acc: 0.644506 Loss: 1.154064 | Val Acc: 0.660421 loss: 1.101216 saving model with acc 0.660 [004/020] Train Acc: 0.672216 Loss: 1.052246 | Val Acc: 0.676300 loss: 1.038718 saving model with acc 0.676 [005/020] Train Acc: 0.691347 Loss: 0.983104 | Val Acc: 0.685154 loss: 1.001852 saving model with acc 0.685 [006/020] Train Acc: 0.705615 Loss: 0.931955 | Val Acc: 0.689301 loss: 0.984177 saving model with acc 0.689 [007/020] Train Acc: 0.716344 Loss: 0.891687 | Val Acc: 0.694516 loss: 0.964627 saving model with acc 0.695 [008/020] Train Acc: 0.725881 Loss: 0.857907 | Val Acc: 0.697720 loss: 0.951889 saving model with acc 0.698 [009/020] Train Acc: 0.733717 Loss: 0.829495 | Val Acc: 0.696691 loss: 0.949866 [010/020] Train Acc: 0.741151 Loss: 0.803701 | Val Acc: 0.699374 loss: 0.944832 saving model with acc 0.699 [011/020] Train Acc: 0.748049 Loss: 0.781106 | Val Acc: 0.697773 loss: 0.946494 [012/020] Train Acc: 0.753793 Loss: 0.760380 | Val Acc: 0.702830 loss: 0.938236 saving model with acc 0.703 [013/020] Train Acc: 0.759404 Loss: 0.741234 | Val Acc: 0.700452 loss: 0.945627 [014/020] Train Acc: 0.764573 Loss: 0.723574 | Val Acc: 0.702159 loss: 0.942118 [015/020] Train Acc: 0.769470 Loss: 0.707325 | Val Acc: 0.704432 loss: 0.936154 saving model with acc 0.704 [016/020] Train Acc: 0.773687 Loss: 0.691314 | Val Acc: 0.701736 loss: 0.945713 [017/020] Train Acc: 0.778676 Loss: 0.676633 | Val Acc: 0.701586 loss: 0.953081 [018/020] Train Acc: 0.783106 Loss: 0.662425 | Val Acc: 0.699667 loss: 0.963290 [019/020] Train Acc: 0.786395 Loss: 0.649180 | Val Acc: 0.700082 loss: 0.957681 [020/020] Train Acc: 0.790643 Loss: 0.636623 | Val Acc: 0.699732 loss: 0.964269
Create a testing dataset, and load model from the saved checkpoint.
# create testing dataset
test_set = TIMITDataset(test, None)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE, shuffle=False)
# create model and load weights from checkpoint
model = Classifier().to(device)
model.load_state_dict(torch.load(model_path))
<All keys matched successfully>
Make prediction.
predict = []
model.eval() # set the model to evaluation mode
with torch.no_grad():
for i, data in enumerate(test_loader):
inputs = data
inputs = inputs.to(device)
outputs = model(inputs)
_, test_pred = torch.max(outputs, 1) # get the index of the class with the highest probability
for y in test_pred.cpu().numpy():
predict.append(y)
Write prediction to a CSV file.
After finish running this block, download the file prediction.csv
from the files section on the left-hand side and submit it to Kaggle.
with open('prediction.csv', 'w') as f:
f.write('Id,Class\n')
for i, y in enumerate(predict):
f.write('{},{}\n'.format(i, y))