Contributors: Shubham Khandale, Allen Lau, Sumaiya Uddin
Source Code: https://github.com/DataScienceAndEngineering/machine-learning-dse-i210-final-project-signlanguageclassification.git
Project Workspace Setup: Run /src/data/make_dataset.py to download necessary data to execute this notebook.
Table of Contents:
Libraries
#importing libraries
import pickle
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
import seaborn as sns
from matplotlib.colors import ListedColormap
from scipy.stats import uniform
from IPython import display
import pickle
import cv2
from tensorflow.keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array
from tqdm import tqdm
from skimage.feature import hog
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.pipeline import Pipeline
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.decomposition import PCA
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import classification_report
from sklearn.ensemble import RandomForestClassifier,StackingClassifier
from sklearn.metrics import roc_curve, auc, matthews_corrcoef, cohen_kappa_score
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.decomposition import TruncatedSVD
from sklearn.manifold import TSNE
from sklearn.svm import SVC
import xgboost as xgb
from sklearn import model_selection
Clear and effective communication is a vital component of society. However, for individuals who rely on sign language, interacting with those who are unfamiliar with this mode of communication can be a difficult task. The development of a model capable of receiving a video stream from a camera and accurately classifying the signed letters can prove to be an invaluable tool. This technology can be utilized in various settings, including hospitals, schools, and government offices, to facilitate seamless communication and eliminate any potential communication barriers.
This notebook outlines the process of identifying the best models and features to accurately and quickly classify hand signs in a live setting, where the chosen models are divided into non-deep learning and deep learning approaches. The best non-deep learning approach is identified as a Stacking Ensemble Classifier, consisting of Logistic Regression, Support Vector Machine, Random Forest, and XGBoost estimators and a Logistic Regression meta-estimator. The features to train this ensemble model are the 23 components resulted from Linear Discriminant Analysis on the images' 784 pixel feature arrays and the first 30 principal components of the derived histogram of oriented gradients features. The best deep learning approach is identified as a convolutional neural network trained on 224 x 224 images with hand landmarks plotted onto the images.
The main form of communication for the deaf and hard of hearing population is sign language. However, language obstacles prevent the deaf and hearing groups from communicating with one another. This communication divide can be closed by sign language identification, which enables automated translation of sign language into written or spoken language.
The problem of sign language alphabet recognition can be formulated as a machine learning problem. The objective is to create a system that can identify hand motions for every letter of the alphabet and correctly assign them to that letter. The intricacy and variety of sign language movements, as well as the requirement that the system be adaptable to changes in backdrop, lighting, and hand orientation, make this a difficult job. The creation of an effective method for deciphering sign language can greatly improve mobility and communication for the deaf and hard of hearing population, enabling them to interact with hearing people more effectively.
This report will outline the procedures and steps taken to tackle this classification problem. The high-level steps taken to create a sign language interpreter model is as follows: data loading, exploratory data analysis, feature extraction, dimensionality reduction, modeling with hyper-parameter tuning (with Naive Bayes, Logistic Regression, Random Forest, Support Vector Machines, XGBoost, Stacking Ensemble Classifier, and Convolutional Neural Networks), and evaluation. For each model, the classification report (depicting accuracy, precision, recall, f1-score, and support), the Matthews correlation coefficient (MCC), and the Kohen Kappa Score are used to determine model performance. As seen in the Evaluation section, the best performing models identified are the Stacking Ensemble Classifier and the Convolutional Neural Network.
Due to considerations like the speed of predictions and the accuracy in a live environment, the Sign Language Interpreter Application, called SignLingo, will leverage the Convolutional Neural Network as its core model. At a high level, the interpreter will utilize the computer or phone’s camera, detect a person’s hand, extract the hand, send it as an input into the trained model, and output the predicted label with a confidence score.
Machine learning has been used in a lot of research and development in the field of sign language recognition and interpretation. The difficulties associated with sign language recognition and interpretation have been the subject of numerous studies.
One example is illustration of a standard-based framework was the American Communication via Gestures (ASL) acknowledgment framework created in 1998[1]. This framework utilized a glove with sensors to catch hand developments and perceived signs in light of predefined rules. While this framework accomplished an acknowledgment exactness of 98%, it was restricted in its capacity to perceive signs performed by various clients with differing hand sizes and shapes.
Although there are countless examples of impactful sign language interpretation modes, there is still room for improvement. There is a struggle in involving communication through signing acknowledgment frameworks for various applications, for example, gesture-based communication interpretation frameworks, communication through signing learning stages, and correspondence help for the hard of hearing and deaf. In order for people who are deaf or hard of hearing and the general public to communicate effectively, these applications need to be able to recognize sign language in real-time and accurately.
Regardless of the headway made in communication through signing acknowledgment and understanding, there still exists difficulties. For example, fluctuations in marking styles, lighting conditions, foundation mess, and impediments. In the field of applied machine learning, these issues need to be addressed as well as the accuracy and robustness of sign language recognition systems need to be improved.
The Sign Language MNIST dataset (Kaggle) will be used for developing the Sign Language Interpreter model. It is structured as a CSV format with rows containing flattened images of pixel intensity values and its associated letter label. The American Sign Language letter database of hand gestures represent a multi-class problem with 24 classes of letters (excluding J and Z, which require motion and will not be explored in this project). The training data (27,455 instances) and test data (7172 cases) are around half the size of the standard MNIST but otherwise identical, with a header row of label, pixel1, pixel2,...pixel784 representing a single 28x28 pixel image with grayscale values ranging from 0-255.
#function for loading the sign language mnist dataset pickle file, generated from /src/data/make_dataset.py
def load_sign_minist(path):
#load defined pickle file and return dat
with open(path,'rb') as f:
data = pickle.load(f)
return data
#Function to return a dictionary of numeric labels to letters
def get_label_dict(y):
#letters
letters = ['A','B','C','D','E','F','G','H','I','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y']
#numbers
y = pd.Series(y,dtype=int)
numbers = sorted(list(y.unique()))
#dictionary of labels
return dict(zip(numbers,letters))
#function to find the indices given a label
def find_indices(data,label):
#check if data is numpy array
if type(data) == np.ndarray:
#return indices
return np.where(data==label)
#check if data is pandas series
elif type(data) == pd.Series:
#return indices
return data[data==label].index
#else not supported in this function
else:
raise Exception('Not supported data type for this function.')
#load dataset
X_train,y_train,X_test,y_test = load_sign_minist('../data/external/sign_mnist.pkl')
#get labels dictionary
label_dict = get_label_dict(y_train)
#list of letters
letters = ['A','B','C','D','E','F','G','H','I','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y']
def visualize_data(x,y,labels_dict,title):
#visualization of dataset
fig, ax = plt.subplots(4,6)
ax = ax.ravel()
pos = 0
#loop through each label in dataset
for label in range(0,26):
#if label is not included in dataset
if label in [9,25]:
continue
#find first index of label
idx = find_indices(y,label)[0][0]
#display first found image
ax[pos].imshow(x[idx],cmap='gray')
#set x label as dataset label
ax[pos].set(xlabel=labels_dict[label])
#do not show ticks
ax[pos].set_xticks([])
ax[pos].set_yticks([])
#increment for subplotting
pos+=1
plt.suptitle(title)
plt.close()
return fig
#visualizing examples from the data set
fig = visualize_data(X_train,y_train,label_dict,'Figure 1: Sign Language Dataset')
fig
Fig. 1 displays example images for each letter in the dataset. As described above, it is observed that each image is a grayscale, 28x28 image. The labels for this classification problem includes letters A - Z, excluding J and Z since these signs are motioned. The images consist of 784 pixel intensity values ranging from 0 - 255, where 0 is black and 255 is white, which are the features being used for machine learning.
The letters A, E, M, N, and S are similar. It is expected that models may have difficulty differentiating these signs. In contrast, letters like L, O, and Y are very different from the other classes; therefore, it is expected that the models would perform better classifying these letters.
The average image, created by taking the average values of each pixel across all images in the dataset, is plotted below. Additionally, the variance image, created by taking the variance of values for each pixel across all images, is plotted.
def mean_var_image(x,title):
#create subplot
fig, ax = plt.subplots(1, 2)
ax = ax.ravel()
#reshaping arrays and finding the mean and variance of each picel
x = x.reshape(x.shape[0], -1)
mean_img = np.mean(x, axis=0)
var_img = np.var(x, axis=0)
#plotting mean image
ax[0].imshow(mean_img.reshape(28, 28), cmap='gray')
ax[0].set_title('Mean Image')
#plotting variance image
ax[1].imshow(var_img.reshape(28, 28), cmap='gray')
ax[1].set_title('Variance Image')
plt.suptitle(title,y=0.85)
plt.tight_layout()
plt.close()
return fig
#Plotting mean and varience image
mean_var_image(X_train,'Figure 2: Mean and Variance Images')
Figure 2 displays the Mean Image, which illustrates the average positioning of our hands being centered with there being a small border on all sides. The background is generally white, however there are some differences in the far corners of the images. The Variance Image shows that the background of our images are not consistently the same.
#Function to plot the average and variance images for each class in a dataset.
def plot_mean_images(X, y, label_dict,title):
# Grouping the training data by label and calculate the mean of each pixel across all observations with the same label
mean_images = []
for label in np.unique(y):
label_images = X[y == label]
mean_image = np.mean(label_images, axis=0)
mean_images.append(mean_image)
# Plotting the average image for each class
fig, ax = plt.subplots(4, 6)
ax = ax.ravel()
pos = 0
for i, mean_image in enumerate(mean_images):
if i == 9:
i+=1
ax[pos].imshow(mean_image, cmap='gray')
label = label_dict[i] # Retrieve the corresponding letter using the label index
ax[pos].set_title(label)
ax[pos].set_xticks([])
ax[pos].set_yticks([])
pos+=1
plt.suptitle(title)
plt.tight_layout()
plt.show()
# Create the label dictionary
label_dict = get_label_dict(y_train)
# Call the function to plot the mean images
plot_mean_images(X_train, y_train, label_dict,'Figure 3: Mean Images vs. Letters')
Figure 3 illustrates the mean image for each letter, showing the average hand positions for each class. Letters that have extended fingers are more blurry around the fingers, showing that there is more variability in the finger positions. In contrast, letters like A and E do not have extended fingers and show less variability (blurriness). It is expected that letters with less variability may be show more overfit results, since they are more similar.
Next, the total counts of the individual labels are plotted to determine if there are any class imbalances that need to be addressed before modeling. The following code is used to plot Figure 4, which displays the distribution of train and test labels in the dataset. This plot illustrates that there are no class-imbalances within the dataset, so there is no need to rebalance classes or sample.
#function for getting label distribution
def label_distr(X_train,y_train,X_test,y_test,title):
train=pd.concat([pd.DataFrame(X_train.reshape(X_train.shape[0],-1)),pd.DataFrame(y_train,columns=['label'])],axis=1)
test=pd.concat([pd.DataFrame(X_test.reshape(X_test.shape[0],-1)),pd.DataFrame(y_test,columns=['label'])],axis=1)
fig, ax = plt.subplots(figsize=(12, 6))
# Grouping the train and test sets by label and count the number of observations for each label
train_counts = train.groupby('label').size()
test_counts = test.groupby('label').size()
# Custom colors
train_color = 'purple'
test_color = 'pink'
# Plotting the bar chart for train & test set
letters = ['A','B','C','D','E','F','G','H','I','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y']
ax.bar(letters, train_counts, color=train_color, alpha=0.5, label='Train')
ax.bar(letters, test_counts, color=test_color, alpha=0.5, label='Test')
# Adding legend and labels
ax.legend()
ax.set_xlabel('Labels')
ax.set_ylabel('Counts')
ax.set_title(title)
plt.show()
label_distr(X_train,y_train,X_test,y_test,'Figure 4: Distribution of Labels in Train and Test Sets')
The histograms of pixel intensities are plotted below in Figure 4. It is observed that the majority of the label distributions are unimodal, left-skewed distributions. This indicates that the majority of the images have more white (or brighter) pixels than black (or darker) pixels. Additionally, most distributions have a spike of frequency at 255, which is due to the fact that most of the backgrounds in the dataset are white.
def label_histograms(X,y,label_dict):
#creating dataframe for data
data=pd.concat([pd.DataFrame(X.reshape(X.shape[0],-1)),pd.DataFrame(y.astype(int),columns=['label'])],axis=1)
#finding unique labels
unique_labels = sorted(data['label'].unique()) # Sort the unique labels in ascending order
#finding length of unique labels
num_labels = len(unique_labels)
fig, axes = plt.subplots(4, 6, figsize=(15,5))
subplot_index = 0
axes = axes.ravel()
#plotting histograms
for i in unique_labels:
if i == 9:
continue
label_data = data[data['label'] == i]
pixel_values = label_data.iloc[:, 1:].values
axes[subplot_index].hist(pixel_values.flatten(), bins=256, color='#B371C7')
axes[subplot_index].set_title(label_dict[i])
subplot_index += 1
plt.suptitle('Figure 5: Pixel Intensity Distribution vs Letter')
fig.text(0.5, 0, 'Pixel Intensity', ha='center')
fig.text(0, 0.5, 'Frequency', va='center', rotation='vertical')
plt.tight_layout()
plt.show()
label_histograms(X_train,y_train,label_dict)
The first step taken to train the sign language classification model is reshaping the train and test images, such that they are flattened arrays of 784 pixels per image. This step is required so that the data can be used in Scikit Learn's data science functionality. The resultant shapes for the data are seen below.
# Reshape the data to (num_samples, 784)
X_train = X_train.reshape(X_train.shape[0], -1)
X_test = X_test.reshape(X_test.shape[0], -1)
# Print the shapes of the augmented data
print(f'X_train shape: {X_train.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'X_test shape: {X_test.shape}')
print(f'y_test shape: {y_test.shape}')
X_train shape: (27455, 784) y_train shape: (27455,) X_test shape: (7172, 784) y_test shape: (7172,)
Second, normalization is performed so that any techniques that are sensitive to the scale of the features are not affected negatively, in terms of bias towards features with high-magnitude scales or ease and speed of convergence. Normalization of the images is performed on both the train and test data by dividing by 255.
#normalized data
X_train_norm = X_train/255
X_test_norm = X_test/255
Using the preprocessed data, initial baseline modeling was performed using Naive Bayes and Logistic Regression. The procedure for initial baseline modeling utilized Randomized Grid Search Cross Validation to identify the hyperparameters that produced the best performing models. All models in this investigation are evaluated using the classification report (depicting accuracy, precision, recall, f1-score, and support), the Matthews correlation coefficient (MCC), and the Kohen Kappa Score, which are computed using the evaluate_model function below.
def evaluate_model(y_true, y_pred, labels):
# Accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy}")
# Classification report
print("Classification report:")
report = classification_report(y_true, y_pred, target_names=labels, output_dict=True)
print(classification_report(y_true, y_pred, target_names=labels))
# Classification report bar graph
precision = [report[label]['precision'] for label in labels]
recall = [report[label]['recall'] for label in labels]
f1_score = [report[label]['f1-score'] for label in labels]
x = np.arange(len(labels))
width = 0.3
# Define custom sequential colormap
sequence = ['#F7E8F6', '#F1C6E7','#E5B0EA','#BD83CE','#B371C7']
divergence = ['#f8df81','#f6aa90','#f6b4bf','#B371C7','#badfda']
cmap = ListedColormap(sequence)
fig, ax = plt.subplots(figsize=(12,8))
rects1 = ax.bar(x - width, precision, width, label='Precision', color=divergence[4])
rects2 = ax.bar(x, recall, width, label='Recall', color=divergence[2])
rects3 = ax.bar(x + width, f1_score, width, label='F1-Score', color=divergence[3])
ax.set_xlabel('Letters')
ax.set_ylabel('Score')
ax.set_title('Classification Report')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
plt.tight_layout()
plt.show()
# Matthews Correlation Coefficient (MCC)
mcc = matthews_corrcoef(y_true, y_pred)
print(f"MCC: {mcc}")
# Cohen's Kappa
kappa = cohen_kappa_score(y_true, y_pred)
print(f"Cohen's Kappa: {kappa}")
# Confusion Matrix
cm = ConfusionMatrixDisplay(confusion_matrix(y_true, y_pred), display_labels=labels)
fig, ax = plt.subplots(figsize=(16,14)) # set figure size
cm.plot(cmap=cmap, ax=ax) # set color map and axis
plt.title("Confusion Matrix")
plt.show()
For initial baseline modeling, Naive Bayes and Logistic Regression were used and trained on the ~27k train images. The Randomized Grid Search CV is used to find the best hyperparameters for Logistic Regression. The results for Naive Bayes are seen in Appendix A and Logistic Regression in Appendix B.
The Logistic Regression model has achieved a high training accuracy (1.00) and a relatively higher test accuracy (0.67). While the test accuracy is better than the Naive Bayes model, there is still a large performance gap between the train and test accuracies, indicating high level of overfitting.
In summary, the Naive Bayes and Logistic Regression models both exhibit strong indications of overfitting due to the large gap in train and test accuracies.
To address the overfitting issues exhibited by the initial model results, three methods were used: data augmentation, dimension reduction, and regularization.
Data augmentation: Data augmentation addressses overfitting by increasing the amount of images available for training by using parameters like rotation, scaling, and translation to transform the original images to augmented ones. This will not only increase the number of training images, but also increase the variability (noise) in the images, allowing the model to be more generalized.
Dimension reduction: LDA, SVD, TSN-E, PCA, and feature selection approaches (HOG) were used extract useful information from the dataset while reducing the dimensionality of the data before training. This helps remove irrelevant or redundant features and focus on the most informative ones, reducing the risk of overfitting. Moreover, this results in a less complex model, which reduces the liklihood for overfitting.
Regularization: L1 or L2 regularization techniques were used to add penalty terms to the model weights during the training process, which discourages complex and overfitted models. Tuning regularization parameters helps by finding the balance between fitting the training data well and generalizing it to new data.
First, data preprocessing, like reshaping the image arrays, so that it is compatible with the Keras Data Generator function is performed. The extra dimension is added to the image array to represent the batch size for the Keras Data Generator. The Keras Data Generator expects an input array of rank 4, where the first dimension represents the batch size. Since one image is being passed at a time to the data generator, it is required to add an extra dimension to the image array to make its shape (1, height, width, channels).
#define image resolution
res = (28,28)
# Reshape the data to images adding color dimention
X_train = X_train.reshape(X_train.shape[0], res[0], res[1], 1)
X_test = X_test.reshape(X_test.shape[0], res[0], res[1], 1)
An ImageDataGenerator object is created with specific parameters for data augmentation. The parameters include rotation_range, zoom_range, width_shift_range, height_shift_range, shear_range, brightness_range, and fill_mode. These parameters define the range and type of transformations that will be applied to the images during data augmentation, such as rotation, zooming, shifting, shearing, adjusting brightness, and filling missing pixels.
# Creating an ImageDataGenerator object with data augmentation parameters
datagen = ImageDataGenerator(
rotation_range=10,
zoom_range=0.1,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
brightness_range=[0.5, 1.5],
fill_mode='nearest')
Data augmentation is applied to the test and train sets separately by looping over each image in the original sets. For each image, three random transformations are generated using the ImageDataGenerator object defined earlier, and the transformed images are added to the augmented sets along with their corresponding labels. Finally, the augmented data is converted to numpy arrays, resulting in X_train_aug, y_train_aug, X_test_aug, and y_test_aug, which contain the augmented images and labels for both the training and test sets.
# Apply data augmentation to the training set
X_train_augmented = []
y_train_augmented = []
for i in range(X_train.shape[0]):
img = X_train[i]
label = y_train[i]
for j in range(3):
x_augmented = datagen.random_transform(img)
X_train_augmented.append(x_augmented)
y_train_augmented.append(label)
# Apply data augmentation to the test set
X_test_augmented = []
y_test_augmented = []
for i in range(X_test.shape[0]):
img = X_test[i]
label = y_test[i]
for j in range(3):
x_augmented = datagen.random_transform(img)
X_test_augmented.append(x_augmented)
y_test_augmented.append(label)
# Convert the augmented data to numpy arrays
X_train_aug = np.array(X_train_augmented)
y_train_aug = np.array(y_train_augmented)
X_test_aug = np.array(X_test_augmented)
y_test_aug = np.array(y_test_augmented)
The arrays containing the augmented data and the original data are reshaped to have dimensions (number of samples, 28, 28) to match the image dimensions.
#rehape arrays
X_train_aug = X_train_aug.reshape(X_train_aug.shape[0],28,28)
X_test_aug = X_test_aug.reshape(X_test_aug.shape[0],28,28)
X_train = X_train.reshape(X_train.shape[0],28,28)
X_test = X_test.reshape(X_test.shape[0],28,28)
The augmented data is combined with the original data by concatenating the arrays along the first axis (row-wise), resulting in the combined training and test sets with increased sample size. As seen by the shapes, using data augmentation, the train dataset size increases from ~27k to ~101k images and the test size increases from ~7k to 29k. This drastically increases the amount of new data that can be trained on, thus reducing the liklihood of overfitting.
# Concatenate the arrays along the first axis (i.e., row-wise)
X_train_com = np.concatenate((X_train_aug, X_train), axis=0)
y_train_com = np.concatenate((y_train_aug, y_train), axis=0)
X_test_com = np.concatenate((X_test_aug, X_test), axis=0)
y_test_com = np.concatenate((y_test_aug, y_test), axis=0)
# Print the shape of the combined array
print("Shape of combined array:", X_train_com.shape)
print("Shape of combined array:", X_test_com.shape)
Shape of combined array: (109820, 28, 28) Shape of combined array: (28688, 28, 28)
#visualizing examples from the data set
fig = visualize_data(X_train_com,y_train_com,label_dict,'Figure 6: Augmented Sign Language Dataset')
fig
Fig. 6 diplays example images representing each letter in the augmented dataset. These images are grayscale, with dimensions of 28x28 pixels. The augmentation process introduces variations such as changes in brightness, darkness, and rotations, resulting in a diverse set of images for each letter.
#Plotting mean and varience image
mean_var_image(X_train_com,'Figure 7: Augmented Mean and Variance Images')
Figure 7 displays the Mean Image of the augmented dataset, indicating that the hands are centered within the images with a small border around them. The background generally appears white, although slight variations can be observed in the far corners of the images. The Variance Image reveals that the background of the augmented images is not consistently uniform, exhibiting some degree of variation. It is noted that the average and variation images for the augmented dataset are more noisy, which illustrates the ~73k additional augmented images.
# Create the label dictionary
label_dict = get_label_dict(y_train_com)
# Call the function to plot the mean images
plot_mean_images(X_train_com, y_train_com, label_dict,'Figure 8: Augmented Mean Images vs. Letters')
Figure 8 depicts the Mean Image for each letter in the augmented dataset, representing the average hand positions for each class. Letters with extended fingers exhibit blurriness around the fingers, indicating higher variability in finger positions. Again, it is noted that the augmented mean images for each letter are more noisy than the corresponding letter for the original dataset.
# Label distribuiton for augmented data
label_distr(X_train_com,y_train_com,X_test_com,y_test_com,'Figure 9: Augmented Distribution of Labels in Train and Test Sets')
Figure 9 shows the distribution in image count for each label after augmentation. It is noted that there is still no class imblances in the dataset.
Dimensionality reduction is an important step in the Data Science pipeline because it reduces the complexity of the model by reducing the number of input features. This results in a decreased likelihood for the model to overfit on the training data. Additionally, removing noise and unimportant/redundant features can lead to better performing models. Lastly, reducing dimensionality will decrease the computational and memory requirements to train and use the model.
Linear Discriminant Analysis is a linear supervised learning algorithm used for classification tasks by projecting the data to a lower dimensionality that maximizes the separation between classes. This is achieved by finding the vectors in the feature space that best separates the different classes of the data and minimizes the variance of the data within each class.
The below code outlines the process of setting up the data so that it can be an input into Scikit-Learn's LinearDiscriminantAnalysis() function.
#renaming variables
X_train = X_train_com
y_train = y_train_com
X_test = X_test_com
y_test = y_test_com
#reshaping
X_train = X_train.reshape(X_train.shape[0],-1)
X_test = X_test.reshape(X_test.shape[0],-1)
# Normalizing the data
X_train_norm = X_train/255
X_test_norm = X_test/255
#define sklearn LDA object
lda = LinearDiscriminantAnalysis()
#fit on training data
lda.fit(X_train_norm,y_train)
LinearDiscriminantAnalysis()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearDiscriminantAnalysis()
The explained variance ratio of the LDA components (linear discriminants) indicate how much information is retained at each component. As a result, the cumulative explained variance can help determine how many components to keep for dimensionality reduction.
#getting explained variance ratio from the lda model
evr = lda.explained_variance_ratio_
components = range(1, len(evr) + 1)
#plotting scree plot
fig, ax = plt.subplots(figsize = (8,5))
ax.bar(x = components, height = evr, label = 'Explained Variance');
plt.plot(components, np.cumsum(evr), marker = '.', color = 'orange', label = 'Cumulative Explained Variance')
plt.axhline(y = .95, color = 'r', linestyle = '--', label = '0.95 Explained Variance')
plt.xticks(range(1, len(evr)+1));
plt.title('Figure 10: LDA Explained Variance');
plt.xlabel('Component');
plt.ylabel('Explained Variance');
plt.legend(fontsize = 9);
Looking at the plot above, it can be interpreted that there is an elbow at around component 3 - 5, however, this would only account for about .4 - .55 of the cumulative variance explained. As a result, for the purposes of modeling, all components resulted from the LDA computation will be used.
This results in a dimensionality reduction of 784 to 23.
#fit on training data and transform
X_train_lda = lda.transform(X_train_norm)
X_test_lda = lda.transform(X_test_norm)
fig, ax = plt.subplots(figsize = (8,8))
ax = sns.scatterplot(x = X_train_lda[:,0], y = X_train_lda[:,1], hue = y_train, palette = 'pastel');
handler, _ = ax.get_legend_handles_labels();
plt.legend(handler, letters, bbox_to_anchor = (1, 1));
plt.title('Figure 11: 2D Embedding of Sign Language Images')
plt.xlabel('Linear Discriminant 1');
plt.ylabel('Linear Discriminant 2');
Plotting linear components 1 and 2 from the LDA computation, it is observed that it does reasonably well at separating certain letters from others. For example, X and Y is separated well from the other letters using the first two linear discriminants. The other letters likely require more components to result in a clearer separation between the classes.
The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image.
The code below defines the parameters for Histogram of Oriented Gradients (HOG) feature extraction, including the number of orientations, pixels per cell, and cells per block. The extract_features function is then defined to extract HOG features from a single image using these parameters. Finally, the function is applied to all images in the training and test sets, resulting in X_train_features and X_test_features arrays that contain the extracted HOG features for each image.
In summary, this code performs HOG feature extraction on the normalized images in the training and test sets using predefined parameters, producing arrays of HOG features for further analysis and modeling.
# Define the HOG parameters
orientations = 9
pixels_per_cell = (2, 2)
cells_per_block = (1, 1)
# Function to extract HOG features from a single image
def extract_features(img):
#print(f"Image shape before HOG: {img.shape}")
features = hog(img, orientations=orientations,
pixels_per_cell=pixels_per_cell,
cells_per_block=cells_per_block,
visualize=False,
transform_sqrt=True,
feature_vector=True,
block_norm='L2-Hys')
#print(f"Feature shape: {features.shape}")
return features
# Apply the extract_features function to all images in X_train and X_test
X_train_features = np.array([extract_features(img.reshape((28,28))) for img in X_train_norm])
X_test_features = np.array([extract_features(img.reshape((28,28))) for img in X_test_norm])
print(f"X_train_features Shape: {X_train_features.shape}")
print(f"X_test_features Shape: {X_test_features.shape}")
X_train_features Shape: (109820, 1764) X_test_features Shape: (28688, 1764)
Because the HOG feature engineering method creates feature arrays with 1764 dimensions, a dimensionality reduction technique is needed so that the number of features being trained on can be limited to reduce the liklihood of overfitting. The code below performs Principal Component Analysis (PCA) on the extracted HOG features from the train and test sets. It reduces the dimensionality of the feature vectors to 30 components and transforms the data accordingly. The shape of hog_test_pca and hog_train_pca is then printed to show the new dimensions of the transformed feature vectors.
#perform PCA on HOG feature
pca = PCA(n_components=30)
#fit transform on training data
hog_train_pca = pca.fit_transform(X_train_features)
#transform on testing data
hog_test_pca=pca.transform(X_test_features)
#printing shape
print(hog_test_pca.shape)
print(hog_train_pca.shape)
(28688, 30) (109820, 30)
#getting explained variance ratio from the lda model
evr = pca.explained_variance_ratio_
components = range(1, len(evr) + 1)
#plotting scree plot
fig, ax = plt.subplots(figsize = (8,5))
ax.bar(x = components, height = evr, label = 'Explained Variance');
plt.plot(components, np.cumsum(evr), marker = '.', color = 'orange', label = 'Cumulative Explained Variance')
plt.axhline(y = .95, color = 'r', linestyle = '--', label = '0.95 Explained Variance')
plt.xticks(range(1, len(evr)+1));
plt.title('Figure 12: PCA Explained Variance (HOG)');
plt.xlabel('Component');
plt.ylabel('Explained Variance');
plt.legend(fontsize = 9);
The figure above shows the explained variance ratio and cumulative explained variance for each principal component in the PCA analysis. It helps visualize the amount of variance explained by each component and the cumulative variance explained as more components are considered. The red dashed line represents the threshold of 0.95 explained variance, indicating the number of components needed to capture at least 95% of the total variance. However, since the explained variance increases slowly, it would require almost all components to reach the 95% mark. As a result, only the first 30 components will be used which was determined by an Accuracy vs Number of LDA Components Graph using the Logistic Regression baseline model. 30 is chosen based on the point where the accuracy no longer increases exponentially. The code and figure illustrating this is shown below.
train_acc = []
test_acc = []
for num_components in range(6,60,3):
pca = PCA(n_components=num_components)
hog_train_pca = pca.fit_transform(X_train_features)
hog_test_pca = pca.transform(X_test_features)
lr = LogisticRegression(C=3.4647045830997407,
max_iter=3171,
penalty="l2",
solver="liblinear",
warm_start=False)
lr.fit(hog_train_pca,y_train)
y_pred_lr_train = lr.predict(hog_train_pca)
y_pred_lr_test = lr.predict(hog_test_pca)
train_acc.append(accuracy_score(y_train,y_pred_lr_train))
test_acc.append(accuracy_score(y_test,y_pred_lr_test))
plt.plot(range(6,60,3),train_acc,label='Train')
plt.plot(range(6,60,3),test_acc,label='Test')
plt.title('Figure 13: Logistic Regression Accuracy vs. Number of PCA Components')
plt.xlabel('Number of PCA Components')
plt.ylabel('Accuracy')
plt.legend()
<matplotlib.legend.Legend at 0x16e270cd0>
Next, the scatter plot in Figure 14 shows a 2D embedding of the HOG features after performing PCA (Principal Component Analysis) on the training data. Each point represents a specific label data point, and its position on the plot is determined by the values of the first and second principal components.
# Plotting scatter plot for hog_train_pca
fig, ax = plt.subplots(figsize=(8, 8))
ax = sns.scatterplot(x=hog_train_pca[:, 0], y=hog_train_pca[:, 1], hue=y_train, palette='pastel', alpha=0.6)
handler, _ = ax.get_legend_handles_labels()
plt.legend(handler, letters, bbox_to_anchor=(1, 1))
plt.title('Figure 14: 2D Embedding For PCA (HOG Train)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()
The last step with feature engineering and dimensionality reduction is to combine the LDA components and the PCA components of HOG together, so that they can be used for model training. The shapes of the dataset are seen below.
# concatenate PCA and LDA features
X_train_combined = np.concatenate((hog_train_pca, X_train_lda), axis=1)
X_test_combined = np.concatenate((hog_test_pca, X_test_lda), axis=1)
print(f"X_train Shape: {X_train_combined.shape}")
print(f"X_test Shape: {X_test_combined.shape}")
#redefining
X_train = X_train_combined
X_test = X_test_combined
X_train Shape: (109820, 80) X_test Shape: (28688, 80)
After addressing the overfittting issue, two mechine learning models were employed as baselines, namely Naive Bayes and Logistic Regression, to establish a performance benchmark. Additionally, more advanced models such as Random Forest, Support Vector Machines (SVM), XGBoost were utilized, along with a stacking ensemble technique. This comprehensive approach aimed to leverage the strengths of each model, resulting in improved predictive accuracy and robustness for the given task. The models Naive Bayes, Logistic Regression, Random Forest, Support Vector Machine, XGBoost, and Stacking Ensemble Classifier are located in Appendix C, D, E, F, G, H respectively where Randomized Grid Search CV is used to find the best hyperparameters for this dataset.
The evaluation for each model is also shown in the aforementioned appendix sections. In the Evaluation Section, plots to determine the best performing models are shown. Only the code for the best performing non-deep learning model and deep learning models will be displayed in this methodology section. To see the code for the other models, refer to the appendix sections.
For non-deep learning models, the best performing model is the Stacking Ensemble Classifier. The code to define the structure and train this model is as follows.
The below code describes the process of training the stacking ensemble classifier. This ensemble method is composed of the best performing models found using Randomized Grid Search CV for Support Vector Machine, XGBoost, Logistic Regression, and Random Forest. The meta estimator is defined as a Logistic Regression model.
#defining estimators
all_estimators = [
('svm',SVC(kernel = 'poly', gamma = 'auto', C = .1, probability=True)),
('xgb',xgb.XGBClassifier(subsample=0.4,reg_lambda=2.25,reg_alpha=2,min_child_weight=30,max_depth=8,learning_rate=0.001,gamma=0,colsample_bytree=0.4)),
('lr',LogisticRegression(C=0.22564631610840102,max_iter=2391, penalty="l2", solver='newton-cg',warm_start=False)),
('rf',RandomForestClassifier(n_estimators=20, min_samples_split=10, min_samples_leaf=5,max_features=5, max_depth=5, random_state=42))
]
#training stacking classifier
all_stack = StackingClassifier(estimators=all_estimators, final_estimator=LogisticRegression(max_iter=3000))
all_stack.fit(X_train,y_train)
#predictions
y_pred_train = all_stack.predict(X_train)
y_pred_test = all_stack.predict(X_test)
#stacking model
all_stack
StackingClassifier(estimators=[('svm', SVC(C=0.1, gamma='auto', kernel='poly', probability=True)), ('xgb', XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=0.4, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=0, gpu_id=None, grow_policy=None,... n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=None, ...)), ('lr', LogisticRegression(C=0.22564631610840102, max_iter=2391, solver='newton-cg')), ('rf', RandomForestClassifier(max_depth=5, max_features=5, min_samples_leaf=5, min_samples_split=10, n_estimators=20, random_state=42))], final_estimator=LogisticRegression(max_iter=3000))In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
StackingClassifier(estimators=[('svm', SVC(C=0.1, gamma='auto', kernel='poly', probability=True)), ('xgb', XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=0.4, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=0, gpu_id=None, grow_policy=None,... n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=None, ...)), ('lr', LogisticRegression(C=0.22564631610840102, max_iter=2391, solver='newton-cg')), ('rf', RandomForestClassifier(max_depth=5, max_features=5, min_samples_leaf=5, min_samples_split=10, n_estimators=20, random_state=42))], final_estimator=LogisticRegression(max_iter=3000))
SVC(C=0.1, gamma='auto', kernel='poly', probability=True)
XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=0.4, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=0, gpu_id=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=0.001, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=8, max_leaves=None, min_child_weight=30, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=None, ...)
LogisticRegression(C=0.22564631610840102, max_iter=2391, solver='newton-cg')
RandomForestClassifier(max_depth=5, max_features=5, min_samples_leaf=5, min_samples_split=10, n_estimators=20, random_state=42)
LogisticRegression(max_iter=3000)
import tensorflow as tf
from tensorflow.keras import layers, models
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=False,
preprocessing_function=tf.keras.applications.resnet50.preprocess_input)
train_generator = train_datagen.flow_from_directory(
directory='/content/drive/MyDrive/Data/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical',shuffle= False)
val_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
preprocessing_function=tf.keras.applications.resnet50.preprocess_input)
val_generator = val_datagen.flow_from_directory(
directory='/content/drive/MyDrive/TestData/TestData/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical',
shuffle= False)
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
preprocessing_function=tf.keras.applications.resnet50.preprocess_input)
test_generator = test_datagen.flow_from_directory(
directory='/content/drive/MyDrive/TestData/TestData/',
target_size=(224, 224),
batch_size=32,
class_mode='categorical',shuffle= False)
#CNN Model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dropout(0.2)) # Add dropout with a rate of 0.5
model.add(layers.Dense(24, activation='softmax'))
model.summary()
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples/train_generator.batch_size,
epochs=5,
validation_data=val_generator,
validation_steps=val_generator.samples/val_generator.batch_size)
Found 5679 images belonging to 24 classes. Found 383 images belonging to 24 classes. Found 383 images belonging to 24 classes. Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_4 (Conv2D) (None, 222, 222, 32) 896 max_pooling2d_4 (MaxPooling (None, 111, 111, 32) 0 2D) conv2d_5 (Conv2D) (None, 109, 109, 64) 18496 max_pooling2d_5 (MaxPooling (None, 54, 54, 64) 0 2D) conv2d_6 (Conv2D) (None, 52, 52, 128) 73856 max_pooling2d_6 (MaxPooling (None, 26, 26, 128) 0 2D) conv2d_7 (Conv2D) (None, 24, 24, 256) 295168 max_pooling2d_7 (MaxPooling (None, 12, 12, 256) 0 2D) flatten_1 (Flatten) (None, 36864) 0 dense_2 (Dense) (None, 512) 18874880 dropout_1 (Dropout) (None, 512) 0 dense_3 (Dense) (None, 24) 12312 ================================================================= Total params: 19,275,608 Trainable params: 19,275,608 Non-trainable params: 0 _________________________________________________________________
This model is a convolutional neural network (CNN) architecture used for image classification tasks.
Conv2D layer with 32 filters and a kernel size of (3, 3): This layer performs the convolution operation on the input image with 32 filters, each of which detects different features in the image. The activation function used is ReLU (Rectified Linear Unit), which introduces non-linearity to the model.
MaxPooling2D layer with a pool size of (2, 2): This layer reduces the spatial dimensions (width and height) of the input by selecting the maximum value in each 2x2 region. It helps in reducing the computational complexity and provides a form of translation invariance.
Conv2D layer with 64 filters and a kernel size of (3, 3): This layer performs another convolution operation on the previous layer's output with 64 filters, extracting more complex features from the image.
MaxPooling2D layer: Similar to the previous max pooling layer, it reduces the spatial dimensions further.
Conv2D layer with 128 filters and a kernel size of (3, 3): This layer continues the pattern of extracting higher-level features with 128 filters.
MaxPooling2D layer: Reduces the spatial dimensions again.
Conv2D layer with 256 filters and a kernel size of (3, 3): This layer further increases the number of filters, capturing more abstract features in the image.
MaxPooling2D layer: Continues the downsampling process.
Flatten layer: This layer converts the 2D output of the previous layer into a 1D feature vector, preparing it for input to a fully connected (dense) layer.
Dense layer with 512 units: A fully connected layer that receives the flattened feature vector as input. It applies the ReLU activation function to introduce non-linearity.
Dropout layer with a dropout rate of 0.2: Dropout is a regularization technique that randomly sets a fraction of input units to 0 during training. It helps prevent overfitting by reducing the reliance on specific neurons and encourages the network to learn more robust features.
Dense layer with 24 units: This is the final output layer of the network, consisting of 24 units corresponding to the number of classes in the classification task. The activation function used is softmax, which converts the final layer's outputs into probabilities, representing the likelihood of each class.
#Showing the training epochs as it was trained on kaggle
display.Image("./figures/CNN_epochs.png")
To find the best performing non-deep learning model, KFold Cross Validation accuracy scores are computed for each model and the plot of the average accuracies across the KFold validation datasets and its standard deviation are shown below.
#define models for stacking classifier
all_estimators = [
('svm',SVC(kernel = 'poly', gamma = 'auto', C = .1, probability=True)),
('xgb',xgb.XGBClassifier(subsample=0.4,reg_lambda=2.25,reg_alpha=2,min_child_weight=30,max_depth=8,learning_rate=0.001,gamma=0,colsample_bytree=0.4)),
('lr',LogisticRegression(C=0.22564631610840102,max_iter=2391, penalty="l2", solver='newton-cg',warm_start=False)),
('rf',RandomForestClassifier(n_estimators=20, min_samples_split=10, min_samples_leaf=5,max_features=5, max_depth=5, random_state=42))
]
#define models for KFold CV accuracy looping
models = {'Naive Bayes':GaussianNB(),
'Logistic Regression':LogisticRegression(C=0.22564631610840102,max_iter=2391, penalty="l2",solver='newton-cg',warm_start=False),
'SVM':SVC(kernel = 'poly', gamma = 'auto', C = .1, probability=True),
'Random Forest Classifier':RandomForestClassifier(n_estimators=20, min_samples_split=10, min_samples_leaf=5,max_features=5, max_depth=5, random_state=42),
'XGBoost':xgb.XGBClassifier(subsample=0.4,reg_lambda=2.25,reg_alpha=2,min_child_weight=30,max_depth=8,learning_rate=0.001,gamma=0,colsample_bytree=0.4),
'Stacking Classifier':StackingClassifier(estimators=all_estimators, final_estimator=LogisticRegression(max_iter=3000))
}
#lists to store results and names
results = []
names = []
#loop to calculate accuracies
for name,model in models.items():
kfold = model_selection.KFold(n_splits=5,random_state=99,shuffle=True)
if name != 'XGBoost':
cv_results = model_selection.cross_val_score(model,X_train,y_train,cv=kfold,scoring='accuracy')
else:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit(y_train)
y_train_encoded = le.transform(y_train)
cv_results = model_selection.cross_val_score(model,X_train,y_train_encoded,cv=kfold,scoring='accuracy')
results.append(cv_results)
names.append(name)
print(f'{name}: {cv_results.mean()}, {cv_results.std()}')
fig = plt.figure()
fig.suptitle('Figure 15: KFold CV Accuracy Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.xticks(rotation=45, ha='right')
plt.show()
#display image due to computation time
display.Image("./figures/Figure15.png")
In Figure 15, it is observed that the stacking classifier has the highest average cross validation accuracy at 0.835 and a standard deviation of 0.00598. This indicates that the stacking ensemble classifier is the best performing non-deep learning model. This is exemplified by the train and test accuracy scores, where the Stacking Ensemble Model has the highest train, test accuracies for the non-deep learning models.
#display image due to computation time
display.Image("./figures/Figure16.png")
To see the performance of the stacking ensemble in more detail, the predictions for train and test datasets are computed, and the evaluate_model() function is executed below.
#predict on train and test data using stacking ensemble
y_pred_train = all_stack.predict(X_train)
y_pred_test = all_stack.predict(X_test)
#evaluate model
evaluate_model(y_train,y_pred_train,letters)
evaluate_model(y_test,y_pred_test,letters)
Accuracy: 0.9568202513203424 Classification report: precision recall f1-score support A 0.93 0.95 0.94 4504 B 0.97 0.97 0.97 4040 C 0.99 0.99 0.99 4576 D 0.96 0.96 0.96 4784 E 0.96 0.95 0.95 3828 F 0.98 0.98 0.98 4816 G 0.96 0.96 0.96 4360 H 0.97 0.97 0.97 4052 I 0.96 0.94 0.95 4648 K 0.95 0.97 0.96 4456 L 0.99 0.99 0.99 4964 M 0.89 0.89 0.89 4220 N 0.91 0.91 0.91 4604 O 0.99 0.99 0.99 4784 P 1.00 1.00 1.00 4352 Q 0.99 0.99 0.99 5116 R 0.94 0.93 0.93 5176 S 0.92 0.91 0.92 4796 T 0.95 0.96 0.96 4744 U 0.92 0.92 0.92 4644 V 0.93 0.92 0.92 4328 W 0.96 0.96 0.96 4900 X 0.97 0.98 0.98 4656 Y 0.98 0.97 0.98 4472 accuracy 0.96 109820 macro avg 0.96 0.96 0.96 109820 weighted avg 0.96 0.96 0.96 109820
MCC: 0.9549342063737285 Cohen's Kappa: 0.9549327309922754
Accuracy: 0.853771611823759 Classification report: precision recall f1-score support A 0.85 0.92 0.89 1324 B 0.96 0.87 0.91 1728 C 0.98 0.96 0.97 1240 D 0.85 0.88 0.87 980 E 0.86 0.90 0.88 1992 F 0.95 0.94 0.95 988 G 0.90 0.85 0.87 1392 H 0.96 0.93 0.94 1744 I 0.89 0.87 0.88 1152 K 0.86 0.89 0.88 1324 L 0.94 0.99 0.96 836 M 0.74 0.74 0.74 1576 N 0.70 0.66 0.68 1164 O 0.96 0.91 0.93 984 P 0.98 0.99 0.99 1388 Q 0.89 0.98 0.93 656 R 0.52 0.75 0.62 576 S 0.65 0.57 0.61 984 T 0.77 0.82 0.79 992 U 0.82 0.66 0.73 1064 V 0.84 0.70 0.76 1384 W 0.64 0.89 0.75 824 X 0.86 0.88 0.87 1068 Y 0.93 0.86 0.90 1328 accuracy 0.85 28688 macro avg 0.85 0.85 0.85 28688 weighted avg 0.86 0.85 0.85 28688
MCC: 0.847147528006954 Cohen's Kappa: 0.846951742002403
# Evaluate the model on the train set
test_loss, test_accuracy = model.evaluate(train_generator)
# Generate predictions on the test set
predictions = model.predict(train_generator)
predicted_labels = tf.argmax(predictions, axis=1)
# Get the true labels
true_labels = train_generator.classes
# Get the class names
class_names = list(train_generator.class_indices.keys())
evaluate_model(true_labels, predicted_labels.numpy(), class_names)
#Showing the images it was trained on kaggle
display.Image("./figures/accTrain.png")
In the given classification report, the model achieved an accuracy of 0.9412, meaning it correctly predicted the class labels for 94.12% of the samples.
Precision: Most classes have high precision scores of 1.00, indicating that the model had a very low rate of false positives for those classes. However, classes 'M', 'N', 'R', 'S', 'U', and 'V' have lower precision scores, suggesting some difficulty in accurately predicting those classes.
Recall: The majority of classes have recall scores of 1.00, indicating that the model successfully identified the majority of positive instances for those classes. However, classes 'M' and 'S' have lower recall scores, indicating some difficulty in correctly capturing all positive instances for those classes.
F1-score: The F1-scores for most classes are high, with a value of 1.00, suggesting a balanced performance in terms of precision and recall. However, classes 'M', 'N', 'R', and 'S' have lower F1-scores, indicating a trade-off between precision and recall for those classes.
Support: The support column shows the number of samples in each class in the dataset.
Weighted avg: The weighted average precision, recall, and F1-score are also around 0.94, taking into account the support (i.e., number of samples) for each class.
Overall, the model achieved high accuracy and performed well for most classes. However, it faced challenges in accurately predicting classes 'M', 'N', 'R', 'S', 'U', and 'V'. Improvements may be needed to enhance the model's performance on these specific classes.
#Showing the images it was trained on kaggle
display.Image("./figures/ClassificationReport_train.png")
#Showing the images it was trained on kaggle
display.Image("./figures/ConfusionMatrix_train.png")
#Deep Learning Evaluation
#testing Data
# Evaluate the model on the test set
test_loss, test_accuracy = model.evaluate(val_generator)
# Generate predictions on the test set
predictions = model.predict(val_generator)
predicted_labels = tf.argmax(predictions, axis=1)
# Get the true labels
true_labels = val_generator.classes
# Get the class names
class_names = list(val_generator.class_indices.keys())
evaluate_model(true_labels, predicted_labels.numpy(), class_names)
#Showing the images it was trained on kaggle
display.Image("./figures/accTest.png")
The accuracy of the model is 0.9321, which means it predicted the correct class for 93.21% of the samples.
Precision measures the proportion of correctly predicted positive instances out of the total instances predicted as positive. Recall, on the other hand, measures the proportion of correctly predicted positive instances out of the total actual positive instances. The F1-score is the harmonic mean of precision and recall, providing a balanced measure of the model's performance.
In your classification report, most classes have perfect precision and recall scores of 1.00, indicating high performance. However, some classes like 'M', 'N', 'R', and 'S' have lower scores, suggesting difficulties in accurately predicting those classes.
The support column indicates the number of samples in each class, which can vary across classes.
The macro average provides the average performance across all classes, treating each class equally. The weighted average considers the support for each class, providing a weighted measure that accounts for class imbalance.
In summary, the model achieved high accuracy and performed well for most classes, but struggled with certain classes. Further analysis and improvements may be needed to enhance the model's performance on those specific classes.
#Showing the images it was trained on kaggle
display.Image("./figures/ClassificationReportTesting.png")
display.Image("./figures/ConfusionMatrixTesting.png")
For the purposes of this investigation, the best performing models for sign language classification is split between the best non-deep learning and deep learning models. The best non-deep learning model is the Stacking Ensemble Classifier, which consists of the estimators of Logistic Regression, Support Vector Machine, Random Forest, and XGBoost. The meta estimator used is Logistic Regression. This model was trained on the ~100k augmented dataset where the features used are the 24 LDA components and the 30 PCA components derived from the HOG features. The best performing deep learning model is a Convolutional Neural Network that consists of a total of 4 hidden layers (convolutional layers with 32, 64, 128, and 256 filters), pooling layers after each convolutional layers to down sample the feature maps, and a dense layer consisting of 24 units, representing the number of classes in the classification task. It is important to note that these models performed well on the train and test datasets; however, for live testing, where the input images are being extracted from live camera video feeds, the performance of the models decreased. As a result, the Convolutional Neural Network was trained on 224 x 224 images where the hand landmarks are plotted on the images. Adding these hand features and increasing the sizes of the training images allows the model to better able find the finger positions and more accurately predict the correct labels.
As seen in the evaluation metrics, these models perform much better than the random classifier for this problem, which has an accuracy of 0.04. As a result, we are able to conclude that we have created high performing and suitable models for a Sign Language Interpreter Model.
Using the CCN, the Sign Language Interpreter Model, named SignLingo, was created and can be ran using the sign_language_interpreter.py script. The reason why the CNN model was chosen is due to the fact that it is the highest performing model on both the dataset and live environment. This script utilizes various Python libraries like OpenCV and MediaPipe utilize the computer or phone’s camera, detect a person’s hand, extract the hand, send it as an input into the trained CNN, and output the predicted label with a confidence score.
The general takeaways from this project are as follows: The performance of models is highly dependent on the quality of data and the ability to extract informative features, training data should be as close as possible to the real world environment, certain models have certain decision boundary characteristics, which work better for certain features (it is expected that Random Forest and XGBoost perform better, however the features that have been chosen are not optimized for these models), and features that can clarify information like the positioning of the fingers will benefit model performance and generality.
Sumaiya Uddin has contributed significantly to the project. Her code hours are estimated to be around 3-4 hours daily on average. In the repository, her contributions include exploratory data analysis (EDA), data augmentation techniques, feature selection using the Histogram of Oriented Gradients (HOG) method, implementation of the baseline model, and the application of various machine learning algorithms such as Naive Bayes, Logistic Regression, Random Forest, and SVD dimension reduction. Her expertise and efforts have been instrumental in improving the project's performance and advancing its objectives.
Allen Lau has been an instrumental part to this project and team. His code hours are estimated to be around 3 - 4 hours daily. In the repository, his contributions include exploratory data analysis, dimensionality reduction using LDA, modeling with various algorithms like support vector machine, XGBoost, and stacking ensemble classifier, and scripts for utilizing OpenCV and MediaPipe to utilize computer vision techniques for the sign language interpreter application. His knowledge and diligence played a vital role in ensuring the project meets the goals set by the team and the project objectives.
Shubham Khandale has played a crucial role in this project and team, dedicating approximately 3 to 4 hours per day to coding. His contributions to the repository encompass a range of tasks, including conducting exploratory data analysis, performing dimensionality reduction using TSNE, and implementing various algorithms such as Logistic Regression, Naïve Bayes, and LDA on the feature set extracted from the CNN model. Furthermore, he actively participated in generating the dataset using the cvzone Python library and constructing a customized CNN model. His expertise and dedication were instrumental in ensuring the project aligns with the team's objectives and achieves its goals.
#Showing the images it was trained on kaggle
display.Image("./figures/github_contributions.png")
The below code is used to train a Gaussian Naive Bayes model on the normalized training data. Once the model is trained, the evaluate_model() function is used to output the evaluation metrics described in the Methods section above.
# Defining Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train_norm, y_train)
# applying NB on normalized train data
y_pred_train = gnb.predict(X_train_norm)
# applying NB on normalized test data
y_pred_test = gnb.predict(X_test_norm)
# Evaluating on train set
evaluate_model(y_train, y_pred_train, letters)
# Evaluating on test set
evaluate_model(y_test, y_pred_test, letters)
Accuracy: 0.4600254962666181 Classification report: precision recall f1-score support A 0.75 0.47 0.58 1126 B 0.89 0.47 0.61 1010 C 0.91 0.76 0.83 1144 D 0.65 0.44 0.52 1196 E 0.44 0.67 0.53 957 F 0.52 0.36 0.42 1204 G 0.68 0.58 0.63 1090 H 0.61 0.42 0.50 1013 I 0.37 0.66 0.48 1162 K 0.30 0.49 0.37 1114 L 0.68 0.57 0.62 1241 M 0.44 0.36 0.39 1055 N 0.50 0.26 0.34 1151 O 0.56 0.62 0.59 1196 P 0.33 0.74 0.45 1088 Q 0.49 0.54 0.52 1279 R 0.26 0.44 0.33 1294 S 0.38 0.35 0.36 1199 T 0.33 0.71 0.45 1186 U 0.39 0.11 0.17 1161 V 0.29 0.34 0.31 1082 W 0.20 0.05 0.08 1225 X 0.58 0.30 0.40 1164 Y 0.69 0.39 0.50 1118 accuracy 0.46 27455 macro avg 0.51 0.46 0.46 27455 weighted avg 0.51 0.46 0.46 27455
MCC: 0.4392833743318951 Cohen's Kappa: 0.4364137376183579
Accuracy: 0.3898494143892917 Classification report: precision recall f1-score support A 0.71 0.48 0.57 331 B 0.96 0.40 0.56 432 C 0.72 0.50 0.59 310 D 0.60 0.40 0.48 245 E 0.53 0.56 0.55 498 F 0.39 0.26 0.31 247 G 0.54 0.56 0.55 348 H 0.85 0.39 0.53 436 I 0.22 0.45 0.30 288 K 0.27 0.41 0.33 331 L 0.41 0.45 0.43 209 M 0.37 0.19 0.25 394 N 0.38 0.31 0.34 291 O 0.37 0.49 0.42 246 P 0.43 0.81 0.56 347 Q 0.29 0.60 0.39 164 R 0.13 0.44 0.20 144 S 0.13 0.17 0.15 246 T 0.24 0.60 0.35 248 U 0.18 0.03 0.05 266 V 0.27 0.21 0.23 346 W 0.11 0.05 0.07 206 X 0.57 0.29 0.38 267 Y 0.53 0.18 0.27 332 accuracy 0.39 7172 macro avg 0.42 0.38 0.37 7172 weighted avg 0.46 0.39 0.39 7172
MCC: 0.36615100879087953 Cohen's Kappa: 0.36306888046227415
The results indicate that the Naive Bayes classifier is showing signs of overfitting. The high training accuracy (0.46) compared to the lower test accuracy (0.39) suggests that the model has learned the training data too well, resulting in poor generalization to unseen data.
The below code describes the process of training the best Logistic Regression model, where Randomized Grid Search CV is used for hyperparameter tuning. The best parameters found using Randomized Grid Search CV are as follows:
Best hyperparameters: {'C': 3.4647045830997407, 'max_iter': 3171, 'penalty': 'l2', 'solver': 'liblinear', 'warm_start': False}
lr = LogisticRegression(C=3.4647045830997407,
max_iter=3171,
penalty="l2",
solver="liblinear",
warm_start=False)
lr.fit(X_train_norm, y_train)
# applying Logistic regression on train
y_pred_lr_train = lr.predict(X_train_norm)
# applying Logistic regression on test
y_pred_lr_test = lr.predict(X_test_norm)
LogisticRegression(C=3.4647045830997407, max_iter=3171, solver='liblinear')
evaluate_model(y_train, y_pred_lr_train, letters)
evaluate_model(y_test, y_pred_lr_test, letters)
Accuracy: 0.999745037333819 Classification report: precision recall f1-score support A 1.00 1.00 1.00 1126 B 1.00 1.00 1.00 1010 C 1.00 1.00 1.00 1144 D 1.00 1.00 1.00 1196 E 1.00 1.00 1.00 957 F 1.00 1.00 1.00 1204 G 1.00 1.00 1.00 1090 H 1.00 1.00 1.00 1013 I 1.00 1.00 1.00 1162 K 1.00 1.00 1.00 1114 L 1.00 1.00 1.00 1241 M 1.00 1.00 1.00 1055 N 1.00 1.00 1.00 1151 O 1.00 1.00 1.00 1196 P 1.00 1.00 1.00 1088 Q 1.00 1.00 1.00 1279 R 1.00 1.00 1.00 1294 S 1.00 1.00 1.00 1199 T 1.00 1.00 1.00 1186 U 1.00 1.00 1.00 1161 V 1.00 1.00 1.00 1082 W 1.00 1.00 1.00 1225 X 1.00 1.00 1.00 1164 Y 1.00 1.00 1.00 1118 accuracy 1.00 27455 macro avg 1.00 1.00 1.00 27455 weighted avg 1.00 1.00 1.00 27455
MCC: 0.9997339050943962 Cohen's Kappa: 0.9997338926358993
Accuracy: 0.6692693809258227 Classification report: precision recall f1-score support A 0.83 0.94 0.89 331 B 1.00 0.83 0.90 432 C 0.93 0.86 0.89 310 D 0.79 0.90 0.84 245 E 0.80 0.88 0.84 498 F 0.63 0.90 0.74 247 G 0.72 0.81 0.76 348 H 0.84 0.71 0.77 436 I 0.71 0.57 0.63 288 K 0.63 0.37 0.46 331 L 0.78 0.90 0.84 209 M 0.69 0.49 0.58 394 N 0.57 0.44 0.50 291 O 1.00 0.59 0.74 246 P 0.79 0.79 0.79 347 Q 0.58 0.74 0.65 164 R 0.19 0.43 0.26 144 S 0.40 0.61 0.48 246 T 0.38 0.53 0.44 248 U 0.48 0.53 0.51 266 V 0.65 0.49 0.56 346 W 0.34 0.51 0.41 206 X 0.79 0.42 0.55 267 Y 0.67 0.56 0.61 332 accuracy 0.67 7172 macro avg 0.67 0.66 0.65 7172 weighted avg 0.71 0.67 0.67 7172
MCC: 0.6555084229826728 Cohen's Kappa: 0.6542843992274351
As seen in the train vs test evaluation, the Logistic Regression model on the 27k training images results in overfitting. The train accuracy is 100%, while the test accuracy is 67%. This indicates that techniques like data augmentation, regularization, and dimensionality reduction are needed to reduce the liklihood of overfitting.
# Normalizing the data
#X_train_norm = X_train/255
#X_test_norm = X_test/255
# Print the shapes of the augmented data
print(f'X_train shape: {X_train_norm.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'X_test shape: {X_test_norm.shape}')
print(f'y_test shape: {y_test.shape}')
X_train shape: (109820, 784) y_train shape: (109820,) X_test shape: (28688, 784) y_test shape: (28688,)
The below code is used to train a Gaussian Naive Bayes model on the Combined featured training data. Once the model is trained, the evaluate_model() function is used to output the evaluation metrics described in the Methods section above.
# Defining Naive Bayes
gnb = GaussianNB()
gnb.fit(X_train, y_train)
# applying NB on normalized train data
y_pred_train = gnb.predict(X_train)
# applying NB on normalized test data
y_pred_test = gnb.predict(X_test)
# Evaluating on train set
evaluate_model(y_train, y_pred_train, letters)
# Evaluating on test set
evaluate_model(y_test, y_pred_test, letters)
Accuracy: 0.6905299581132762 Classification report: precision recall f1-score support A 0.73 0.69 0.71 4504 B 0.80 0.71 0.75 4040 C 0.90 0.76 0.83 4576 D 0.67 0.58 0.62 4784 E 0.72 0.64 0.68 3828 F 0.82 0.74 0.77 4816 G 0.81 0.74 0.77 4360 H 0.73 0.70 0.71 4052 I 0.64 0.74 0.69 4648 K 0.67 0.60 0.63 4456 L 0.77 0.75 0.76 4964 M 0.64 0.58 0.61 4220 N 0.62 0.53 0.57 4604 O 0.81 0.80 0.81 4784 P 0.88 0.86 0.87 4352 Q 0.84 0.87 0.85 5116 R 0.44 0.66 0.53 5176 S 0.50 0.66 0.56 4796 T 0.56 0.75 0.64 4744 U 0.54 0.53 0.54 4644 V 0.61 0.57 0.59 4328 W 0.73 0.64 0.68 4900 X 0.71 0.75 0.73 4656 Y 0.81 0.69 0.75 4472 accuracy 0.69 109820 macro avg 0.71 0.69 0.69 109820 weighted avg 0.70 0.69 0.69 109820
MCC: 0.6773744850858269 Cohen's Kappa: 0.6769212656002532
Accuracy: 0.5632668711656442 Classification report: precision recall f1-score support A 0.68 0.69 0.68 1324 B 0.76 0.48 0.59 1728 C 0.91 0.59 0.72 1240 D 0.44 0.55 0.49 980 E 0.70 0.56 0.62 1992 F 0.64 0.60 0.62 988 G 0.69 0.64 0.66 1392 H 0.81 0.65 0.72 1744 I 0.53 0.55 0.54 1152 K 0.48 0.46 0.47 1324 L 0.68 0.81 0.74 836 M 0.56 0.46 0.50 1576 N 0.42 0.41 0.42 1164 O 0.65 0.65 0.65 984 P 0.90 0.84 0.87 1388 Q 0.63 0.81 0.71 656 R 0.15 0.44 0.23 576 S 0.26 0.45 0.33 984 T 0.36 0.56 0.44 992 U 0.33 0.32 0.32 1064 V 0.54 0.39 0.45 1384 W 0.48 0.57 0.52 824 X 0.60 0.69 0.64 1068 Y 0.72 0.48 0.58 1328 accuracy 0.56 28688 macro avg 0.58 0.57 0.56 28688 weighted avg 0.61 0.56 0.58 28688
MCC: 0.545281864226818 Cohen's Kappa: 0.544065722941411
The below code describes the process of training the best Logistic Regression model, where Randomized Grid Search CV is used for hyperparameter tuning. The best parameters found using Randomized Grid Search CV are as follows:
Best hyperparameters: {C=0.22564631610840102,max_iter=2391, penalty="l2", solver='newton-cg',warm_start=False}
lr = LogisticRegression(C=0.22564631610840102,
max_iter=2391,
penalty="l2",
solver='newton-cg',
warm_start=False)
lr.fit(X_train, y_train)
# applying Logistic regression and predicting on train
y_pred_lr_train = lr.predict(X_train)
# testing logistic regression on test data
y_pred_lr_test = lr.predict(X_test)
evaluate_model(y_train, y_pred_lr_train, letters)
evaluate_model(y_test, y_pred_lr_test, letters)
Accuracy: 0.7694044800582772 Classification report: precision recall f1-score support A 0.77 0.81 0.79 4504 B 0.82 0.82 0.82 4040 C 0.88 0.87 0.87 4576 D 0.71 0.71 0.71 4784 E 0.78 0.72 0.75 3828 F 0.79 0.79 0.79 4816 G 0.82 0.81 0.81 4360 H 0.80 0.81 0.80 4052 I 0.74 0.74 0.74 4648 K 0.75 0.76 0.75 4456 L 0.84 0.86 0.85 4964 M 0.69 0.63 0.66 4220 N 0.68 0.66 0.67 4604 O 0.84 0.83 0.83 4784 P 0.88 0.89 0.89 4352 Q 0.89 0.90 0.90 5116 R 0.65 0.68 0.66 5176 S 0.66 0.68 0.67 4796 T 0.74 0.76 0.75 4744 U 0.67 0.62 0.64 4644 V 0.72 0.69 0.71 4328 W 0.79 0.83 0.81 4900 X 0.76 0.79 0.78 4656 Y 0.80 0.80 0.80 4472 accuracy 0.77 109820 macro avg 0.77 0.77 0.77 109820 weighted avg 0.77 0.77 0.77 109820
MCC: 0.7593337133846444 Cohen's Kappa: 0.7593119420125211
Accuracy: 0.6532696597880647 Classification report: precision recall f1-score support A 0.72 0.81 0.76 1324 B 0.80 0.65 0.71 1728 C 0.91 0.78 0.84 1240 D 0.63 0.65 0.64 980 E 0.76 0.64 0.69 1992 F 0.63 0.66 0.65 988 G 0.71 0.71 0.71 1392 H 0.87 0.71 0.78 1744 I 0.62 0.58 0.60 1152 K 0.58 0.58 0.58 1324 L 0.73 0.90 0.81 836 M 0.58 0.49 0.53 1576 N 0.44 0.45 0.44 1164 O 0.71 0.68 0.70 984 P 0.89 0.87 0.88 1388 Q 0.72 0.87 0.79 656 R 0.27 0.51 0.35 576 S 0.34 0.42 0.38 984 T 0.57 0.64 0.60 992 U 0.51 0.42 0.47 1064 V 0.68 0.49 0.57 1384 W 0.48 0.79 0.59 824 X 0.66 0.77 0.71 1068 Y 0.71 0.69 0.70 1328 accuracy 0.65 28688 macro avg 0.65 0.66 0.65 28688 weighted avg 0.67 0.65 0.66 28688
MCC: 0.6381457820243219 Cohen's Kappa: 0.637620633120249
The below code describes the process of training the best Random Forest model, where Randomized Grid Search CV is used for hyperparameter tuning. The best parameters found using Randomized Grid Search CV are as follows:
Best hyperparameters: {'n_estimators=20, min_samples_split=10, min_samples_leaf=5, max_features=5, max_depth=5}
#defining rfc with best parameters
rfc =RandomForestClassifier(n_estimators=20, min_samples_split=10, min_samples_leaf=5,
max_features=5, max_depth=5, random_state=42)
rfc.fit(X_train, y_train)
# applying Logistic regression and predicting on train
y_pred_train = rfc.predict(X_train)
# testing logistic regression on test data
y_pred_test = rfc.predict(X_test)
evaluate_model(y_train, y_pred_train, letters)
evaluate_model(y_test, y_pred_test, letters)
Accuracy: 0.5358131487889274 Classification report: precision recall f1-score support A 0.45 0.80 0.58 4504 B 0.77 0.46 0.58 4040 C 0.72 0.78 0.75 4576 D 0.57 0.46 0.51 4784 E 0.63 0.43 0.51 3828 F 0.65 0.59 0.62 4816 G 0.64 0.66 0.65 4360 H 0.70 0.54 0.61 4052 I 0.49 0.56 0.52 4648 K 0.51 0.43 0.46 4456 L 0.49 0.72 0.59 4964 M 0.50 0.42 0.46 4220 N 0.54 0.34 0.41 4604 O 0.65 0.70 0.67 4784 P 0.65 0.78 0.71 4352 Q 0.63 0.83 0.71 5116 R 0.26 0.62 0.37 5176 S 0.39 0.26 0.31 4796 T 0.62 0.45 0.52 4744 U 0.69 0.09 0.16 4644 V 0.59 0.05 0.09 4328 W 0.39 0.68 0.50 4900 X 0.71 0.51 0.59 4656 Y 0.60 0.62 0.61 4472 accuracy 0.54 109820 macro avg 0.58 0.53 0.52 109820 weighted avg 0.57 0.54 0.52 109820
MCC: 0.5180986803161411 Cohen's Kappa: 0.5151415723255848
Accuracy: 0.4210122699386503 Classification report: precision recall f1-score support A 0.39 0.77 0.52 1324 B 0.81 0.34 0.48 1728 C 0.71 0.66 0.68 1240 D 0.39 0.38 0.39 980 E 0.68 0.32 0.44 1992 F 0.46 0.51 0.48 988 G 0.54 0.50 0.52 1392 H 0.78 0.45 0.57 1744 I 0.39 0.43 0.41 1152 K 0.30 0.25 0.28 1324 L 0.38 0.79 0.51 836 M 0.44 0.34 0.38 1576 N 0.30 0.19 0.23 1164 O 0.47 0.57 0.51 984 P 0.71 0.76 0.73 1388 Q 0.35 0.77 0.48 656 R 0.12 0.64 0.21 576 S 0.16 0.12 0.14 984 T 0.33 0.23 0.27 992 U 0.47 0.05 0.09 1064 V 0.52 0.03 0.05 1384 W 0.23 0.65 0.34 824 X 0.57 0.39 0.47 1068 Y 0.47 0.42 0.45 1328 accuracy 0.42 28688 macro avg 0.46 0.44 0.40 28688 weighted avg 0.50 0.42 0.41 28688
MCC: 0.4016037083169605 Cohen's Kappa: 0.3972766531870806
The below code describes the process of training the best Support Vector Machine, where Randomized Grid Search CV is used for hyperparameter tuning. The best parameters found using Randomized Grid Search CV are as follows:
Best hyperparameters: {'kernel': 'poly', 'gamma': 'scale', 'C': .1}
#define svm with best parameters
svm = SVC(kernel = 'poly', gamma = 'scale', C = .1, probability=True)
#fit on training data
svm.fit(X_train,y_train)
#predict on training data
y_pred_train = svm.predict(X_train)
#predict on testing data
y_pred_test = svm.predict(X_test)
evaluate_model(y_train,y_pred_train,letters)
evaluate_model(y_test,y_pred_test,letters)
Accuracy: 0.9360954288836277 Classification report: precision recall f1-score support A 0.95 0.90 0.93 4504 B 0.98 0.95 0.97 4040 C 1.00 0.95 0.97 4576 D 0.94 0.95 0.95 4784 E 0.98 0.83 0.90 3828 F 0.96 0.98 0.97 4816 G 0.97 0.91 0.94 4360 H 0.98 0.94 0.96 4052 I 0.88 0.96 0.92 4648 K 0.98 0.91 0.94 4456 L 0.99 0.97 0.98 4964 M 0.92 0.82 0.87 4220 N 0.81 0.92 0.87 4604 O 0.99 0.98 0.98 4784 P 1.00 0.97 0.98 4352 Q 1.00 0.97 0.99 5116 R 0.88 0.95 0.91 5176 S 0.83 0.94 0.88 4796 T 0.80 0.99 0.88 4744 U 0.93 0.89 0.91 4644 V 0.89 0.93 0.91 4328 W 0.97 0.93 0.95 4900 X 0.96 0.97 0.96 4656 Y 0.99 0.94 0.96 4472 accuracy 0.94 109820 macro avg 0.94 0.93 0.94 109820 weighted avg 0.94 0.94 0.94 109820
MCC: 0.9334368694135377 Cohen's Kappa: 0.9332945727981474
Accuracy: 0.8232361963190185 Classification report: precision recall f1-score support A 0.90 0.89 0.90 1324 B 0.97 0.84 0.90 1728 C 0.99 0.87 0.93 1240 D 0.82 0.88 0.85 980 E 0.92 0.74 0.82 1992 F 0.89 0.93 0.91 988 G 0.90 0.78 0.84 1392 H 0.96 0.89 0.93 1744 I 0.77 0.92 0.84 1152 K 0.93 0.80 0.86 1324 L 0.98 0.96 0.97 836 M 0.80 0.66 0.72 1576 N 0.58 0.75 0.65 1164 O 0.94 0.88 0.91 984 P 1.00 0.95 0.97 1388 Q 0.95 0.96 0.95 656 R 0.44 0.79 0.57 576 S 0.51 0.69 0.58 984 T 0.57 0.89 0.69 992 U 0.81 0.59 0.68 1064 V 0.81 0.74 0.78 1384 W 0.67 0.83 0.74 824 X 0.83 0.90 0.86 1068 Y 0.96 0.80 0.87 1328 accuracy 0.82 28688 macro avg 0.83 0.83 0.82 28688 weighted avg 0.85 0.82 0.83 28688
MCC: 0.8159408493125702 Cohen's Kappa: 0.8152119732513206
The below code describes the process of training the best XGBoost, where Randomized Grid Search CV is used for hyperparameter tuning. The best parameters found using Randomized Grid Search CV are as follows:
Best hyperparameters: {'subsample': 0.4, 'reg_lambda': 2.25, 'reg_alpha': 2, 'min_child_weight': 30, 'max_depth': 8, 'learning_rate': 0.001, 'gamma': 0, 'colsample_bytree': 0.4}
#encoding for XGBoost
from sklearn.preprocessing import LabelEncoder
#encoding
le = LabelEncoder()
le.fit(y_train)
y_train_encoded = le.transform(y_train)
y_test_encoded = le.transform(y_test)
# Create an XGBoost classifier object
xgb_model = xgb.XGBClassifier(subsample=0.4,reg_lambda=2.25,reg_alpha=2,min_child_weight=30,max_depth=8,learning_rate=0.001,gamma=0,colsample_bytree=0.4)
#fitting on train
xgb_model.fit(X_train,y_train_encoded)
with open('/content/drive/Shareddrives/SignLanguageData/XGBoost_Predictions.pkl','rb') as f:
y_train_encoded,y_pred_train,y_test_encoded,y_pred_test = pickle.load(f)
#evaluation
evaluate_model(y_train_encoded,y_pred_train,letters)
evaluate_model(y_test_encoded,y_pred_test,letters)
Accuracy: 0.759278819887088 Classification report: precision recall f1-score support A 0.70 0.81 0.75 4504 B 0.85 0.77 0.81 4040 C 0.92 0.83 0.87 4576 D 0.74 0.65 0.69 4784 E 0.82 0.69 0.75 3828 F 0.84 0.81 0.82 4816 G 0.80 0.83 0.82 4360 H 0.77 0.77 0.77 4052 I 0.74 0.76 0.75 4648 K 0.73 0.68 0.71 4456 L 0.81 0.85 0.83 4964 M 0.75 0.61 0.68 4220 N 0.72 0.64 0.68 4604 O 0.82 0.84 0.83 4784 P 0.88 0.90 0.89 4352 Q 0.82 0.91 0.86 5116 R 0.61 0.76 0.68 5176 S 0.64 0.69 0.66 4796 T 0.70 0.80 0.74 4744 U 0.68 0.59 0.64 4644 V 0.73 0.58 0.65 4328 W 0.71 0.82 0.76 4900 X 0.77 0.78 0.77 4656 Y 0.80 0.79 0.79 4472 accuracy 0.76 109820 macro avg 0.76 0.76 0.76 109820 weighted avg 0.76 0.76 0.76 109820
MCC: 0.7489324962535907 Cohen's Kappa: 0.7487012455759929
Accuracy: 0.5957543223647518 Classification report: precision recall f1-score support A 0.57 0.78 0.66 1324 B 0.82 0.56 0.67 1728 C 0.91 0.73 0.81 1240 D 0.53 0.53 0.53 980 E 0.81 0.61 0.69 1992 F 0.68 0.70 0.69 988 G 0.67 0.67 0.67 1392 H 0.79 0.69 0.74 1744 I 0.61 0.59 0.60 1152 K 0.51 0.51 0.51 1324 L 0.67 0.87 0.76 836 M 0.59 0.41 0.48 1576 N 0.45 0.37 0.41 1164 O 0.67 0.70 0.69 984 P 0.91 0.86 0.88 1388 Q 0.58 0.87 0.70 656 R 0.18 0.48 0.27 576 S 0.30 0.42 0.35 984 T 0.46 0.59 0.52 992 U 0.39 0.31 0.35 1064 V 0.48 0.25 0.33 1384 W 0.39 0.71 0.51 824 X 0.65 0.70 0.68 1068 Y 0.72 0.53 0.61 1328 accuracy 0.60 28688 macro avg 0.60 0.60 0.59 28688 weighted avg 0.63 0.60 0.60 28688
MCC: 0.5792471330287466 Cohen's Kappa: 0.5779678446850007
The below code describes the process of training the stacking ensemble classifier. This ensemble method is composed of the best performing models found using Randomized Grid Search CV for Support Vector Machine, XGBoost, Logistic Regression, and Random Forest. The meta estimator is defined as a Logistic Regression model.
#defining estimators
all_estimators = [
('svm',SVC(kernel = 'poly', gamma = 'auto', C = .1, probability=True)),
('xgb',xgb.XGBClassifier(subsample=0.4,reg_lambda=2.25,reg_alpha=2,min_child_weight=30,max_depth=8,learning_rate=0.001,gamma=0,colsample_bytree=0.4)),
('lr',LogisticRegression(C=0.22564631610840102,max_iter=2391, penalty="l2", solver='newton-cg',warm_start=False)),
('rf',RandomForestClassifier(n_estimators=20, min_samples_split=10, min_samples_leaf=5,max_features=5, max_depth=5, random_state=42))
]
#training stacking classifier
all_stack = StackingClassifier(estimators=all_estimators, final_estimator=LogisticRegression(max_iter=3000))
all_stack.fit(X_train,y_train)
#predictions
y_pred_train = all_stack.predict(X_train)
y_pred_test = all_stack.predict(X_test)
#evaluate model
evaluate_model(y_train,y_pred_train,letters)
evaluate_model(y_test,y_pred_train,letters)
#display image due to computation time
print('Stacking Ensemble Train Evaluation')
display.Image("./figures/stack_eval_train1.png")
Stacking Ensemble Train Evaluation
#display image due to computation time
display.Image("./figures/stack_eval_train2.png")
#display image due to computation time
display.Image("./figures/stack_eval_train3.png")
#display image due to computation time
print('Stacking Ensemble Test Evaluation')
display.Image("./figures/stack_eval_test1.png")
Stacking Ensemble Test Evaluation
#display image due to computation time
display.Image("./figures/stack_eval_test2.png")
#display image due to computation time
display.Image("./figures/stack_eval_test3.png")
HandDetector - A robust Python module called CVZone was created exclusively for hand landmark identification. With its extensive collection of features and capabilities, CVZone makes it easier to find and follow hand landmarks in photos and video feeds. In order to precisely find and identify important human hand features like the fingers, knuckles, and palm center, it makes use of computer vision and machine learning methods.
#Deep Learning Data Generation
import os
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import math
import time
cap = cv2.VideoCapture(0)
detector = HandDetector(maxHands=1)
offset = 20
imgSize = 224
counter = 0
label = "Y"
folder = "../data/external/Data/" + label + "/"
if not os.path.exists(folder):
os.makedirs(folder)
print(f"Directory '{folder}' created successfully")
else:
print(f"Directory '{folder}' already exists")
while True:
success, img = cap.read()
hands, img = detector.findHands(img)
if hands:
hand = hands[0]
x, y, w, h = hand['bbox']
imgWhite = np.ones((imgSize, imgSize, 3), np.uint8) * 255
imgCrop = img[y - offset:y + h + offset, x - offset:x + w + offset]
imgCropShape = imgCrop.shape
aspectRatio = h / w
if aspectRatio > 1:
k = imgSize / h
wCal = math.ceil(k * w)
imgResize = cv2.resize(imgCrop, (wCal, imgSize))
imgResizeShape = imgResize.shape
wGap = math.ceil((imgSize - wCal) / 2)
imgWhite[:, wGap:wCal + wGap] = imgResize
else:
k = imgSize / w
hCal = math.ceil(k * h)
imgResize = cv2.resize(imgCrop, (imgSize, hCal))
imgResizeShape = imgResize.shape
hGap = math.ceil((imgSize - hCal) / 2)
imgWhite[hGap:hCal + hGap, :] = imgResize
cv2.imshow("ImageCrop", imgCrop)
cv2.imshow("ImageWhite", imgWhite)
cv2.imshow("Image", img)
key = cv2.waitKey(1)
if key == ord("s"):
counter += 1
cv2.imwrite(f'{folder}/Image_{time.time()}.jpg', imgWhite)
print(counter)
The provided code focuses on extracting hierarchical and increasingly abstract features from the input images through multiple convolutional and pooling layers. These layers effectively perform feature selection by learning and capturing important patterns and information from the images. The subsequent layers (not shown in the given code) would typically involve fully connected layers and an output layer for the final classification or regression task.
#importing the required libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import pickle
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
import seaborn as sns
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
from keras.callbacks import ReduceLROnPlateau
# 100k shuffled data
with open('../data/external/combined_augmented_data_v3.pkl','rb') as f:
X_train,y_train,X_test,y_test = pickle.load(f)
# Normalize the data
x_train = X_train / 255
x_test = X_test / 255
#printing the shape of data
print(f'X_train shape: {x_train.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'X_test shape: {X_test.shape}')
print(f'y_test shape: {y_test.shape}')
# Reshape X to be a 4D tensor for use in Keras
X = x_train.reshape((x_train.shape[0], x_train.shape[1], x_train.shape[2], 1))
y= y_train.reshape((y_train.shape[0], 1))
#number of labels
num_classes = len(list(map(int,list(np.unique(y_train)))))
#CNN Model
model = Sequential()
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (28,28,1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(50 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(25 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Flatten())
model.summary()
# Extract features from X using the model
features = model.predict(X)
#saving extracted Features into numpy
with open('../data/external/extracted_features_on_v3.npy', 'wb') as f:
np.save(f,features)
The provided code represents a CNN model for image classification. It consists of multiple convolutional and pooling layers for feature extraction, followed by fully connected layers for high-level feature representation and classification. The model is trained using the Adam optimizer and evaluated using categorical cross-entropy loss and accuracy metrics.
#importing required libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import pickle
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
import seaborn as sns
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report,confusion_matrix
from keras.callbacks import ReduceLROnPlateau
from sklearn.preprocessing import LabelBinarizer
#opening pickle file of augmented added data
with open('../data/external/combined_augmented_data_v2.pkl','rb') as f:
X_train,y_train,X_test,y_test = pickle.load(f)
#LabelBinarizer is typically employed to convert categorical labels into binary vectors, enabling easier analysis and computation by machine learning algorithms.
label_binarizer = LabelBinarizer()
y_train = label_binarizer.fit_transform(y_train)
y_test = label_binarizer.fit_transform(y_test)
# Normalize the data
x_train = X_train / 255
x_test = X_test / 255
learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy', patience = 2, verbose=1,factor=0.5, min_lr=0.00001)
#CNN Model
model = Sequential()
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (28,28,1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(50 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(25 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Flatten())
model.add(Dense(units = 512 , activation = 'relu'))
model.add(Dropout(0.3))
model.add(Dense(units = 24 , activation = 'softmax'))
model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])
model.summary()
history = model.fit(x_train,y_train, batch_size = 128 ,epochs = 2 , validation_data = (x_test, y_test) , callbacks = [learning_rate_reduction], verbose=1)
print("Accuracy of the model is - " , model.evaluate(x_test,y_test)[1]*100 , "%")
# save the model to an h5 file
model.save('../data/external/finalModel.h5')
Singular Value Decomposition (SVD) is a matrix factorization technique that is commonly used for dimensionality reduction and feature extraction. It decomposes a matrix into three separate matrices: U, Σ, and V, where U and V are orthogonal matrices and Σ is a diagonal matrix containing the singular values.
In the code below, TruncatedSVD is used to perform SVD with a specified number of components (n_components). The data is first fitted to the pipeline (pipe.fit_transform(X_train)) to learn the SVD model and then transformed to obtain the reduced-dimensional representations. Finally, the test data is transformed using the learned SVD model (pipe.transform(X_test)).
n_components = 10
svd = TruncatedSVD(n_components, n_iter=7, random_state=42)
# Build the pipeline
pipe = Pipeline([('reducer', svd)])
# Fit the pipeline to X_train_sc and transform the data
X_train_svd = pipe.fit_transform(X_train_norm)
X_test_svd = svd.transform(X_test_norm)
The explained variance ratio of the SVD components (linear discriminants) indicate how much information is retained at each component. As a result, the cumulative explained variance can help determine how many components to keep for dimensionality reduction.
# calculate the explained variance ratio for each component
explained_variance_ratio = svd.explained_variance_ratio_
explained_variance_ratio
array([0.57801598, 0.06165046, 0.04856357, 0.03330968, 0.02285816, 0.01864049, 0.01758379, 0.0155315 , 0.01323109, 0.01095916])
# Getting explained variance ratio from the lda model
evr = svd.explained_variance_ratio_
components = range(1, len(evr) + 1)
# Plotting scree plot
fig, ax = plt.subplots(figsize=(8, 5))
ax.bar(x=components, height=evr, label='Explained Variance')
plt.plot(components, np.cumsum(evr), marker='.', color='orange', label='Cumulative Explained Variance')
plt.axhline(y=.95, color='r', linestyle='--', label='0.95 Explained Variance')
plt.xticks(range(1, len(evr) + 1))
plt.title('SVD: Explained Variance')
plt.xlabel('Component')
plt.ylabel('Explained Variance')
plt.legend(fontsize=9)
# Show the plot
plt.show()
fig, ax = plt.subplots(figsize = (8,8))
ax = sns.scatterplot(x = X_train_svd[:,0], y = X_train_svd[:,1], hue = y_train, palette = 'pastel');
handler, _ = ax.get_legend_handles_labels();
plt.legend(handler, letters, bbox_to_anchor = (1, 1));
plt.title('2D Embedding of Sign Language Images')
plt.xlabel('Singular Vector 1');
plt.ylabel('Singular Vector 2');
Looking at the plot above, it can be interpreted that there is an elbow at component 2. And from Scatter plot above it is observed that it does not reasonably well at separating certain letters from others.
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a nonlinear dimensionality reduction technique commonly used for visualizing high-dimensional data in a lower-dimensional space. It aims to preserve the local structure and relationships between data points while creating a compressed representation.
The below code outline the process of setting up the data so that it can be an input into t-SNE using Scikit-Learn.
# Initialize t-SNE object
tsne = TSNE(n_components=2, random_state=0 )
# Apply t-SNE to data
tsne_res = tsne.fit_transform(X_train_norm)
# Plot t-SNE results
fig, ax = plt.subplots(figsize = (10,10))
ax= sns.scatterplot(x = tsne_res[:,0], y = tsne_res[:,1], hue = y_train, palette = sns.hls_palette(10), legend = 'full')
handler, _ = ax.get_legend_handles_labels();
plt.legend(handler, letters, bbox_to_anchor = (1, 1));
#title
plt.title('2D Embedding of Sign Language Images')
#x-axis label
plt.xlabel('TSNE Component 1');
#y-axis label
plt.ylabel('TSNE Component 2');
/var/folders/ph/_s861md14q5c2ykky6q5pgd80000gn/T/ipykernel_15778/1653652376.py:4: UserWarning: The palette list has fewer values (10) than needed (24) and will cycle, which may produce an uninterpretable plot. ax= sns.scatterplot(x = tsne_res[:,0], y = tsne_res[:,1], hue = y_train, palette = sns.hls_palette(10), legend = 'full')
Looking at the plot above, it is observed that it does not reasonably well at separating certain letters from others.