Implementation of t-SNE and UMAP in trainHVT function

Bidesh Ghosh, Alimpan Dey

2024-11-08

1. Background

The HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:

  1. Data Compression: Vector Quantization (VQ), HVQ (Hierarchical Vector Quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.

  2. Data Projection: Dimension projection of the compressed cells to 2D with t-SNE, UMAP and Sammon’s Mapping Algorithm. This step creates a topology preserving map (also called embeddings) coordinates into the desired output dimension.

  3. Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes Heatmap plots for Hierarchical Voronoi Tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks.

  4. Scoring: Scoring data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.

  5. Temporal Analysis and Visualization: A Collection of functions that leverages the capacity of the HVT package by analyzing time series data for its underlying patterns, calculation of transitioning probabilities and the visualizations for the flow of data over time.

What’s New?

This notebook showcases the enhancement made to the trainHVT function through the integration of dimensionality reduction techniques and comprehensive evaluation metrics. These advancements aim to enhance the visualization, analysis, and interpretability of high-dimensional data within the HVT framework.

1. Integration of Advanced Dimensionality Reduction Techniques:

The trainHVT function now includes dimensionality reduction techniques like t-SNE and UMAP, alongside the previously implemented Sammon’s method. This integration enhances the function’s capacity to explore and apply various dimensionality reduction approaches.

2. Integration of Evaluation Metrics:

Dimensionality reduction evaluation metrics help to determine the quality and effectiveness of the dimensionality reduction process by evaluating aspects such as data point proximity, cluster separation, and overall fidelity of the reduced representation.

2. t-Distributed Stochastic Neighbor Embedding

t-SNE is a widely recognized technique for visualizing high-dimensional data in a low-dimensional space, typically two or three dimensions. Developed by Laurens van der Maaten and Geoffrey Hinton, t-SNE is particularly effective at preserving the local structure of the data, ensuring that similar data points are positioned close to one another in the reduced dimensional space.

Advantages of t-SNE

3. Uniform Manifold Approximation and Projection

UMAP is a cutting-edge technique for dimension reduction and data visualization, known for its speed, scalability, and ability to maintain both global and local data structure. Developed by Leland McInnes, John Healy, and James Melville, UMAP has quickly become a favorite among data scientists for its versatility and robust performance across a wide range of applications.

Advantages of UMAP

4. Dimensionality reduction evaluation metrics

Dimensionality reduction evaluation metrics are measures used to assess the effectiveness of dimensionality reduction techniques. They help evaluate how well these techniques preserve the structure, relationships, and quality of the data when reducing its dimensions.

These six metrics are organized into three main categories. Below is a brief overview of each metric included in the trainHVT function:

Structure Preservation Metrics

  1. Trustworthiness:
  1. Continuity:
  1. Sammon’s Stress:

Distance Preservation Metrics

  1. RMSE(Root Mean Square Error):

Human Centered Metrics

  1. Likert Scale:

Ground Truth: We have performed dimensionality reduction techniques on torus data. The underlying structure of this data is a torus, a surface shaped like a doughnut. The true shape of the data in its original high-dimensional space must resemble an annulus(two concentric circles) when properly reduced to two or three dimensions.

  1. Spatial Orientation:

Interpretive Quality Metrics

  1. Silhouette Score:
  1. KNN Retention Score:

Computational Efficiency Metrics

  1. Execution Duration:

5. Notebook Requirements

This chunk verifies the installation of all the necessary packages to successfully run this vignette, if not, installs them and attach all the packages in the session environment.

list.of.packages <- c("DT","plotly", "magrittr", "data.table", "tidyverse", "crosstalk",
                      "kableExtra", "cowplot","gdata","tidyverse", "ggplot2", "gridExtra","tibble","HVT")

new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[, "Package"])]
if (length(new.packages)){install.packages(new.packages, repos='https://cloud.r-project.org/')}
invisible(lapply(list.of.packages, library, character.only = TRUE))

6. Data Understanding

First, let us see how to generate data for torus. We are using a library geozoo for this purpose. Geo Zoo (stands for Geometric Zoo) is a compilation of geometric objects ranging from three to 10 dimensions. Geo Zoo contains regular or well-known objects, e.g., cube and sphere, and some abstract objects, e.g., Boy’s surface, Torus and Hyper-Torus.

Here, we will generate a 3D torus (a torus is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanar with the circle) with 12000 points.

The torus dataset includes the following columns:

Lets, explore the raw torus dataset containing 12000 points. For the sake of brevity, we are displaying the 10 rows.

set.seed(124)
torus <- geozoo::torus(p = 3,n = 12000)
torus_df <- data.frame(torus$points)
colnames(torus_df) <- c("x","y","z")

torus_df1 <- torus_df %>% round(4)
colnames(torus_df1) <- c("x","y","z")
torus_df1$Row.No <- as.numeric(row.names(torus_df))
torus_df1 <- torus_df1 %>% dplyr::select(x,y,z)
displayTable(head(torus_df1, 10))
x y z
1.0055 0.5779 0.5422
-1.1971 -0.1153 0.6035
0.2963 1.7116 0.9648
-0.8651 -0.5048 0.0571
1.6057 -0.8437 0.9825
0.3565 -2.5977 -0.7830
0.1319 -2.5860 -0.8079
-2.4760 1.5867 0.3388
-1.7364 -0.9281 -0.9995
2.2525 -1.9531 0.1922

Now, let’s try to visualize the torus dataset in 3D.

plot_ly(x = torus_df1$x, y = torus_df1$y, z = torus_df1$z, type = 'scatter3d',mode = 'markers',
marker = list(color = torus_df1$z,colorscale = c('#F50000', '#000FFF'),showscale = TRUE,size = 3,colorbar = list(title = 'z'))) %>%
layout(scene = list(xaxis = list(title = 'x'),yaxis = list(title = 'y'),zaxis = list(title = 'z'),
aspectratio = list(x = 1, y = 1, z = 0.5)))

Figure 1: 3D Torus

7. Model Training and Visualization

The core function for compression in the workflow is hvq (hierarchical vector quantization), which is called within the trainHVT function. we have a parameter called ‘quantization error’. This parameter acts as a threshold and determines the number of levels in the hierarchy. It means that, if there are ‘n’ number of levels in the hierarchy, then all the clusters formed till this level will have quantization error equal to or less than the threshold quantization error. The user can define the number of clusters in the first level of the hierarchy and then each cluster in the upcoming levels is subdivided into the same number of clusters. This process continues for all the clusters until the threshold quantization error is met. The output of this technique will be hierarchically arranged vector quantized data.

7.1 Understanding trainHVT() Function: Parameters and Hyperparameters for Dimensionality Reduction Methods

The trainHVT() function is used to train a hierarchical Voronoi tessellation (HVT) model. When integrating dimensionality reduction techniques such as t-SNE, UMAP, and Sammon’s projection, it’s essential to understand both the parameters of the trainHVT() function and the specific hyperparameters for each dimensionality reduction method.

trainHVT(
  dataset, min_compression_perc, n_cells,
  depth, quant.err,normalize,
  distance_metric,error_metric,quant_method,
  scale_summary, diagnose, hvt_validation,
  train_validation_split_ratio, projection.scale,
  dim_reduction_method,
  tsne_perplexity,tsne_theta,tsne_verbose,
  tsne_eta,tsne_max_iter,
  umap_n_neighbors,umap_min_dist
)

Each of the parameters of trainHVT function has been explained below:

Hyperparameters for different dimensionality reduction methods:

The trainHVT() function allows for fine-tuning t-SNE hyperparameters, including perplexity, learning_rate, n_iter, and metric, which are managed using the Rtsne library. Adjusting these parameters can optimize the balance between local and global data structures, convergence speed, and distance metrics for effective dimensionality reduction.:

The trainHVT() function leverages the UMAP hyperparameters—n_neighbors, min_dist—through the uwot library. Fine-tuning these parameters helps control neighborhood size, spacing in the reduced space, and overall structure preservation in dimensionality reduction.

7.2 A guidance to choose the dimensionality reduction methods

The output of the trainHVT function (list of 7 elements) has been explained below with an image attached for clear understanding.


NOTE: Here the attached ‘Figure:2’ is the example snapshot of the output list generated from trainHVT

Figure 2: The Output list generated by trainHVT function.

Figure 2: The Output list generated by trainHVT function.

We will use the trainHVT function to compress our data while preserving essential features of the dataset. Our goal is to achieve data compression to atleast 80%. In situations where the compression ratio does not meet the desired target, we can explore adjusting the model parameters as a potential solution. This involves making modifications to parameters such as the quantization error threshold or increasing the number of cells and then rerunning the trainHVT function.

7.2 Training and Visualization of t-SNE

t-SNE is a powerful technique for visualizing high-dimensional data by reducing it to two or three dimensions while preserving local structures. By performing and plotting t-SNE, intricate patterns and relationships within the data can be effectively explored and interpreted.

7.2.1 Performing t-SNE

Here, we will perform t-SNE as dimensionality reduction technique in trainHVT function with n_cells=20.

We have passed the below mentioned model parameters along with depth=1, 2, 3 respectively to trainHVT function.

Model Parameters

  • dataset = torus_df1,
  • n_cells = 20,
  • quant.err = 0.1,
  • normalize = TRUE,
  • distance_metric = “L2_Norm”,
  • error_metric = “mean”,
  • quant_method = “kmeans”,
  • dim_reduction_method = “tsne”,
  • tsne_perplexity = 6,
  • tsne_theta = 0.5,
  • tsne_verbose = TRUE,
  • tsne_eta = 200,
  • tsne_max_iter = 1000

Performing t-SNE on the torus data using trainHVT function with depth=1

# Apply trainHVT to the simulated data with dim_reduction_method="tsne" and depth=1
  hvt_results_tsne1 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,               
  depth = 1,                 
  quant.err = 0.1,           
  normalize = TRUE,         
  distance_metric = "L2_Norm",  
  error_metric = "mean",      
  quant_method = "kmeans",   
  dim_reduction_method = "tsne", 
  tsne_perplexity = 6,
  tsne_theta = 0.5,
  tsne_verbose = TRUE,
  tsne_eta = 200,
  tsne_max_iter = 1000
  )

Performing t-SNE on the torus data using trainHVT function with depth=2

# Apply trainHVT to the simulated data with dim_reduction_method="tsne" and depth=2
hvt_results_tsne2 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,         
  depth = 2,                
  quant.err = 0.1,           
  normalize = TRUE,          
  distance_metric = "L2_Norm",   
  error_metric = "mean",    
  quant_method = "kmeans",   
  dim_reduction_method = "tsne",
  tsne_perplexity = 6,
  tsne_theta = 0.5,
  tsne_verbose = TRUE,
  tsne_eta = 200,
  tsne_max_iter = 1000
)

Performing t-SNE on the torus data using trainHVT function with depth=3

# Apply trainHVT to the simulated data with dim_reduction_method="tsne" and depth=3
hvt_results_tsne3 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,         
  depth = 3,                
  quant.err = 0.1,           
  normalize = TRUE,          
  distance_metric = "L2_Norm",   
  error_metric = "mean",    
  quant_method = "kmeans",   
  dim_reduction_method = "tsne",
  tsne_perplexity = 6,
  tsne_theta = 0.5,
  tsne_verbose = TRUE,
  tsne_eta = 200,
  tsne_max_iter = 1000
)

7.2.2 Plotting the outcome of trainHVT function with dim_reduction_method=“tsne” using plotHVT

Now let’s plot all the features for each cell at level 1,2 and 3 respectively as a Heatmap for better visualization.

The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.

tSNE_depth1

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

tSNE_depth1_x = plotHVT(hvt_results_tsne1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        child.level = 1, 
        hmap.cols = 'x',
        centroid.color = c("navyblue"),
        cell_id = TRUE,
        title = "2D projection with t-SNE as dim_reduction_method, depth=1 and hmap.cols='x'")

tSNE_depth1_y = plotHVT(hvt_results_tsne1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        centroid.color = c("navyblue"),
        plot.type = "2Dheatmap", 
        child.level = 1,
        hmap.cols = 'y',
        cell_id = TRUE,
        title = "2D projection with t-SNE as dim_reduction_method, depth=1 and hmap.cols='y'")

tSNE_depth1_z = plotHVT(hvt_results_tsne1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),centroid.color = c("navyblue"),
        plot.type = "2Dheatmap", 
        child.level = 1,
        hmap.cols = 'z',
        cell_id = TRUE,
        title = "2D projection with t-SNE as dim_reduction_method, depth=1 and hmap.cols='z'")

tSNE_depth1 = grid.arrange(tSNE_depth1_x, tSNE_depth1_y, tSNE_depth1_z, ncol=3)
Figure 3: 2D projection with tsne as dim_reduction_method and depth=1

Figure 3: 2D projection with tsne as dim_reduction_method and depth=1

tSNE_depth2

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

tSNE_depth2_x =plotHVT(hvt_results_tsne2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("navyblue","steelblue"), 
        centroid.size = c(0.2, 0.1) ,
        child.level = 2, 
        hmap.cols = 'x',
        n_cells.hmap = 10,
        title = "2D projection with tsne as dim_reduction_method, depth=2 and hmap.cols='x'")

tSNE_depth2_y =plotHVT(hvt_results_tsne2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("navyblue","steelblue"), 
        child.level = 2,
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'y',
        n_cells.hmap = 10,
        title = "2D projection with tsne as dim_reduction_method, depth=2 and hmap.cols='y'")

tSNE_depth2_z =plotHVT(hvt_results_tsne2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue","steelblue"), 
        child.level = 2,
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'z',
        n_cells.hmap = 10,
        title = "2D projection with tsne as dim_reduction_method, depth=2 and hmap.cols='z'")

tSNE_depth2 = grid.arrange(tSNE_depth2_x, tSNE_depth2_y, tSNE_depth2_z, ncol=3)
Figure 4: 2D projection with tsne as dim_reduction_method, depth=2

Figure 4: 2D projection with tsne as dim_reduction_method, depth=2

tSNE_depth3

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

tSNE_depth3_x =plotHVT(hvt_results_tsne3,
        line.width = c(0.3, 0.2, 0.1),
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level= 3,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "x",
        n_cells.hmap = 10,
        title = "2D projection with tsne as dim_reduction_method, depth=3 and hmap.cols='x'")

tSNE_depth3_y =plotHVT(hvt_results_tsne3,
        line.width = c(0.3, 0.2, 0.1),
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level= 3,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "y",
        n_cells.hmap = 10,
        title = "2D projection with tsne as dim_reduction_method, depth=3 and hmap.cols='y'")

tSNE_depth3_z =plotHVT(hvt_results_tsne3,
        line.width = c(0.3, 0.2, 0.1),
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level= 3,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "z",
        n_cells.hmap = 10 ,
        title = "2D projection with tsne as dim_reduction_method, depth=3 and hmap.cols='z'")

tSNE_depth3 = grid.arrange(tSNE_depth3_x, tSNE_depth3_y, tSNE_depth3_z, ncol = 3)
Figure 5: 2D projection with tsne as dim_reduction_method, depth=3

Figure 5: 2D projection with tsne as dim_reduction_method, depth=3

7.3 Training and Visualization of UMAP

UMAP is a powerful technique for visualizing high-dimensional data by reducing it to two or three dimensions while preserving both global and local structures. By performing and plotting UMAP, intricate patterns and relationships within the data can be effectively explored and interpreted.

7.3.1 Performing UMAP

Here, we will perform UMAP as dimensionality reduction technique in trainHVT function with n_cells=20.

We have passed the below mentioned model parameters along with depth=1, 2, 3 respectively to trainHVT function.

Model Parameters

  • dataset = torus_df1,
  • n_cells = 20,
  • quant.err = 0.1,
  • normalize = TRUE,
  • distance_metric = “L2_Norm”,
  • error_metric = “mean”,
  • quant_method = “kmeans”,
  • dim_reduction_method = “umap”,
  • umap_n_neighbors = 6,
  • umap_min_dist = 0.2

Performing UMAP on the torus data using trainHVT function with depth=1

# Apply trainHVT to the simulated data with dim_reduction_method="umap" and depth=1
hvt_results_umap1 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,          
  depth = 1,                
  quant.err = 0.1,           
  normalize = TRUE,       
  distance_metric = "L2_Norm",
  error_metric = "mean",      
  quant_method = "kmeans",   
  dim_reduction_method = "umap",
  umap_n_neighbors = 6,
  umap_min_dist = 0.2
)

Performing UMAP on the torus data using trainHVT function with depth=2

# Apply trainHVT to the simulated data with dim_reduction_method="umap" and depth=2
hvt_results_umap2 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,          
  depth = 2,                
  quant.err = 0.1,           
  normalize = TRUE,       
  distance_metric = "L2_Norm",
  error_metric = "mean",      
  quant_method = "kmeans",   
  dim_reduction_method = "umap",
  umap_n_neighbors = 6,
  umap_min_dist = 0.2
)

Performing UMAP on the torus data using trainHVT function with depth=3

# Apply trainHVT to the simulated data with dim_reduction_method="umap" and depth=3
hvt_results_umap3 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,          
  depth = 3,                
  quant.err = 0.1,           
  normalize = TRUE,       
  distance_metric = "L2_Norm",
  error_metric = "mean",      
  quant_method = "kmeans",   
  dim_reduction_method = "umap",
  umap_n_neighbors = 6,
  umap_min_dist = 0.2
)

7.3.2 Plotting the outcome of trainHVT function with dim_reduction_method=“UMAP” using plotHVT

Now let’s plot all the features for each cell at level 1,2 and 3 respectively as a Heatmap for better visualization.

The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.

UMAP_depth1

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

UMAP_depth1_x=plotHVT(hvt_results_umap1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'x',
        cell_id = TRUE,
        title = "2D projection with UMAP as dim_reduction_method, depth=1 and hmap.cols='x'")

UMAP_depth1_y=plotHVT(hvt_results_umap1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'y',
        cell_id = TRUE,
        title = "2D projection with UMAP as dim_reduction_method, depth=1 and hmap.cols='y'")

UMAP_depth1_z=plotHVT(hvt_results_umap1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'z',
        cell_id = TRUE,
        title = "2D projection with UMAP as dim_reduction_method, depth=1 and hamp.cols='z'")
UMAP_depth1 = grid.arrange(UMAP_depth1_x, UMAP_depth1_y, UMAP_depth1_z, ncol = 3)
Figure 6: 2D projection with UMAP as dim_reduction_method, depth=1

Figure 6: 2D projection with UMAP as dim_reduction_method, depth=1

UMAP_depth2

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

UMAP_depth2_x=plotHVT(hvt_results_umap2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue","steelblue"),
        child.level = 2,
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'x',
        n_cells.hmap = 10,
        title = "2D projection with UMAP as dim_reduction_method, depth=2 and hmap.cols='x'")

UMAP_depth2_y=plotHVT(hvt_results_umap2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue","steelblue"),
        child.level = 2,
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'y',
        n_cells.hmap = 10,
        title = "2D projection with UMAP as dim_reduction_method, depth=2 and hmap.cols='y'")

UMAP_depth2_z=plotHVT(hvt_results_umap2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue","steelblue"),
        child.level = 2,
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'z',
        n_cells.hmap = 10,
        title = "2D projection with UMAP as dim_reduction_method, depth=2 and hmap.cols='z'")

UMAP_depth2 = grid.arrange(UMAP_depth2_x, UMAP_depth2_y, UMAP_depth2_z, ncol= 3)
Figure 7: 2D projection with UMAP as dim_reduction_method, depth=2

Figure 7: 2D projection with UMAP as dim_reduction_method, depth=2

UMAP_depth3

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

UMAP_depth3_x=plotHVT(hvt_results_umap3,
        line.width = c(0.3, 0.2,0.1) ,
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level = 3,
        n_cells.hmap = 10,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "x",
        title = "2D projection with UMAP as dim_reduction_method, depth=3 and hmap.cols='x'")

UMAP_depth3_y=plotHVT(hvt_results_umap3,
        line.width = c(0.3, 0.2,0.1) ,
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level = 3,
        n_cells.hmap = 10,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "y",
        title = "2D projection with UMAP as dim_reduction_method, depth=3 and hmap.cols='y'")

UMAP_depth3_z=plotHVT(hvt_results_umap3,
        line.width = c(0.3, 0.2,0.1) ,
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level = 3,
        n_cells.hmap = 10,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "z",
        title = "2D projection with UMAP as dim_reduction_method, depth=3 and hmap.cols='z'")
UMAP_depth3 = grid.arrange(UMAP_depth3_x, UMAP_depth3_y, UMAP_depth3_z, ncol= 3)
Figure 8: 2D projection with UMAP as dim_reduction_method, depth=3

Figure 8: 2D projection with UMAP as dim_reduction_method, depth=3

7.4 Training and Visualization of Sammon’s projection

Sammon’s mapping is a powerful technique for visualizing high-dimensional data by reducing it to two or three dimensions while preserving the structure of the data as much as possible. By performing and plotting Sammon’s mapping, intricate patterns and relationships within the data can be effectively explored and interpreted.

7.4.1 Performing Sammon’s Mapping

Here, we will perform Sammon’s mapping as dimensionality reduction technique in trainHVT function with n_cells=20.

We have passed the below mentioned model parameters along with depth=1, 2, 3 respectively to trainHVT function.

Model Parameters

  • dataset = torus_df1,
  • n_cells = 20,
  • quant.err = 0.1,
  • normalize = TRUE,
  • distance_metric = “L2_Norm”,
  • error_metric = “mean”,
  • quant_method = “kmeans”,
  • dim_reduction_method = “sammon”

Performing Sammon’s projection on the torus data using trainHVT function with depth=1

# Apply trainHVT to the simulated data with dim_reduction_method="sammon" and depth=1
hvt_results_sammon1 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,         
  depth = 1,                
  quant.err = 0.1,           
  normalize = TRUE,          
  distance_metric = "L2_Norm",   
  error_metric = "mean",    
  quant_method = "kmeans",   
  dim_reduction_method = "sammon" 
)

Performing Sammon’s projection on the torus data using trainHVT function with depth=2

# Apply trainHVT to the simulated data with dim_reduction_method="sammon" and depth=2
hvt_results_sammon2 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,         
  depth = 2,                
  quant.err = 0.1,           
  normalize = TRUE,          
  distance_metric = "L2_Norm",   
  error_metric = "mean",    
  quant_method = "kmeans",   
  dim_reduction_method = "sammon" 
)

Performing Sammon’s projection on the torus data using trainHVT function with depth=3

# Apply trainHVT to the simulated data with dim_reduction_method="sammon" and depth=3
hvt_results_sammon3 <- trainHVT(
  dataset = torus_df1,
  n_cells = 20,         
  depth = 3,                
  quant.err = 0.1,           
  normalize = TRUE,          
  distance_metric = "L2_Norm",   
  error_metric = "mean",    
  quant_method = "kmeans",   
  dim_reduction_method = "sammon" 
)

7.4.2 Plotting the outcome of trainHVT function with dim_reduction_method=“sammon” using plotHVT

Now let’s plot all the features for each cell at level 1,2 and 3 respectively as a Heatmap for better visualization.

The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.

Sammon_depth1

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

Sammon_depth1_x = plotHVT(hvt_results_sammon1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        centroid.color  = c("navyblue"),
        plot.type = "2Dheatmap", child.level = 1, hmap.cols = 'x',cell_id = TRUE,
        title = "2D projection with Sammon as dim_reduction_method with depth=1 and hmap.cols='x'")

Sammon_depth1_y = plotHVT(hvt_results_sammon1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        centroid.color  = c("navyblue"),
        plot.type = "2Dheatmap", child.level = 1, hmap.cols = 'y',cell_id = TRUE,
        title = "2D projection with Sammon as dim_reduction_method with depth=1 and hmap.cols='y'")

Sammon_depth1_z = plotHVT(hvt_results_sammon1,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        centroid.color  = c("navyblue"),
        plot.type = "2Dheatmap", child.level = 1, hmap.cols = 'z',cell_id = TRUE,
        title = "2D projection with Sammon as dim_reduction_method with depth=1 and hmap.cols='z'")

Sammon_depth1 = grid.arrange(Sammon_depth1_x, Sammon_depth1_y, Sammon_depth1_z, ncol=3)
Figure 9: 2D projection with Sammon as dim_reduction_method, depth=1

Figure 9: 2D projection with Sammon as dim_reduction_method, depth=1

Sammon_depth2

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

Sammon_depth2_x = plotHVT(hvt_results_sammon2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue","steelblue"),
        child.level = 2, 
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'x',
        title = "2D projection with Sammon as dim_reduction_method with depth=2 and hmap.cols='x'")

Sammon_depth2_y = plotHVT(hvt_results_sammon2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("navyblue","steelblue"),
        child.level = 2, 
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'y',
        title = "2D projection with Sammon as dim_reduction_method with depth=2 and hmap.cols='y'")

Sammon_depth2_z = plotHVT(hvt_results_sammon2,
        line.width = c(0.2, 0.1) ,
        color.vec = c("navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue","steelblue"),
        child.level = 2, 
        centroid.size = c(0.2, 0.1) ,
        hmap.cols = 'z',
        title = "2D projection with Sammon as dim_reduction_method with depth=2 and hmap.cols='z'")
Sammon_depth2 = grid.arrange(Sammon_depth2_x, Sammon_depth2_y, Sammon_depth2_z, ncol=3)
Figure 10: 2D projection with Sammon as dim_reduction_method, depth=2

Figure 10: 2D projection with Sammon as dim_reduction_method, depth=2

Sammon_depth3

Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.

Sammon_depth3_x = plotHVT(hvt_results_sammon3,
        line.width = c(0.3, 0.2,0.1) ,
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level = 3, 
        hmap.cols = "x",
        centroid.size = c(0.3, 0.2, 0.1),
        n_cells.hmap = 20,
        title = "2D projection with Sammon as dim_reduction_method with depth=3 and hmap.cols='x'")

Sammon_depth3_y = plotHVT(hvt_results_sammon3,
        line.width = c(0.3, 0.2,0.1) ,
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level = 3, 
        hmap.cols = "y",
        centroid.size = c(0.3, 0.2, 0.1),
        n_cells.hmap = 20,
        title = "2D projection with Sammon as dim_reduction_method with depth=3 and hmap.cols='y'")

Sammon_depth3_z = plotHVT(hvt_results_sammon3,
        line.width = c(0.3, 0.2,0.1) ,
        color.vec = c("#0047ab","navyblue","steelblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("#0047ab","navyblue","steelblue"),
        child.level = 3,
        centroid.size = c(0.3, 0.2, 0.1),
        hmap.cols = "z",
        n_cells.hmap = 20,
        title = "2D projection with Sammon as dim_reduction_method with depth=3 and hmap.cols='z'")

Sammon_depth3 = grid.arrange(Sammon_depth3_x, Sammon_depth3_y, Sammon_depth3_z, ncol =3)
Figure 11: 2D projection with Sammon as dim_reduction_method, depth=3

Figure 11: 2D projection with Sammon as dim_reduction_method, depth=3

7.5 Visual Comparison of t-SNE, UMAP and Sammon’s Projection on Torus Dataset in trainHVT with 20 cells

In the context of applying dimensionality reduction techniques in the trainHVT function on torus dataset, a visual comparison of three methods (t-SNE, UMAP, and Sammons) can provide valuable insights on their performance. With n_cells set to 20, these methods are evaluated based on how effectively the topological and geometric features are preserved when projected in lower dimensions. We have drawn the 2D Heatmaps with respect to “x”, “y” and “z” columns from torus dataset when depth is 1,2 and 3 respectively.

depth=1

Here, we have drawn the 2D Heatmap with respect to “x” column.

grid.arrange(tSNE_depth1_x, UMAP_depth1_x, Sammon_depth1_x, ncol=3)
Figure 12: From left to right tSNE_depth1, UMAP_depth1, Sammon_depth1

Figure 12: From left to right tSNE_depth1, UMAP_depth1, Sammon_depth1

Here, we have drawn the 2D Heatmap with respect to “y” column.

grid.arrange(tSNE_depth1_y, UMAP_depth1_y, Sammon_depth1_y, ncol=3)
Figure 13: From left to right tSNE_depth1, UMAP_depth1, Sammon_depth1

Figure 13: From left to right tSNE_depth1, UMAP_depth1, Sammon_depth1

Here, we have drawn the 2D Heatmap with respect to “z” column.

grid.arrange(tSNE_depth1_z, UMAP_depth1_z, Sammon_depth1_z, ncol=3)
Figure 14: Fromt left to right SNE_depth1, UMAP_depth1, Sammon_depth1

Figure 14: Fromt left to right SNE_depth1, UMAP_depth1, Sammon_depth1

depth=2

Here, we have drawn the 2D Heatmap with respect to “x” column.

grid.arrange(tSNE_depth2_x, UMAP_depth2_x, Sammon_depth2_x, ncol=3)
Figure 15: From left to right tSNE_depth2, UMAP_depth2, Sammon_depth2

Figure 15: From left to right tSNE_depth2, UMAP_depth2, Sammon_depth2

Here, we have drawn the 2D Heatmap with respect to “y” column.


grid.arrange(tSNE_depth2_y, UMAP_depth2_y, Sammon_depth2_y, ncol=3)
Figure 16: From left to right tSNE_depth2, UMAP_depth2, Sammon_depth2

Figure 16: From left to right tSNE_depth2, UMAP_depth2, Sammon_depth2

Here, we have drawn the 2D Heatmap with respect to “z” column.

grid.arrange(tSNE_depth2_z, UMAP_depth2_z, Sammon_depth2_z, ncol=3)
Figure 17: From left to right tSNE_depth2, UMAP_depth2, Sammon_depth2

Figure 17: From left to right tSNE_depth2, UMAP_depth2, Sammon_depth2

depth=3

Here, we have drawn the 2D Heatmap with respect to “x” column.

grid.arrange(tSNE_depth3_x, UMAP_depth3_x, Sammon_depth3_x, ncol=3)
Figure 18: From left to right tSNE_depth3, UMAP_depth3, Sammon_depth3

Figure 18: From left to right tSNE_depth3, UMAP_depth3, Sammon_depth3

Here, we have drawn the 2D Heatmap with respect to “y” column.

grid.arrange(tSNE_depth3_y, UMAP_depth3_y, Sammon_depth3_y, ncol=3)
Figure 19: From left to right tSNE_depth3, UMAP_depth3, Sammon_depth3

Figure 19: From left to right tSNE_depth3, UMAP_depth3, Sammon_depth3

Here, we have drawn the 2D Heatmap with respect to “z” column.

grid.arrange(tSNE_depth3_z, UMAP_depth3_z, Sammon_depth3_z, ncol=3)
Figure 20: From left to right tSNE_depth3, UMAP_depth3, Sammon_depth3

Figure 20: From left to right tSNE_depth3, UMAP_depth3, Sammon_depth3

8. Evaluation of t-SNE, UMAP and Sammon’s projection

For evaluation purpose we have set n_cells to 100 and depth to 1, so that the trainHVT function can capture more detailed hierarchical structures, allowing for a comprehensive visual and analytical comparison of how well each technique retains the torus’s geometric properties in the reduced dimensional space.

8.1 Training

Performing t-SNE, UMAP and Sammon

We have passed the below mentioned model parameters to trainHVT function.

Model Parameters

for dim_reduction_method = “tsne”,

for dim_reduction_method = “umap”,

# Apply trainHVT to the simulated data with dim_reduction_method="tsne", depth=1 and n_cells=100
set.seed(123)
hvt_results_tsne <- trainHVT(
  dataset = torus_df1,
  n_cells = 100,               
  depth = 1,                 
  quant.err = 0.1,           
  normalize = TRUE,         
  distance_metric = "L2_Norm",  
  error_metric = "mean",      
  quant_method = "kmeans",   
  dim_reduction_method = "tsne", 
  tsne_perplexity = 30,
  tsne_theta = 0.5,
  tsne_verbose = TRUE,
  tsne_eta = 200,
  tsne_max_iter = 1000
  )

Compression summary

displayTable(hvt_results_tsne[[3]][["compression_summary"]])
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 76 0.76 n_cells: 100 quant.err: 0.1 distance_metric: L2_Norm error_metric: mean quant_method: kmeans
# Apply trainHVT to the simulated data with dim_reduction_method="umap", depth=1 and n_cells=100
set.seed(123)
hvt_results_umap <- trainHVT(
  dataset = torus_df1,
  n_cells = 100,          
  depth = 1,                
  quant.err = 0.1,           
  normalize = TRUE,       
  distance_metric = "L2_Norm",
  error_metric = "mean",      
  quant_method = "kmeans",   
  dim_reduction_method = "umap",
  umap_n_neighbors = 23,
  umap_min_dist = 0.2)

Compression summary

displayTable(hvt_results_umap[[3]][["compression_summary"]])
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 76 0.76 n_cells: 100 quant.err: 0.1 distance_metric: L2_Norm error_metric: mean quant_method: kmeans
# Apply trainHVT to the simulated data with dim_reduction_method="sammon", depth=1 and n_cells=100
set.seed(123)
hvt_results_sammon <- trainHVT(
  dataset = torus_df1,
  n_cells = 100,         
  depth = 1,                
  quant.err = 0.1,           
  normalize = TRUE,          
  distance_metric = "L2_Norm",   
  error_metric = "mean",    
  quant_method = "kmeans",   
  dim_reduction_method = "sammon" 
)

Compression summary

displayTable(hvt_results_sammon[[3]][["compression_summary"]])
segmentLevel noOfCells noOfCellsBelowQuantizationError percentOfCellsBelowQuantizationErrorThreshold parameters
1 100 76 0.76 n_cells: 100 quant.err: 0.1 distance_metric: L2_Norm error_metric: mean quant_method: kmeans

8.2 Visual comparision of t-SNE, UMAP and Sammon in trainHVT function for Human Centered Metrics on Torus Data with 100 cells and depth 1

Now let’s plot all the features for each cell at level one as a Heatmap for better visualization.

The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.

The underlying structure of this data is a torus, a surface shaped like a doughnut. The true shape of the data in its original high-dimensional space must resemble an annulus(two concentric circles) when properly reduced to two or three dimensions.

2DHeatmap of Column X
tsne_x = plotHVT(hvt_results_tsne,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'x',
        cell_id = TRUE,
        title = "2D projection with t-SNE as dim_reduction_method, depth=1, hmap.cols='x' and n_cells=100")
umap_x = plotHVT(hvt_results_umap,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1, 
        hmap.cols = 'x',
        cell_id = TRUE,
        title = "2D projection with UMAP as dim_reduction_method, depth=1, hmap.cols='x' and n_cells=100")
Sammon_x = plotHVT(hvt_results_sammon,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1, 
        hmap.cols = 'x',
        cell_id = TRUE,
        title = "2D projection with Sammon as dim_reduction_method, depth=1, hmap.cols='x' and n_cells=100")

plot_x = grid.arrange(tsne_x, umap_x, Sammon_x, ncol=3)
Figure 21: From left to right tsne_x, umap_x, Sammon_x with n_cells=100 and depth=1

Figure 21: From left to right tsne_x, umap_x, Sammon_x with n_cells=100 and depth=1

2DHeatmap of Column Y
tsne_y = plotHVT(hvt_results_tsne,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'y',
        cell_id = TRUE,
        title = "2D projection with t-SNE as dim_reduction_method, depth=1, hmap.cols='y' and n_cells=100")

umap_y=plotHVT(hvt_results_umap,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'y',
        cell_id = TRUE,
        title = "2D projection with UMAP as dim_reduction_method, depth=1, hmap.cols='y' and n_cells=100")

Sammon_y = plotHVT(hvt_results_sammon,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'y',
        cell_id = TRUE,
        title = "2D projection with Sammon as dim_reduction_method, depth=1, hmap.cols='y' and n_cells=100")

plot_y = grid.arrange(tsne_y, umap_y, Sammon_y, ncol = 3)
Figure 22: From left to right tsne_y, umap_y, Sammon_y with n_cells=100 and depth=1

Figure 22: From left to right tsne_y, umap_y, Sammon_y with n_cells=100 and depth=1

2DHeatmap of Column Z
set.seed(123)
tsne_z = plotHVT(hvt_results_tsne,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1, 
        hmap.cols = 'z',
        cell_id = TRUE,
        title = "2D projection with t-SNE as dim_reduction_method, depth=1, hmap.cols='z' and n_cells=100")

umap_z=plotHVT(hvt_results_umap,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap", 
        centroid.color = c("navyblue"),
        child.level = 1,
        hmap.cols = 'z',
        cell_id = TRUE,
        title = "2D projection with UMAP as dim_reduction_method with depth=1, hamp.cols='z' and n_cells=100")


Sammon_z = plotHVT(hvt_results_sammon,
        line.width = c(0.2) ,
        color.vec = c("navyblue"),
        plot.type = "2Dheatmap",
        centroid.color = c("navyblue"),
        child.level = 1, 
        hmap.cols = 'z',
        cell_id = TRUE,
        title = "2D projection with Sammon as dim_reduction_method with depth=1 hmap.cols='z' and n_cells=100")

plot_z = grid.arrange(tsne_z, umap_z, Sammon_z, ncol=3)
Figure 23: From left to right tsne_z, umap_z, Sammon_z with n_cells=100 and depth=1

Figure 23: From left to right tsne_z, umap_z, Sammon_z with n_cells=100 and depth=1

Human Centered Metric: Likert Scale[1-3]

We presented the 2DHeatmap of column Z to five different individuals and informed of the ground truth i.e. the underlying structure of the data forms a torus—a surface resembling a doughnut. When appropriately reduced to two or three dimensions, the true shape of this high-dimensional data should resemble an annulus (two concentric circles). Afterward, participants were asked to provide their scores from 1 to 3 based on their observations.

  • 1 indicates a poor projection - significant distortion or overlap; unclear data structure.

  • 2 represents an average projection - adequate representation with minor distortions.

  • 3 signifies a good projection - accurate, distinct, and insightful representation.

The likert scores provided by the participants are being displayed in table below:
Participant tSNE UMAP Sammon
Person 1 1 1 3
Person 2 2 1 3
Person 3 1 2 3
Person 4 2 1 3
Person 5 1 2 3
Average 1.4 1.4 3

Likert Scale Responses

8.3 Evaluation metrics for t-SNE, UMAP and Sammon’s projection

Evaluation metrics for dimensionality reduction methods like t-SNE, Sammon’s mapping, and UMAP are crucial for assessing how well these techniques preserve the structure of the original high-dimensional data when reduced to lower dimensions. Below a table has been displayed to compare the performance of three dimensionality reduction techniques—t-SNE, UMAP, and Sammon’s mapping—across various evaluation metrics, categorized into Structure Preservation Metrics, Distance Preservation Metrics, Interpretive Quality Metrics, Computational Efficiency Metrics and Human Centered Metrics.

tsne_score = "1.4"
umap_score = "1.4"
sammon_score = "3"
 
metrics_table_sammon = c(round(hvt_results_sammon$model_info$distance_measures$Value,4))
metrics_table_umap = c(round(hvt_results_umap$model_info$distance_measures$Value,4))
metrics_table_tsne = c(round(hvt_results_tsne$model_info$distance_measures$Value,4))
 
metrics_table <- data.frame(
  L1_Metrics = c(hvt_results_sammon$model_info$distance_measures$L1_Metrics[1:4],"Human Centered Metrics","Human Centered Metrics",hvt_results_sammon$model_info$distance_measures$L1_Metrics[5:7]),
  L2_Metrics = c(hvt_results_sammon$model_info$distance_measures$L2_Metrics[1:4],"Likert Scale [1-3]","Spatial Orientation",hvt_results_sammon$model_info$distance_measures$L2_Metrics[5:7]) ,
  tSNE = c(metrics_table_tsne[1], metrics_table_tsne[2], metrics_table_tsne[3], metrics_table_tsne[4],tsne_score,"NA", metrics_table_tsne[5] , metrics_table_tsne[6],metrics_table_tsne[7]),
  UMAP = c(metrics_table_umap[1], metrics_table_umap[2], metrics_table_umap[3], metrics_table_umap[4], umap_score,"NA", metrics_table_umap[5], metrics_table_umap[6],metrics_table_umap[7]),
  Sammon = c(metrics_table_sammon[1], metrics_table_sammon[2], metrics_table_sammon[3], metrics_table_sammon[4], sammon_score, "NA",metrics_table_sammon[5], metrics_table_sammon[6], metrics_table_sammon[7])
)
displayTable(data = metrics_table)
L1_Metrics L2_Metrics tSNE UMAP Sammon
Structure Preservation Metrics Trustworthiness 0.9823 0.923 0.8535
Continuity 0.9557 0.9593 0.9736
Sammon’s Stress 82.5773 19.1181 12.3546
Distance Preservation Metrics RMSE 52.1035 26.0502 21.6824
Human Centered Metrics Likert Scale [1-3] 1.4 1.4 3
Spatial Orientation NA NA NA
Interpretive Quality Metrics Silhouette Score 0.3631 0.366 0.3774
KNN Retention Score 0.7312 0.5975 0.5062
Computational Efficiency Metrics Execution Duration(sec) 0.0761 0.2034 0.0033

The table shows a comparison of different evaluation metrics for t-SNE, UMAP, and Sammon on torus data with 100 cells and a depth of 1.For details on the evaluation methods listed above. More Details.

Note: The Spatial Orientation metric is marked as NA for all the methods (t-SNE, UMAP, Sammon). In the HVT (Hierarchical Voronoi Tessellation) process, data compression is performed as the first step, where the centroids of clusters are calculated and utilized. This compression effectively reduces the data to a smaller set of representative points, which are then subjected to dimensionality reduction methods. Due to this prior compression step, spatial orientation becomes less relevant. The original spatial relationships between individual data points are inherently altered during the compression process, meaning that the preservation of spatial orientation is no longer a critical or meaningful metric for evaluating the effectiveness of the dimensionality reduction techniques in this context.

8.4 Insights of the three different outcomes of trainHVT function

t-SNE:

t-SNE demonstrates exceptional performance in structure preservation, with the highest trustworthiness and continuity scores. However, it suffers in distance preservation, reflected by the highest RMSE and Sammon’s stress values. Despite its computational efficiency, with the shortest execution duration, its interpretative quality is moderate, with a good silhouette score. t-SNE has relatively poor ratings (1.4) on the Likert scale, likely due to issues with interpretability or potential distortion of data structure that negatively impacts user perception.

UMAP:

UMAP strikes a balance between structure and distance preservation, offering competitive trustworthiness and continuity scores, alongside a significantly lower RMSE and Sammon’s stress compared to t-SNE. Though less computationally efficient and slightly lower in KNN retention, UMAP provides a favorable trade-off in projection quality, making it a versatile choice. UMAP has relatively poor ratings (1.4) on the Likert scale, likely due to issues with interpretability or potential distortion of data structure that negatively impacts user perception.

Sammon:

Sammon’s mapping excels in distance preservation, achieving the lowest RMSE and Sammon’s stress. It also maintains strong structure preservation, particularly in continuity. However, it is highly efficient in terms of execution duration, though this comes at the expense of interpretative quality, as reflected in its lower silhouette score and the lowest KNN retention. Despite this, Sammon’s mapping offers a unique advantage in scenarios prioritizing distance preservation and computational efficiency. It got the highest Likert scale score (3), because it is the most favorable for human visualization, due to its accurate distance representation and efficiency.

9. Conclusion

The integration of t-SNE, UMAP, and performance metrics into the trainHVT function enhances its ability to process, analyze, and visualize high-dimensional data. By incorporating various dimensionality reduction techniques and performance metrics—such as Trustworthiness, Continuity, RMSE, Silhouette Score, KNN Retention Score, Sammon’s Stress, Execution Duration, and Likert Scale [1-3]—trainHVT now provides more flexibility for evaluating the quality of dimensionality reduction and clustering, meeting diverse data analysis requirements.

10. References