trainHVT()
Function: Parameters and
Hyperparameters for Dimensionality Reduction MethodstrainHVT
with 20 cellsThe HVT package is a collection of R functions to facilitate building topology preserving maps for rich multivariate data analysis. Tending towards a big data preponderance, a large number of rows. A collection of R functions for this typical workflow is organized below:
Data Compression: Vector Quantization (VQ), HVQ (Hierarchical Vector Quantization) using means or medians. This step compresses the rows (long data frame) using a compression objective.
Data Projection: Dimension projection of the compressed cells to 2D with t-SNE, UMAP and Sammon’s Mapping Algorithm. This step creates a topology preserving map (also called embeddings) coordinates into the desired output dimension.
Tessellation: Create cells required for object visualization using the Voronoi Tessellation method, package includes Heatmap plots for Hierarchical Voronoi Tessellations (HVT). This step enables data insights, visualization, and interaction with the topology preserving map. Useful for semi-supervised tasks.
Scoring: Scoring data sets and recording their assignment using the map objects from the above steps, in a sequence of maps if required.
Temporal Analysis and Visualization: A Collection of functions that leverages the capacity of the HVT package by analyzing time series data for its underlying patterns, calculation of transitioning probabilities and the visualizations for the flow of data over time.
What’s New?
This notebook showcases the enhancement made to the
trainHVT
function through the integration of dimensionality
reduction techniques and comprehensive evaluation metrics. These
advancements aim to enhance the visualization, analysis, and
interpretability of high-dimensional data within the HVT framework.
1. Integration of Advanced Dimensionality Reduction Techniques:
The trainHVT
function now includes dimensionality
reduction techniques like t-SNE and UMAP, alongside the previously
implemented Sammon’s method. This integration enhances the function’s
capacity to explore and apply various dimensionality reduction
approaches.
t-Distributed Stochastic Neighbor Embedding
(t-SNE): Integrating t-SNE into the trainHVT
function facilitates non-linear dimensionality reduction, particularly
by preserving local structures and visualization of intricate data
structures. It efficiently processes large datasets with minimal
computational overhead.
Uniform Manifold Approximation and Projection
(UMAP): Integrating UMAP into the trainHVT
function used to preserve both local and global data structures. UMAP
excels at maintaining local relationships between data points while also
preserving the broader global structure, which helps in revealing more
meaningful clusters and patterns in complex datasets.
2. Integration of Evaluation Metrics:
Dimensionality reduction evaluation metrics help to determine the quality and effectiveness of the dimensionality reduction process by evaluating aspects such as data point proximity, cluster separation, and overall fidelity of the reduced representation.
t-SNE is a widely recognized technique for visualizing high-dimensional data in a low-dimensional space, typically two or three dimensions. Developed by Laurens van der Maaten and Geoffrey Hinton, t-SNE is particularly effective at preserving the local structure of the data, ensuring that similar data points are positioned close to one another in the reduced dimensional space.
Advantages of t-SNE
The key advantage of using t-SNE for dimensionality reduction lies in its probabilistic approach to measuring the similarities between data points.
t-SNE focuses on preserving the relative distances between data points, rather than just the absolute distances. This results in visually intuitive maps where similar data points form dense clusters, while dissimilar points are more spread out.
This property makes the resulting visualizations not only accurate in terms of capturing the underlying data structure, but also highly interpretable, even for users without extensive statistical expertise.
UMAP is a cutting-edge technique for dimension reduction and data visualization, known for its speed, scalability, and ability to maintain both global and local data structure. Developed by Leland McInnes, John Healy, and James Melville, UMAP has quickly become a favorite among data scientists for its versatility and robust performance across a wide range of applications.
Advantages of UMAP
The key advantage of using UMAP as a dimensionality reduction technique is its ability to simultaneously preserve the global structure of the data while also highlighting the local relationships between data points.
This dual focus results in visualizations that accurately represent the underlying clusters and patterns within the dataset, providing insights that may be missed by other dimensionality reduction methods.
UMAP delivers high-quality, interpretable visualizations that significantly enhance our understanding of complex, multi-dimensional datasets.
Dimensionality reduction evaluation metrics are measures used to assess the effectiveness of dimensionality reduction techniques. They help evaluate how well these techniques preserve the structure, relationships, and quality of the data when reducing its dimensions.
These six metrics are organized into three main categories. Below is
a brief overview of each metric included in the trainHVT
function:
Structure Preservation Metrics
Distance Preservation Metrics
Human Centered Metrics
Ground Truth: We have performed dimensionality reduction techniques on torus data. The underlying structure of this data is a torus, a surface shaped like a doughnut. The true shape of the data in its original high-dimensional space must resemble an annulus(two concentric circles) when properly reduced to two or three dimensions.
Interpretive Quality Metrics
Computational Efficiency Metrics
This chunk verifies the installation of all the necessary packages to successfully run this vignette, if not, installs them and attach all the packages in the session environment.
<- c("DT","plotly", "magrittr", "data.table", "tidyverse", "crosstalk",
list.of.packages "kableExtra", "cowplot","gdata","tidyverse", "ggplot2", "gridExtra","tibble","HVT")
<- list.of.packages[!(list.of.packages %in% installed.packages()[, "Package"])]
new.packages if (length(new.packages)){install.packages(new.packages, repos='https://cloud.r-project.org/')}
invisible(lapply(list.of.packages, library, character.only = TRUE))
First, let us see how to generate data for torus. We are using a library geozoo for this purpose. Geo Zoo (stands for Geometric Zoo) is a compilation of geometric objects ranging from three to 10 dimensions. Geo Zoo contains regular or well-known objects, e.g., cube and sphere, and some abstract objects, e.g., Boy’s surface, Torus and Hyper-Torus.
Here, we will generate a 3D torus (a torus is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanar with the circle) with 12000 points.
The torus dataset includes the following columns:
Lets, explore the raw torus dataset containing 12000 points. For the sake of brevity, we are displaying the 10 rows.
set.seed(124)
<- geozoo::torus(p = 3,n = 12000)
torus <- data.frame(torus$points)
torus_df colnames(torus_df) <- c("x","y","z")
<- torus_df %>% round(4)
torus_df1 colnames(torus_df1) <- c("x","y","z")
$Row.No <- as.numeric(row.names(torus_df))
torus_df1<- torus_df1 %>% dplyr::select(x,y,z)
torus_df1 displayTable(head(torus_df1, 10))
x | y | z |
---|---|---|
1.0055 | 0.5779 | 0.5422 |
-1.1971 | -0.1153 | 0.6035 |
0.2963 | 1.7116 | 0.9648 |
-0.8651 | -0.5048 | 0.0571 |
1.6057 | -0.8437 | 0.9825 |
0.3565 | -2.5977 | -0.7830 |
0.1319 | -2.5860 | -0.8079 |
-2.4760 | 1.5867 | 0.3388 |
-1.7364 | -0.9281 | -0.9995 |
2.2525 | -1.9531 | 0.1922 |
Now, let’s try to visualize the torus dataset in 3D.
plot_ly(x = torus_df1$x, y = torus_df1$y, z = torus_df1$z, type = 'scatter3d',mode = 'markers',
marker = list(color = torus_df1$z,colorscale = c('#F50000', '#000FFF'),showscale = TRUE,size = 3,colorbar = list(title = 'z'))) %>%
layout(scene = list(xaxis = list(title = 'x'),yaxis = list(title = 'y'),zaxis = list(title = 'z'),
aspectratio = list(x = 1, y = 1, z = 0.5)))
The core function for compression in the workflow is hvq
(hierarchical vector quantization), which is called within the
trainHVT
function. we have a parameter called ‘quantization
error’. This parameter acts as a threshold and determines the number of
levels in the hierarchy. It means that, if there are ‘n’ number of
levels in the hierarchy, then all the clusters formed till this level
will have quantization error equal to or less than the threshold
quantization error. The user can define the number of clusters in the
first level of the hierarchy and then each cluster in the upcoming
levels is subdivided into the same number of clusters. This process
continues for all the clusters until the threshold quantization error is
met. The output of this technique will be hierarchically arranged vector
quantized data.
trainHVT()
Function: Parameters and
Hyperparameters for Dimensionality Reduction MethodsThe trainHVT()
function is used to train a hierarchical
Voronoi tessellation (HVT) model. When integrating dimensionality
reduction techniques such as t-SNE, UMAP, and Sammon’s projection, it’s
essential to understand both the parameters of the trainHVT() function
and the specific hyperparameters for each dimensionality reduction
method.
trainHVT(
dataset, min_compression_perc, n_cells,
depth, quant.err,normalize,
distance_metric,error_metric,quant_method,
scale_summary, diagnose, hvt_validation,
train_validation_split_ratio, projection.scale,
dim_reduction_method,
tsne_perplexity,tsne_theta,tsne_verbose,
tsne_eta,tsne_max_iter,
umap_n_neighbors,umap_min_dist )
Each of the parameters of trainHVT
function has been
explained below:
dataset
- A data frame, with
numeric columns (features) that will be used for training the
model.
min_compression_perc
- An integer,
indicating the minimum compression percentage to be achieved for the
dataset. It indicates the desired level of reduction in dataset size
compared to its original size. This parameter need not be called if
n_cells are specified.
n_cells
- An integer, indicating
the number of cells per hierarchy (level). This parameter determines the
granularity or level of detail in the hierarchical vector
quantization.
depth
- An integer, indicating the
number of levels. A depth of 1 means no hierarchy (single level), while
higher values indicate multiple levels (hierarchy).
quant.err
- A number indicating the
quantization error threshold. A cell will only break down into further
cells if the quantization error of the cell is above the defined
quantization error threshold.
normalize
- A logical value
indicating if the dataset should be normalized. When set to TRUE, scales
the values of all features to have a mean of 0 and a standard deviation
of 1 (Z-score). if FALSE, the dataset is used without scaling.
distance_metric
- The distance
metric can be L1_Norm
(Manhattan) or
L2_Norm
(Euclidean). L1_Norm
is selected by
default. The distance metric is used to calculate the distance between a
clustered datapoint point and a centroid.
error_metric
- The error metric can
be mean
or max
. max
is selected
by default. max
will return the max of m
values and mean
will take the mean of m
values
where each value is a distance between a point and centroid.
quant_method
- The quantization
method can be kmeans
or kmedoids
. Kmeans uses
means (centroids) as cluster centers while Kmedoids uses actual data
points (medoids) as cluster centers. kmeans
is selected by
default.
scale_summary
- A list with
user-defined mean and standard deviation values for all the features in
the dataset. Pass the scale summary when normalize is set to
FALSE.
diagnose
- A logical value
indicating whether the user wants to perform diagnostics on the model.
Default value is FALSE.
hvt_validation
- A logical value
indicating whether the user wants to hold out a validation set and find
the Mean Absolute Deviation of the validation points from the centroid.
Default value is FALSE.
train_validation_split_ratio
- A
numeric value indicating train validation split ratio. This argument is
only used when hvt_validation
has been set to TRUE. Default
value for the argument is 0.8
projection.scale
- A number
indicating the scale factor for the tessellations to visualize the
sub-tessellations well enough. It helps in adjusting the visual
representation of the hierarchy to make the sub-tessellations more
visible. Default is 10.
dim_reduction_method
- The
trainHVT
function above has been modified to include an
additional parameter dim_reduction_method that accepts
“sammon”, “tsne”, or
“umap”. This parameter is used to determine which
dimensionality reduction technique to apply within the function. Default
value is ‘sammon’.
Hyperparameters for different dimensionality reduction methods:
The trainHVT()
function allows for fine-tuning t-SNE
hyperparameters, including perplexity, learning_rate, n_iter, and
metric, which are managed using the Rtsne library.
Adjusting these parameters can optimize the balance between local and
global data structures, convergence speed, and distance metrics for
effective dimensionality reduction.:
tsne_perplexity
- A numeric,
balances the attention t-SNE gives to local and global aspects of the
data. Lower values focus more on local structure, while higher values
consider more global structure. It is recommended to be between 5 and
50. Default value is 30.
tsne_theta
- A numeric,
speed/accuracy trade-off parameter for Barnes-Hut approximation. If set
to 0, exact t-SNE is performed, which is slower. If set to greater than
0, an approximation is used, which speeds up the process but may reduce
accuracy. Default value is 0.5
tsne_eta (learning_rate)
- A
numeric, learning rate for t-SNE optimization.Determines the step size
during optimization. If too low, the algorithm might get stuck in local
minima; if too high, the solution may become unstable. Default value is
200.
tsne_max_iter
- An integer, maximum
number of iterations. Number of iterations for the optimization process.
More iterations can improve results but increase computation time.
Default value is 1000.
The trainHVT()
function leverages the UMAP
hyperparameters—n_neighbors, min_dist—through the uwot
library. Fine-tuning these parameters helps control neighborhood size,
spacing in the reduced space, and overall structure preservation in
dimensionality reduction.
umap_n_neighbors
- An integer, the
size of the local neighborhood (in terms of number of neighboring sample
points) used for manifold approximation, controls the balance between
local and global structure in the data, smaller values focus on local
structure, while larger values capture more global structures. Default
value is 15.
umap_min_dist
- A numeric, the
minimum distance between points in the embedded space, controls how
tightly UMAP packs points together, lower values result in a more
clustered embedding. Default value is 0.1
t-SNE: Use t-SNE when your primary focus is on maintaining local relationships between data points. This method excels at preserving local structure, meaning points that are close in high-dimensional space will remain close in the lower-dimensional space. However, global relationships (the larger structure or overall distribution) might not be well-preserved. It is computationally intensive, so it may not be suitable for very large datasets.
Key characteristics: Good for clustering small groups, preserves local distances, high computational cost, sensitive to parameter tuning (e.g., perplexity).
UMAP: Use UMAP when you want to achieve a balance between local and global structure. It offers a good compromise by maintaining both small-scale relationships and large-scale global features. UMAP is more scalable and faster than t-SNE, which makes it a more practical choice for large datasets. Additionally, UMAP has better generalization properties, meaning that new data points can be effectively mapped into the lower-dimensional space without recomputing everything.
Key characteristics: Balances local and global structure, scalable to large datasets, relatively fast, good at embedding new data points.
Sammon’s Mapping: Use Sammon’s Mapping when the exact pairwise distances between data points are critical. This method minimizes the distortion of distances as much as possible, making it suitable for cases where you need high fidelity in distance preservation. However, Sammon’s Mapping is computationally expensive and not suitable for very large datasets.
Key characteristics: Preserves pairwise distances accurately, computationally expensive, suitable for small to medium-sized datasets.
The output of the trainHVT
function (list of 7 elements)
has been explained below with an image attached for clear
understanding.
NOTE: Here the attached ‘Figure:2’ is the example
snapshot of the output list generated from
trainHVT
The ‘1st element’ is a list containing information related to plotting tessellations. This information includes coordinates, boundaries and other details necessary for visualizing the tessellations.
The ‘2nd element’ is a list containing information related to Sammon’s projection coordinates of the data points in the reduced-dimensional space.
The ‘3rd element’ is a list containing detailed information about the hierarchical vector quantized data along with a summary section containing no. of points, quantization error and the centroids for each cell in 2D.
The ‘4th element’ is a list that contains all the diagnostics
information of the model when diagnose
is set to TRUE.
Otherwise NA
The ‘5th element’ is a list that contains all the information required to generate a Mean Absolute Deviation (MAD) plot, if hvt_validation is set to TRUE. Otherwise NA
The ‘6th element’ is a list containing detailed information about
the hierarchical vector quantized data along with a summary section
containing no. of points, quantization error and the centroids for each
cell in 1D, which is the output of hvq
.
The ‘7th element’ (model info) is a list that contains model generated time passed to the model and the validation results, input_parameters and distance_measures which is the metrics evaluation table .
We will use the trainHVT
function to compress our data
while preserving essential features of the dataset. Our goal is to
achieve data compression to atleast 80%
. In situations
where the compression ratio does not meet the desired target, we can
explore adjusting the model parameters as a potential solution. This
involves making modifications to parameters such as the quantization
error threshold or increasing the number of cells and then rerunning the
trainHVT
function.
t-SNE is a powerful technique for visualizing high-dimensional data by reducing it to two or three dimensions while preserving local structures. By performing and plotting t-SNE, intricate patterns and relationships within the data can be effectively explored and interpreted.
Here, we will perform t-SNE as dimensionality
reduction technique in trainHVT
function with
n_cells=20.
We have passed the below mentioned model parameters along with
depth=1, 2, 3 respectively to trainHVT
function.
Model Parameters
Performing t-SNE on the torus data using
trainHVT
function with depth=1
# Apply trainHVT to the simulated data with dim_reduction_method="tsne" and depth=1
<- trainHVT(
hvt_results_tsne1 dataset = torus_df1,
n_cells = 20,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "tsne",
tsne_perplexity = 6,
tsne_theta = 0.5,
tsne_verbose = TRUE,
tsne_eta = 200,
tsne_max_iter = 1000
)
Performing t-SNE on the torus data using
trainHVT
function with depth=2
# Apply trainHVT to the simulated data with dim_reduction_method="tsne" and depth=2
<- trainHVT(
hvt_results_tsne2 dataset = torus_df1,
n_cells = 20,
depth = 2,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "tsne",
tsne_perplexity = 6,
tsne_theta = 0.5,
tsne_verbose = TRUE,
tsne_eta = 200,
tsne_max_iter = 1000
)
Performing t-SNE on the torus data using
trainHVT
function with depth=3
# Apply trainHVT to the simulated data with dim_reduction_method="tsne" and depth=3
<- trainHVT(
hvt_results_tsne3 dataset = torus_df1,
n_cells = 20,
depth = 3,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "tsne",
tsne_perplexity = 6,
tsne_theta = 0.5,
tsne_verbose = TRUE,
tsne_eta = 200,
tsne_max_iter = 1000
)
trainHVT
function with dim_reduction_method=“tsne” using
plotHVT
Now let’s plot all the features for each cell at level 1,2 and 3 respectively as a Heatmap for better visualization.
The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
= plotHVT(hvt_results_tsne1,
tSNE_depth1_x line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
child.level = 1,
hmap.cols = 'x',
centroid.color = c("navyblue"),
cell_id = TRUE,
title = "2D projection with t-SNE as dim_reduction_method, depth=1 and hmap.cols='x'")
= plotHVT(hvt_results_tsne1,
tSNE_depth1_y line.width = c(0.2) ,
color.vec = c("navyblue"),
centroid.color = c("navyblue"),
plot.type = "2Dheatmap",
child.level = 1,
hmap.cols = 'y',
cell_id = TRUE,
title = "2D projection with t-SNE as dim_reduction_method, depth=1 and hmap.cols='y'")
= plotHVT(hvt_results_tsne1,
tSNE_depth1_z line.width = c(0.2) ,
color.vec = c("navyblue"),centroid.color = c("navyblue"),
plot.type = "2Dheatmap",
child.level = 1,
hmap.cols = 'z',
cell_id = TRUE,
title = "2D projection with t-SNE as dim_reduction_method, depth=1 and hmap.cols='z'")
= grid.arrange(tSNE_depth1_x, tSNE_depth1_y, tSNE_depth1_z, ncol=3) tSNE_depth1
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
=plotHVT(hvt_results_tsne2,
tSNE_depth2_x line.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
centroid.size = c(0.2, 0.1) ,
child.level = 2,
hmap.cols = 'x',
n_cells.hmap = 10,
title = "2D projection with tsne as dim_reduction_method, depth=2 and hmap.cols='x'")
=plotHVT(hvt_results_tsne2,
tSNE_depth2_y line.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'y',
n_cells.hmap = 10,
title = "2D projection with tsne as dim_reduction_method, depth=2 and hmap.cols='y'")
=plotHVT(hvt_results_tsne2,
tSNE_depth2_z line.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'z',
n_cells.hmap = 10,
title = "2D projection with tsne as dim_reduction_method, depth=2 and hmap.cols='z'")
= grid.arrange(tSNE_depth2_x, tSNE_depth2_y, tSNE_depth2_z, ncol=3) tSNE_depth2
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
=plotHVT(hvt_results_tsne3,
tSNE_depth3_x line.width = c(0.3, 0.2, 0.1),
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level= 3,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "x",
n_cells.hmap = 10,
title = "2D projection with tsne as dim_reduction_method, depth=3 and hmap.cols='x'")
=plotHVT(hvt_results_tsne3,
tSNE_depth3_y line.width = c(0.3, 0.2, 0.1),
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level= 3,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "y",
n_cells.hmap = 10,
title = "2D projection with tsne as dim_reduction_method, depth=3 and hmap.cols='y'")
=plotHVT(hvt_results_tsne3,
tSNE_depth3_z line.width = c(0.3, 0.2, 0.1),
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level= 3,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "z",
n_cells.hmap = 10 ,
title = "2D projection with tsne as dim_reduction_method, depth=3 and hmap.cols='z'")
= grid.arrange(tSNE_depth3_x, tSNE_depth3_y, tSNE_depth3_z, ncol = 3) tSNE_depth3
UMAP is a powerful technique for visualizing high-dimensional data by reducing it to two or three dimensions while preserving both global and local structures. By performing and plotting UMAP, intricate patterns and relationships within the data can be effectively explored and interpreted.
Here, we will perform UMAP as dimensionality
reduction technique in trainHVT
function with
n_cells=20.
We have passed the below mentioned model parameters along with
depth=1, 2, 3 respectively to trainHVT
function.
Model Parameters
Performing UMAP on the torus data using trainHVT
function with depth=1
# Apply trainHVT to the simulated data with dim_reduction_method="umap" and depth=1
<- trainHVT(
hvt_results_umap1 dataset = torus_df1,
n_cells = 20,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "umap",
umap_n_neighbors = 6,
umap_min_dist = 0.2
)
Performing UMAP on the torus data using trainHVT
function with depth=2
# Apply trainHVT to the simulated data with dim_reduction_method="umap" and depth=2
<- trainHVT(
hvt_results_umap2 dataset = torus_df1,
n_cells = 20,
depth = 2,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "umap",
umap_n_neighbors = 6,
umap_min_dist = 0.2
)
Performing UMAP on the torus data using trainHVT
function with depth=3
# Apply trainHVT to the simulated data with dim_reduction_method="umap" and depth=3
<- trainHVT(
hvt_results_umap3 dataset = torus_df1,
n_cells = 20,
depth = 3,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "umap",
umap_n_neighbors = 6,
umap_min_dist = 0.2
)
trainHVT
function with dim_reduction_method=“UMAP” using
plotHVT
Now let’s plot all the features for each cell at level 1,2 and 3 respectively as a Heatmap for better visualization.
The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
=plotHVT(hvt_results_umap1,
UMAP_depth1_xline.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'x',
cell_id = TRUE,
title = "2D projection with UMAP as dim_reduction_method, depth=1 and hmap.cols='x'")
=plotHVT(hvt_results_umap1,
UMAP_depth1_yline.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'y',
cell_id = TRUE,
title = "2D projection with UMAP as dim_reduction_method, depth=1 and hmap.cols='y'")
=plotHVT(hvt_results_umap1,
UMAP_depth1_zline.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'z',
cell_id = TRUE,
title = "2D projection with UMAP as dim_reduction_method, depth=1 and hamp.cols='z'")
= grid.arrange(UMAP_depth1_x, UMAP_depth1_y, UMAP_depth1_z, ncol = 3) UMAP_depth1
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
=plotHVT(hvt_results_umap2,
UMAP_depth2_xline.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'x',
n_cells.hmap = 10,
title = "2D projection with UMAP as dim_reduction_method, depth=2 and hmap.cols='x'")
=plotHVT(hvt_results_umap2,
UMAP_depth2_yline.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'y',
n_cells.hmap = 10,
title = "2D projection with UMAP as dim_reduction_method, depth=2 and hmap.cols='y'")
=plotHVT(hvt_results_umap2,
UMAP_depth2_zline.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'z',
n_cells.hmap = 10,
title = "2D projection with UMAP as dim_reduction_method, depth=2 and hmap.cols='z'")
= grid.arrange(UMAP_depth2_x, UMAP_depth2_y, UMAP_depth2_z, ncol= 3) UMAP_depth2
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
=plotHVT(hvt_results_umap3,
UMAP_depth3_xline.width = c(0.3, 0.2,0.1) ,
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level = 3,
n_cells.hmap = 10,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "x",
title = "2D projection with UMAP as dim_reduction_method, depth=3 and hmap.cols='x'")
=plotHVT(hvt_results_umap3,
UMAP_depth3_yline.width = c(0.3, 0.2,0.1) ,
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level = 3,
n_cells.hmap = 10,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "y",
title = "2D projection with UMAP as dim_reduction_method, depth=3 and hmap.cols='y'")
=plotHVT(hvt_results_umap3,
UMAP_depth3_zline.width = c(0.3, 0.2,0.1) ,
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level = 3,
n_cells.hmap = 10,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "z",
title = "2D projection with UMAP as dim_reduction_method, depth=3 and hmap.cols='z'")
= grid.arrange(UMAP_depth3_x, UMAP_depth3_y, UMAP_depth3_z, ncol= 3) UMAP_depth3
Sammon’s mapping is a powerful technique for visualizing high-dimensional data by reducing it to two or three dimensions while preserving the structure of the data as much as possible. By performing and plotting Sammon’s mapping, intricate patterns and relationships within the data can be effectively explored and interpreted.
Here, we will perform Sammon’s mapping as
dimensionality reduction technique in trainHVT
function
with n_cells=20.
We have passed the below mentioned model parameters along with
depth=1, 2, 3 respectively to trainHVT
function.
Model Parameters
Performing Sammon’s projection on the torus data using
trainHVT
function with depth=1
# Apply trainHVT to the simulated data with dim_reduction_method="sammon" and depth=1
<- trainHVT(
hvt_results_sammon1 dataset = torus_df1,
n_cells = 20,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "sammon"
)
Performing Sammon’s projection on the torus data using
trainHVT
function with depth=2
# Apply trainHVT to the simulated data with dim_reduction_method="sammon" and depth=2
<- trainHVT(
hvt_results_sammon2 dataset = torus_df1,
n_cells = 20,
depth = 2,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "sammon"
)
Performing Sammon’s projection on the torus data using
trainHVT
function with depth=3
# Apply trainHVT to the simulated data with dim_reduction_method="sammon" and depth=3
<- trainHVT(
hvt_results_sammon3 dataset = torus_df1,
n_cells = 20,
depth = 3,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "sammon"
)
trainHVT
function with dim_reduction_method=“sammon” using
plotHVT
Now let’s plot all the features for each cell at level 1,2 and 3 respectively as a Heatmap for better visualization.
The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
= plotHVT(hvt_results_sammon1,
Sammon_depth1_x line.width = c(0.2) ,
color.vec = c("navyblue"),
centroid.color = c("navyblue"),
plot.type = "2Dheatmap", child.level = 1, hmap.cols = 'x',cell_id = TRUE,
title = "2D projection with Sammon as dim_reduction_method with depth=1 and hmap.cols='x'")
= plotHVT(hvt_results_sammon1,
Sammon_depth1_y line.width = c(0.2) ,
color.vec = c("navyblue"),
centroid.color = c("navyblue"),
plot.type = "2Dheatmap", child.level = 1, hmap.cols = 'y',cell_id = TRUE,
title = "2D projection with Sammon as dim_reduction_method with depth=1 and hmap.cols='y'")
= plotHVT(hvt_results_sammon1,
Sammon_depth1_z line.width = c(0.2) ,
color.vec = c("navyblue"),
centroid.color = c("navyblue"),
plot.type = "2Dheatmap", child.level = 1, hmap.cols = 'z',cell_id = TRUE,
title = "2D projection with Sammon as dim_reduction_method with depth=1 and hmap.cols='z'")
= grid.arrange(Sammon_depth1_x, Sammon_depth1_y, Sammon_depth1_z, ncol=3) Sammon_depth1
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
= plotHVT(hvt_results_sammon2,
Sammon_depth2_x line.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'x',
title = "2D projection with Sammon as dim_reduction_method with depth=2 and hmap.cols='x'")
= plotHVT(hvt_results_sammon2,
Sammon_depth2_y line.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'y',
title = "2D projection with Sammon as dim_reduction_method with depth=2 and hmap.cols='y'")
= plotHVT(hvt_results_sammon2,
Sammon_depth2_z line.width = c(0.2, 0.1) ,
color.vec = c("navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue","steelblue"),
child.level = 2,
centroid.size = c(0.2, 0.1) ,
hmap.cols = 'z',
title = "2D projection with Sammon as dim_reduction_method with depth=2 and hmap.cols='z'")
= grid.arrange(Sammon_depth2_x, Sammon_depth2_y, Sammon_depth2_z, ncol=3) Sammon_depth2
Here, we have drawn the 2D Heatmap with respect to “x”, “y” and “z” columns.
= plotHVT(hvt_results_sammon3,
Sammon_depth3_x line.width = c(0.3, 0.2,0.1) ,
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level = 3,
hmap.cols = "x",
centroid.size = c(0.3, 0.2, 0.1),
n_cells.hmap = 20,
title = "2D projection with Sammon as dim_reduction_method with depth=3 and hmap.cols='x'")
= plotHVT(hvt_results_sammon3,
Sammon_depth3_y line.width = c(0.3, 0.2,0.1) ,
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level = 3,
hmap.cols = "y",
centroid.size = c(0.3, 0.2, 0.1),
n_cells.hmap = 20,
title = "2D projection with Sammon as dim_reduction_method with depth=3 and hmap.cols='y'")
= plotHVT(hvt_results_sammon3,
Sammon_depth3_z line.width = c(0.3, 0.2,0.1) ,
color.vec = c("#0047ab","navyblue","steelblue"),
plot.type = "2Dheatmap",
centroid.color = c("#0047ab","navyblue","steelblue"),
child.level = 3,
centroid.size = c(0.3, 0.2, 0.1),
hmap.cols = "z",
n_cells.hmap = 20,
title = "2D projection with Sammon as dim_reduction_method with depth=3 and hmap.cols='z'")
= grid.arrange(Sammon_depth3_x, Sammon_depth3_y, Sammon_depth3_z, ncol =3) Sammon_depth3
trainHVT
with 20 cellsIn the context of applying dimensionality reduction techniques in the
trainHVT
function on torus dataset, a visual comparison of
three methods (t-SNE, UMAP, and Sammons) can provide valuable insights
on their performance. With n_cells
set to 20, these methods
are evaluated based on how effectively the topological and geometric
features are preserved when projected in lower dimensions. We have drawn
the 2D Heatmaps with respect to “x”, “y” and “z” columns from torus
dataset when depth is 1,2 and 3 respectively.
Here, we have drawn the 2D Heatmap with respect to “x” column.
grid.arrange(tSNE_depth1_x, UMAP_depth1_x, Sammon_depth1_x, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “y” column.
grid.arrange(tSNE_depth1_y, UMAP_depth1_y, Sammon_depth1_y, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “z” column.
grid.arrange(tSNE_depth1_z, UMAP_depth1_z, Sammon_depth1_z, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “x” column.
grid.arrange(tSNE_depth2_x, UMAP_depth2_x, Sammon_depth2_x, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “y” column.
grid.arrange(tSNE_depth2_y, UMAP_depth2_y, Sammon_depth2_y, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “z” column.
grid.arrange(tSNE_depth2_z, UMAP_depth2_z, Sammon_depth2_z, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “x” column.
grid.arrange(tSNE_depth3_x, UMAP_depth3_x, Sammon_depth3_x, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “y” column.
grid.arrange(tSNE_depth3_y, UMAP_depth3_y, Sammon_depth3_y, ncol=3)
Here, we have drawn the 2D Heatmap with respect to “z” column.
grid.arrange(tSNE_depth3_z, UMAP_depth3_z, Sammon_depth3_z, ncol=3)
For evaluation purpose we have set n_cells to 100 and depth to 1, so that the trainHVT function can capture more detailed hierarchical structures, allowing for a comprehensive visual and analytical comparison of how well each technique retains the torus’s geometric properties in the reduced dimensional space.
Performing t-SNE, UMAP and Sammon
We have passed the below mentioned model parameters to
trainHVT
function.
Model Parameters
for dim_reduction_method = “tsne”,
for dim_reduction_method = “umap”,
# Apply trainHVT to the simulated data with dim_reduction_method="tsne", depth=1 and n_cells=100
set.seed(123)
<- trainHVT(
hvt_results_tsne dataset = torus_df1,
n_cells = 100,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "tsne",
tsne_perplexity = 30,
tsne_theta = 0.5,
tsne_verbose = TRUE,
tsne_eta = 200,
tsne_max_iter = 1000
)
Compression summary
displayTable(hvt_results_tsne[[3]][["compression_summary"]])
segmentLevel | noOfCells | noOfCellsBelowQuantizationError | percentOfCellsBelowQuantizationErrorThreshold | parameters |
---|---|---|---|---|
1 | 100 | 76 | 0.76 | n_cells: 100 quant.err: 0.1 distance_metric: L2_Norm error_metric: mean quant_method: kmeans |
# Apply trainHVT to the simulated data with dim_reduction_method="umap", depth=1 and n_cells=100
set.seed(123)
<- trainHVT(
hvt_results_umap dataset = torus_df1,
n_cells = 100,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "umap",
umap_n_neighbors = 23,
umap_min_dist = 0.2)
Compression summary
displayTable(hvt_results_umap[[3]][["compression_summary"]])
segmentLevel | noOfCells | noOfCellsBelowQuantizationError | percentOfCellsBelowQuantizationErrorThreshold | parameters |
---|---|---|---|---|
1 | 100 | 76 | 0.76 | n_cells: 100 quant.err: 0.1 distance_metric: L2_Norm error_metric: mean quant_method: kmeans |
# Apply trainHVT to the simulated data with dim_reduction_method="sammon", depth=1 and n_cells=100
set.seed(123)
<- trainHVT(
hvt_results_sammon dataset = torus_df1,
n_cells = 100,
depth = 1,
quant.err = 0.1,
normalize = TRUE,
distance_metric = "L2_Norm",
error_metric = "mean",
quant_method = "kmeans",
dim_reduction_method = "sammon"
)
Compression summary
displayTable(hvt_results_sammon[[3]][["compression_summary"]])
segmentLevel | noOfCells | noOfCellsBelowQuantizationError | percentOfCellsBelowQuantizationErrorThreshold | parameters |
---|---|---|---|---|
1 | 100 | 76 | 0.76 | n_cells: 100 quant.err: 0.1 distance_metric: L2_Norm error_metric: mean quant_method: kmeans |
trainHVT
function for Human Centered Metrics on Torus Data
with 100 cells and depth 1Now let’s plot all the features for each cell at level one as a Heatmap for better visualization.
The Heatmaps displayed below provides a visual representation of the spatial characteristics of the torus dataset, allowing us to observe patterns and trends in the distribution of each of the features (x,y,z). The sheer green shades highlight regions with higher values in each of the Heatmaps, while the indigo shades indicate areas with the lowest values in each of the Heatmaps. By analyzing these Heatmaps, we can gain insights into the variations and relationships between each of these features within the torus dataset.
The underlying structure of this data is a torus, a surface shaped like a doughnut. The true shape of the data in its original high-dimensional space must resemble an annulus(two concentric circles) when properly reduced to two or three dimensions.
= plotHVT(hvt_results_tsne,
tsne_x line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'x',
cell_id = TRUE,
title = "2D projection with t-SNE as dim_reduction_method, depth=1, hmap.cols='x' and n_cells=100")
= plotHVT(hvt_results_umap,
umap_x line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'x',
cell_id = TRUE,
title = "2D projection with UMAP as dim_reduction_method, depth=1, hmap.cols='x' and n_cells=100")
= plotHVT(hvt_results_sammon,
Sammon_x line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'x',
cell_id = TRUE,
title = "2D projection with Sammon as dim_reduction_method, depth=1, hmap.cols='x' and n_cells=100")
= grid.arrange(tsne_x, umap_x, Sammon_x, ncol=3) plot_x
= plotHVT(hvt_results_tsne,
tsne_y line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'y',
cell_id = TRUE,
title = "2D projection with t-SNE as dim_reduction_method, depth=1, hmap.cols='y' and n_cells=100")
=plotHVT(hvt_results_umap,
umap_yline.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'y',
cell_id = TRUE,
title = "2D projection with UMAP as dim_reduction_method, depth=1, hmap.cols='y' and n_cells=100")
= plotHVT(hvt_results_sammon,
Sammon_y line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'y',
cell_id = TRUE,
title = "2D projection with Sammon as dim_reduction_method, depth=1, hmap.cols='y' and n_cells=100")
= grid.arrange(tsne_y, umap_y, Sammon_y, ncol = 3) plot_y
set.seed(123)
= plotHVT(hvt_results_tsne,
tsne_z line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'z',
cell_id = TRUE,
title = "2D projection with t-SNE as dim_reduction_method, depth=1, hmap.cols='z' and n_cells=100")
=plotHVT(hvt_results_umap,
umap_zline.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'z',
cell_id = TRUE,
title = "2D projection with UMAP as dim_reduction_method with depth=1, hamp.cols='z' and n_cells=100")
= plotHVT(hvt_results_sammon,
Sammon_z line.width = c(0.2) ,
color.vec = c("navyblue"),
plot.type = "2Dheatmap",
centroid.color = c("navyblue"),
child.level = 1,
hmap.cols = 'z',
cell_id = TRUE,
title = "2D projection with Sammon as dim_reduction_method with depth=1 hmap.cols='z' and n_cells=100")
= grid.arrange(tsne_z, umap_z, Sammon_z, ncol=3) plot_z
Human Centered Metric: Likert Scale[1-3]
We presented the 2DHeatmap of column Z to five different individuals and informed of the ground truth i.e. the underlying structure of the data forms a torus—a surface resembling a doughnut. When appropriately reduced to two or three dimensions, the true shape of this high-dimensional data should resemble an annulus (two concentric circles). Afterward, participants were asked to provide their scores from 1 to 3 based on their observations.
1 indicates a poor projection - significant distortion or overlap; unclear data structure.
2 represents an average projection - adequate representation with minor distortions.
3 signifies a good projection - accurate, distinct, and insightful representation.
Participant | tSNE | UMAP | Sammon |
---|---|---|---|
Person 1 | 1 | 1 | 3 |
Person 2 | 2 | 1 | 3 |
Person 3 | 1 | 2 | 3 |
Person 4 | 2 | 1 | 3 |
Person 5 | 1 | 2 | 3 |
Average | 1.4 | 1.4 | 3 |
Likert Scale Responses
Evaluation metrics for dimensionality reduction methods like t-SNE, Sammon’s mapping, and UMAP are crucial for assessing how well these techniques preserve the structure of the original high-dimensional data when reduced to lower dimensions. Below a table has been displayed to compare the performance of three dimensionality reduction techniques—t-SNE, UMAP, and Sammon’s mapping—across various evaluation metrics, categorized into Structure Preservation Metrics, Distance Preservation Metrics, Interpretive Quality Metrics, Computational Efficiency Metrics and Human Centered Metrics.
= "1.4"
tsne_score = "1.4"
umap_score = "3"
sammon_score
= c(round(hvt_results_sammon$model_info$distance_measures$Value,4))
metrics_table_sammon = c(round(hvt_results_umap$model_info$distance_measures$Value,4))
metrics_table_umap = c(round(hvt_results_tsne$model_info$distance_measures$Value,4))
metrics_table_tsne
<- data.frame(
metrics_table L1_Metrics = c(hvt_results_sammon$model_info$distance_measures$L1_Metrics[1:4],"Human Centered Metrics","Human Centered Metrics",hvt_results_sammon$model_info$distance_measures$L1_Metrics[5:7]),
L2_Metrics = c(hvt_results_sammon$model_info$distance_measures$L2_Metrics[1:4],"Likert Scale [1-3]","Spatial Orientation",hvt_results_sammon$model_info$distance_measures$L2_Metrics[5:7]) ,
tSNE = c(metrics_table_tsne[1], metrics_table_tsne[2], metrics_table_tsne[3], metrics_table_tsne[4],tsne_score,"NA", metrics_table_tsne[5] , metrics_table_tsne[6],metrics_table_tsne[7]),
UMAP = c(metrics_table_umap[1], metrics_table_umap[2], metrics_table_umap[3], metrics_table_umap[4], umap_score,"NA", metrics_table_umap[5], metrics_table_umap[6],metrics_table_umap[7]),
Sammon = c(metrics_table_sammon[1], metrics_table_sammon[2], metrics_table_sammon[3], metrics_table_sammon[4], sammon_score, "NA",metrics_table_sammon[5], metrics_table_sammon[6], metrics_table_sammon[7])
)displayTable(data = metrics_table)
L1_Metrics | L2_Metrics | tSNE | UMAP | Sammon |
---|---|---|---|---|
Structure Preservation Metrics | Trustworthiness | 0.9823 | 0.923 | 0.8535 |
Continuity | 0.9557 | 0.9593 | 0.9736 | |
Sammon’s Stress | 82.5773 | 19.1181 | 12.3546 | |
Distance Preservation Metrics | RMSE | 52.1035 | 26.0502 | 21.6824 |
Human Centered Metrics | Likert Scale [1-3] | 1.4 | 1.4 | 3 |
Spatial Orientation | NA | NA | NA | |
Interpretive Quality Metrics | Silhouette Score | 0.3631 | 0.366 | 0.3774 |
KNN Retention Score | 0.7312 | 0.5975 | 0.5062 | |
Computational Efficiency Metrics | Execution Duration(sec) | 0.0761 | 0.2034 | 0.0033 |
The table shows a comparison of different evaluation metrics for t-SNE, UMAP, and Sammon on torus data with 100 cells and a depth of 1.For details on the evaluation methods listed above. More Details.
Note: The Spatial Orientation metric is marked as NA for all the methods (t-SNE, UMAP, Sammon). In the HVT (Hierarchical Voronoi Tessellation) process, data compression is performed as the first step, where the centroids of clusters are calculated and utilized. This compression effectively reduces the data to a smaller set of representative points, which are then subjected to dimensionality reduction methods. Due to this prior compression step, spatial orientation becomes less relevant. The original spatial relationships between individual data points are inherently altered during the compression process, meaning that the preservation of spatial orientation is no longer a critical or meaningful metric for evaluating the effectiveness of the dimensionality reduction techniques in this context.
trainHVT
functiont-SNE:
t-SNE demonstrates exceptional performance in structure preservation, with the highest trustworthiness and continuity scores. However, it suffers in distance preservation, reflected by the highest RMSE and Sammon’s stress values. Despite its computational efficiency, with the shortest execution duration, its interpretative quality is moderate, with a good silhouette score. t-SNE has relatively poor ratings (1.4) on the Likert scale, likely due to issues with interpretability or potential distortion of data structure that negatively impacts user perception.
UMAP:
UMAP strikes a balance between structure and distance preservation, offering competitive trustworthiness and continuity scores, alongside a significantly lower RMSE and Sammon’s stress compared to t-SNE. Though less computationally efficient and slightly lower in KNN retention, UMAP provides a favorable trade-off in projection quality, making it a versatile choice. UMAP has relatively poor ratings (1.4) on the Likert scale, likely due to issues with interpretability or potential distortion of data structure that negatively impacts user perception.
Sammon:
Sammon’s mapping excels in distance preservation, achieving the lowest RMSE and Sammon’s stress. It also maintains strong structure preservation, particularly in continuity. However, it is highly efficient in terms of execution duration, though this comes at the expense of interpretative quality, as reflected in its lower silhouette score and the lowest KNN retention. Despite this, Sammon’s mapping offers a unique advantage in scenarios prioritizing distance preservation and computational efficiency. It got the highest Likert scale score (3), because it is the most favorable for human visualization, due to its accurate distance representation and efficiency.
The integration of t-SNE, UMAP, and performance metrics into the
trainHVT
function enhances its ability to process, analyze,
and visualize high-dimensional data. By incorporating various
dimensionality reduction techniques and performance metrics—such as
Trustworthiness, Continuity, RMSE, Silhouette Score, KNN Retention
Score, Sammon’s Stress, Execution Duration, and Likert Scale
[1-3]—trainHVT now provides more flexibility for evaluating the quality
of dimensionality reduction and clustering, meeting diverse data
analysis requirements.
Link for the paper on t-SNE
Link for the paper on UMAP
Link for the paper on Sammon’s Mapping