Using stackview.clusterplot
we can visualize contents of pandas DataFrames and corresponding segmented objects in an sime side-by-side. In such a plot you can select objects and visualize the selection. This might be useful for exploring feature extraction parameter spaces.
import pandas as pd
import numpy as np
import stackview
import pandas as pd
from skimage.measure import regionprops_table
from skimage.io import imread
from skimage.filters import threshold_otsu
from skimage.measure import label
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from umap import UMAP
stackview.__version__
'0.12.0'
To demonstrate this, we need an image, a segmentation and a table of extracted features.
image = imread('data/blobs.tif')
# segment image
thresh = threshold_otsu(image)
binary_image = image > thresh
labeled_image = label(binary_image)
properties = regionprops_table(labeled_image, intensity_image=image, properties=[
'mean_intensity', 'std_intensity',
'centroid', 'area', 'feret_diameter_max',
'minor_axis_length', 'major_axis_length'])
df = pd.DataFrame(properties)
# Select numeric columns
numeric_cols = df.select_dtypes(include=[np.number]).columns
# Scale the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[numeric_cols])
# Create UMAP embedding
umap = UMAP(n_components=2, random_state=42)
umap_coords = umap.fit_transform(scaled_data)
# Add UMAP coordinates to dataframe
df['UMAP1'] = umap_coords[:, 0]
df['UMAP2'] = umap_coords[:, 1]
df.head()
C:\Users\rober\miniforge3\envs\bob-env\Lib\site-packages\umap\umap_.py:1952: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism. warn(
mean_intensity | std_intensity | centroid-0 | centroid-1 | area | feret_diameter_max | minor_axis_length | major_axis_length | UMAP1 | UMAP2 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 190.854503 | 30.269911 | 13.212471 | 19.986143 | 433.0 | 36.055513 | 16.819060 | 34.957399 | 4.446589 | 0.901159 |
1 | 179.286486 | 21.824090 | 4.270270 | 62.945946 | 185.0 | 21.377558 | 11.803854 | 21.061417 | 2.342915 | -0.930705 |
2 | 205.617021 | 29.358477 | 12.568389 | 108.329787 | 658.0 | 32.449961 | 28.278264 | 30.212552 | 4.911047 | 0.156550 |
3 | 217.327189 | 36.019565 | 9.806452 | 154.520737 | 434.0 | 26.925824 | 23.064079 | 24.535398 | 4.941196 | -0.982479 |
4 | 212.142558 | 29.872907 | 13.545073 | 246.809224 | 477.0 | 31.384710 | 19.833058 | 31.162612 | 5.321925 | -1.058476 |
num_objects = df.shape[0]
pre_selection = np.zeros(num_objects)
pre_selection[:int(num_objects/2)] = 1
df["selection"] = pre_selection
Using some more involved code we can also draw the image and the scatter plot side-by-side and make them interact. You can select data points in the plot on the right and the visualization on the left will be updated accordingly.
stackview.clusterplot(image=image,
labels=labeled_image,
df=df,
column_x="centroid-0",
column_y="centroid-1",
zoom_factor=1.5,
markersize=15)
VBox(children=(HBox(children=(HBox(children=(VBox(children=(VBox(children=(HBox(children=(VBox(children=(Image…
Every time the user selects different data points, the selection in our dataframe is update
df["selection"]
0 False 1 True 2 False 3 False 4 False ... 59 True 60 True 61 True 62 True 63 True Name: selection, Length: 64, dtype: bool