Cluster plots¶

Using stackview.clusterplot we can visualize contents of pandas DataFrames and corresponding segmented objects in an sime side-by-side. In such a plot you can select objects and visualize the selection. This might be useful for exploring feature extraction parameter spaces.

In [1]:

import pandas as pd
import numpy as np
import stackview
import pandas as pd
from skimage.measure import regionprops_table
from skimage.io import imread
from skimage.filters import threshold_otsu
from skimage.measure import label
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler 
from umap import UMAP

stackview.__version__

Out[1]:

'0.12.0'

To demonstrate this, we need an image, a segmentation and a table of extracted features.

In [2]:

image = imread('data/blobs.tif')

# segment image
thresh = threshold_otsu(image)
binary_image = image > thresh
labeled_image = label(binary_image)

In [4]:

properties = regionprops_table(labeled_image, intensity_image=image, properties=[
    'mean_intensity', 'std_intensity',
    'centroid', 'area', 'feret_diameter_max', 
    'minor_axis_length', 'major_axis_length'])

df = pd.DataFrame(properties)

# Select numeric columns
numeric_cols = df.select_dtypes(include=[np.number]).columns

# Scale the data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[numeric_cols])

# Create UMAP embedding
umap = UMAP(n_components=2, random_state=42) 
umap_coords = umap.fit_transform(scaled_data)

# Add UMAP coordinates to dataframe 
df['UMAP1'] = umap_coords[:, 0]
df['UMAP2'] = umap_coords[:, 1]

df.head()

C:\Users\rober\miniforge3\envs\bob-env\Lib\site-packages\umap\umap_.py:1952: UserWarning: n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(

Out[4]:

	mean_intensity	std_intensity	centroid-0	centroid-1	area	feret_diameter_max	minor_axis_length	major_axis_length	UMAP1	UMAP2
0	190.854503	30.269911	13.212471	19.986143	433.0	36.055513	16.819060	34.957399	4.446589	0.901159
1	179.286486	21.824090	4.270270	62.945946	185.0	21.377558	11.803854	21.061417	2.342915	-0.930705
2	205.617021	29.358477	12.568389	108.329787	658.0	32.449961	28.278264	30.212552	4.911047	0.156550
3	217.327189	36.019565	9.806452	154.520737	434.0	26.925824	23.064079	24.535398	4.941196	-0.982479
4	212.142558	29.872907	13.545073	246.809224	477.0	31.384710	19.833058	31.162612	5.321925	-1.058476

In [5]:

num_objects = df.shape[0]
pre_selection = np.zeros(num_objects)
pre_selection[:int(num_objects/2)] = 1

df["selection"] = pre_selection

Interaction¶

Using some more involved code we can also draw the image and the scatter plot side-by-side and make them interact. You can select data points in the plot on the right and the visualization on the left will be updated accordingly.

In [9]:

stackview.clusterplot(image=image,
                     labels=labeled_image,
                     df=df,
                     column_x="centroid-0",
                     column_y="centroid-1",
                     zoom_factor=1.5,
                     markersize=15)

Out[9]:

VBox(children=(HBox(children=(HBox(children=(VBox(children=(VBox(children=(HBox(children=(VBox(children=(Image…

Every time the user selects different data points, the selection in our dataframe is update

In [10]:

df["selection"]

Out[10]:

0     False
1      True
2     False
3     False
4     False
      ...  
59     True
60     True
61     True
62     True
63     True
Name: selection, Length: 64, dtype: bool

In [ ]: