Exploring slices of stacks by plotting corresponding data¶

In this notebook we will explore a stack of images using an interactive scatterplot of measurements that were done on the individual slices of the stack. For demonstration purposes, we explore a stack of teaching slides an embedding generated using large language models.

In [1]:

import pandas as pd
import stackview
import os
import numpy as np
from skimage.io import imread
import yaml
import requests
import zipfile

First, we download the dataset, which is licensed CC-BY 4.0 by Robert Haase.

In [2]:

# URL of the zip file
url = "https://zenodo.org/records/14030307/files/png_umap.zip?download=1"
# Save the zip file locally
zip_path = "png_umap.zip"

if not os.path.exists(zip_path):
    # Download the zip file
    response = requests.get(url)
    with open(zip_path, "wb") as f:
        f.write(response.content)
    
    # Extract the contents
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall()

# Optionally, remove the zip file after extraction
#os.remove(zip_path)

In [3]:

# Read YAML file
with open('png_umap/png_umap.yml', 'r') as file:
    loaded_dict = yaml.safe_load(file)


df = pd.DataFrame(loaded_dict)

# Show first few rows of the loaded DataFrame
df.head()

Out[3]:

	UMAP0	UMAP1	filename	page_index	png_filename	text	url
0	2.785299	5.125338	12623730_14_Summary.pdf	0	12623730_14_Summary_0.png	Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/...	https://zenodo.org/api/records/12623730/files/...
1	1.759109	5.196022	12623730_14_Summary.pdf	1	12623730_14_Summary_1.png	Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/...	https://zenodo.org/api/records/12623730/files/...
2	1.605859	6.084491	12623730_14_Summary.pdf	2	12623730_14_Summary_2.png	Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/...	https://zenodo.org/api/records/12623730/files/...
3	1.581907	6.084695	12623730_14_Summary.pdf	3	12623730_14_Summary_3.png	Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/...	https://zenodo.org/api/records/12623730/files/...
4	2.163119	7.161102	12623730_14_Summary.pdf	4	12623730_14_Summary_4.png	Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/...	https://zenodo.org/api/records/12623730/files/...

We also define a helper function that loads all images mentioned in a dataframe into one big numpy array.

In [4]:

def get_images(selected_rows):
    # Load images for selected pages
    images = []
    for _, row in selected_rows.iterrows():
        img_path = os.path.join('png_umap', row['png_filename'])
        img = imread(img_path)
        images.append(img)

    if len(images) == 0:
        return np.zeros((2,2,2))
    else:
        return np.asarray(images)

We can use the sliceplot function of stackview to visualize the embedding next to selected slides.

In [5]:

stackview.sliceplot(df, get_images(df), column_x="UMAP0", column_y="UMAP1", zoom_factor=1.5, zoom_spline_order=2)

Out[5]:

HBox(children=(HBox(children=(VBox(children=(VBox(children=(HBox(children=(VBox(children=(ImageWidget(height=4…

Exercise¶

Explore the plot by dragging lines around islands of datapoints with the mouse. What content are these islands about?

In [ ]: