In this notebook we will explore a stack of images using an interactive scatterplot of measurements that were done on the individual slices of the stack. For demonstration purposes, we explore a stack of teaching slides an embedding generated using large language models.
import pandas as pd
import stackview
import os
import numpy as np
from skimage.io import imread
import yaml
import requests
import zipfile
First, we download the dataset, which is licensed CC-BY 4.0 by Robert Haase.
# URL of the zip file
url = "https://zenodo.org/records/14030307/files/png_umap.zip?download=1"
# Save the zip file locally
zip_path = "png_umap.zip"
if not os.path.exists(zip_path):
# Download the zip file
response = requests.get(url)
with open(zip_path, "wb") as f:
f.write(response.content)
# Extract the contents
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall()
# Optionally, remove the zip file after extraction
#os.remove(zip_path)
# Read YAML file
with open('png_umap/png_umap.yml', 'r') as file:
loaded_dict = yaml.safe_load(file)
df = pd.DataFrame(loaded_dict)
# Show first few rows of the loaded DataFrame
df.head()
UMAP0 | UMAP1 | filename | page_index | png_filename | text | url | |
---|---|---|---|---|---|---|---|
0 | 2.785299 | 5.125338 | 12623730_14_Summary.pdf | 0 | 12623730_14_Summary_0.png | Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/... | https://zenodo.org/api/records/12623730/files/... |
1 | 1.759109 | 5.196022 | 12623730_14_Summary.pdf | 1 | 12623730_14_Summary_1.png | Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/... | https://zenodo.org/api/records/12623730/files/... |
2 | 1.605859 | 6.084491 | 12623730_14_Summary.pdf | 2 | 12623730_14_Summary_2.png | Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/... | https://zenodo.org/api/records/12623730/files/... |
3 | 1.581907 | 6.084695 | 12623730_14_Summary.pdf | 3 | 12623730_14_Summary_3.png | Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/... | https://zenodo.org/api/records/12623730/files/... |
4 | 2.163119 | 7.161102 | 12623730_14_Summary.pdf | 4 | 12623730_14_Summary_4.png | Robert Haase\n@haesleinhuepf\nBIDS Lecture 14/... | https://zenodo.org/api/records/12623730/files/... |
We also define a helper function that loads all images mentioned in a dataframe into one big numpy array.
def get_images(selected_rows):
# Load images for selected pages
images = []
for _, row in selected_rows.iterrows():
img_path = os.path.join('png_umap', row['png_filename'])
img = imread(img_path)
images.append(img)
if len(images) == 0:
return np.zeros((2,2,2))
else:
return np.asarray(images)
We can use the sliceplot
function of stackview to visualize the embedding next to selected slides.
stackview.sliceplot(df, get_images(df), column_x="UMAP0", column_y="UMAP1", zoom_factor=1.5, zoom_spline_order=2)
HBox(children=(HBox(children=(VBox(children=(VBox(children=(HBox(children=(VBox(children=(ImageWidget(height=4…
Explore the plot by dragging lines around islands of datapoints with the mouse. What content are these islands about?