Cellpose 2.0 now allows you to train your own models in the GUI!
This notebook allows you to load this custom model and run the model on your images with a GPU.
In this notebook, you can also train a custom model using your labels (_seg.npy
) files, or other labels as _masks.tif
files. If you already have a trained model, skip this part of the notebook.
For more details on cellpose 2.0 check out the paper or the talk.
Mount your google drive to access all your image files, segmentations, and custom models. This also ensures that any models you train are saved to your google drive. If you'd like to try out the notebook without your own files, please download the sample images from tissuenet (optional step in Setup below).
This notebook was inspired by the Zero-Cost Deep-Learning to Enhance Microscopy project (https://github.com/HenriquesLab/DeepLearning_Collab/wiki). Jointly developed by the Jacquemet (link to https://cellmig.org/) and Henriques (https://henriqueslab.github.io/) laboratories. Please check out their great work!
We will first install cellpose 2.0, check the GPU is working, and mount google drive to get your models and images.
Install cellpose -- by default the torch GPU version is installed in COLAB notebook.
!pip install "opencv-python-headless<4.3"
!pip install cellpose
Collecting opencv-python-headless<4.3 Downloading opencv_python_headless-4.2.0.34-cp37-cp37m-manylinux1_x86_64.whl (21.6 MB) |████████████████████████████████| 21.6 MB 2.4 MB/s Requirement already satisfied: numpy>=1.14.5 in /usr/local/lib/python3.7/dist-packages (from opencv-python-headless<4.3) (1.21.6) Installing collected packages: opencv-python-headless Successfully installed opencv-python-headless-4.2.0.34 Collecting cellpose Downloading cellpose-2.0.4-py3-none-any.whl (168 kB) |████████████████████████████████| 168 kB 5.2 MB/s Requirement already satisfied: numpy>=1.20.0 in /usr/local/lib/python3.7/dist-packages (from cellpose) (1.21.6) Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from cellpose) (4.64.0) Requirement already satisfied: natsort in /usr/local/lib/python3.7/dist-packages (from cellpose) (5.5.0) Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from cellpose) (1.4.1) Requirement already satisfied: torch>=1.6 in /usr/local/lib/python3.7/dist-packages (from cellpose) (1.11.0+cu113) Requirement already satisfied: numba in /usr/local/lib/python3.7/dist-packages (from cellpose) (0.51.2) Requirement already satisfied: tifffile in /usr/local/lib/python3.7/dist-packages (from cellpose) (2021.11.2) Requirement already satisfied: opencv-python-headless in /usr/local/lib/python3.7/dist-packages (from cellpose) (4.2.0.34) Collecting fastremap Downloading fastremap-1.12.2-cp37-cp37m-manylinux2010_x86_64.whl (3.4 MB) |████████████████████████████████| 3.4 MB 50.4 MB/s Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch>=1.6->cellpose) (4.2.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba->cellpose) (57.4.0) Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba->cellpose) (0.34.0) Installing collected packages: fastremap, cellpose Successfully installed cellpose-2.0.4 fastremap-1.12.2
Check CUDA version and that GPU is working in cellpose and import other libraries.
!nvcc --version
!nvidia-smi
import os, shutil
import numpy as np
import matplotlib.pyplot as plt
from cellpose import core, utils, io, models, metrics
from glob import glob
use_GPU = core.use_gpu()
yn = ['NO', 'YES']
print(f'>>> GPU activated? {yn[use_GPU]}')
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:09:46_PDT_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.TC455_06.29190527_0 Sat Apr 30 23:46:48 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 | | N/A 42C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ >>> GPU activated? YES
Please mount your google drive and find your working folder with (if available) the model that you trained. If you want to train a model, create a folder in google drive with the images and the labels, either _seg.npy
files from the cellpose gui, or _masks.tif
files.
#@markdown ###Run this cell to connect your Google Drive to Colab
#@markdown * Click on the URL.
#@markdown * Sign in your Google Account.
#@markdown * Copy the authorization code.
#@markdown * Enter the authorization code.
#@markdown * Click on "Files" site on the right. Refresh the site. Your Google Drive folder should now be available here as "drive".
#mounts user's Google Drive to Google Colab.
from google.colab import drive
drive.mount('/content/gdrive')
If you don't mount your google drive, and want to test cellpose 2.0, run the next code block to download the example data. This human_in_the_loop
folder has a train
folder with training images and manual segmentations (in this case created in the loop), and a test
folder with test images and manual segmentations from scratch.
These images are from the breast vectra class from the Tissuenet dataset. The full Tissuenet dataset is available here, and for a description of the data see the paper.
We will convert the _seg.npy
files to _masks.tif
files.
# !rm -rf human_in_the_loop/
from natsort import natsorted
url = 'https://drive.google.com/uc?id=1HXpLczf7TPCdI1yZY5KV3EkdWzRrgvhQ'
utils.download_url_to_file(url, 'human_in_the_loop.zip')
!unzip human_in_the_loop
train_files = natsorted(glob('human_in_the_loop/train/*.tif'))
train_seg = natsorted(glob('human_in_the_loop/train/*.npy'))
test_files = natsorted(glob('human_in_the_loop/test/*.npy'))
test_seg = natsorted(glob('human_in_the_loop/test/*.npy'))
100%|██████████| 15.5M/15.5M [00:00<00:00, 77.4MB/s]
Archive: human_in_the_loop.zip creating: human_in_the_loop/test/ inflating: human_in_the_loop/test/breast_vectra_0.tif inflating: human_in_the_loop/test/breast_vectra_0_seg.npy inflating: human_in_the_loop/test/img_1213.tif inflating: human_in_the_loop/test/img_1213_seg.npy inflating: human_in_the_loop/test/img_1220.tif inflating: human_in_the_loop/test/img_1220_seg.npy creating: human_in_the_loop/train/ inflating: human_in_the_loop/train/img_0.tif inflating: human_in_the_loop/train/img_0_seg.npy inflating: human_in_the_loop/train/img_1.tif inflating: human_in_the_loop/train/img_1_seg.npy inflating: human_in_the_loop/train/img_3.tif inflating: human_in_the_loop/train/img_3_seg.npy inflating: human_in_the_loop/train/img_4.tif inflating: human_in_the_loop/train/img_4_seg.npy inflating: human_in_the_loop/train/img_5.tif inflating: human_in_the_loop/train/img_5_seg.npy
what the training images look like + their labels
plt.figure(figsize=(12,4), dpi=300)
for k,f in enumerate(train_files):
img = io.imread(f)
plt.subplot(2,len(train_files),k+1)
img = np.vstack((img, np.zeros_like(img)[:1]))
img = img.transpose(1,2,0)
plt.imshow(img)
plt.axis('off')
plt.subplot(2,len(train_files),len(train_files) + k+1)
seg = np.load(os.path.splitext(f)[0] + '_seg.npy', allow_pickle=True).item()
masks= seg['masks'].squeeze()
plt.imshow(masks)
plt.axis('off')
Skip this step if you already have a pretrained model.
Fill out the form below with the paths to your data and the parameters to start training.
Paths for training, predictions and results
train_dir:
, test_dir
: These are the paths to your folders train_dir (with images and masks of training images) and test_dir (with images and masks of test images). You can leave the test_dir blank, but it's recommended to have some test images to check the model's performance. To find the paths of the folders containing the respective datasets, go to your Files on the left of the notebook, navigate to the folder containing your files and copy the path by right-clicking on the folder, Copy path and pasting it into the right box below.
initial_model
: Choose a model from the cellpose model zoo to start from.
model_name
: Enter the path where your model will be saved once trained (for instance your result folder).
Training parameters
number_of_epochs
: Input how many epochs (rounds) the network will be trained. At least 100 epochs are recommended, but sometimes 250 epochs are necessary, particularly from scratch. Default value: 100
#@markdown ###Path to images and masks:
train_dir = "human_in_the_loop/train" #@param {type:"string"}
test_dir = "human_in_the_loop/test" #@param {type:"string"}
#Define where the patch file will be saved
base = "/content"
# model name and path
#@markdown ###Name of the pretrained model to start from and new model name:
from cellpose import models
initial_model = "cyto" #@param ['cyto','nuclei','tissuenet','livecell','cyto2','CP','CPx','TN1','TN2','TN3','LC1','LC2','LC3','LC4','scratch']
model_name = "CP_tissuenet" #@param {type:"string"}
# other parameters for training.
#@markdown ###Training Parameters:
#@markdown Number of epochs:
n_epochs = 100#@param {type:"number"}
Channel_to_use_for_training = "Green" #@param ["Grayscale", "Blue", "Green", "Red"]
# @markdown ###If you have a secondary channel that can be used for training, for instance nuclei, choose it here:
Second_training_channel= "Red" #@param ["None", "Blue", "Green", "Red"]
#@markdown ###Advanced Parameters
Use_Default_Advanced_Parameters = True #@param {type:"boolean"}
#@markdown ###If not, please input:
learning_rate = 0.1 #@param {type:"number"}
weight_decay = 0.0001 #@param {type:"number"}
if (Use_Default_Advanced_Parameters):
print("Default advanced parameters enabled")
learning_rate = 0.1
weight_decay = 0.0001
#here we check that no model with the same name already exist, if so delete
model_path = train_dir + 'models/'
if os.path.exists(model_path+'/'+model_name):
print("!! WARNING: "+model_name+" already exists and will be deleted in the following cell !!")
if len(test_dir) == 0:
test_dir = None
# Here we match the channel to number
if Channel_to_use_for_training == "Grayscale":
chan = 0
elif Channel_to_use_for_training == "Blue":
chan = 3
elif Channel_to_use_for_training == "Green":
chan = 2
elif Channel_to_use_for_training == "Red":
chan = 1
if Second_training_channel == "Blue":
chan2 = 3
elif Second_training_channel == "Green":
chan2 = 2
elif Second_training_channel == "Red":
chan2 = 1
elif Second_training_channel == "None":
chan2 = 0
if initial_model=='scratch':
initial_model = 'None'
Default advanced parameters enabled
Here's what the command to train would be on the command line -- make sure if you run this locally to correct the paths for your local computer.
run_str = f'python -m cellpose --use_gpu --verbose --train --dir {train_dir} --pretrained_model {initial_model} --chan {chan} --chan2 {chan2} --n_epochs {n_epochs} --learning_rate {learning_rate} --weight_decay {weight_decay}'
if test_dir is not None:
run_str += f' --test_dir {test_dir}'
run_str += ' --mask_filter _seg.npy' # if you want to use _seg.npy files for training
print(run_str)
python -m cellpose --use_gpu --verbose --train --dir human_in_the_loop/train --pretrained_model cyto --chan 2 --chan2 1 --n_epochs 100 --learning_rate 0.1 --weight_decay 0.0001 --test_dir human_in_the_loop/test --mask_filter _seg.npy
Using settings from form above, train model in notebook.
# start logger (to see training across epochs)
logger = io.logger_setup()
# DEFINE CELLPOSE MODEL (without size model)
model = models.CellposeModel(gpu=use_GPU, model_type=initial_model)
# set channels
channels = [chan, chan2]
# get files
output = io.load_train_test_data(train_dir, test_dir, mask_filter='_seg.npy')
train_data, train_labels, _, test_data, test_labels, _ = output
new_model_path = model.train(train_data, train_labels,
test_data=test_data,
test_labels=test_labels,
channels=channels,
save_path=train_dir,
n_epochs=n_epochs,
learning_rate=learning_rate,
weight_decay=weight_decay,
nimg_per_epoch=8,
model_name=model_name)
# diameter of labels in training images
diam_labels = model.diam_labels.copy()
2022-04-30 23:49:32,053 [INFO] WRITING LOG OUTPUT TO /root/.cellpose/run.log 2022-04-30 23:49:32,070 [INFO] >> cyto << model set to be used 2022-04-30 23:49:32,075 [INFO] ** TORCH CUDA version installed and working. ** 2022-04-30 23:49:32,080 [INFO] >>>> using GPU 2022-04-30 23:49:32,588 [INFO] >>>> model diam_mean = 30.000 (ROIs rescaled to this size during training) 2022-04-30 23:49:32,606 [INFO] not all flows are present, running flow generation for all images 2022-04-30 23:49:32,677 [INFO] 5 / 5 images in human_in_the_loop/train folder have labels 2022-04-30 23:49:32,683 [INFO] not all flows are present, running flow generation for all images 2022-04-30 23:49:32,703 [INFO] 3 / 3 images in human_in_the_loop/test folder have labels 2022-04-30 23:49:32,796 [INFO] computing flows for labels
100%|██████████| 5/5 [00:00<00:00, 7.34it/s]
2022-04-30 23:49:33,503 [INFO] computing flows for labels
100%|██████████| 3/3 [00:00<00:00, 9.34it/s]
2022-04-30 23:49:33,876 [INFO] >>>> median diameter set to = 30 2022-04-30 23:49:33,881 [INFO] >>>> mean of training label mask diameters (saved to model) 19.259 2022-04-30 23:49:33,888 [INFO] >>>> training network with 2 channel input <<<< 2022-04-30 23:49:33,892 [INFO] >>>> LR: 0.10000, batch_size: 8, weight_decay: 0.00010 2022-04-30 23:49:33,897 [INFO] >>>> ntrain = 5, ntest = 3 2022-04-30 23:49:33,900 [INFO] >>>> nimg_per_epoch = 8 2022-04-30 23:49:35,131 [INFO] Epoch 0, Time 1.2s, Loss 2.5977, Loss Test 3.1540, LR 0.0000 2022-04-30 23:49:35,474 [INFO] saving network parameters to human_in_the_loop/train/models/CP_tissuenet 2022-04-30 23:49:36,835 [INFO] Epoch 5, Time 2.9s, Loss 2.5898, Loss Test 2.4993, LR 0.0556 2022-04-30 23:49:38,402 [INFO] Epoch 10, Time 4.5s, Loss 1.7924, Loss Test 2.5567, LR 0.1000 2022-04-30 23:49:41,475 [INFO] Epoch 20, Time 7.6s, Loss 1.4150, Loss Test 2.4996, LR 0.1000 2022-04-30 23:49:44,524 [INFO] Epoch 30, Time 10.6s, Loss 1.2365, Loss Test 2.4980, LR 0.1000 2022-04-30 23:49:47,595 [INFO] Epoch 40, Time 13.7s, Loss 1.1301, Loss Test 2.4164, LR 0.1000 2022-04-30 23:49:50,674 [INFO] Epoch 50, Time 16.8s, Loss 1.1060, Loss Test 2.4959, LR 0.1000 2022-04-30 23:49:53,747 [INFO] Epoch 60, Time 19.8s, Loss 1.0456, Loss Test 2.5608, LR 0.1000 2022-04-30 23:49:56,839 [INFO] Epoch 70, Time 22.9s, Loss 0.9907, Loss Test 2.5011, LR 0.1000 2022-04-30 23:49:59,948 [INFO] Epoch 80, Time 26.0s, Loss 0.9823, Loss Test 2.5206, LR 0.1000 2022-04-30 23:50:03,031 [INFO] Epoch 90, Time 29.1s, Loss 1.0361, Loss Test 2.4964, LR 0.1000 2022-04-30 23:50:05,777 [INFO] saving network parameters to human_in_the_loop/train/models/CP_tissuenet
If you have test data, check performance
# get files (during training, test_data is transformed so we will load it again)
output = io.load_train_test_data(test_dir, mask_filter='_seg.npy')
test_data, test_labels = output[:2]
# run model on test images
masks = model.eval(test_data,
channels=[chan, chan2],
diameter=diam_labels)[0]
# check performance using ground truth labels
ap = metrics.average_precision(test_labels, masks)[0]
print('')
print(f'>>> average precision at iou threshold 0.5 = {ap[:,0].mean():.3f}')
2022-04-30 23:52:13,228 [INFO] 3 / 3 images in human_in_the_loop/test folder have labels 2022-04-30 23:52:13,246 [INFO] 0%| | 0/3 [00:00<?, ?it/s] 2022-04-30 23:52:13,730 [INFO] 33%|###3 | 1/3 [00:00<00:00, 2.07it/s] 2022-04-30 23:52:14,140 [INFO] 67%|######6 | 2/3 [00:00<00:00, 2.27it/s] 2022-04-30 23:52:14,555 [INFO] 100%|##########| 3/3 [00:01<00:00, 2.34it/s] 2022-04-30 23:52:14,557 [INFO] 100%|##########| 3/3 [00:01<00:00, 2.29it/s] >>> average precision at iou threshold 0.5 = 0.693
plot masks
plt.figure(figsize=(12,8), dpi=150)
for k,im in enumerate(test_data):
img = im.copy()
plt.subplot(3,len(train_files), k+1)
img = np.vstack((img, np.zeros_like(img)[:1]))
img = img.transpose(1,2,0)
plt.imshow(img)
plt.axis('off')
if k==0:
plt.title('image')
plt.subplot(3,len(train_files), len(train_files) + k+1)
plt.imshow(masks[k])
plt.axis('off')
if k==0:
plt.title('predicted labels')
plt.subplot(3,len(train_files), 2*len(train_files) + k+1)
plt.imshow(test_labels[k])
plt.axis('off')
if k==0:
plt.title('true labels')
plt.tight_layout()
# model name and path
#@markdown ###Custom model path (full path):
model_path = "human_in_the_loop/train/models/CP_tissuenet" #@param {type:"string"}
#@markdown ###Path to images:
dir = "human_in_the_loop/test" #@param {type:"string"}
#@markdown ###Channel Parameters:
Channel_to_use_for_segmentation = "Green" #@param ["Grayscale", "Blue", "Green", "Red"]
# @markdown If you have a secondary channel that can be used, for instance nuclei, choose it here:
Second_segmentation_channel= "Red" #@param ["None", "Blue", "Green", "Red"]
# Here we match the channel to number
if Channel_to_use_for_segmentation == "Grayscale":
chan = 0
elif Channel_to_use_for_segmentation == "Blue":
chan = 3
elif Channel_to_use_for_segmentation == "Green":
chan = 2
elif Channel_to_use_for_segmentation == "Red":
chan = 1
if Second_segmentation_channel == "Blue":
chan2 = 3
elif Second_segmentation_channel == "Green":
chan2 = 2
elif Second_segmentation_channel == "Red":
chan2 = 1
elif Second_segmentation_channel == "None":
chan2 = 0
#@markdown ### Segmentation parameters:
#@markdown diameter of cells (set to zero to use diameter from training set):
diameter = 0#@param {type:"number"}
#@markdown threshold on flow error to accept a mask (set higher to get more cells, e.g. in range from (0.1, 3.0), OR set to 0.0 to turn off so no cells discarded):
flow_threshold = 0.4 #@param {type:"slider", min:0.0, max:3.0, step:0.1}
#@markdown threshold on cellprob output to seed cell masks (set lower to include more pixels or higher to include fewer, e.g. in range from (-6, 6)):
cellprob_threshold=0 #@param {type:"slider", min:-6, max:6, step:1}
if you're using the example test data we'll copy it to a new folder
src = 'human_in_the_loop/test'
if dir[:len(src)] == src:
files = io.get_image_files(dir, '_masks')
dir = 'human_in_the_loop/eval/'
os.makedirs(dir, exist_ok=True)
for f in files:
dst = dir + os.path.split(f)[1]
print(f'{f} > {dst}')
shutil.copyfile(f, dst)
human_in_the_loop/test/breast_vectra_0.tif > human_in_the_loop/eval/breast_vectra_0.tif human_in_the_loop/test/img_1213.tif > human_in_the_loop/eval/img_1213.tif human_in_the_loop/test/img_1220.tif > human_in_the_loop/eval/img_1220.tif
Here's what the command to train would be on the command line -- make sure if you run this locally to correct the paths for your local computer.
run_str = f'python -m cellpose --use_gpu --verbose --dir {dir} --pretrained_model {model_path} --chan {chan} --chan2 {chan2} --diameter {diameter} --flow_threshold {flow_threshold} --cellprob_threshold {cellprob_threshold}'
print(run_str)
python -m cellpose --use_gpu --verbose --dir human_in_the_loop/eval/ --pretrained_model human_in_the_loop/train/models/CP_tissuenet --chan 2 --chan2 1 --diameter 0 --flow_threshold 0.4 --cellprob_threshold 0
how to run the custom model in a notebook
# gets image files in dir (ignoring image files ending in _masks)
files = io.get_image_files(dir, '_masks')
print(files)
images = [io.imread(f) for f in files]
# declare model
model = models.CellposeModel(gpu=True,
pretrained_model=model_path)
# use model diameter if user diameter is 0
diameter = model.diam_labels if diameter==0 else diameter
# run model on test images
masks, flows, styles = model.eval(images,
channels=[chan, chan2],
diameter=diameter,
flow_threshold=flow_threshold,
cellprob_threshold=cellprob_threshold
)
['human_in_the_loop/eval/breast_vectra_0.tif', 'human_in_the_loop/eval/img_1213.tif', 'human_in_the_loop/eval/img_1220.tif'] 2022-04-30 23:52:29,909 [INFO] >>>> loading model human_in_the_loop/train/models/CP_tissuenet 2022-04-30 23:52:29,911 [INFO] ** TORCH CUDA version installed and working. ** 2022-04-30 23:52:29,913 [INFO] >>>> using GPU 2022-04-30 23:52:30,084 [INFO] >>>> model diam_mean = 30.000 (ROIs rescaled to this size during training) 2022-04-30 23:52:30,085 [INFO] >>>> model diam_labels = 19.259 (mean diameter of training ROIs) 2022-04-30 23:52:30,088 [INFO] 0%| | 0/3 [00:00<?, ?it/s] 2022-04-30 23:52:30,497 [INFO] 33%|###3 | 1/3 [00:00<00:00, 2.46it/s] 2022-04-30 23:52:30,782 [INFO] 67%|######6 | 2/3 [00:00<00:00, 2.98it/s] 2022-04-30 23:52:31,071 [INFO] 100%|##########| 3/3 [00:00<00:00, 3.19it/s] 2022-04-30 23:52:31,072 [INFO] 100%|##########| 3/3 [00:00<00:00, 3.05it/s]
you will see the files save in the Files tab and you can download them from there
from cellpose import io
io.masks_flows_to_seg(images,
masks,
flows,
diameter*np.ones(len(masks)),
files,
[chan, chan2])
io.save_masks(images,
masks,
flows,
files,
channels=[chan, chan2],
png=True, # save masks as PNGs and save example image
tif=True, # save masks as TIFFs
save_txt=True, # save txt outlines for ImageJ
save_flows=False, # save flows as TIFFs
save_outlines=False, # save outlines as TIFFs
)
f = files[0]
plt.figure(figsize=(12,4), dpi=300)
plt.imshow(io.imread(os.path.splitext(f)[0] + '_cp_output.png'))
plt.axis('off')
(-0.5, 3599.5, 899.5, -0.5)