Notebook

Image Segmentation using the idr0021 dataset¶

This notebook uses the idr0021 dataset (idr0021-lawo-pericentriolarmaterial/experimentA) and tries to reproduce some of the analysis published in 'Subdiffraction imaging of centrosomes reveals higher-order organizational features of pericentriolar material'; in particular to create a figure similar to Figure 1 of the article.

Tasks¶

Connect to an OMERO.server
Load data (images, datasets, projects, pixel data)
Image segmentation using EBImage
Create ROIs
Store segmentation results on the OMERO.server
Plot the results
Calculate basic statistics from the segmentation data

Exercise: Go to https://workshop.openmicroscopy.org , log in as your given user, and find your dataset called 'R-dataset'. Leave the window open, as you will use OMERO.web to check the results as we're going along.

How to use this notebook¶

The grey blocks with the square brackets to the left are code blocks. Click on a block to select it and click the play button in the toolbar to execute the block of code (alternatively hold SHIFT key and hit ENTER). The bracket will show a star to indicate that this code is currently running. The output is displayed below the block. Wait for it to finish. When the execution is finished the star will be replaced by a number.

As we go through the notebook you are expected to run the code blocks in order.

Here's an example:

In [ ]:

message("Just wait for a bit...")
Sys.sleep(3)
message("Done.")

You can change the code and run the block again.

Let's get started...¶

Load the necessary libraries:

romero-gateway for the OMERO <-> R communication
EBImage to perform the image thresholding.

In [ ]:

# Load the libraries
library(romero.gateway)
library(EBImage, warn.conflicts = FALSE)

Log in to the OMERO server¶

In [ ]:

user_name = readline('Username: ')
user_password <- getPass::getPass('OMERO password: ')

server <- OMEROServer(host = 'wss://workshop.openmicroscopy.org/omero-ws', port = 443L, username=user_name, password=user_password)
server <- connect(server)
paste('Successfully logged in as', server@user$getUserName())

Load and display an image¶

Exercise: Go to the image 'siControl_N20_Cep215_I_20110411_Mon-1509_0_SIR_PRJ.dv' in your 'R-dataset', find the 'Image ID', copy it and paste it below.

In [ ]:

imageId <- REPLACE_WITH_IMAGE_ID 
image <- loadObject(server, "ImageData", imageId)
paste("Image", imageId, "loaded.")

Load the pixel values and display the image

In [ ]:

# There is just one plane, so z = 1 and t = 1
z <- 1
t <- 1

# Load the second channel
channelIndex <- 2

pixels <- getPixelValues(image, z, t, channelIndex)

ebimage <- EBImage::Image(data = pixels, colormode = 'Grayscale')
ebimage <- normalize(ebimage)
EBImage::display(ebimage)

Image segmentation¶

Visual¶

We are using four steps for the segmentation:

Thresholding using the thresh function to get a binary image.
Filtering to remove noise, using medianFilter.
fillHull to close holes in the segmented objects.

Note: Up till now we still had a binary image (the pixel array only consists of 0's (background) and 1's (objects)), therefore we need:

bwlabel to distinguish the single objects from each other. It assigns a different number to each object (first object will get pixel values '1', the second pixel values '2', etc.)

In [ ]:

# Threshold
threshImg <- thresh(ebimage, w=10, h=10, offset=0.01)

# Remove noise
threshImg <- medianFilter(threshImg, size=3)

# Fill holes
threshImg <- fillHull(threshImg)          

# Distinguish objects
threshImg <- bwlabel(threshImg)

# Show the segmented image
EBImage::display(colorLabels(threshImg))

Exercise: Modify the parameters of the thresh function w, h and offset to get a better segmentation. Goal: The two centrioles should be detected as two separate objects. We don't have to get rid off all the noise, but the two centrioles should be the largest objects detected. (Hint: w=15, h=15, offset=0.1 seems to work pretty well)

Compute the features (measurements)¶

Use the computeFeatures methods to calculate some properties of the objects.

In [ ]:

shapes = computeFeatures.shape(threshImg)
moments = computeFeatures.moment(threshImg)

# merge the two dataframes together into one 'features' dataframe
features <- merge(shapes, moments, by=0, all=TRUE)
features

For further analysis we need the object sizes: 's.area', 's.perimeter', especially 's.radius.mean' (~ diameter).

For the visualisation we can use the location (m.cx, m.mcy), the major radius (m.majoraxis), the minor radius (can be calculated from m.eccentricity) and the rotation angle (m.theta) to draw an Ellipse ROI around each object.

Save the results back to OMERO¶

Create the ROIs from the features table¶

In order to be able to re-use this code later, we define a function for it:

In [ ]:

#' Create ROIs from a features table
#'
#' @param features The shape and moments generated by EBImage computeFeatures
#' @return A dataframe specifying the x, y, rx, ry, w, h, theta and text parameters of the ROIs
createROIs <- function(features) {
    rois <- data.frame(x = c(0), y = c(0), rx = c(0), ry = c(0), w = c(0), h = c(0), theta = c(0), text = c('remove'), stringsAsFactors = FALSE)
    for (index in 1:length(features[,1])) {
        x <- features[index,8]
        y <- features[index,9]
        r1 <- features[index,10]
        ecc <- features[index,11]
        r2 <- sqrt(r1^2 * (1 - ecc^2))
        theta <- features[index,12]
        rois <- rbind(rois, c(x, y, r1, r2, NA, NA, -theta, as.character(index)))
    }
    rois <- rois[-1,]
    rownames(rois) <- c()
    return(rois)
}
print("Function 'createROIs' defined.")

In [ ]:

rois <- createROIs(features)
rois

Save the ROIs back to OMERO

In [ ]:

# addROIs creates an ROI for each entry in the dataframe specified by the 'coords' parameter
addROIs(image, coords = rois)
print("ROIs successfully created.")

Exercise: Open the image in the full viewer in OMERO.web and check the ROIs.

Store the features table¶

Attach the whole 'features' dataframe to the image:

In [ ]:

# 'attachDataframe' directly attaches an R dataframe to an OMERO image (project, dataset, etc.)
# ('invisible' not strictly necessary, just suppresses some unnecessary output)
invisible(attachDataframe(image, features, "ROI features"))
print("Data frame successfully attached.")

The dataframe is now attached to the image as an HDF table with file name 'ROI features'. You can download and open it with software like HDF Compass, load it into python scripts using 'h5py' library, etc. or simply load it directly as R dataframe again.

Alternatively attach it as CSV file:

Use R's built-in write.csv method to create a csv file
And attach that to the image using attachFile function from the romero.gateway

In [ ]:

csvFile <- "/tmp/ROI_features.csv"
write.csv(features, file = csvFile)
fileannotation <- attachFile(image, csvFile)
print("CSV file successfully attached.")

Automate¶

Tasks:

Segment each image of a dataset.
For visualisation create an ROI for each of the objects detected.
For further analysis only take the largest object per image into account (assumption: this is a centriole). Parameters to evaluate for this object are area, perimeter and diameter

Note: We need to be able to specify the channel. Only one channel in each image is relevant for our analysis. The relevant channel is identified by its name.

We put all the pieces together and wrap them up in a function called 'analyzeImage':

Determine the relevant channel
Load the pixel values
Perform the segmentation
Calculate the features of the detected objects
Create ROIs to visualise the segmentation results (adding the ROI 'index' as text/comment so that we can identify the ROI in the features table)
Extract the valuable information for further statistical analysis (area, perimeter and diameter of the largest object)

In [ ]:

#' Performs an image segmentation to find the largest ROI of the image
#' and returns some features of interest (area, perimeter and diameter).
#' Optionally: Creates an ROI for each object detected by the segmentation.
#'
#' @param image The image to segment
#' @param channelName The channel to be taken into account
#' @param df The dataframe to add the features to (channelName, imageName, ImageID, ROIIndex, area, perimeter, diameter)
#' @param saveROIs Flag if ROIs should be created and attached to the image (default: FALSE)
#' @return The dataframe with the features
analyzeImage <- function(image, channelName, df, saveROIs = FALSE) {
    # Find the channel index
    chnames <- getChannelNames(image)
    chindex <- match(channelName, chnames, nomatch = 0)
    if (chindex == 0) {
      message (paste("Could not resolve channel name, skipping ", image@dataobject$getId()))
      return(df)
    }
    
    # Load the pixels
    pixels <- getPixelValues(image, 1, 1, chindex)
    ebi <- EBImage::Image(data = pixels, colormode = 'Grayscale')
    ebi <- normalize(ebi)
     
    # this is our segmentation workflow from above
    threshImg <- thresh(ebi, w=15, h=15, offset=0.1)
    threshImg <- medianFilter(threshImg, size=3)
    threshImg <- fillHull(threshImg)          
    threshImg <- bwlabel(threshImg)
    
    # Calculate the features
    shapes = suppressMessages(computeFeatures.shape(threshImg))
    moments = suppressMessages(computeFeatures.moment(threshImg))
    features <- merge(shapes, moments, by=0, all=TRUE)
    
    if (length(features[,1])>1) {
        # Add the ROIs to the image
        if (saveROIs) {
            rois <- createROIs(features)
            addROIs(image, coords = rois)
        }
        
        # Add the interesting properties (area, perimeter and diameter)
        # of the largest object together with channel name, image name, image id 
        # and roi index to the dataframe
        features <- features[order(-features[,2]),]
        diameter <- features[1,4]*2
        df <- rbind(df, c(channelName, image@dataobject$getName(), image@dataobject$getId(), features[1,1], features[1,2], features[1,3], diameter))
    }
    return(df)
}
print("Function 'analyzeImage' defined.")

Iterate over the dataset and call the analyze function for each of the images:

In [ ]:

datasetId <- REPLACE_WITH_DATASET_ID

channelName <- 'CDK5RAP2-C'

dataset <- loadObject(server, "DatasetData", datasetId)

# Keep the channel name, image name, image id, area, perimeter, and diameter of the largest ROIs
result <- data.frame(Channel = c('remove'), ImageName = c('remove'), Image = c(0), ROIIndex = c(0), Area = c(0), Perimeter = c(0), Diameter = c(0), stringsAsFactors = FALSE)

images <- getImages(dataset)
for (image in images) {
    result <- tryCatch({
        analyzeImage(image, channelName, result, saveROIs = TRUE)
    }, warning = function(war) {
        message(paste("WARNING:  ", image@dataobject$getId(),war))
        return(result)
    }, error = function(err) {
        message(paste("ERROR:  ", image@dataobject$getId() ,err))
        return(result)
    }, finally = {
    })
}

result <- result[-1,]
rownames(result) <- c()

# set the correct datatypes
result$Channel <- as.factor(result$Channel)
result$Area <- as.numeric(result$Area)
result$Perimeter <- as.numeric(result$Perimeter)
result$Diameter <- as.numeric(result$Diameter)

result

Exercise: Open some of the other images of the dataset in the full viewer to check the ROIs.

Optional Exercise: The measurements (Area, Perimeter and Diameter) are currently showing the number of pixels. Find out the pixel size in nanometers and apply it accordingly. Hint: The X and Y pixel sizes are the same for all images. Select one image in OMERO.web and check its metadata (right hand side panel). Solution: See bottom of this notebook.

In [ ]:

pxSizeInNM <- NA # REPLACE WITH PIXEL SIZE
if (!is.na(pxSizeInNM)) {
    # Replace result$Area, Perimeter, Diameter with the
    # size in nanometers, using the variable 'pxSizeInNM'
    # declared above.
    
    # Enter code here...
    
    # Just print out the data frame again to check result:
    result
}

Statistical analysis¶

The image segmentation has already been run over the whole idr0021 project and the results attached to the project as table 'Summary from R'. We are going to load this table from OMERO as dataframe "centrioles". In this manner, we will quickly obtain a sizeable segmentation dataset to perform a statistical analysis on.

Exercise: Go to trainer-1's idr0021 project, copy the ID and paste it below:

In [ ]:

projectId <- REPLACE_WITH_PROJECT_ID 
project <- loadObject(server, "ProjectData", projectId)
dataframes <- availableDataframes(project)
print(dataframes)

In [ ]:

# Load the dataframe
dfID <- dataframes$ID[1]
centrioles <- loadDataframe(project, dfID)

# Make sure the data types are correct
centrioles$Dataset <- as.factor(centrioles$Dataset)
centrioles$Diameter <- as.numeric(centrioles$Diameter)
centrioles

Plot the data¶

In [ ]:

# Order the datasets ascending by mean diameter 
ag <-aggregate(centrioles$Diameter ~ centrioles$Dataset, centrioles, median)
ordered <- factor(centrioles$Dataset, levels=ag[order(ag$`centrioles$Diameter`), 'centrioles$Dataset'])

# Draw the plot 
plot(centrioles$Diameter ~ ordered, ylab='Diameter (nm)', xlab="Protein", cex.axis=0.5)

Interpretation¶

One-way analysis of variance¶

Is there a significant difference between the proteins?

In [ ]:

fit <- aov(centrioles$Diameter ~ centrioles$Dataset)
summary(fit)

Two-sample Wilcoxon test of all pairwise combinations¶

Are all proteins significantly different from each other?

In [ ]:

# Two-sample Wilcoxon test ('Mann-Whitney') of all pairwise combinations:
combins <- combn(levels(centrioles$Dataset), 2)
params_list <- split(as.vector(combins), rep(1:ncol(combins), each = nrow(combins)))
testResults <- data.frame()
for (param in params_list) {
  testdf <- subset(centrioles, centrioles$Dataset %in% param)
  pval <- wilcox.test(formula = Diameter ~ Dataset, data = testdf)$p.value
  testResults <- rbind(testResults, data.frame(Protein_1=param[1], Protein_2=param[2], p_value=pval))
}
testResults

Finally close the connection to the OMERO.server again:

In [ ]:

disconnect(server)

Further exercise: Take a look at the plot again. Find the outliers in OMERO.web (using 'parade') and check what might have gone wrong there.

In [ ]:

# Solution for the optional exercise ('set pixel size in nanometer'):

pxSizeInNM <- 40
result$Area <- result$Area * pxSizeInNM ^ 2
result$Perimeter <- result$Perimeter * pxSizeInNM
result$Diameter <- result$Diameter * pxSizeInNM

License (BSD 2-Clause)¶

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.