Notebook

Introduction¶

Now we have an idea of three important components to analyzing neuroimaging data:

Data manipulation
Cleaning and confound regression
Parcellation and signal extraction

In this notebook the goal is to integrate these 3 basic components and perform a full analysis of group data using Intranetwork Functional Connectivity (FC).

Intranetwork functional connectivity is essentially a result of performing correlational analysis on mean signals extracted from two ROIs. Using this method we can examine how well certain resting state networks, such as the Default Mode Network (DMN), are synchronized across spatially distinct regions.

ROI-based correlational analysis forms the basis of many more sophisticated kinds of functional imaging analysis.

Using Nilearn's High-level functionality to compute correlation matrices¶

Nilearn has a built in function for extracting timeseries from functional files and doing all the extra signal processing at the same time. Let's walk through how this is done

First we'll grab our imports as usual

In [ ]:

from nilearn import image as nimg
from nilearn import plotting as nplot
import numpy as np
import pandas as pd
from bids import BIDSLayout

Let's grab the data that we want to perform our connectivity analysis on using PyBIDS:

In [ ]:

#Use PyBIDS to parse BIDS data structure
fmriprep_dir = "../data/ds000030/derivatives/fmriprep/"
layout = BIDSLayout(fmriprep_dir,
                   config=['bids','derivatives'])

In [ ]:

#Get resting state data (preprocessed, mask, and confounds file)
func_files = layout.get(datatype='func', task='rest',
                        desc='preproc',
                        space='MNI152NLin2009cAsym',
                        extension='nii.gz',
                        return_type='file')

mask_files = layout.get(datatype='func', task='rest',
                        desc='brain',
                        suffix="mask",
                        space='MNI152NLin2009cAsym',
                        extension='nii.gz',
                        return_type='file')

confound_files = layout.get(datatype='func',
                            task='rest',
                            desc='confounds',
                            extension='tsv',
                            return_type='file')

Now that we have a list of subjects to peform our analysis on, let's load up our parcellation template file

In [ ]:

#Load separated parcellation
parcel_file = '../resources/rois/yeo_2011/Yeo_JNeurophysiol11_MNI152/relabeled_yeo_atlas.nii.gz'
yeo_7 = nimg.load_img(parcel_file)

Now we'll import a package from nilearn, called input_data which allows us to pull data using the parcellation file, and at the same time applying data cleaning!

We first create an object using the parcellation file yeo_7 and our cleaning settings which are the following:

Settings to use:

Confounds: trans_x, trans_y, trans_z, rot_x, rot_y, rot_z, white_matter, csf, global_signal
Temporal Derivatives: Yes
high_pass = 0.009
low_pass = 0.08
detrend = True
standardize = True

In [ ]:

The object masker is now able to be used on any functional image of the same size. The input_data.NiftiLabelsMasker object is a wrapper that applies parcellation, cleaning and averaging to an functional image. For example let's apply this to our first subject:

In [ ]:

# Pull the first subject's data
func_file = func_files[0]
mask_file = mask_files[0]
confound_file = confound_files[0]

Before we go ahead and start using the masker that we've created, we have to do some preparatory steps. The following should be done prior to use the masker object:

Make your confounds matrix (as we've done in Episode 06)
Drop Dummy TRs that are to be excluded from our cleaning, parcellation, and averaging step

To help us with the first part, let's define a function to help extract our confound regressors from the .tsv file for us. Note that we've handled pulling the appropriate {confounds}_derivative1 columns for you! You just need to supply the base regressors!

In [ ]:

#Refer to part_06 for code + explanation
def extract_confounds(confound_tsv,confounds,dt=True):
    '''
    Arguments:
        confound_tsv                    Full path to confounds.tsv
        confounds                       A list of confounder variables to extract
        dt                              Compute temporal derivatives [default = True]
        
    Outputs:
        confound_mat                    
    '''
    
    if dt:    
        dt_names = ['{}_derivative1'.format(c) for c in confounds]
        confounds = confounds + dt_names
    
    #Load in data using Pandas then extract relevant columns
    confound_df = pd.read_csv(confound_tsv,delimiter='\t') 
    confound_df = confound_df[confounds]
    
 
    #Convert into a matrix of values (timepoints)x(variable)
    confound_mat = confound_df.values 
    
    #Return confound matrix
    return confound_mat

Finally we'll set up our image file for confound regression (as we did in Episode 6). To do this we'll drop 4 TRs from both our functional image and our confounds file. Note that our masker object will not do this for us!

In [ ]:

#Load functional image

#Remove the first 4 TRs

#Use the above function to pull out a confound matrix

#Drop the first 4 rows of the confounds matrix

Using the masker¶

Finally with everything set up, we can now use the masker to perform our:

Confounds cleaning
Parcellation
Averaging within a parcel

All in one step!

In [ ]:

#Apply cleaning, parcellation and extraction to functional data

Just to be clear, this data is automatically parcellated for you, and, in addition, is cleaned using the confounds you've specified!

The result of running masker.fit_transform is a matrix that has:

Rows matching the number of timepoints (148)
Columns, each for one of the ROIs that are extracted (43)

But wait!

We originally had 50 ROIs, what happened to 7 of them? It turns out that masker drops ROIs that are empty (i.e contain no brain voxels inside of them), this means that 7 of our atlas' parcels did not correspond to any region with signal! To see which ROIs are kept after computing a parcellation you can look at the labels_ property of masker:

In [ ]:

print(masker.labels_)
print("Number of labels", len(masker.labels_))

This means that our ROIs of interest (44 and 46) cannot be accessed using the 44th and 46th columns directly!

There are many strategies to deal with this weirdness. What we're going to do is to create a new array that fills in the regions that were removed with 0 values. It might seem a bit weird now, but it'll simplify things when we start working with multiple subjects!

First we'll identify all ROIs from the original atlas. We're going to use the numpy package which will provide us with functions to work with our image arrays:

In [ ]:

import numpy as np

In [ ]:

# Get the label numbers from the atlas

# Get number of labels that we have

Now we're going to create an array that contains:

A number of rows matching the number of timepoints
A number of columns matching the total number of regions

In [ ]:

# Remember fMRI images are of size (x,y,z,t)
# where t is the number of timepoints

# Create an array of zeros that has the correct size

# Get regions that are kept

# Fill columns matching labels with signal values

print(final_signal.shape)

It's a bit of work, but now we have an array where:

The column number matches the ROI label number
Any column that is lost during the masker.fit_transform is filled with 0 values!

To get the columns corresponding to the regions that we've kept, we can simply use the regions_kept variable to select columns corresponding to the regions that weren't removed:

In [ ]:

This is identical to the original output of masker.fit_transform

In [ ]:

This might seem unnecessary for now, but as you'll see in a bit, it'll come in handy when we deal with multiple subjects!

Calculating Connectivity¶

In fMRI imaging, connectivity typically refers to the correlation of the timeseries of 2 ROIs. Therefore we can calculate a full connectivity matrix by computing the correlation between all pairs of ROIs in our parcellation scheme!

We'll use another nilearn tool called ConnectivityMeasure from nilearn.connectome. This tool will perform the full set of pairwise correlations for us

In [ ]:

Like the masker, we need to make an object that will calculate connectivity for us.

In [ ]:

Try using SHIFT-TAB to see what options you can put into the kind argument of ConnectivityMeasure

Then we use correlation_measure.fit_transform() in order to calculate the full correlation matrix for our parcellated data!

In [ ]:

Note that we're using a list [final_signal], this is because correlation_measure works on a list of subjects. We'll take advantage of this later!

The result is a matrix which has:

A number of rows matching the number of ROIs in our parcellation atlas
A number of columns, that also matches the number of ROIs in our parcellation atlas

You can read this correlation matrix as follows:

Suppose we wanted to know the correlation between ROI 30 and ROI 40

Then Row 30, Column 40 gives us this correlation.
Row 40, Column 40 can also give us this correlation

This is because the correlation of $A \to B = B \to A$

NOTE

Remember we were supposed to lose 7 regions from the masker.fit_transform step. The correlations for these regions will be 0!

Let's try pulling the correlation for ROI 44 and 46!

In [ ]:

Note that it'll be the same if we swap the rows and columns!

In [ ]:

Exercise¶

Apply the data extract process shown above to all subjects in our subject list and collect the results. Your job is to fill in the blanks!

In [ ]:

# First we're going to create some empty lists to store all our data in!
ctrl_subjects = []
schz_subjects = []


# We're going to keep track of each of our subjects labels here
# pulled from masker.labels_
labels_list = []

# Get the number of unique labels in our parcellation
# We'll use this to figure out how many columns to make (as we did earlier)
atlas_labels = np.unique(yeo_7.get_fdata().astype(int))
NUM_LABELS = len(atlas_labels)

# Set the list of confound variables we'll be using
confound_variables = ['trans_x','trans_y','trans_z',
                               'rot_x','rot_y','rot_z',
                               'global_signal',
                               'white_matter','csf']

# Number of TRs we should drop
TR_DROP=4

# Lets get all the subjects we have
subjects = layout.get_subjects()
for sub in subjects:
    
    #Get the functional file for the subject (MNI space)
    func_file = layout.get(subject=??,
                           datatype='??', task='rest',
                           desc='??',
                           space='??'
                           extension="nii.gz",
                           return_type='file')[0]
    
    #Get the confounds file for the subject (MNI space)
    confound_file=layout.get(subject=??, datatype='??',
                             task='rest',
                             desc='??',
                             extension='tsv',
                             return_type='file')[0]
    
    #Load the functional file in
    func_img = nimg.load_img(??)
    
    #Drop the first 4 TRs
    func_img = func_img.slicer[??,??,??,??]
    
    
    #Extract the confound variables using the function
    confounds = extract_confounds(confound_file,
                                  confound_variables)
    
    #Drop the first 4 rows from the confound matrix
    confounds = confounds[??]
    
    # Make our array of zeros to fill out
    # Number of rows should match number of timepoints
    # Number of columns should match the total number of regions
    fill_array = np.zeros((func_img.shape[??], ??))
    
    #Apply the parcellation + cleaning to our data
    #What function of masker is used to clean and average data?
    time_series = masker.fit_transform(??,??)
    
    # Get the regions that were kept for this scan
    regions_kept = np.array(masker.labels_)
    
    # Fill the array, this is what we'll use
    # to make sure that all our array are of the same size
    fill_array[:, ??] = time_series
    
    #If the subject ID starts with a "1" then they are control
    if sub.startswith('1'):
        ctrl_subjects.append(fill_array)
    #If the subject ID starts with a "5" then they are case (case of schizophrenia)
    if sub.startswith('5'):
        schz_subjects.append(fill_array)
        
    labels_list.append(masker.labels_)

The result of all of this code is that:

Subjects who start with a "1" in their ID, are controls, and are placed into the ctrl_subjects list
Subjects who start with a "2" in their ID, have schizophrenia, and are placed into the schz_subjects list

What's actually being placed into the list? The cleaned, parcellated time series data for each subject (the output of masker.fit_transform)!

A helpful trick is that we can re-use the correlation_measure object we made earlier and apply it to a list of subject data!

In [ ]:

At this point, we have correlation matrices for each subject across two populations. The final step is to examine the differences between these groups in their correlation between ROI 44 and ROI 46.

Visualizing Correlation Matrices and Group Differences¶

An important step in any analysis is visualizing the data that we have. We've cleaned data, averaged data and calculated correlations but we don't actually know what it looks like! Visualizing data is important to ensure that we don't throw pure nonsense into our final statistical analysis

To visualize data we'll be using a python package called seaborn which will allow us to create statistical visualizations with not much effort.

In [ ]:

import seaborn as sns
import matplotlib.pyplot as plt

We can view a single subject's correlation matrix by using seaborn's heatmap function:

In [ ]:

Recall that cleaning and parcellating the data causes some ROIs to get dropped. We dealt with this by filling an array of zeros (fill_array) only for columns where the regions are kept (regions_kept). This means that we'll have some correlation values that are 0!

This is more apparent if we plot the data slightly differently. For demonstrative purposes we've:

Taken the absolute value of our correlations so that the 0's are the darkest color
Used a different color scheme

In [ ]:

sns.heatmap(np.abs(ctrl_correlation_matrices[0]), cmap='viridis')

The dark lines in the correlation matrix correspond to regions that were dropped and therefore have 0 correlation!

We can now pull our ROI 44 and 46 by indexing our list of correlation matrices as if it were a 3D array (kind of like an MR volume). Take a look at the shape:

In [ ]:

This is of form:

ctrl_correlation_matrices[subject_index, row_index, column_index]

Now we're going to pull out just the correlation values between ROI 44 and 46 across all our subjects. This can be done using standard array indexing:

In [ ]:

Next we're going to arrange this data into a table. We'll create two tables (called dataframes in the python package we're using, pandas)

In [ ]:

#Create control dataframe

In [ ]:

# Create the schizophrenia dataframe

The result is:

ctrl_df a table containing the correlation value for each control subject, with an additional column with the group label, which is 'control'
scz_df a table containing the correlation value for each schizophrenia group subject, with an additional column with the group label, which is 'schizophrenia'

For visualization we're going to stack the two tables together...

In [ ]:

#Stack the two dataframes together

# Show some random samples from dataframe

Finally we're going to visualize the results using the python package seaborn!

In [ ]:

#Visualize results

# Create a figure canvas of equal width and height

# Create a box plot, with the x-axis as group
#the y-axis as the correlation value

# Create a "swarmplot" as well, you'll see what this is..

# Set the title and labels of the figure

Although the results here aren't significant they seem to indicate that there might be three subclasses in our schizophrenia group - of course we'd need a lot more data to confirm this! The interpretation of these results should ideally be based on some a priori hypothesis!

Congratulations!¶

Hopefully now you understand that:

fMRI data needs to be pre-processed before analyzing
Manipulating images in python is easily done using nilearn and nibabel
You can also do post-processing like confound/nuisance regression using nilearn
Parcellating is a method of simplifying and "averaging" data. The type of parcellation reflect assumptions you make about the structure of your data
Functional Connectivity is really just time-series correlations between two signals!