In [ ]:
%matplotlib inline

Functional connectivity predicts age group

This example compares different kinds of functional connectivity between regions of interest : correlation, partial correlation, and tangent space embedding.

The resulting connectivity coefficients can be used to discriminate children from adults. In general, the tangent space embedding outperforms the standard correlations: see Dadi et al 2019 <>_ for a careful study.

Load brain development fMRI dataset and MSDL atlas

We study only 60 subjects from the dataset, to save computation time.

In [ ]:
from nilearn import datasets

development_dataset = datasets.fetch_development_fmri(n_subjects=60)

We use probabilistic regions of interest (ROIs) from the MSDL atlas.

In [ ]:
from nilearn.input_data import NiftiMapsMasker

msdl_data = datasets.fetch_atlas_msdl()
msdl_coords = msdl_data.region_coords

masker = NiftiMapsMasker(
    msdl_data.maps, resampling_target="data", t_r=2, detrend=True,
    low_pass=.1, high_pass=.01, memory='nilearn_cache', memory_level=1).fit()
masked_data = [masker.transform(func, confounds) for
               (func, confounds) in zip(
                   development_dataset.func, development_dataset.confounds)]

What kind of connectivity is most powerful for classification?

we will use connectivity matrices as features to distinguish children from adults. We use cross-validation and measure classification accuracy to compare the different kinds of connectivity matrices.

In [ ]:
# prepare the classification pipeline
from sklearn.pipeline import Pipeline
from nilearn.connectome import ConnectivityMeasure
from sklearn.svm import LinearSVC
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import GridSearchCV

kinds = ['correlation', 'partial correlation', 'tangent']

pipe = Pipeline(
    [('connectivity', ConnectivityMeasure(vectorize=True)),
     ('classifier', GridSearchCV(LinearSVC(), {'C': [.1, 1., 10.]}, cv=5))])

param_grid = [
    {'classifier': [DummyClassifier(strategy='most_frequent')]},
    {'connectivity__kind': kinds}

We use random splits of the subjects into training/testing sets. StratifiedShuffleSplit allows preserving the proportion of children in the test set.

In [ ]:
from sklearn.model_selection import GridSearchCV, StratifiedShuffleSplit
from sklearn.preprocessing import LabelEncoder

groups = [pheno['Child_Adult'] for pheno in development_dataset.phenotypic]
classes = LabelEncoder().fit_transform(groups)

cv = StratifiedShuffleSplit(n_splits=30, random_state=0, test_size=10)
gs = GridSearchCV(pipe, param_grid, scoring='accuracy', cv=cv, verbose=1,
                  refit=False, n_jobs=8), classes)
mean_scores = gs.cv_results_['mean_test_score']
scores_std = gs.cv_results_['std_test_score']

display the results

In [ ]:
from matplotlib import pyplot as plt

plt.figure(figsize=(6, 4))
positions = [.1, .2, .3, .4]
plt.barh(positions, mean_scores, align='center', height=.05, xerr=scores_std)
yticks = ['dummy'] + list(gs.cv_results_['param_connectivity__kind'].data[1:])
yticks = [t.replace(' ', '\n') for t in yticks]
plt.yticks(positions, yticks)
plt.xlabel('Classification accuracy')

This is a small example to showcase nilearn features. In practice such comparisons need to be performed on much larger cohorts and several datasets. Dadi et al 2019 <>_ Showed that across many cohorts and clinical questions, the tangent kind should be preferred.

In [ ]: