All of the tutorial notebooks as well as information about the dependent package (nma-ibl
) can be found at nma-ibl GitHub repository.
Please execute the cells below to install the necessary dependencies and prepare the environment.
# install IBL pipeline package to access and navigate the pipeline
!pip install --quiet nma-ibl
# Download data needed for plot recreation
!wget https://github.com/vathes/nma-ibl/raw/master/uuids_trained1.npy
One of the immense strenghts of DataJoint pipelines lies in the tight data integrity and full tracking of all processing and computations as captured by the data pipeline. Here we demonstrate how a study figure based on the IBL pipeline can be replicated using data freshly fetched from the data pipeline.
In the study A standardized and reproducible method to measure decision-making in mice, the authors have shown that the animal behavior in a visual decision-making task is similar across 9 labs in 7 institutions across 3 countries, when using a standardized, reproduciable experimental hardware, software, and procedures.
This notebook replicates Figure 2 from that work, which shows a similar learning rate of animals across different labs. This notebook was generated based on this repository, allowing us to perform figure replications on a local machine!
Let's connect to the database again. Use the public credentials ibl-public
:
import datajoint as dj
dj.config['database.host'] = 'datajoint-public.internationalbrainlab.org'
dj.config['database.user'] = 'ibl-public'
dj.config['database.password'] = 'ibl-public'
dj.conn() # explicitly verify that the connection to database can be established
To start with, we import some modules that will be used in the rest of the notebook:
import pandas as pd
import numpy as np
import os
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
We also import modules that allow us to interact with the schemas and tables in the IBL DataJoint pipeline.
from nma_ibl import reference, subject, behavior_analyses
Here are some overview of what each schema contains:
reference
schema contains lab, user and project informationsubject
schema contains information about subjectsbehavior_analyses
schema contains results of standardized analyses, including the training statusHere are some modules that defines pre-defined figure settings
from nma_ibl.paper_behavior_functions import (query_subjects, seaborn_style,
group_colors, institution_map, seaborn_style)
seaborn_style()
pal = group_colors()
institution_map, col_names = institution_map()
col_names = col_names[:-1]
We pre-selected the "trained" animals based on the following criteria and save their uuids in the file uuids_trained1.npy
uuids = np.load('uuids_trained1.npy', allow_pickle=True)
We could then fetch the animals in the data pipeline corresponding to their uuids:
subjects = subject.Subject & [{'subject_uuid': uuid} for uuid in uuids]
These are the 101 subjects reported in this study:
subjects
To include all information that are needed for subjects, we pre-queried subjects with the function query_subject
.
use_subjects = query_subjects()
use_subjects
One important field used in Figure 2 in this table is date_trained
, which is the first date that the animal reached the trained criteria.
The summary statistics of the behavior are processed and saved in behavior_analyses.BehavioralSummaryByDate
:
behavior_analyses.BehavioralSummaryByDate()
Join the BehavioralSummaryByDate table with subject query to gather info together:
b = behavior_analyses.BehavioralSummaryByDate * use_subjects
b
Then we could fetch the contents in the table and return the data as a data frame:
behav = b.fetch(order_by='institution_short, subject_nickname, training_day',
format='frame').reset_index()
behav['institution_code'] = behav.institution_short.map(institution_map)
behav
Now compute how many mice are there for each institution and add the column to the dataframe
N = behav.groupby(['institution_code'])['subject_nickname'].nunique().to_dict()
behav['n_mice'] = behav.institution_code.map(N)
behav['institution_name'] = behav.institution_code + \
': ' + behav.n_mice.apply(str) + ' mice'
behav
In Fig 2a, we plot the performance on easy trials performance_easy
as a function of training_day
for each animal in each institution.
For plotting purpose, we create another column only after the mouse is trained, and performance before the training date is marked as NaN:
behav2 = pd.DataFrame([])
for index, group in behav.groupby(['institution_code', 'subject_nickname']):
group['performance_easy_trained'] = group.performance_easy
group.loc[group['session_date'] < pd.to_datetime(group['date_trained']),
'performance_easy_trained'] = np.nan
# add this
behav2 = behav2.append(group)
behav = behav2
Finally we generate the figure. The following cell may take some time to run.
behav['performance_easy'] = behav.performance_easy * 100
behav['performance_easy_trained'] = behav.performance_easy_trained * 100
# plot one curve for each animal, one panel per lab
fig = sns.FacetGrid(behav,
col="institution_code", col_wrap=4, col_order=col_names,
sharex=True, sharey=True, aspect=1, hue="subject_uuid", xlim=[-1, 41.5])
fig.map(sns.lineplot, "training_day",
"performance_easy", color='gray', alpha=0.3)
fig.map(sns.lineplot, "training_day",
"performance_easy_trained", color='darkblue', alpha=0.3)
fig.set_titles("{col_name}")
for axidx, ax in enumerate(fig.axes.flat):
ax.set_title(behav.institution_name.unique()[
axidx], color=pal[axidx], fontweight='bold')
# overlay the example mouse
sns.lineplot(ax=fig.axes[0], x='training_day', y='performance_easy', color='black',
data=behav[behav['subject_nickname'].str.contains('KS014')], legend=False)
fig.set_axis_labels('Training day', 'Performance (%) on easy trials')
fig.despine(trim=True)
Performance on easy contrast trials (50% and 100% contrast) across mice and laboratories. Each panel represents a different lab, and each curve represents a mouse (gray). The transition from gray to blue indicates when performance criteria for "trained" are met. Black, performance for example mouse KS014
# Plot all labs
fig, ax1 = plt.subplots(1, 1, figsize=(5, 4))
sns.lineplot(x='training_day', y='performance_easy', hue='institution_code', palette=pal,
ax=ax1, legend=False, data=behav, ci=None)
ax1.set_title('All labs', color='k', fontweight='bold')
ax1.set(xlabel='Training day',
ylabel='Performance (%) on easy trials', xlim=[-1, 41.5])
seaborn_style()
plt.tight_layout(pad=2)
behav_summary_std = behav.groupby(['training_day'])[
'performance_easy'].std().reset_index()
behav_summary = behav.groupby(['training_day'])[
'performance_easy'].mean().reset_index()
print('number of days to reach 80% accuracy on easy trials: ')
print(behav_summary.loc[behav_summary.performance_easy >
80, 'training_day'].min())
And that's it! You have now completed the introductory tutorials for navigating and accessing IBL data pipepline, and hopefully this gets sets you on a good track to take a deeper dive into this rich and exciting datasets.
Be sire tp visit DataJoint.io for further learning resources for DataJoint. Also be sure to signup to our DataJoint Slack group (link on the website) to join the vibrant DataJoint user community!