Tutorial: Synaptic connectivity and dynamics in mouse and human cortex

This tutorial will show you how to access the Synaptic Physiology Dataset from the Allen Institute for Brain Science. This dataset describes properties of synapses that were recorded using patch-clamp electrophysiology in mouse and human neocortical tissue. The main purpose is to understand the relationship between cell types and synapse properties in local microcircuits:

What is the probability of finding a synaptic connection between two cells, and how is that affected by cell type?
How is connectivity affected by the distance between two cells, or by other experimental parameters?
How is cell type related to other synapse properties such as strength, latency, kinetics (PSP shape), and dynamics (stochasticity and short-term plasticity)?
How can we best model synaptic connectivity and dynamics, and what can these models tell us about the function of the microcircuit?

Multi-patch electrophysiology

The experiments in the Synaptic Physiology Dataset are performed using patch clamp electrophysiology in brain slices with up to 8 electrodes simultaneously. The resulting data are complex; understanding the limitations of the experimental methods is necessary in order to avoid analysis mistakes. For complete details on our methods, see our website and recent bioRxiv publication. When in doubt, you can ask questions on our forum with the tag synaptic-physiology.

In patch-clamp electrophysiology, we use glass electrodes to gain direct electrical access to the interior of a neuron. This allows us to precisely control the spiking of individual cells and to record synaptic currents that are too small to be observed by any other method (for now).

multipatch

In a single experiment, we patch up to 8 neurons simultaneously. Each neuron is stimulated to fire patterns of action potentials while we record the membrane potential of the other neurons. If two neurons are connected by a synapse, we should be able to see synaptic currents within a few ms of each presynaptic spike (although many synapses require averaging to reduce noise before the response is visible).

Synapse characterization

Each synapse in our dataset is characterized by their responses to many presynaptic spikes -- each spike elicits a "postsynaptic potential" (PSP), and by looking at many responses we can determine properties such as the average amplitude, the latency from spike to response, the PSP shape, and the trial-to-trial variance.

psp_measurements

We also rapidly stimulate each cell to determine how the synapse reacts dynamically -- many synapses will either facilitate or depress in this situation.

Dataset and database

The Synaptic Physiology Dataset contains the results of thousands of multipatch experiments. For each experiment, we store three major types of information:

Experiment metadata: species, brain region, and experimental conditions
Cell properties: location (including cortical layer), morphology, transgenic reporters, and intrinsic electrophysiological features
Synapse properties: strength, latency, kinetics (PSP shape), and dynamics (variance and short-term plasticity)

These data are stored in a relational database (an sqlite file) and spread out over many tables. It is possible to access these tables using SQL or sqlalchemy; however, for this tutorial we will use helper methods that handle most of the queries for us.

database_schema

The diagram above shows a selection of more commonly-used resources in the relational database. The complete set of tables and columns is decribed in the schema documentation.

Querying synapse data¶

As a simple starting point, let's get a dataframe that describes the properties of all human synapses in the dataset. We will use the function db.pair_query, which returns one row for each cell pair in the database. Cell pairs are ordered, which means that the connections from cell A→B and cell B→A will have two different rows.

In [1]:

from aisynphys.database import SynphysDatabase
db = SynphysDatabase.load_current('small')

query = db.pair_query(
    experiment_type='standard_multipatch',   # filter: just multipatch experiments
    species='human',                         # filter: only human data
    synapse=True,                            # filter: only cell pairs connected by synapse
    synapse_type='ex',                       # filter: only excitatory synapses
    preload=['synapse', 'cell'],             # include extra tables that contain synapse AND cell properties
)
synapses = query.dataframe()                 # query all records and convert to pandas dataframe

print(f"Loaded {len(synapses)} synapses")
synapses.head()

Downloading https://aisynphys.s3-us-west-1.amazonaws.com/synphys_r2.0-pre4_small.sqlite =>
  /home/luke/ai_synphys_cache/database/synphys_r2.0-pre4_small.sqlite
  [####################]  100.00% (160.52 MB / 160.5 MB)  10.298 MB/s  0:00:00 remaining
  done.
Loaded 376 synapses

Out[1]:

	pair.id	pair.experiment_id	pair.pre_cell_id	pair.post_cell_id	pair.has_synapse	pair.has_polysynapse	pair.has_electrical	pair.crosstalk_artifact	pair.n_ex_test_spikes	pair.n_in_test_spikes	...	synapse_model.ml_variability_stp_induced_state_50hz	synapse_model.ml_variability_change_initial_50hz	synapse_model.ml_variability_change_induction_50hz	synapse_model.ml_paired_event_correlation_1_2_r	synapse_model.ml_paired_event_correlation_1_2_p	synapse_model.ml_paired_event_correlation_2_4_r	synapse_model.ml_paired_event_correlation_2_4_p	synapse_model.ml_paired_event_correlation_4_8_r	synapse_model.ml_paired_event_correlation_4_8_p	synapse_model.meta
0	56768	1650	9834	9829	True	False	False	NaN	772	59	...	-0.537358	0.238305	0.522540	0.004416	0.921544	-0.048564	0.278435	0.040824	0.362318	{'pair_ext_id': '1525829536.548 7 2'}
1	56794	1651	9841	9840	True	False	False	NaN	154	84	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	None
2	56787	1651	9841	9836	True	False	False	NaN	84	84	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	None
3	56733	1651	9837	9839	True	False	False	NaN	516	84	...	0.551889	0.502588	1.633543	0.022635	0.613610	-0.086233	0.053981	-0.050557	0.259160	{'pair_ext_id': '1525841351.681 2 4'}
4	56718	1651	9836	9837	True	False	False	NaN	680	84	...	1.532516	0.034194	1.450314	0.046353	0.300931	-0.061078	0.172695	0.056326	0.208631	{'pair_ext_id': '1525841351.681 1 2'}

5 rows × 363 columns

In [2]:

synapses['pre_cortical_cell_location.cortical_layer'].unique()

Out[2]:

array(['3', None, '2', '4', '5', '6', '1'], dtype=object)

This dataframe contains many columns that describe the properties of each synapse. These columns are described in the documentation, but a few common ones are:


synapse.psp_amplitude	Median amplitude of resting-state PSPs
synapse.latency	Time between presynaptic spike and PSP onset
synapse.psp_rise_time	Time from onset to peak of averaged PSP
synapse.psp_decay_tau	Time constant of PSP decay phase
dynamics.stp_induction_50hz	A metric of synaptic facilitation / depression induced by 50 Hz spike trains
dynamics.variability_resting_state	Adjusted coefficient of variation of PSP amplitudes

In [3]:

import seaborn
import matplotlib.pyplot as plt

seaborn.histplot(synapses['synapse.psp_amplitude'])

Out[3]:

<AxesSubplot:xlabel='synapse.psp_amplitude', ylabel='Count'>

Comparing synapse properties across cell types

To make the most of this dataset, we will usually want to group synapses based on the cells they connect. For example, how do human synapses differ when their cells are in layer 2 versus layer 3? The dataset offers several sources of information for grouping cells together:

cell_selection

To begin grouping the data, we need to know where to find the relevant cell properties in our dataframe. Again, we could check the documentation, but the most commonly used cell features are:


cell.cell_class	'ex' (excitatory) or 'in' (inhibitory)
cell.cre_type	Cre reporter observed in cell (mouse only for now)
cortical_cell_location.cortical_layer	Cortical layer of cell soma
morphology.dendrite_type	'spiny' (excitatory cells), 'aspiny' (inhibitory cells), or 'sparsely spiny' (ambiguous)

Note that, in the dataframe, each of the column names above is prepended with either pre_ or post_, depending on which cell is described.

In [4]:

fig,ax = plt.subplots()
# ax.set_yscale('log')
seaborn.swarmplot(
    data=synapses, 
    x='pre_cortical_cell_location.cortical_layer', 
    y='dynamics.stp_induction_50hz', 
    order=['1', '2', '3', '4', '5', '6'],
    ax=ax,
)

Out[4]:

<AxesSubplot:xlabel='pre_cortical_cell_location.cortical_layer', ylabel='dynamics.stp_induction_50hz'>

Above we can see that most of the synapses in our human dataset are from layers 2 and 3, and also that layer 2 cells tend to express more synaptic facilitation compared to layer 3.

Defining more complex cell categories¶

The approach above can take us a long way, but eventually we will want to categorize cells based on more complex criteria, which can get a little messy. To make this easier, we provide tools that take care of the most common tasks in cell categorization. In the example below, we define a set of mouse cell classes based on a combination of transgenic (CRE) reporters, cortical layer, and dendrite morphology:

In [5]:

from aisynphys.cell_class import CellClass, classify_pair_dataframe

cell_categories = {
        'L2/3 Excit.': CellClass(cell_class='ex',  cortical_layer='2/3'),
         'L2/3 Pvalb': CellClass(cre_type='pvalb', cortical_layer='2/3'),
           'L2/3 Sst': CellClass(cre_type='sst',   cortical_layer='2/3'),
           'L2/3 Vip': CellClass(cre_type='vip',   cortical_layer='2/3'),
}

We have defined 4 cell categories above. Three categories are layer 2/3 inhibitory cells that express either Pvalb, Sst, or Vip. The fourth category include all excitatory cells in layer 2/3. These cells might be identified by any combination of excitatory transgenic markers, dendritic morphology, or excitatory synapse projection. Next, we will load all mouse synapses from the database and create 2 new dataframe columns that indicate the categories of pre- and postsynaptic cells:

In [6]:

# load a dataframe of all all mouse synapses
query = db.pair_query(
    experiment_type='standard_multipatch',   # filter: just multipatch experiments
    species='mouse',                         # filter: only human data
    synapse=True,                            # filter: only cell pairs connected by synapse
    preload=['synapse', 'cell'],             # include extra tables that contain synapse AND cell properties
)
synapses = query.dataframe()                 # query all records and convert to pandas dataframe
print(f"Loaded {len(synapses)} synapses")

Loaded 2555 synapses

In [7]:

# add columns giving categories for pre and postsynaptic cells
classify_pair_dataframe(cell_categories, synapses, col_names=('pre_category', 'post_category'))

# check:
synapses['pre_category'].unique()

Out[7]:

array([None, 'L2/3 Pvalb', 'L2/3 Sst', 'L2/3 Vip', 'L2/3 Excit.'],
      dtype=object)

.. and with our cells categorized, we can now display any metrics grouped by those categories. We have 4 categories of presynaptic cells, and 4 of postsynaptic cells, for a total of 16 pre/post combinations. If we ask pandas to make a pivot table based on these categories, we can easily visualize the matrix of results:

In [8]:

latency = synapses.pivot_table(
    values='synapse.latency',
    index='pre_category',
    columns='post_category',
    aggfunc='mean',
    fill_value=float('nan'),
)

latency

Out[8]:

post_category	L2/3 Excit.	L2/3 Pvalb	L2/3 Sst	L2/3 Vip
pre_category
L2/3 Excit.	0.001930	0.001350	0.001548	0.001717
L2/3 Pvalb	0.000936	0.001091	0.001009	NaN
L2/3 Sst	0.001119	0.001420	0.001399	0.001054
L2/3 Vip	0.001723	0.001438	0.001569	0.002233

Pandas generates the pivot table above by combining several steps:

Create a table with all possible pre_category values as rows, and all possible post_category values as columns.
For each element in this table, group together all rows from the dataframe that have this combination of pre_category and post_category.
Insert into each table element the mean of synapse.latency for all rows that were grouped to that element.

With seaborn, we can visualize this table as a heatmap:

In [9]:

import seaborn as sns

hm = sns.heatmap(
    latency * 1000, 
    cmap='viridis', vmin=1, vmax=2, square=True,
    cbar_kws={"ticks":[1, 1.5, 2], 'label': 'Latency (ms)'}
)
hm.set_xlabel("postsynaptic", fontsize=14)
hm.set_ylabel("presynaptic", fontsize=14);

The procedure above is common enough that a single function exists to run the database query, cell categorization, and plotting all in one. Let's try this with a less restrictive set of cell categories:

In [10]:

cell_categories = {
    'Excit.': CellClass(cell_class='ex'),
     'Pvalb': CellClass(cre_type='pvalb'),
       'Sst': CellClass(cre_type='sst'),
       'Vip': CellClass(cre_type='vip'),
}

In [11]:

import matplotlib.pyplot as plt
from aisynphys.ui.notebook import cell_class_matrix

fig,ax = plt.subplots(1, 3, figsize=(16, 3))

metrics = ['pulse_amp_90th_percentile', 'stp_induction_50hz', 'psc_rise_time']

for i, metric in enumerate(metrics):
    cell_class_matrix(
        pre_classes=cell_categories, 
        post_classes=cell_categories,
        metric=metric, 
        class_labels=None, ax=ax[i],
        db=db, pair_query_args={
            'experiment_type': 'standard multipatch',
            'synapse': True,
            'species': 'mouse',
        }
    );

/home/luke/miniconda3/envs/aisynphys/lib/python3.9/site-packages/pandas/io/sql.py:1424: SAWarning: TypeDecorator FloatType() will not produce a cache key because the ``cache_ok`` flag is not set to True.  Set this flag to True if this type object's state is safe to use in a cache key, or False to disable this warning.
  return self.connectable.execution_options().execute(*args, **kwargs)

Measuring connectivity¶

If we check two cells at random, what is the probability of finding a connection between them? If I want to build a network model of the cortex, which cells should I connect together? Measuring connectivity in a dataset like this can be surprisingly complex. Several factors can affect the connectivity we see:

Cell type-dependent differences in connectivity
Spatial effects of connectivity--intersomatic distance and relative positioning
False negatives:
- Connections severed during tissue preparation
- Synapses with low SNR

The first approach we will make to measure connectivity considers only the first point--we will look at the proportion of cell pairs that were connected, grouped by cell type.

In [10]:

mouse_pairs = db.pair_query(
    experiment_type='standard multipatch',  # filter: just multipatch experiments
    species='mouse',                        # filter: only mouse data
    synapse_probed=True,                    # filter: only cell pairs that were checked for connectivity
    preload=['cell']                        # include tables that describe cell properties
).dataframe()

print(f"Loaded {len(mouse_pairs)} cell pairs")

Loaded 23311 cell pairs

A single row in this dataframe contains information about a cell pair, which represents a possible connection from one cell to another.

Note that in prior queries we used synapse=True to select only connected cell pairs. Here we are interested in measuring connectivity, so we also need to know about pairs that were checked for connectivity, but where no connection was found. So instead, we used synapse_probed=True, which returns all cell pairs that were checked for connectivity, regardless of whether a connection was found.

Next, we categorize all cells like before, and then measure the proportion of connected pairs for each group:

In [13]:

# add two new columns with pre- and postsynaptic cell categories
classify_pair_dataframe(cell_categories, mouse_pairs, col_names=('pre_category', 'post_category'))

# make a pivot table count the number of pairs in each matrix element
probed = mouse_pairs.pivot_table(
    values='pair.id', index='pre_category', columns='post_category', 
    aggfunc=len, fill_value=1
)

# pivot again, but this time count the connected pairs
connected = mouse_pairs.pivot_table(
    values='pair.has_synapse', index='pre_category', columns='post_category', 
    aggfunc=sum, fill_value=0
)

# show the table of connected proportions
connected / probed

Out[13]:

post_category	Excit.	Pvalb	Sst	Vip
pre_category
Excit.	NaN	NaN	NaN	NaN
Pvalb	NaN	NaN	NaN	NaN
Sst	NaN	NaN	NaN	NaN
Vip	NaN	NaN	NaN	NaN

Connectivity across layer¶

Now let's define a larger list of cell classes that combine cortical layer, transgenic types, and morphology:

In [6]:

from aisynphys.cell_class import CellClass
import matplotlib.pyplot as plt

In [4]:

cell_category_criteria = {
    'l23pyr': {'dendrite_type': 'spiny',       'cortical_layer': '2/3'},
    'l23pv':  {'cre_type': 'pvalb',            'cortical_layer': '2/3'},
    'l23sst': {'cre_type': 'sst',              'cortical_layer': '2/3'},
    'l23vip': {'cre_type': 'vip',              'cortical_layer': '2/3'},
    'l4pyr':  {'cre_type': ('nr5a1', 'rorb'),  'cortical_layer': '4'},
    'l4pv':   {'cre_type': 'pvalb',            'cortical_layer': '4'},
    'l4sst':  {'cre_type': 'sst',              'cortical_layer': '4'},
    'l4vip':  {'cre_type': 'vip',              'cortical_layer': '4'},
    'l5et':   {'cre_type': ('sim1', 'fam84b'), 'cortical_layer': '5'},
    'l5it':   {'cre_type': 'tlx3',             'cortical_layer': '5'}, 
    'l5pv':   {'cre_type': 'pvalb',            'cortical_layer': '5'},
    'l5sst':  {'cre_type': 'sst',              'cortical_layer': '5'},
    'l5vip':  {'cre_type': 'vip',              'cortical_layer': '5'},
    'l6pyr':  {'cre_type': 'ntsr1',            'cortical_layer': ('6a','6b')},
    'l6pv':   {'cre_type': 'pvalb',            'cortical_layer': ('6a','6b')},
    'l6sst':  {'cre_type': 'sst',              'cortical_layer': ('6a','6b')},
    'l6vip':  {'cre_type': 'vip',              'cortical_layer': ('6a','6b')},
}

cell_categories = {k:CellClass(name=k, **v) for k,v in cell_category_criteria.items()}

In [7]:

from aisynphys.ui.notebook import generate_connectivity_matrix

fig, ax = plt.subplots(figsize=(14,14))

generate_connectivity_matrix(
    db=db,
    cell_classes=cell_categories,
    pair_query_args={
        'experiment_type': 'standard multipatch',
        'species': 'mouse',
        'synapse_probed': 'True',
    },
    ax=ax,
);

c:\users\stephanies\aisynphys\aisynphys\connectivity.py:322: RuntimeWarning: Mean of empty slice.
  mean_cp = np.exp(-x_probed**2 / (2 * sigma**2)).mean()
c:\users\stephanies\appdata\local\continuum\miniconda3\envs\py3\lib\site-packages\numpy\core\_methods.py:170: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
c:\users\stephanies\aisynphys\aisynphys\connectivity.py:328: RuntimeWarning: invalid value encountered in long_scalars
  est_pmax = (n_conn / n_test) / mean_cp

Modeling connectivity versus intersomatic distance¶

The analysis above gives us an estimate of the relative connectivities between cell types, but leaves out some important details. In particular, we know that the probability of finding a connection between any two cells is strongly related to the spatial relationship between the cells and their axo-dendritic morphology.

As an approximation, we think of cell morphology as being cylindrically symmetrical around the axis perpendicular to the cortical surface. This means that the likelihood of two cells being connected by a synapse is strongly related to the lateral distance between their cell bodies (the distance parallel to the cortical surface).

In [16]:

from aisynphys.ui.notebook import show_connectivity_profile
from aisynphys.connectivity import GaussianModel

ei_mask = (
    (mouse_pairs['pre_cell.cell_class'] == 'ex') & 
    (mouse_pairs['post_cell.cell_class'] == 'in') &
    (mouse_pairs['pair.lateral_distance'] < 500e-6)
)

ee_mask = (
    (mouse_pairs['pre_cell.cell_class'] == 'ex') & 
    (mouse_pairs['post_cell.cell_class'] == 'ex') &
    (mouse_pairs['pair.lateral_distance'] < 500e-6)
)

fig,ax = plt.subplots(1, 2, figsize=(15, 5))

for i, (label, mask) in enumerate({'E->I': ei_mask, 'E->E': ee_mask}.items()):
    x_probed = mouse_pairs[mask]['pair.lateral_distance'].to_numpy(dtype=float)
    conn = mouse_pairs[mask]['pair.has_synapse'].to_numpy(dtype=bool)

    fit = GaussianModel.fit(x_probed, conn)
    show_connectivity_profile(x_probed, conn, ax[i], fit, ymax=0.25)

    ax[i].set_xlim(0, 250e-6)
    ax[i].set_title(f"{label} Gaussian fit pmax={fit.pmax:0.2f}, σ={fit.size*1e6:0.2f}μm")

# print(f"E->I Gaussian fit pmax={fit.pmax:0.2f}, σ={fit.size*1e6:0.2f}μm")

Correcting for bias in connectivity measurements¶

We see above that there is a steep relationship between intersomatic distance and the probability of connectivity. A consequence is that, if two groups of connections are sampled with different intersomatic distances, then they may appear to have different rates of connectivity simply as an experimental artifact. This relationship is further influenced by cell type, in this case impacting E->I connections more so than E->E connections

Likewise, we have two other major sources of experimental artifacts:

Connections damaged during tissue dissection
Connections missed due to low signal to noise ratio

These artifacts are present within our dataset, and they are especially prominent when comparing results across studies that may use very different experimental protocols. It is possible, however, to model these effects and estimate the unbiased connectivity.

For more on this, please see our manuscript on bioRxiv, as well as this supplementary notebook.

Stochastic release model¶

Above, we looked briefly at features that describe the dynamic behavior of synapses including stochasticity and short-term plasticity. Those features are useful for comparing synapses, but make an incomplete description of synapse behavior. To capture a more complete description of synaptic dynamics, we use a model of stochastic vesicle release.

Conceptually, the model is simple: given a list of presynaptic spike times, predict the distribution of likely response amplitudes after each spike. The model has several parameters that combine standard quantal release and short-term plasticity features:

Parameter	Description
n_release_sites	Number of synaptic release zones
base_release_probability	Resting-state synaptic release probability (0.0-1.0)
mini_amplitude	Mean PSP amplitude evoked by a single vesicle release
mini_amplitude_cv	Coefficient of variation of PSP amplitude evoked from single vesicle releases
depression_amount	Amount of depression (0.0-1.0) to apply per spike. -1 enables vesicle depletion rather than Pr depression.
depression_tau	Time constant for recovery from depression or vesicle depletion
facilitation_amount	Release probability facilitation per spike (0.0-1.0)
facilitation_tau	Time constant for facilitated release probability to recover toward resting state
measurement_stdev	Extra variance in PSP amplitudes purely as a result of membrane noise / measurement error

Many synapses in the database include maximum-likelihood parameters that can be used to seed this model and make predictions about how the synapse would respond to any input.

In [17]:

from aisynphys.stochastic_release_model import StochasticReleaseModel, StochasticModelRunner

synapses = db.pair_query(
    experiment_type='standard multipatch',  # filter: just multipatch experiments
    species='human',                        # filter: only human data
    synapse=True,                           # filter: only cell pairs that are connected by a synapse
    preload=['cell', 'synapse']             # include tables that describe cell and synapse properties
).dataframe()

In [18]:

# select only synapses with max likelihood model parameters
mask = ~synapses['synapse_model.ml_n_release_sites'].isna()

# pick a random synapse
synapse = synapses[mask].iloc[123]

# make a dictionary of model parameters
model_params = {param:synapse[f'synapse_model.ml_{param}'] for param in StochasticReleaseModel.param_names}
model_params['n_release_sites'] = int(model_params['n_release_sites'])
model_params

Out[18]:

{'n_release_sites': 3,
 'base_release_probability': 0.762698585902344,
 'mini_amplitude': 0.000333520496496931,
 'mini_amplitude_cv': 0.251188643150958,
 'depression_amount': 0.125,
 'depression_tau': 0.107672015410588,
 'facilitation_amount': 0.375,
 'facilitation_tau': 2.56,
 'measurement_stdev': 8.62167096775792e-05}

In [19]:

import numpy as np

# Instantiate a model with the ML parameters chosen above
model = StochasticReleaseModel(model_params)

# Make up a list of presynaptic spike times to test
spike_times = np.array([0.1, 0.11, 0.12, 0.14, 0.2, 0.21, 0.22, 0.23, 0.5, 0.51, 0.52])

# Run the model many times
n_trials = 500
psp_amps = np.empty((n_trials, len(spike_times)))
for i in range(n_trials):
    model_result = model.run_model(spike_times, amplitudes='random')
    psp_amps[i] = model_result.result['amplitude']
    
# Plot the results
fig,ax = plt.subplots(figsize=(10, 4))
for i in range(n_trials):
    ax.scatter(
        spike_times * 1000 + np.random.normal(size=len(spike_times)), 
        psp_amps[i] * 1000, 
        color=(0, 0, 0, 0.1)
    )
    
ax.plot(spike_times * 1000, psp_amps.mean(axis=0) * 1000)
ax.set_xlabel('spike time (ms)')
ax.set_ylabel('PSP amplitude (mV)');