scNT-seq

This tutorial uses data from Qiu, Hu, et al (2020). Special thanks also go to Hao for the tutorial improvement.

In this study, by integrating metabolic RNA labeling, droplet microfluidics (Drop-seq) and chemical conversion of 4sU to cytosine analogs with TimeLapse chemistry, we developed scNT-seq to jointly profile newly-transcribed (marked by T-to-C substitutions) and pre-existing mRNAs from the same cell at scale. scNT-seq was used to distinguish new from old mRNAs using neuronal activity-induced genes in primary mouse cortical neurons. By linking cis-regulatory sequences to new transcripts, scNT-seq reveals time-resolved TF activity at single-cell levels.

We analyzed the scNT-seq with Dynamo to obtain time-resolved metabolic labeling based RNA velocity (computed from new and total RNA levels) which outperforms splicing RNA velocity (due to sparsity of intronic reads for many activity-induced genes) on inferring cell state trajectory during initial neuronal activation (0-30 min). Since dynamo explicitly models the labeling process and considers real time, the velocity learned is intrinsically physically meaningful (with the unit of molecules / hour).

Using pulse-chase labeling and scNT-seq, we further determined rates of RNA biogenesis and decay to uncover RNA regulatory strategies during stepwise conversion between pluripotent and rare totipotent two-cell embryo (2C)-like stem cell states.

Finally, we demonstrated that the library complexity can be substantially improved with 2nd strand synthesis reaction. This allows scNT-seq to detect ~6k genes and ~20k transcripts at ~50k reads per cell. We hope that this approach will be useful to many of you out there.

In this tutorial, we will use the newest dynamo to reanalyze the data and highlight a few analysis that is enabled with our vector field reconstruction algorithm. Please note that we recommend the reader to first go through our zerafish tutorial before diving into this tutorial.

In [ ]:
# get the latest pypi version
# to get the latest version on github and other installations approaches, see:
# https://dynamo-release.readthedocs.io/en/latest/ten_minutes_to_dynamo.html#how-to-install
!pip install dynamo-release --upgrade --quiet
In [1]:
import warnings
warnings.filterwarnings('ignore')

import dynamo as dyn
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 

dyn.get_all_dependencies_version()
package dynamo-release umap-learn anndata cvxopt hdbscan loompy matplotlib numba numpy pandas pynndescent python-igraph scikit-learn scipy seaborn setuptools statsmodels tqdm trimap numdifftools colorcet
version 0.95.2 0.4.6 0.7.4 1.2.3 0.8.26 3.0.6 3.3.0 0.51.0 1.19.1 1.1.1 0.4.8 0.8.2 0.23.2 1.5.2 0.9.0 49.6.0 0.11.1 4.48.2 1.0.12 0.9.39 2.0.2

Load splicing and labeling data

We prefiltered good quality cells for the labeling data. The same set of high quality cells will also be used to subset the splicing data. We especially focus on cells from excitatory neurons as they have the strongest response to the KCL treatment.

In [3]:
neuron_labeling = dyn.read_h5ad('/Users/xqiu/Dropbox/Projects/dynamo-tutorials/data/scNT_seq.h5ad') # Neu\
neuron_splicing = dyn.read_h5ad('/Users/xqiu/Dropbox (Personal)/dynamo/dont_remove/neuron_splicing_4_11.h5ad') # Neu\
In [4]:
print(neuron_labeling)
print(neuron_splicing)
AnnData object with n_obs × n_vars = 3060 × 37007
    obs: 'nGene', 'nUMI', 'time', 'cluster'
    var: 'gene_short_name'
    obsm: 'X_umap_ori'
    layers: 'new', 'total'
AnnData object with n_obs × n_vars = 13476 × 44021
    obs: 'cellname'
    var: 'gene_short_name'
    layers: 'spliced', 'unspliced'

Subset datasets with neural activity genes

Note that this labeling experiment is a mixture labeling experiment (See discussion of labeling strategies here). In this experiment, the cells are all labeled for 2 hours (so we set label_time as 2) but Kcl was added at different time point prior to cell harvest. Therefore the neural cells will start from the steady state, followed by an activation stage characterized by the KCL-induced depolarization (except the 120 minutes time point).

Since the neural activation dynamics is intricate and happens on the scale of 0-30 minutes, we need to focus our analysis on the subset of neural activity genes (identified in Fig SI2d). Note that this operation is specific for this system and generally not required for systems with a longer time scale.

In [5]:
neuron_labeling.obs['label_time'] = 2 # this is the labeling time 
tkey = 'label_time'

peng_gene_list = pd.read_csv('/Users/xqiu/Dropbox (Personal)/dynamo/dont_remove/0408_grp_info.txt', sep='\t')

neuron_labeling = neuron_labeling[:, peng_gene_list.index]

Let us also subset the splicing data with the same set of cells and genes from the labeling data.

In [6]:
neuron_splicing = neuron_splicing[neuron_labeling.obs_names, neuron_labeling.var_names]
neuron_labeling.obs['time'] = neuron_labeling.obs.time.astype('category')
neuron_splicing.obs = neuron_labeling.obs.copy()
Trying to set attribute `.obs` of view, copying.

Run a typical dynamo analysis

This analysis consists the following five steps:

1. the preprocessing procedure; 
2. kinetic estimation and velocity calculation;
3. dimension reduction;
4. high dimension velocity projection;
5. vector field reconstruction 

Note that dynamo is smart enough to automatically detect your data type (splicing or labeling data) and choose the proper estimation method to analyze your data. The kinetic estimation and velocity calculation framework in dynamo is extremely sophisticated and comprehensive. See here for an overall description. Full details will be described in our final publication.

Note that setting calc_rnd_vel=True in dyn.tl.cell_velocities will enable us to calculate randomized velocity which provides support for the specificity of the observed time-resolved RNA velocity flow.

In [7]:
def dynamo_workflow(adata):
    adata = dyn.pp.recipe_monocle(adata)

    dyn.tl.dynamics(adata)

    dyn.tl.reduceDimension(adata)

    dyn.tl.cell_velocities(adata, calc_rnd_vel=True)
    
    dyn.vf.VectorField(adata, basis='umap')
    
In [8]:
%%capture 

dynamo_workflow(neuron_splicing)
dynamo_workflow(neuron_labeling)

Time resolved dynamo velocity outperforms splicing velocity

In the following, I will visualize the splicing (left) and labeling (right) velocity streamline plots side by side. Cells are color coded by time points. The streamlines indicate the integration paths that connect local projections from the observed state to the extrapolated future state. It is apparent that labeling velocity reveals a much more coherent trasition from cells at time 0 to 15, 30, 60 and 120 minutes than that of the splicing velocity. Interestingly, we can also see that the cells start with quick velocity until about 30 minutes and then slow down and gradually move backward to the original cell state at 120 minute, indicating a fast polorization phase followed a slow depolorization phase. Are those observed velocity results significant? Let us take a look at the randomized velocity results.

In [9]:
fig1, f1_axes = plt.subplots(ncols=2, nrows=1, constrained_layout=True, figsize=(12, 4))
f1_axes[0] = dyn.pl.streamline_plot(neuron_splicing, color='time', color_key_cmap = 'viridis', basis='umap', ax=f1_axes[0], show_legend='right', save_show_or_return='return')
f1_axes[0].set_title('splicing')
f1_axes[1] = dyn.pl.streamline_plot(neuron_labeling, color='time', color_key_cmap = 'viridis', basis='umap', ax=f1_axes[1], show_legend='right', save_show_or_return='return')
f1_axes[1].set_title('labeling')
plt.show()