Time-resolved scRNA-seq

RNA velocity

Although the seminal RNA velocity work is exciting, it has the following limitations:

  1. Because the intron reads are generated through mis-priming on polyA- or polyT- enriched intronic regions of nascent pre-RNA, it can be difficult to apply conventional RNA velocity to most transcription factors, which are typically expressed at low levels, and genes with no polyA/T-enriched intron regions;
  2. Although many biological systems, including hematopoiesis, involve rapid but coordinated changes of RNA transcription rates for a considerable number of genes (Barile et al., 2021), existing estimation methods of splicing RNA velocity (Bergen et al., 2020; La Manno et al., 2018) all assume constant transcription rates ($\alpha_{on}$ for the induction phase, $\alpha_{off}$ for the repression phase) and often lead to nonsensical backward RNA velocity flow.
  3. The linear regression methods used by conventional RNA velocity ignores the distribution of unspliced and spliced RNA, which can be used to improve the estimation of kinetic parameters;
  4. For systems far away from the pseudo-steady state, using cells with extreme RNA expression levels for linear regression may lead to inaccurate velocity calculations for most cells;
  5. The time scale for conventional RNA velocity ($v = u - \tilde{\gamma} s$) is scaled by $\beta$ (since $\tilde{\gamma}$ is the ratio of $\beta$ and $\gamma$), which makes the estimated velocity a relative quantity;
  6. Conventional RNA velocity only estimates velocity for observed cells. Thus, it is a discrete, sparse, and local measure of cell dynamics and often merely used as a descriptive instead of a predictive tool.

A great deal of effort has been devoted to improving conventional RNA velocity estimation (La Manno et al., 2018) in regard to challenges 3) and 4) and extend the concept to “protein velocity” (Gorin et al., 2020), but 1), 2), and 5) are fundamental limitations that cannot be resolved at the computational level without additional experimental information. In dynamo, we introduce our methods for analyzing conventional scRNA-seq data, addressing some of the issues with existing RNA velocity methods. More importantly, we also focus on computational methods for computing RNA velocity for metabolic labeling data, which reconciles the splicing- and labeling-based kinetics and overcomes other drawbacks of conventional RNA velocity methods. Finally, to address 6), we go beyond RNA velocity samples of single cells to map the continuous vector field functions in transcriptomic space and perform sophisticated differential geometry analyses to gain various functional vector field predictions and biological insights.

How does metabolic labeling work

How can we quantify nascent RNA via metabolic labeling? Overall there are two different methods, the biotin purification or chemical conversion based approach. Both approaches are quiet similar in that we first need to applying different labeling strategies to label the cells. For biotin purification, we need to use thiol-specific biotinylation to tag labeled mRNA. Then the streptavidin beads can be used to pull down and separate the pre-exisiting RNA and newly transcribed RNA. Then we will follow by preparing two separate libraries, old and new RNAs, for sequencing. There are a few very well known issue regarding this method:

  1. it often introduces 20-30% cross-contanimation between old and new RNAs,
  2. it also leads to some normalization issues between different libraries.

On the other hand, the chemical conversion based approaches avoid the labrious and error-prone procedure of separating old/old RNA and preparing two different libraries and emerged as the favored strategy recently. The key idea of chemical conversion based methods are that by some chemical reaction we can artificially introduce T to C mutation which can then be used to distinuigh labelled and thus new RNA from old RNA. There are about three different chemistry developed: IAA alkylation or hydrogen bond reconfiguration via TimeLapse-seq or TUC-seq chemistry.

In fact, metabolic labeling has been widely adapted for the past a few decades. We can use various nucleotides to label RNA, for example, BrU, Eu and Biotin-NTP. For 4sU based labeling, there are about three different strategies, namely, SLAM-seq, TUC-seq, and Time-lapse-seq.

Metabolic labeling based scRNA-seq

Recently a few groups adapted the bulk method to either the plate-based scRNA-seq with SMART-seq2 method, for example, scSLAM-seq or NASC-seq. scEU-seq is based on CEL-Seq2 and is also plate-based but uses UMI in contrast to scSLAM-seq or NASC-seq. The scEU-seq method is based on EU and required purification and it thus may involve cross-contanimation or normalization issues.

Cao, et al recently developed sci-fate which integrates 4sU labeling with combinatorial indexing based scRNA-seq, thus it can potentially enable measuring hundread thousands of single cells.

For the first time, Wu lab from Upenn developed the RNA metabolic labeling argumented scRNA-seq based on drop-seq, scNT-seq.

Comparison between different labeling based scRNA-seq methods

In Qiu, Hu, et. al, we performed a detailed comparison (Supplementary table 7) between scNT-seq with other available methods. Especially for the improved second-strand synthesis based strategy, we are able to obtain substantially high number of genes and UMIs per cell with relatively few number of reads. Thus scNT-seq is arguably one of the best metabolic labeling based scRNA-seq strategies.

In our study, we show that dynamo can be used to leverage scNT-seq datasets for time-resolved RNA-velocity analysis. Those results demonstrate the power of dynamo and scNT-seq in revealing the fine-grained transcriptomic dynamics that cannot be captured by coventional methods based solely on splicing data.

Labeling strategies

We can be very creative and smart in designing the metabolic labeling experiments. We suggest three general labeling strategies, namely one-shot, kinetics/pulse, and degradation/ chase experiments, which cover most labeling methods in published tscRNA-seq experiments, aimed at estimating different RNA kinetic parameters (See figure below). It is possible to extend or combine these three general labeling strategies to more complicated labeling schemes, e.g., the fourth type in (See figure below), which consists of a time-series of multiple kinetics experiments, or a mixture experiment as in scEU-seq study (Battich et al., 2020), in which a variable initial kinetics experiment and later accompanying degradation experiment are conducted but kept in a fixed time period.

For example, you can design an experiment where you can take different days and perform a kinetics experiment at each day. This can help you obtain transcription rate, splicing and degradation rate over time. But this is often time-consuming, so we may just choose a typical day for a single kinetics experiment. In addition, we may also perform a degradation experiment where we label the cells with 4sU for an extended time period to saturate the 4sU labeling in cells. Then we can wash out the 4sU and replaced with excess U, followed by chasing at different time points. This can help us to estimate the splicing and degradation rates (and half life) of RNA. We can also just design a one-shot labeling experiment to label cells at different time points. Since splicing and degradation rate of mRNA can be reasonably assumed to be constants, when combining one-shot experiments with degradation experiments, we are able to get even more accurate estimates of the transcription rates at each time point. We also want to note that we can combine different labeling strategies, for exmple, combining pulse chase in a single experiment or integrating metabolic labeling with drug treatment or genetic perturbations.

Dynamo's comprehensive model framework for analyzing lableing datasets

To develop a unified framework for extracting RNA dynamics information from cscRNA-seq and tscRNA-seq datasets, we constructed an inclusive model (Fig. 2A from Qiu, et. al) that considers on-off switching of promoter state, RNA metabolic labeling (when using tscRNA-seq data), RNA splicing and degradation, and even translation and protein degradation when transcriptomic– proteomic coassay data are available. To account for different data types and experiments, we further implement three reduced models: Model 1 considers RNA transcription, splicing and degradation, but not RNA metabolic labeling, and is tailored for cscRNA-seq, whereas both Models 2 and 3 are tailored for tscRNA-seq with metabolic labeling, with the difference that only Model 3 considers RNA splicing (Fig. SI2A from Qiu, et. al).

When only cscRNA-seq data are available, or when one needs to use splicing data from tscRNA-seq experiments (which also capture intrinsic splicing kinetics), dynamo can be used to estimate the relative degradation rate constant ($\hat{\gamma} = \gamma / \beta$) and relative spliced RNA velocity (Fig. 2B, top from Qiu, et. al). The estimation methods built upon Model 1 from Fig. SI2A from Qiu, et. al include both the original method (La Manno et al., 2018) and the generalized method of moments (GMM). The GMM, in turn, consists of the stochastic splicing method, which relies on a master equation formulation of RNA kinetics (see MATERIALS AND METHODS) and is equivalent to the stochastic method developed recently (Bergen et al., 2020), and a new approach, the negative binomial (NB) method, which additionally models the gene expression at steady state as a negative binomial (NB) distribution, in the same vein as in previous studies (Grün et al., 2014).

By comparison, from a tscRNA-seq experiment, one can estimate the absolute kinetic parameters ($\alpha, \beta, \gamma$) and calculate absolute unspliced, spliced, new, or total RNA velocity, depending on the exact labeling strategy and underlying model used (Fig. 2B, bottom from Qiu, et. al). We suggest three general labeling strategies, namely one-shot, kinetics/pulse, and degradation/chase experiments, which cover most labeling methods in published tscRNA-seq experiments, aimed at estimating different RNA kinetic parameters (Fig. 2C from Qiu, et. al). It is possible to extend or combine these three general labeling strategies to more complicated labeling schemes, e.g., the fourth type in (Fig. 2C from Qiu, et. al), which consists of a time-series of multiple kinetics experiments, or a mixture experiment as in scEU-seq study (Battich et al., 2020), in which a variable initial kinetics experiment and later accompanying degradation experiment are conducted but kept in a fixed time period.

Estimating the parameters and RNA velocities with labeling data involves some technical subtleties, which we took into account when developing the corresponding algorithms tailored for each labeling strategy. Overall, we estimate absolute splicing and degradation constants ($\beta, \gamma$) by first estimating the degradation rates from labeling data and then the scaled degradation rate constant ($\hat{\gamma} = \gamma / \beta$) from splicing data, followed by obtaining a confident splicing rate constant (See MATERIALS AND METHODS and SUPPLEMENTARY METHODS from Qiu, et. al for details). For kinetics/chase experiments, we designed a two-step method (Fig. 2D-I from Qiu, et. al), in which we first compute the slope, $k$, for the labeled ($l$) and total ($r$) RNA for different labeling time period $t$, based on the linear regression $l = k\gamma$ where the $k$ at a particular time follows $k(t) = (1-e^{-\gamma t})$ (see MATERIALS AND METHODS, Fig. 2E left, 2H from Qiu, et. al. This leads to a second linear relationship between the labeling time $t$ and $-ln(1-k)$, whose slope is $\gamma$ (Fig. 2E right, 2I from Qiu, et. al).

The estimation framework described above is generally applicable to various data types, labeling strategies and biological systems, which also reconciles the programmable labeling kinetics and intrinsic splicing kinetics to absolute achieve accurate quantification of RNA synthesis and turnover rates, enabling functional biological discoveries.