This tutorial demonstrates how to use CellRank's interface to Waddington OT (WOT) to work with time-series scRNA-seq data Schiebinger et al., Cell 2019. The interface is constructed via an external kernel. In CellRank, we call everyting that computes a cell-cell transition matrix a kernel, see our Kernels and estimators tutorial. An external kernel wraps around another method (like Waddington OT) so that you can use it from within CellRank, through the usual API. If you have your own method that you would like to integrate as an external kernel in CellRank, see our Creating a new kernel tutorial.

We'll demonstrate this using the original data from the WOT publication Schiebinger et al., Cell 2019, i.e. reprogramming of mouse embryonic fibroblasts (MEFs) towards induced pluripotent stem cells (iPSCs) across 39 time points spanning days 0-18 past reprogramming. We'll restrict our attention to the serum condition and subsample the data to speed up computations.

In [1]:

```
import sys
# the installation can take ~10mins
# ignore the messages: ERROR: Failed building wheel for <package>
# as long as it later print: Running setup.py install for <package> ... done
if "google.colab" in sys.modules:
!pip install -q git+https://github.com/theislab/[email protected]#egg=cellrank[external,krylov]
```

In [1]:

```
import matplotlib.pyplot as plt
import scvelo as scv
import scanpy as sc
import cellrank as cr
# import CellRank kernels and estimators
from cellrank.external.kernels import WOTKernel
from cellrank.tl.kernels import ConnectivityKernel
from cellrank.tl.estimators import GPCCA
# set verbosity levels
cr.settings.verbosity = 2
scv.settings.verbosity = 3
# figure settings
scv.settings.set_figure_params(
"scvelo", dpi_save=400, dpi=80, transparent=True, fontsize=20, color_map="viridis"
)
```

Import the data.

In [2]:

```
adata = cr.datasets.reprogramming_schiebinger()
adata
```

Out[2]:

This data contains cells cultured in both 2i as well as serum conditions - we'll focus on the serum condition here.

In [3]:

```
adata = adata[adata.obs["serum"] == "True"].copy()
```

Subsample to speed up the analysis - this tutorial is meant to run in a couple of minutes on a laptop. It's not a problem for CellRank to do any of the computations here on the full data, we'd just have to wait a bit longer.

In [4]:

```
sc.pp.subsample(adata, fraction=0.25, random_state=0)
adata
```

Out[4]:

Let's visualize this data, using the original force-directed layout as well as annotated cell sets. We simplifed the original cell-set annotations for our purposes here.

In [5]:

```
scv.pl.scatter(adata, basis="force_directed", c="day", legend_loc="none")
```

Each dot is a cell in the force-direction embedding, colored according to one of the 39 time-points of sequencing, from early (day 0, in black) to late (day 18, in yellow).

In [6]:

```
scv.pl.scatter(adata, basis="force_directed", c="cell_sets", legend_loc="right")
```

This datset has already been normalized by total counts and log2-transformed. Further, highly variable genes have already been annotated. We can thus direclty compute PCA and a KNN graph, which we'll need for later.

In [7]:

```
sc.pp.pca(adata)
```

In [8]:

```
sc.pp.neighbors(adata, random_state=0)
```

WOT incorporates cellular growth and death alongside differentiation. To initialze the computation of growth rates, we can use pre-compiled sets of proliferation- and apoptosis-related genes. Note that WOT uses the concept of unbalanced optimal transport which means our initial guess doesn't have to be exact - it's optimized in an iterative fashion internally. Please see the original WOT tutorial for more information on this.

Initialize the WOT kernel based on the `AnnData`

object

In [9]:

```
wk = WOTKernel(adata, time_key="day")
```

Estimate growth rates based on gene-sets for proliferation and apoptosis.

In [10]:

```
wk.compute_initial_growth_rates(organism="mouse", key_added="growth_rate_init")
scv.pl.scatter(
adata, c="growth_rate_init", legend_loc="right", basis="force_directed", s=10
)
```

During the computation of transport maps between adjacent time-points, WOT incorporates cellular growth and death. The output of the method is a list of transport maps, one for each pair of time points ($t_i, t_{i+1}$). These are saved in `wk.transport_maps`

. Rows in these transport maps do not need to sum to one because depending on whether a cell is likely to proliferate or die, it's outgoing probability mass will be larger or smaller than one, respectively.

CellRank, however, is based on Markov chains where we need one single transition matrix with rows that sum to one. For this reason, in addition to the saved `transport_maps`

, we accumulate the individual transport maps into one large transition matrix which we row-normalize.

Further, for `N`

time points, transition probabilities for cells in the Nth time point aren't defined since there is nowhere to transport their probability mass. However, to treat the system as a Markov chain, we need to assign transition probabilites for final-day cells as well. In this example, we'll use transcriptomic similarity for this purpose via the argument `last_time_point='connectivities'`

. That means we'll compute transition probabilities for earlier day cells to later day cells using optimal transport, however, once cells have reached the last time-point, we'll compute transition probabilities among final-day cells just based on their gene expression similarity.

In [11]:

```
wk.compute_transition_matrix(
growth_iters=3, growth_rate_key="growth_rate_init", last_time_point="connectivities"
)
```

Out[11]:

The method argument `growth_iters`

defines how many iterations we use to fine-tune the cellular growth rates. For other parameters, see the original WOT documentation.

To gain some intuition for the transition matrix we have just computed, we can simualte some random walks and plot them in the embedding.

In [12]:

```
wk.plot_random_walks(
n_sims=300,
max_iter=200,
start_ixs={"day": 0.0},
basis="force_directed",
c="day",
legend_loc="none",
linealpha=0.5,
dpi=150,
)
```

Black dots indicate the start of a random walk (can't really see them here, but they come from the lower right corner where day 0 MEFs are located) and yellow dots inicate the end. If we compare with the cluster labels below, we can observe that most random walks terminate in either the Trophoblast, Stromal, Epithelial, Neural or IPS clusters. We'll make this more quantitative below using pyGPCCA to compute macrostates. Generalized Perron Cluster Cluster Analysis (GPCCA) is a fast and mathematically sound way to define macrostates - sets of cells that have some level of metastability Reuter et al., JCP (2019).

In [13]:

```
scv.pl.scatter(adata, c="cell_sets", basis="force_directed", legend_loc="right")
```