In this notebook, we demonstrate how to use scVI models trained on our fetal immune atlas for query-to-reference mapping of external datasets, making use of the amazing scvi-tools
.
Note: this analysis is much faster with GPU!
We need a specific version of scvi-tools, otherwise unexpected errors and inconsistencies with pytorch
might come up.
!pip install scvi-tools==0.14.0 scanpy leidenalg
Collecting scvi-tools==0.14.0 Using cached scvi_tools-0.14.0-py3-none-any.whl (231 kB) Requirement already satisfied: scanpy in /opt/conda/lib/python3.8/site-packages (1.7.1) Requirement already satisfied: leidenalg in /opt/conda/lib/python3.8/site-packages (0.8.3) Collecting pytorch-lightning<1.4,>=1.3 Using cached pytorch_lightning-1.3.8-py3-none-any.whl (813 kB) Requirement already satisfied: ipywidgets in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (7.6.3) Collecting docrep>=0.3.2 Using cached docrep-0.3.2-py3-none-any.whl Requirement already satisfied: numba>=0.41.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (0.51.2) Requirement already satisfied: torch>=1.8.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (1.8.1) Requirement already satisfied: openpyxl>=3.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (3.0.7) Requirement already satisfied: numpy>=1.17.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (1.20.1) Requirement already satisfied: h5py>=2.9.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (3.1.0) Requirement already satisfied: pyro-ppl>=1.6.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (1.6.0) Requirement already satisfied: anndata>=0.7.5 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (0.7.5) Requirement already satisfied: pandas>=1.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (1.2.3) Requirement already satisfied: rich>=9.1.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (10.0.0) Requirement already satisfied: scikit-learn>=0.21.2 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (0.24.1) Requirement already satisfied: tqdm>=4.56.0 in /opt/conda/lib/python3.8/site-packages (from scvi-tools==0.14.0) (4.58.0) Requirement already satisfied: scipy~=1.0 in /opt/conda/lib/python3.8/site-packages (from anndata>=0.7.5->scvi-tools==0.14.0) (1.6.1) Requirement already satisfied: natsort in /opt/conda/lib/python3.8/site-packages (from anndata>=0.7.5->scvi-tools==0.14.0) (7.1.1) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from anndata>=0.7.5->scvi-tools==0.14.0) (20.9) Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from docrep>=0.3.2->scvi-tools==0.14.0) (1.15.0) Requirement already satisfied: setuptools in /opt/conda/lib/python3.8/site-packages (from numba>=0.41.0->scvi-tools==0.14.0) (49.6.0.post20210108) Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /opt/conda/lib/python3.8/site-packages (from numba>=0.41.0->scvi-tools==0.14.0) (0.34.0) Requirement already satisfied: et-xmlfile in /opt/conda/lib/python3.8/site-packages (from openpyxl>=3.0->scvi-tools==0.14.0) (1.0.1) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.8/site-packages (from pandas>=1.0->scvi-tools==0.14.0) (2.8.1) Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.8/site-packages (from pandas>=1.0->scvi-tools==0.14.0) (2021.1) Requirement already satisfied: pyro-api>=0.1.1 in /opt/conda/lib/python3.8/site-packages (from pyro-ppl>=1.6.0->scvi-tools==0.14.0) (0.1.2) Requirement already satisfied: opt-einsum>=2.3.2 in /opt/conda/lib/python3.8/site-packages (from pyro-ppl>=1.6.0->scvi-tools==0.14.0) (3.3.0) Requirement already satisfied: tensorboard!=2.5.0,>=2.2.0 in /opt/conda/lib/python3.8/site-packages (from pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (2.4.1) Requirement already satisfied: torchmetrics>=0.2.0 in /opt/conda/lib/python3.8/site-packages (from pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.2.0) Collecting pyDeprecate==0.3.0 Using cached pyDeprecate-0.3.0-py3-none-any.whl (10 kB) Collecting fsspec[http]!=2021.06.0,>=2021.05.0 Using cached fsspec-2022.1.0-py3-none-any.whl (133 kB) Requirement already satisfied: pillow!=8.3.0 in /opt/conda/lib/python3.8/site-packages (from pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (8.1.2) Requirement already satisfied: future>=0.17.1 in /opt/conda/lib/python3.8/site-packages (from pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.18.2) Requirement already satisfied: PyYAML<=5.4.1,>=5.1 in /opt/conda/lib/python3.8/site-packages (from pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (5.3.1) Requirement already satisfied: aiohttp in /opt/conda/lib/python3.8/site-packages (from fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (3.7.4.post0) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (2.25.1) Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->anndata>=0.7.5->scvi-tools==0.14.0) (2.4.7) Requirement already satisfied: colorama<0.5.0,>=0.4.0 in /opt/conda/lib/python3.8/site-packages (from rich>=9.1.0->scvi-tools==0.14.0) (0.4.4) Requirement already satisfied: commonmark<0.10.0,>=0.9.0 in /opt/conda/lib/python3.8/site-packages (from rich>=9.1.0->scvi-tools==0.14.0) (0.9.1) Requirement already satisfied: typing-extensions<4.0.0,>=3.7.4 in /opt/conda/lib/python3.8/site-packages (from rich>=9.1.0->scvi-tools==0.14.0) (3.7.4.3) Requirement already satisfied: pygments<3.0.0,>=2.6.0 in /opt/conda/lib/python3.8/site-packages (from rich>=9.1.0->scvi-tools==0.14.0) (2.8.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.21.2->scvi-tools==0.14.0) (1.0.1) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from scikit-learn>=0.21.2->scvi-tools==0.14.0) (2.1.0) Requirement already satisfied: werkzeug>=0.11.15 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.0.1) Requirement already satisfied: google-auth<2,>=1.6.3 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.28.0) Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (3.3.4) Requirement already satisfied: protobuf>=3.6.0 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (3.15.6) Requirement already satisfied: wheel>=0.26 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.36.2) Requirement already satisfied: grpcio>=1.24.3 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.36.1) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.4.3) Requirement already satisfied: absl-py>=0.4 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.12.0) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /opt/conda/lib/python3.8/site-packages (from tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.8.0) Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.8/site-packages (from google-auth<2,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.2.8) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from google-auth<2,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (4.2.1) Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.8/site-packages (from google-auth<2,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (4.7.2) Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.3.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.3->tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (0.4.8) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (2020.12.5) Requirement already satisfied: chardet<5,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (4.0.0) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (2.10) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.26.3) Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard!=2.5.0,>=2.2.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (3.0.1) Requirement already satisfied: umap-learn>=0.3.10 in /opt/conda/lib/python3.8/site-packages (from scanpy) (0.4.6) Requirement already satisfied: patsy in /opt/conda/lib/python3.8/site-packages (from scanpy) (0.5.1) Requirement already satisfied: networkx>=2.3 in /opt/conda/lib/python3.8/site-packages (from scanpy) (2.5) Requirement already satisfied: tables in /opt/conda/lib/python3.8/site-packages (from scanpy) (3.6.1) Requirement already satisfied: seaborn in /opt/conda/lib/python3.8/site-packages (from scanpy) (0.11.1) Requirement already satisfied: matplotlib>=3.1.2 in /opt/conda/lib/python3.8/site-packages (from scanpy) (3.3.4) Requirement already satisfied: legacy-api-wrap in /opt/conda/lib/python3.8/site-packages (from scanpy) (0.0.0) Requirement already satisfied: sinfo in /opt/conda/lib/python3.8/site-packages (from scanpy) (0.3.1) Requirement already satisfied: statsmodels>=0.10.0rc2 in /opt/conda/lib/python3.8/site-packages (from scanpy) (0.12.2) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.1.2->scanpy) (0.10.0) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib>=3.1.2->scanpy) (1.3.1) Requirement already satisfied: decorator>=4.3.0 in /opt/conda/lib/python3.8/site-packages (from networkx>=2.3->scanpy) (4.4.2) Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.8/site-packages (from aiohttp->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (20.3.0) Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.8/site-packages (from aiohttp->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (1.6.3) Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.8/site-packages (from aiohttp->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (5.1.0) Requirement already satisfied: async-timeout<4.0,>=3.0 in /opt/conda/lib/python3.8/site-packages (from aiohttp->fsspec[http]!=2021.06.0,>=2021.05.0->pytorch-lightning<1.4,>=1.3->scvi-tools==0.14.0) (3.0.1) Requirement already satisfied: ipykernel>=4.5.1 in /opt/conda/lib/python3.8/site-packages (from ipywidgets->scvi-tools==0.14.0) (5.5.0) Requirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets->scvi-tools==0.14.0) (5.1.2) Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets->scvi-tools==0.14.0) (1.0.0) Requirement already satisfied: traitlets>=4.3.1 in /opt/conda/lib/python3.8/site-packages (from ipywidgets->scvi-tools==0.14.0) (5.0.5) Requirement already satisfied: ipython>=4.0.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets->scvi-tools==0.14.0) (7.21.0) Requirement already satisfied: widgetsnbextension~=3.5.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets->scvi-tools==0.14.0) (3.5.1) Requirement already satisfied: jupyter-client in /opt/conda/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets->scvi-tools==0.14.0) (6.1.11) Requirement already satisfied: tornado>=4.2 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets->scvi-tools==0.14.0) (6.1) Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (0.18.0) Requirement already satisfied: backcall in /opt/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (0.2.0) Requirement already satisfied: pexpect>4.3 in /opt/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (4.8.0) Requirement already satisfied: pickleshare in /opt/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (0.7.5) Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (3.0.16) Requirement already satisfied: parso<0.9.0,>=0.8.0 in /opt/conda/lib/python3.8/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (0.8.1) Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets->scvi-tools==0.14.0) (4.7.1) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets->scvi-tools==0.14.0) (3.2.0) Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets->scvi-tools==0.14.0) (0.2.0) Requirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->scvi-tools==0.14.0) (0.17.3) Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.8/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (0.7.0) Requirement already satisfied: wcwidth in /opt/conda/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets->scvi-tools==0.14.0) (0.2.5) Requirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (6.2.0) Requirement already satisfied: nbconvert in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (6.0.7) Requirement already satisfied: terminado>=0.8.3 in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.9.2) Requirement already satisfied: jinja2 in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (2.11.3) Requirement already satisfied: prometheus-client in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.9.0) Requirement already satisfied: pyzmq>=17 in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (22.0.3) Requirement already satisfied: argon2-cffi in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (20.1.0) Requirement already satisfied: Send2Trash>=1.5.0 in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (1.5.0) Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (1.14.5) Requirement already satisfied: pycparser in /opt/conda/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (2.20) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.8/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (1.1.1) Requirement already satisfied: get-version>=2.0.4 in /opt/conda/lib/python3.8/site-packages (from legacy-api-wrap->scanpy) (2.1) Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.5.3) Requirement already satisfied: mistune<2,>=0.8.1 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.8.4) Requirement already satisfied: testpath in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.4.4) Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (1.4.2) Requirement already satisfied: entrypoints>=0.2.2 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.3) Requirement already satisfied: jupyterlab-pygments in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.1.2) Requirement already satisfied: bleach in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (3.3.0) Requirement already satisfied: defusedxml in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.6.0) Requirement already satisfied: async-generator in /opt/conda/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (1.10) Requirement already satisfied: nest-asyncio in /opt/conda/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (1.4.3) Requirement already satisfied: webencodings in /opt/conda/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->scvi-tools==0.14.0) (0.5.1) Requirement already satisfied: stdlib-list in /opt/conda/lib/python3.8/site-packages (from sinfo->scanpy) (0.7.0) Requirement already satisfied: numexpr>=2.6.2 in /opt/conda/lib/python3.8/site-packages (from tables->scanpy) (2.7.3) Installing collected packages: fsspec, pyDeprecate, pytorch-lightning, docrep, scvi-tools Attempting uninstall: fsspec Found existing installation: fsspec 0.8.7 Uninstalling fsspec-0.8.7: Successfully uninstalled fsspec-0.8.7 Attempting uninstall: pytorch-lightning Found existing installation: pytorch-lightning 1.2.5 Uninstalling pytorch-lightning-1.2.5: Successfully uninstalled pytorch-lightning-1.2.5 Attempting uninstall: scvi-tools Found existing installation: scvi-tools 0.10.0 Uninstalling scvi-tools-0.10.0: Successfully uninstalled scvi-tools-0.10.0 Successfully installed docrep-0.3.2 fsspec-2022.1.0 pyDeprecate-0.3.0 pytorch-lightning-1.3.8 scvi-tools-0.14.0
import scanpy as sc
import scvi
import numpy as np
import pandas as pd
import numpy.random as random
import scipy
import anndata
import matplotlib.pyplot as plt
Set up use of GPU
import torch
device = torch.device("cuda")
In this tutorial we use as query scRNA-seq data from the fetal gut generated by Elmentaite et al. 2020.
!wget https://cellgeni.cog.sanger.ac.uk/gutcellatlas/fetal_RAWCOUNTS_cellxgene.h5ad
--2022-01-12 16:49:37-- https://cellgeni.cog.sanger.ac.uk/gutcellatlas/fetal_RAWCOUNTS_cellxgene.h5ad Resolving cellgeni.cog.sanger.ac.uk (cellgeni.cog.sanger.ac.uk)... 193.62.203.62, 193.62.203.63, 193.62.203.61 Connecting to cellgeni.cog.sanger.ac.uk (cellgeni.cog.sanger.ac.uk)|193.62.203.62|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1533655305 (1.4G) [application/x-hdf] Saving to: ‘fetal_RAWCOUNTS_cellxgene.h5ad’ fetal_RAWCOUNTS_cel 100%[===================>] 1.43G 113MB/s in 14s 2022-01-12 16:49:52 (101 MB/s) - ‘fetal_RAWCOUNTS_cellxgene.h5ad’ saved [1533655305/1533655305]
We load the anndata object and subset to the immune cells, to speed up this tutorial.
query_adata = sc.read_h5ad('./fetal_RAWCOUNTS_cellxgene.h5ad')
query_adata = query_adata[query_adata.obs.cell_type_group == 'immune'].copy()
Before mapping to the fetal immune reference model, we need to check a few things: (A) the model takes in input raw gene counts, so we check that data in query_adata.X
is not normalized, and (B) the saved models use EnsemblID as variable names, so we need to make sure that query_adata.var_names
correspond to IDs (rather than gene names).
def _verify_counts(adata):
return(all([not (i%1) for i in adata.X[0,:].toarray()[0]]))
if not _verify_counts(query_adata):
raise ValueError('`query_adata.X` does not contain raw counts.')
if not query_adata.var_names.str.startswith("ENS").all():
raise ValueError('`query_adata.var_names` are not Ensembl geneIDs. Please convert')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /tmp/ipykernel_11162/3692709436.py in <module> 5 raise ValueError('`query_adata.X` does not contain raw counts.') 6 if not query_adata.var_names.str.startswith("ENS").all(): ----> 7 raise ValueError('`query_adata.var_names` are not Ensembl geneIDs. Please convert') ValueError: `query_adata.var_names` are not Ensembl geneIDs. Please convert
## Change var_names to ensemblIDs
query_adata.var['gene_names'] = query_adata.var_names.values.copy()
query_adata.var_names = query_adata.var['gene_ids'].values
We download the scVI model trained on all immune cells in the fetal immune atlas. Other models trained on lineage subsets (e.g. myeloid cells, NK/T cells) are also available.
!wget https://cellgeni.cog.sanger.ac.uk/developmentcellatlas/fetal-immune/scVI_models/scvi_HSC_IMMUNE_model.tar.gz
--2022-01-12 16:50:02-- https://cellgeni.cog.sanger.ac.uk/developmentcellatlas/fetal-immune/scVI_models/scvi_HSC_IMMUNE_model.tar.gz Resolving cellgeni.cog.sanger.ac.uk (cellgeni.cog.sanger.ac.uk)... 193.62.203.63, 193.62.203.61, 193.62.203.62 Connecting to cellgeni.cog.sanger.ac.uk (cellgeni.cog.sanger.ac.uk)|193.62.203.63|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 20084377 (19M) [application/gzip] Saving to: ‘scvi_HSC_IMMUNE_model.tar.gz’ scvi_HSC_IMMUNE_mod 100%[===================>] 19.15M --.-KB/s in 0.1s 2022-01-12 16:50:03 (165 MB/s) - ‘scvi_HSC_IMMUNE_model.tar.gz’ saved [20084377/20084377]
!tar -xf scvi_HSC_IMMUNE_model.tar.gz
model_dir = './scvi_HSC_IMMUNE_model/'
Now we are ready to map the query to the reference.
We first need to define the column in query_adata.obs
that defines the technical batches from which cells come, and rename it to match the ID used during reference training (bbk
). For simplicity, here we assume all cells come from the same technical batch.
query_adata.obs["bbk"] = 'fetal_gut'
Next, we need to handle missing genes in the query. These reference models were trained on 7500 highly variable genes in the reference dataset and excluding cell cycle genes and TCR/BCR genes. We need to make sure that the majority of the genes used for training are profiled in the query dataset.
var_names_model = pd.read_csv(model_dir + "var_names.csv", header=None)[0].values
is_in_query_var = pd.Series(var_names_model).isin(query_adata.var_names)
n_genes = len(var_names_model[~is_in_query_var])
print("% or genes missing from query: {p}%".format(p=np.round((n_genes/len(var_names_model))*100,2)))
% or genes missing from query: 1.36%
We replace missing reference genes in the query with zeros, following the workflow proposed by Lotfollahi et al.:
[...] integration performance was robust when 10% (of 2,000 genes) were missing from query data.
## Zero-filling
empty_X = np.zeros(shape=[ query_adata.n_obs, n_genes])
empty_query_adata = anndata.AnnData(X=empty_X, obs=query_adata.obs)
empty_query_adata.var_names = var_names_model[~is_in_query_var]
empty_query_adata.var_names.names = ["index"]
query_adata_filled = anndata.concat([query_adata, empty_query_adata], axis=1)
query_adata_filled = query_adata_filled[:,var_names_model].copy()
query_adata_filled.obs = query_adata.obs.copy()
Now we are all set to train our model on the query data. We follow the worflow described in the reference mapping tutorial of scvi-tools
.
## Load new model with the query data
vae_q = scvi.model.SCVI.load_query_data(
query_adata_filled,
model_dir,
inplace_subset_query_vars=True
)
INFO .obs[_scvi_labels] not found in target, assuming every cell is same category INFO Using data from adata.X INFO Registered keys:['X', 'batch_indices', 'labels'] INFO Successfully registered anndata object containing 1967 cells, 7500 vars, 34 batches, 1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates.
/opt/conda/envs/ed6/lib/python3.8/site-packages/scvi/model/base/_archesmixin.py:95: UserWarning: Query integration should be performed using models trained with version >= 0.8 warnings.warn( /opt/conda/envs/ed6/lib/python3.8/site-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function transfer_anndata_setup is deprecated; This method will be removed in 0.15.0. Please avoid building any new dependencies on it. warnings.warn(msg, category=FutureWarning)
## Train
vae_q.train(max_epochs=200, plan_kwargs=dict(weight_decay=0.0))
GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 200/200: 100%|██████████| 200/200 [00:26<00:00, 7.65it/s, loss=988, v_num=1]
Now we can extract the latent representation learnt for the query dataset. We save this on the original dataset, which includes all the genes.
query_adata.obsm["X_scvi"] = vae_q.get_latent_representation()
We can use the learnt latent embedding for clustering and visualization of the query dataset. Here we can already see that the scArches mapping allows to separate cells by cell types, rather then Donor identity.
sc.pp.neighbors(query_adata, use_rep="X_scvi")
sc.tl.leiden(query_adata)
sc.tl.umap(query_adata)
sc.pl.umap(query_adata_filled, color=['cell_name_detailed','Donor_id'], wspace=0.3)
Looking at more marker genes for immune cell populations we can see these distinguish different clusters of query cells.
query_adata.var_names = query_adata.var['gene_names'].values
sc.pp.normalize_per_cell(query_adata)
sc.pp.log1p(query_adata)
immune_markers = ['CD3D',
'MS4A1',
'CD14',
'KLF1', "HBD", "ITGA2B",
"GATA2", 'CD34',
'KIT', "IL1R1",
'NKG7','KLRD1',
]
sc.pl.umap(query_adata, color=immune_markers, cmap='magma')
We can also jointly analyse the mapped query data with the reference data. To do this we need to download the full dataset (the download might take a few minutes, and loading the object requires a sufficient amount of RAM).
!wget https://cellgeni.cog.sanger.ac.uk/developmentcellatlas/fetal-immune/PAN.A01.v01.raw_count.20210429.HSC_IMMUNE.embedding.h5ad
--2022-01-12 17:03:21-- https://cellgeni.cog.sanger.ac.uk/developmentcellatlas/fetal-immune/PAN.A01.v01.raw_count.20210429.HSC_IMMUNE.embedding.h5ad Resolving cellgeni.cog.sanger.ac.uk (cellgeni.cog.sanger.ac.uk)... 193.62.203.61, 193.62.203.62, 193.62.203.63 Connecting to cellgeni.cog.sanger.ac.uk (cellgeni.cog.sanger.ac.uk)|193.62.203.61|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 11174824039 (10G) [application/octet-stream] Saving to: ‘PAN.A01.v01.raw_count.20210429.HSC_IMMUNE.embedding.h5ad’ PAN.A01.v01.raw_cou 100%[===================>] 10.41G 50.6MB/s in 4m 9s 2022-01-12 17:07:30 (42.8 MB/s) - ‘PAN.A01.v01.raw_count.20210429.HSC_IMMUNE.embedding.h5ad’ saved [11174824039/11174824039]
ref_adata = sc.read_h5ad("./PAN.A01.v01.raw_count.20210429.HSC_IMMUNE.embedding.h5ad")
ref_adata
AnnData object with n_obs × n_vars = 593203 × 33538 obs: 'n_counts', 'n_genes', 'file', 'mito', 'doublet_scores', 'predicted_doublets', 'old_annotation_uniform', 'organ', 'Sort_id', 'age', 'method', 'donor', 'sex', 'Sample', 'scvi_clusters', 'is_maternal_contaminant', 'anno_lvl_2_final_clean', 'celltype_annotation' var: 'GeneID', 'GeneName', 'highly_variable', 'means', 'dispersions', 'dispersions_norm' uns: 'leiden', 'scvi', 'umap' obsm: 'X_scvi', 'X_umap' obsp: 'scvi_connectivities', 'scvi_distances'
To speed up computations in this tutorial, we subset the reference dataset for joint analysis (e.g. UMAP embedding can take several minutes on a dataset of > 500k cells).
sc.pp.subsample(ref_adata, fraction=0.2)
## Convert var_names to ensembl IDs
ref_adata.var_names = ref_adata.var.GeneID
query_adata.var_names = query_adata.var.gene_ids
## Merge
concat_adata = anndata.concat([ref_adata, query_adata], axis=0,
label="dataset", keys=["reference", "query"],
join="outer", merge="unique", uns_merge="unique")
concat_adata.obs_names = concat_adata.obs_names + "-" + concat_adata.obs["dataset"].astype("str")
concat_adata
AnnData object with n_obs × n_vars = 120607 × 34241 obs: 'n_counts', 'n_genes', 'file', 'mito', 'doublet_scores', 'predicted_doublets', 'old_annotation_uniform', 'organ', 'Sort_id', 'age', 'method', 'donor', 'sex', 'Sample', 'scvi_clusters', 'is_maternal_contaminant', 'anno_lvl_2_final_clean', 'celltype_annotation', 'CRL', 'Enrichment_fraction', 'PCW', 'Donor_nb', 'Donor_id', 'Purification', 'Organ', 'doublet_scores_observed_cells', 'percent_mito', 'cell_type_group', 'cell_name', 'cell_name_detailed', 'bbk', 'leiden', 'dataset' var: 'GeneID', 'GeneName', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'gene_ids', 'gene_names' uns: 'leiden', 'scvi', 'neighbors', 'log1p' obsm: 'X_scvi', 'X_umap'
We can now compute a joint KNN graph and UMAP embedding.
## Compute UMAP
sc.pp.neighbors(concat_adata, n_neighbors=30, use_rep="X_scvi")
sc.tl.umap(concat_adata, min_dist = 0.01, spread = 2)
This allows to visualize similarities between query and reference cells
plt.rcParams['figure.figsize'] = [8,8]
sc.pl.umap(concat_adata, color=['celltype_annotation','cell_name_detailed'], size=10,
title=['annotation reference', 'annotation query'], legend_loc='on data')
Here we can already distinguish immune cells from different lineages in the query dataset. We can use similarity in the latent embedding to predict cell identity of these cells. We implement a utility function to do this in Pan_fetal_immune/src/utils/scArches_utils/map_query_utils.py
. This stores predicted annotations in adata.obs['predicted_anno']
.
import sys
sys.path.append('../src/utils/scArches_utils/')
import map_query_utils
map_query_utils.predict_label2(concat_adata, anno_col='celltype_annotation', min_score=0.5)
0.3734002113342285
n_cells = concat_adata.obs['predicted_anno'].value_counts()
n_cells
LATE_ERY 659 MACROPHAGE_MHCII_HIGH 242 low_confidence 189 MACROPHAGE_LYVE1_HIGH 132 DOUBLETS_FIBRO_ERY 118 DC2 87 MID_ERY 74 YS_ERY 52 MACROPHAGE_PROLIFERATING 46 DC1 40 MONOCYTE_I_CXCR4 37 PRO_B 30 PROMONOCYTE 29 MONOCYTE_II_CCR2 29 ILC3 21 MAST_CELL 19 CYCLING_DC 18 PRE_PRO_B 12 NK 12 CYCLING_YS_ERY 9 EARLY_MK 9 PDC 9 EOSINOPHIL_BASOPHIL 9 LOW_QUALITY_MID_ERY_(HIGH_RIBO) 8 MONOCYTE_III_IL1B 8 CYCLING_ILC 7 FIBROBLAST_XVII 7 LATE_MK 6 MOP 6 CYCLING_NK 4 LATE_PRO_B 3 LARGE_PRE_B 3 VSMC_PERICYTE_III 3 LMPP_MLP 3 MACROPHAGE_IRON_RECYCLING 3 DP(P)_T 2 MATURE_B 2 MACROPHAGE_KUPFFER_LIKE 2 MEP 2 EARLY_ERY 2 PROMYELOCYTE 2 GMP 2 CMP 2 HSC_MPP 1 LOW_QUALITY_MACROPHAGE 1 HIGH_MITO 1 DOUBLET_LYMPHOID_MACROPHAGE 1 CYCLING_T 1 DN(P)_T 1 CYCLING_PDC 1 DN(early)_T 1 Name: predicted_anno, dtype: int64
## Exclude low abundance predictions
low_ab_predictions = n_cells.index[n_cells < 10]
concat_adata.obs.loc[concat_adata.obs['predicted_anno'].isin(low_ab_predictions), 'predicted_anno'] = 'low_confidence'
sc.pl.umap(concat_adata, color=['predicted_anno','predicted_anno_prob'], size=10, legend_loc='on data')
/opt/conda/envs/ed6/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object. c.reorder_categories(natsorted(c.categories), inplace=True) ... storing 'predicted_anno' as categorical /opt/conda/envs/ed6/lib/python3.8/site-packages/anndata/_core/anndata.py:1220: FutureWarning: The `inplace` parameter in pandas.Categorical.reorder_categories is deprecated and will be removed in a future version. Reordering categories will always return a new Categorical object. c.reorder_categories(natsorted(c.categories), inplace=True) ... storing 'predicted_anno_unfiltered' as categorical
plt.rcParams['figure.figsize'] = [10,4]
map_query_utils.plot_confusion_mat(concat_adata, query_anno_col='cell_name_detailed')
Visualize predicted labels on query data only and compare with expression of immune cell markers.
query_adata.obs['predicted_anno'] = concat_adata.obs[concat_adata.obs['dataset']=='query']['predicted_anno'].values
plt.rcParams['figure.figsize'] = [8,8]
sc.pl.umap(query_adata, color=['predicted_anno'])
immune_markers = {
"Progenitors":["GATA2", 'CD34'],
'DC1':["CLEC9A", "BATF3"],
"DC2":["CD1C", "CLEC10A"],
"T cells":['CD3D'],
'B cells':['MS4A1'],
"Mono/Macs":['CD14', "LYVE1", "CCR2"],
'Ery':['KLF1', "HBD"],
'ILC':['KIT', "IL1R1"],
'NK':['NKG7','KLRD1']
}
sc.pl.dotplot(query_adata,immune_markers, groupby='predicted_anno')