To assemble our annotations, we'll read our clustered Myeloid cell data and assign our expert annotations to those clusters. We'll then inspect the annotations in our UMAP projections, and output final labels for these cells.
For Myeloid cells, we have two groups of cells to label - Most of the Myeloid cells were assigned labels at one resolution, and the Dendritic cells were assigned labels after additional, iterative clustering. So, we'll load both of these sets, remove DCs from the rest of the Myeloid cells, assign identities based on clusters in each, and finally concatenate all of the cell barcodes.
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=RuntimeWarning)
from datetime import date
import hisepy
import os
import pandas as pd
import scanpy as sc
This function makes it easy to pull csv files stored in HISE as a pandas data.frame
def read_csv_uuid(csv_uuid):
csv_path = '/home/jupyter/cache/{u}'.format(u = csv_uuid)
if not os.path.isdir(csv_path):
hise_res = hisepy.reader.cache_files([csv_uuid])
csv_filename = os.listdir(csv_path)[0]
csv_file = '{p}/{f}'.format(p = csv_path, f = csv_filename)
df = pd.read_csv(csv_file, index_col = 0)
return df
cell_class = 'myeloid'
h5ad_uuid = 'c38df326-662d-459b-982d-0186c022f70d'
h5ad_path = '/home/jupyter/cache/{u}'.format(u = h5ad_uuid)
if not os.path.isdir(h5ad_path):
hise_res = hisepy.reader.cache_files([h5ad_uuid])
h5ad_filename = os.listdir(h5ad_path)[0]
h5ad_file = '{p}/{f}'.format(p = h5ad_path, f = h5ad_filename)
adata = sc.read_h5ad(h5ad_file)
adata.shape
(397356, 1112)
dc_uuid = '892e4fb0-8dad-4cb6-bcec-8f29b3dcd15e'
dc_path = '/home/jupyter/cache/{u}'.format(u = dc_uuid)
if not os.path.isdir(dc_path):
hise_res = hisepy.reader.cache_files([dc_path])
dc_filename = os.listdir(dc_path)[0]
dc_file = '{p}/{f}'.format(p = dc_path, f = dc_filename)
dc_adata = sc.read_h5ad(dc_file)
dc_adata.shape
(34641, 2327)
drop_lgl = [not x for x in adata.obs['barcodes'].isin(dc_adata.obs['barcodes'])]
nondc_adata = adata[drop_lgl].copy()
nondc_adata.shape
(362715, 1112)
anno_uuid = '9f7d59f2-7aa8-4c2a-86b9-fe6c46b1068f'
anno = read_csv_uuid(anno_uuid)
join_col = 'leiden_resolution_3'
anno[join_col] = anno[join_col].astype('string').astype('category')
obs = nondc_adata.obs
sum(obs[join_col].isin(anno[join_col]))
362715
nondc_anno = obs.merge(anno, how = 'left', on = join_col)
nondc_anno.head()
barcodes | batch_id | cell_name | cell_uuid | chip_id | hto_barcode | hto_category | n_genes | n_mito_umis | n_reads | ... | pct_counts_mito | leiden | leiden_resolution_1 | leiden_resolution_1.5 | leiden_resolution_2 | leiden_resolution_2.5 | leiden_resolution_3 | AIFI_L3 | AIFI_L1 | AIFI_L2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cf71fa1048b611ea8957bafe6d70929e | B001 | impatient_familial_cuckoo | cf71fa1048b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1246 | 204 | 11107 | ... | 5.935409 | 3 | 5 | 3 | 1 | 2 | 1 | Core CD14 monocyte | Monocyte | CD14 monocyte |
1 | cf71ffba48b611ea8957bafe6d70929e | B001 | dastardly_wintery_airedale | cf71ffba48b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1965 | 363 | 15979 | ... | 6.871096 | 9 | 0 | 6 | 3 | 3 | 19 | Core CD14 monocyte | Monocyte | CD14 monocyte |
2 | cf721da648b611ea8957bafe6d70929e | B001 | silvery_uncouth_sturgeon | cf721da648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1322 | 93 | 9883 | ... | 2.932829 | 3 | 2 | 4 | 5 | 11 | 10 | Core CD14 monocyte | Monocyte | CD14 monocyte |
3 | cf7221e848b611ea8957bafe6d70929e | B001 | obtuse_visible_icefish | cf7221e848b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1626 | 280 | 15824 | ... | 5.764875 | 3 | 5 | 3 | 1 | 2 | 1 | Core CD14 monocyte | Monocyte | CD14 monocyte |
4 | cf7223aa48b611ea8957bafe6d70929e | B001 | cosmologic_sisterlike_rattail | cf7223aa48b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 815 | 72 | 4830 | ... | 4.755614 | 9 | 2 | 4 | 17 | 16 | 13 | Core CD14 monocyte | Monocyte | CD14 monocyte |
5 rows × 58 columns
dc_anno_uuid = '98c74523-e518-49f3-a021-f30b87a8f565'
dc_anno = read_csv_uuid(dc_anno_uuid)
join_col = 'leiden_resolution_2_myeloid-dcs'
dc_anno[join_col] = dc_anno[join_col].astype('string').astype('category')
obs = dc_adata.obs
sum(obs[join_col].isin(dc_anno[join_col]))
34641
dc_anno = obs.merge(dc_anno, how = 'left', on = join_col)
dc_anno.head()
barcodes | batch_id | cell_name | cell_uuid | chip_id | hto_barcode | hto_category | n_genes | n_mito_umis | n_reads | ... | leiden_resolution_1 | leiden_resolution_1.5 | leiden_resolution_2 | leiden_resolution_2.5 | leiden_resolution_3 | leiden_resolution_1_myeloid-dcs | leiden_resolution_2_myeloid-dcs | AIFI_L3 | AIFI_L1 | AIFI_L2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cf72153648b611ea8957bafe6d70929e | B001 | svelte_frenzied_kusimanse | cf72153648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1246 | 69 | 9481 | ... | 13 | 17 | 24 | 25 | 25 | 7 | 1 | pDC | DC | pDC |
1 | cf7273e648b611ea8957bafe6d70929e | B001 | camouflage_gentled_eskimodog | cf7273e648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 2721 | 680 | 38114 | ... | 8 | 19 | 27 | 19 | 28 | 5 | 19 | CD14+ cDC2 | DC | cDC2 |
2 | cf764a6648b611ea8957bafe6d70929e | B001 | rightist_camerashy_volvox | cf764a6648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 2344 | 139 | 24384 | ... | 9 | 24 | 34 | 35 | 39 | 8 | 20 | Doublet | Doublet | Doublet |
3 | cf7ac94c48b611ea8957bafe6d70929e | B001 | stumpy_charitable_flee | cf7ac94c48b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 2026 | 398 | 24221 | ... | 8 | 11 | 7 | 8 | 3 | 2 | 8 | HLA-DRhi cDC2 | DC | cDC2 |
4 | cf7f830648b611ea8957bafe6d70929e | B001 | putrid_patterned_atlasmoth | cf7f830648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 2148 | 271 | 21160 | ... | 9 | 24 | 34 | 35 | 39 | 8 | 18 | Doublet | Doublet | Doublet |
5 rows × 60 columns
anno = pd.concat([nondc_anno, dc_anno], axis = 0)
anno = anno[['barcodes', 'AIFI_L1', 'AIFI_L2', 'AIFI_L3']]
anno = anno.set_index('barcodes')
obs = adata.obs
obs = obs.merge(anno, how = 'left', left_index = True, right_index = True)
adata.obs = obs
adata.obs.head()
barcodes | batch_id | cell_name | cell_uuid | chip_id | hto_barcode | hto_category | n_genes | n_mito_umis | n_reads | ... | pct_counts_mito | leiden | leiden_resolution_1 | leiden_resolution_1.5 | leiden_resolution_2 | leiden_resolution_2.5 | leiden_resolution_3 | AIFI_L1 | AIFI_L2 | AIFI_L3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
barcodes | |||||||||||||||||||||
cf71fa1048b611ea8957bafe6d70929e | cf71fa1048b611ea8957bafe6d70929e | B001 | impatient_familial_cuckoo | cf71fa1048b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1246 | 204 | 11107 | ... | 5.935409 | 3 | 5 | 3 | 1 | 2 | 1 | Monocyte | CD14 monocyte | Core CD14 monocyte |
cf71ffba48b611ea8957bafe6d70929e | cf71ffba48b611ea8957bafe6d70929e | B001 | dastardly_wintery_airedale | cf71ffba48b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1965 | 363 | 15979 | ... | 6.871096 | 9 | 0 | 6 | 3 | 3 | 19 | Monocyte | CD14 monocyte | Core CD14 monocyte |
cf72153648b611ea8957bafe6d70929e | cf72153648b611ea8957bafe6d70929e | B001 | svelte_frenzied_kusimanse | cf72153648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1246 | 69 | 9481 | ... | 2.474005 | 24 | 13 | 17 | 24 | 25 | 25 | DC | pDC | pDC |
cf721da648b611ea8957bafe6d70929e | cf721da648b611ea8957bafe6d70929e | B001 | silvery_uncouth_sturgeon | cf721da648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1322 | 93 | 9883 | ... | 2.932829 | 3 | 2 | 4 | 5 | 11 | 10 | Monocyte | CD14 monocyte | Core CD14 monocyte |
cf7221e848b611ea8957bafe6d70929e | cf7221e848b611ea8957bafe6d70929e | B001 | obtuse_visible_icefish | cf7221e848b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1626 | 280 | 15824 | ... | 5.764875 | 3 | 5 | 3 | 1 | 2 | 1 | Monocyte | CD14 monocyte | Core CD14 monocyte |
5 rows × 58 columns
sc.pl.umap(adata, color = ['AIFI_L1', 'AIFI_L2', 'AIFI_L3'], ncols = 1)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
sc.pl.umap(adata,
color = ['leiden_resolution_1',
'leiden_resolution_1.5',
'leiden_resolution_2',
'leiden_resolution_2.5',
'leiden_resolution_3'],
ncols = 1)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter( /opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
obs = adata.obs
obs = obs.reset_index(drop = True)
umap_mat = adata.obsm['X_umap']
umap_df = pd.DataFrame(umap_mat, columns = ['umap_1', 'umap_2'])
obs['umap_1'] = umap_df['umap_1']
obs['umap_2'] = umap_df['umap_2']
obs.head()
barcodes | batch_id | cell_name | cell_uuid | chip_id | hto_barcode | hto_category | n_genes | n_mito_umis | n_reads | ... | leiden_resolution_1 | leiden_resolution_1.5 | leiden_resolution_2 | leiden_resolution_2.5 | leiden_resolution_3 | AIFI_L1 | AIFI_L2 | AIFI_L3 | umap_1 | umap_2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | cf71fa1048b611ea8957bafe6d70929e | B001 | impatient_familial_cuckoo | cf71fa1048b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1246 | 204 | 11107 | ... | 5 | 3 | 1 | 2 | 1 | Monocyte | CD14 monocyte | Core CD14 monocyte | 1.689215 | 1.722977 |
1 | cf71ffba48b611ea8957bafe6d70929e | B001 | dastardly_wintery_airedale | cf71ffba48b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1965 | 363 | 15979 | ... | 0 | 6 | 3 | 3 | 19 | Monocyte | CD14 monocyte | Core CD14 monocyte | 2.061653 | -2.359979 |
2 | cf72153648b611ea8957bafe6d70929e | B001 | svelte_frenzied_kusimanse | cf72153648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1246 | 69 | 9481 | ... | 13 | 17 | 24 | 25 | 25 | DC | pDC | pDC | 14.148112 | 3.576505 |
3 | cf721da648b611ea8957bafe6d70929e | B001 | silvery_uncouth_sturgeon | cf721da648b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1322 | 93 | 9883 | ... | 2 | 4 | 5 | 11 | 10 | Monocyte | CD14 monocyte | Core CD14 monocyte | 3.417083 | -0.073715 |
4 | cf7221e848b611ea8957bafe6d70929e | B001 | obtuse_visible_icefish | cf7221e848b611ea8957bafe6d70929e | B001-P1C1 | TGATGGCCTATTGGG | singlet | 1626 | 280 | 15824 | ... | 5 | 3 | 1 | 2 | 1 | Monocyte | CD14 monocyte | Core CD14 monocyte | 2.073276 | 1.557365 |
5 rows × 60 columns
out_dir = 'output'
if not os.path.isdir(out_dir):
os.makedirs(out_dir)
obs_out_csv = '{p}/ref_pbmc_{c}_labeled_meta_umap_{d}.csv'.format(p = out_dir, c = cell_class, d = date.today())
obs.to_csv(obs_out_csv, index = False)
obs_out_parquet = '{p}/ref_pbmc_{c}_labeled_meta_umap_{d}.parquet'.format(p = out_dir, c = cell_class, d = date.today())
obs.to_parquet(obs_out_parquet, index = False)
bc_anno = obs[['barcodes', 'AIFI_L1', 'AIFI_L2', 'AIFI_L3']]
label_out_csv = '{p}/ref_pbmc_{c}_barcode_labels_{d}.csv'.format(p = out_dir, c = cell_class, d = date.today())
bc_anno.to_csv(label_out_csv, index = False)
label_out_parquet = '{p}/ref_pbmc_{c}_barcode_labels_{d}.parquet'.format(p = out_dir, c = cell_class, d = date.today())
bc_anno.to_parquet(label_out_parquet, index = False)
Finally, we'll use hisepy.upload.upload_files()
to send a copy of our output to HISE to use for downstream analysis steps.
study_space_uuid = '64097865-486d-43b3-8f94-74994e0a72e0'
title = 'Myeloid cell barcode annotations {d}'.format(d = date.today())
in_files = [h5ad_uuid, dc_uuid, anno_uuid, dc_anno_uuid]
in_files
['c38df326-662d-459b-982d-0186c022f70d', '892e4fb0-8dad-4cb6-bcec-8f29b3dcd15e', '9f7d59f2-7aa8-4c2a-86b9-fe6c46b1068f', '98c74523-e518-49f3-a021-f30b87a8f565']
out_files = [obs_out_csv, obs_out_parquet,
label_out_csv, label_out_parquet]
out_files
['output/ref_pbmc_myeloid_labeled_meta_umap_2024-02-29.csv', 'output/ref_pbmc_myeloid_labeled_meta_umap_2024-02-29.parquet', 'output/ref_pbmc_myeloid_barcode_labels_2024-02-29.csv', 'output/ref_pbmc_myeloid_barcode_labels_2024-02-29.parquet']
hisepy.upload.upload_files(
files = out_files,
study_space_id = study_space_uuid,
title = title,
input_file_ids = in_files
)
Cannot determine the current notebook. 1) /home/jupyter/scRNA-Reference-IH-A/05-Assembly/18-Python_assign_Myeloid_cells.ipynb 2) /home/jupyter/scRNA-Reference-IH-A/05-Assembly/17-Python_assign_B_cells.ipynb 3) /home/jupyter/scRNA-Reference-IH-A/04-Annotation/16-Python_T_cell_annotations.ipynb Please select (1-3)
you are trying to upload file_ids... ['output/ref_pbmc_myeloid_labeled_meta_umap_2024-02-29.csv', 'output/ref_pbmc_myeloid_labeled_meta_umap_2024-02-29.parquet', 'output/ref_pbmc_myeloid_barcode_labels_2024-02-29.csv', 'output/ref_pbmc_myeloid_barcode_labels_2024-02-29.parquet']. Do you truly want to proceed?
{'trace_id': 'af36dae2-5608-499b-9a8f-845162d31035', 'files': ['output/ref_pbmc_myeloid_labeled_meta_umap_2024-02-29.csv', 'output/ref_pbmc_myeloid_labeled_meta_umap_2024-02-29.parquet', 'output/ref_pbmc_myeloid_barcode_labels_2024-02-29.csv', 'output/ref_pbmc_myeloid_barcode_labels_2024-02-29.parquet']}
import session_info
session_info.show()
----- anndata 0.10.3 hisepy 0.3.0 pandas 2.1.4 scanpy 1.9.6 session_info 1.0.0 -----
PIL 10.0.1 anyio NA arrow 1.3.0 asttokens NA attr 23.2.0 attrs 23.2.0 babel 2.14.0 beatrix_jupyterlab NA brotli NA cachetools 5.3.1 certifi 2023.11.17 cffi 1.16.0 charset_normalizer 3.3.2 cloudpickle 2.2.1 colorama 0.4.6 comm 0.1.4 cryptography 41.0.7 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 db_dtypes 1.1.1 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 deprecated 1.2.14 exceptiongroup 1.2.0 executing 2.0.1 fastjsonschema NA fqdn NA google NA greenlet 2.0.2 grpc 1.58.0 grpc_status NA h5py 3.10.0 idna 3.6 igraph 0.10.8 importlib_metadata NA ipykernel 6.28.0 ipython_genutils 0.2.0 ipywidgets 8.1.1 isoduration NA jedi 0.19.1 jinja2 3.1.2 joblib 1.3.2 json5 NA jsonpointer 2.4 jsonschema 4.20.0 jsonschema_specifications NA jupyter_events 0.9.0 jupyter_server 2.12.1 jupyterlab_server 2.25.2 jwt 2.8.0 kiwisolver 1.4.5 leidenalg 0.10.1 llvmlite 0.41.0 lz4 4.3.2 markupsafe 2.1.3 matplotlib 3.8.0 matplotlib_inline 0.1.6 mpl_toolkits NA mpmath 1.3.0 natsort 8.4.0 nbformat 5.9.2 numba 0.58.0 numpy 1.24.0 opentelemetry NA overrides NA packaging 23.2 parso 0.8.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 4.1.0 plotly 5.18.0 prettytable 3.9.0 prometheus_client NA prompt_toolkit 3.0.42 proto NA psutil NA ptyprocess 0.7.0 pure_eval 0.2.2 pyarrow 13.0.0 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pynvml NA pyparsing 3.1.1 pyreadr 0.5.0 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA scipy 1.11.4 send2trash NA shapely 1.8.5.post1 six 1.16.0 sklearn 1.3.2 sniffio 1.3.0 socks 1.7.1 sql NA sqlalchemy 2.0.21 sqlparse 0.4.4 stack_data 0.6.2 sympy 1.12 termcolor NA texttable 1.7.0 threadpoolctl 3.2.0 torch 2.1.2+cu121 torchgen NA tornado 6.3.3 tqdm 4.66.1 traitlets 5.9.0 typing_extensions NA uri_template NA urllib3 1.26.18 wcwidth 0.2.12 webcolors 1.13 websocket 1.7.0 wrapt 1.15.0 xarray 2023.12.0 yaml 6.0.1 zipp NA zmq 25.1.2 zoneinfo NA zstandard 0.22.0
----- IPython 8.19.0 jupyter_client 8.6.0 jupyter_core 5.6.1 jupyterlab 4.0.10 notebook 6.5.4 ----- Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] Linux-5.15.0-1052-gcp-x86_64-with-glibc2.31 ----- Session information updated at 2024-02-29 23:38