In this notebook, we use marker gene detection to select clusters that contain Gamma-Delta T (gdT) cells, then subset our dataset and perform a round of iterative clustering.
In some other CD8 T cell populations, we also identify some clusters that include gdT cells. We'll assemble all of the gdT-containing clusters from our subclustering analysis of CD8 CM and CD8 EM cells as well as the main cluster of gdT cells from clustering of all T cells.
The outputs of this analysis are used by our domain experts to assign cell type identities to our reference.
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import concurrent.futures
from concurrent.futures import ProcessPoolExecutor
import copy
from datetime import date
import hisepy
import os
import pandas as pd
import re
import scanpy as sc
import scanpy.external as sce
These functions will help with subsetting and performing leiden clustering at multiple resolutions in parallel.
select_clusters_by_gene_frac()
allows us to compute the fraction of cells in each cluster that express the provided gene (> 0 UMIs). This fraction is provided by scanpy
's dotplot function, which calculates these fractions for use in display. We then filter clusters based on the cutoff provided as a parameter to this function.
def select_clusters_by_gene_frac(adata, gene, cutoff, clusters = 'leiden'):
gene_cl_frac = sc.pl.dotplot(
adata,
groupby = clusters,
var_names = gene,
return_fig = True
).dot_size_df
select_cl = gene_cl_frac.index[gene_cl_frac[gene] > cutoff].tolist()
return select_cl
cell_class = 't-gd'
h5ad_uuid = 'd6ebc576-34ea-4394-a569-e35e16f20253'
h5ad_path = '/home/jupyter/cache/{u}'.format(u = h5ad_uuid)
if not os.path.isdir(h5ad_path):
hise_res = hisepy.reader.cache_files([h5ad_uuid])
h5ad_filename = os.listdir(h5ad_path)[0]
h5ad_file = '{p}/{f}'.format(p = h5ad_path, f = h5ad_filename)
adata = sc.read_h5ad(h5ad_file)
adata
AnnData object with n_obs × n_vars = 1191327 × 1487 obs: 'barcodes', 'batch_id', 'cell_name', 'cell_uuid', 'chip_id', 'hto_barcode', 'hto_category', 'n_genes', 'n_mito_umis', 'n_reads', 'n_umis', 'original_barcodes', 'pbmc_sample_id', 'pool_id', 'well_id', 'sample.sampleKitGuid', 'cohort.cohortGuid', 'subject.subjectGuid', 'subject.biologicalSex', 'subject.race', 'subject.ethnicity', 'subject.birthYear', 'sample.visitName', 'sample.drawDate', 'file.id', 'subject.cmv', 'subject.bmi', 'celltypist.low', 'seurat.l1', 'seurat.l1.score', 'seurat.l2', 'seurat.l2.score', 'seurat.l2.5', 'seurat.l2.5.score', 'seurat.l3', 'seurat.l3.score', 'predicted_doublet', 'doublet_score', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mito', 'log1p_total_counts_mito', 'pct_counts_mito', 'leiden', 'leiden_resolution_1', 'leiden_resolution_1.5', 'leiden_resolution_2' var: 'mito', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'celltypist.low_colors', 'hvg', 'leiden', 'leiden_colors', 'log1p', 'neighbors', 'pca', 'seurat.l2.5_colors', 'umap' obsm: 'X_pca', 'X_pca_harmony', 'X_umap' varm: 'PCs' obsp: 'connectivities', 'distances'
To get an overview of cluster identity, we'll use a set of marker genes that are expressed in major classes of T cell types:
markers = [
'CD4', # CD4 T cells
'CD8A', # CD8 T cells
'FHIT', # Higher in CD4 Naive
'IKZF2', # Helios; Treg
'LGALS3', # Double-Negative
'SLC4A10', # MAIT
'TRDC', # Gamma-Delta
'MKI67' # Proliferating cells
]
sc.pl.dotplot(
adata,
groupby = 'leiden_resolution_1.5',
var_names = markers,
swap_axes = True
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_dotplot.py:747: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored dot_ax.scatter(x, y, **kwds)
To select clusters, we'll use select_clusters_by_gene_frac()
to select clusters for our desired cell type. We can also select clusters that express off-target genes (like HBB and PPBP), and use these to filter our list of clusters.
sc.pl.umap(adata, color = 'leiden_resolution_1.5', legend_loc = 'on data')
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
trdc_pos_cl = select_clusters_by_gene_frac(
adata, gene = 'TRDC', cutoff = 0.5, clusters = 'leiden_resolution_1.5'
)
sc.pl.umap(adata, color = 'leiden_resolution_1.5', groups = trdc_pos_cl)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Here, we use Python's set
class to keep the clusters we want, and remove off-target hits.
keep_cl = trdc_pos_cl
keep_cl.sort()
keep_cl
['15']
Now, we can filter the dataset to get the subset we're after.
t_adata_subset = adata[adata.obs['leiden_resolution_1.5'].isin(keep_cl)]
t_adata_subset.shape
(33088, 1487)
cd8_cm_h5ad_uuid = '6c1dff43-ddc5-437b-8e3d-dd5a32553b16'
cd8_cm_h5ad_path = '/home/jupyter/cache/{u}'.format(u = cd8_cm_h5ad_uuid)
if not os.path.isdir(cd8_cm_h5ad_path):
hise_res = hisepy.reader.cache_files([cd8_cm_h5ad_uuid])
cd8_cm_h5ad_filename = os.listdir(cd8_cm_h5ad_path)[0]
cd8_cm_h5ad_file = '{p}/{f}'.format(p = cd8_cm_h5ad_path, f = cd8_cm_h5ad_filename)
cd8_cm_adata = sc.read_h5ad(cd8_cm_h5ad_file)
cd8_cm_adata
AnnData object with n_obs × n_vars = 43289 × 1754 obs: 'barcodes', 'batch_id', 'cell_name', 'cell_uuid', 'chip_id', 'hto_barcode', 'hto_category', 'n_genes', 'n_mito_umis', 'n_reads', 'n_umis', 'original_barcodes', 'pbmc_sample_id', 'pool_id', 'well_id', 'sample.sampleKitGuid', 'cohort.cohortGuid', 'subject.subjectGuid', 'subject.biologicalSex', 'subject.race', 'subject.ethnicity', 'subject.birthYear', 'sample.visitName', 'sample.drawDate', 'file.id', 'subject.cmv', 'subject.bmi', 'celltypist.low', 'seurat.l1', 'seurat.l1.score', 'seurat.l2', 'seurat.l2.score', 'seurat.l2.5', 'seurat.l2.5.score', 'seurat.l3', 'seurat.l3.score', 'predicted_doublet', 'doublet_score', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mito', 'log1p_total_counts_mito', 'pct_counts_mito', 'leiden', 'leiden_resolution_1', 'leiden_resolution_1.5', 'leiden_resolution_2', 'leiden_resolution_1.5_t-cd8-cm' var: 'mito', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'celltypist.low_colors', 'hvg', 'leiden', 'leiden_colors', 'leiden_resolution_1.5_colors', 'log1p', 'neighbors', 'pca', 'seurat.l2.5_colors', 'umap' obsm: 'X_pca', 'X_pca_harmony', 'X_umap' varm: 'PCs' obsp: 'connectivities', 'distances'
To get an overview of cluster identity, we'll use a set of marker genes that are expressed in major classes of T cell types:
tcr_markers = [
'TRAC', # Alpha-Beta
'TRBC1', # Alpha-Beta
'TRBC2', # Alpha-Beta
'TRGC1', # Gamma-Delta
'TRGC2', # Gamma-Delta
'TRDC' # Gamma-Delta
]
sc.pl.dotplot(
cd8_cm_adata,
groupby = 'leiden_resolution_1.5_t-cd8-cm',
var_names = tcr_markers,
swap_axes = True
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_dotplot.py:747: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored dot_ax.scatter(x, y, **kwds)
To select clusters, we'll use select_clusters_by_gene_frac()
to select clusters for our desired cell type. We can also select clusters that express off-target genes (like HBB and PPBP), and use these to filter our list of clusters.
sc.pl.umap(cd8_cm_adata, color = 'leiden_resolution_1.5_t-cd8-cm', legend_loc = 'on data')
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
trdc_pos_cl = select_clusters_by_gene_frac(
cd8_cm_adata, gene = 'TRDC', cutoff = 0.5, clusters = 'leiden_resolution_1.5_t-cd8-cm'
)
sc.pl.umap(cd8_cm_adata, color = 'leiden_resolution_1.5_t-cd8-cm', groups = trdc_pos_cl)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Here, we use Python's set
class to keep the clusters we want, and remove off-target hits.
keep_cl = trdc_pos_cl
keep_cl.sort()
keep_cl
['11', '4']
Now, we can filter the dataset to get the subset we're after.
cd8_cm_adata_subset = cd8_cm_adata[cd8_cm_adata.obs['leiden_resolution_1.5_t-cd8-cm'].isin(keep_cl)]
cd8_cm_adata_subset.shape
(5721, 1754)
cd8_em_h5ad_uuid = 'b671c53a-2698-41c1-a886-9ab939306716'
cd8_em_h5ad_path = '/home/jupyter/cache/{u}'.format(u = cd8_em_h5ad_uuid)
if not os.path.isdir(cd8_em_h5ad_path):
hise_res = hisepy.reader.cache_files([cd8_em_h5ad_uuid])
cd8_em_h5ad_filename = os.listdir(cd8_em_h5ad_path)[0]
cd8_em_h5ad_file = '{p}/{f}'.format(p = cd8_em_h5ad_path, f = cd8_em_h5ad_filename)
cd8_em_adata = sc.read_h5ad(cd8_em_h5ad_file)
cd8_em_adata
AnnData object with n_obs × n_vars = 118291 × 1659 obs: 'barcodes', 'batch_id', 'cell_name', 'cell_uuid', 'chip_id', 'hto_barcode', 'hto_category', 'n_genes', 'n_mito_umis', 'n_reads', 'n_umis', 'original_barcodes', 'pbmc_sample_id', 'pool_id', 'well_id', 'sample.sampleKitGuid', 'cohort.cohortGuid', 'subject.subjectGuid', 'subject.biologicalSex', 'subject.race', 'subject.ethnicity', 'subject.birthYear', 'sample.visitName', 'sample.drawDate', 'file.id', 'subject.cmv', 'subject.bmi', 'celltypist.low', 'seurat.l1', 'seurat.l1.score', 'seurat.l2', 'seurat.l2.score', 'seurat.l2.5', 'seurat.l2.5.score', 'seurat.l3', 'seurat.l3.score', 'predicted_doublet', 'doublet_score', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mito', 'log1p_total_counts_mito', 'pct_counts_mito', 'leiden', 'leiden_resolution_1', 'leiden_resolution_1.5', 'leiden_resolution_2', 'leiden_resolution_3_t-cd8-em' var: 'mito', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'celltypist.low_colors', 'hvg', 'leiden', 'leiden_colors', 'leiden_resolution_1.5_colors', 'log1p', 'neighbors', 'pca', 'seurat.l2.5_colors', 'umap' obsm: 'X_pca', 'X_pca_harmony', 'X_umap' varm: 'PCs' obsp: 'connectivities', 'distances'
To get an overview of cluster identity, we'll use a set of marker genes that are expressed in major classes of T cell types:
tcr_markers = [
'TRAC', # Alpha-Beta
'TRBC1', # Alpha-Beta
'TRBC2', # Alpha-Beta
'TRGC1', # Gamma-Delta
'TRGC2', # Gamma-Delta
'TRDC' # Gamma-Delta
]
sc.pl.dotplot(
cd8_em_adata,
groupby = 'leiden_resolution_3_t-cd8-em',
var_names = tcr_markers,
swap_axes = True
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_dotplot.py:747: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored dot_ax.scatter(x, y, **kwds)
To select clusters, we'll use select_clusters_by_gene_frac()
to select clusters for our desired cell type. We can also select clusters that express off-target genes (like HBB and PPBP), and use these to filter our list of clusters.
sc.pl.umap(cd8_em_adata, color = 'leiden_resolution_3_t-cd8-em', legend_loc = 'on data')
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
trdc_pos_cl = select_clusters_by_gene_frac(
cd8_em_adata, gene = 'TRDC', cutoff = 0.5, clusters = 'leiden_resolution_3_t-cd8-em'
)
sc.pl.umap(cd8_em_adata, color = 'leiden_resolution_3_t-cd8-em', groups = trdc_pos_cl)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Here, we use Python's set
class to keep the clusters we want, and remove off-target hits.
keep_cl = trdc_pos_cl
keep_cl.sort()
keep_cl
['11', '13', '6']
Now, we can filter the dataset to get the subset we're after.
cd8_em_adata_subset = cd8_em_adata[cd8_em_adata.obs['leiden_resolution_3_t-cd8-em'].isin(keep_cl)]
cd8_em_adata_subset.shape
(12565, 1659)
mait_h5ad_uuid = '0f821486-866b-4c08-b0b8-508a5c544547'
mait_h5ad_path = '/home/jupyter/cache/{u}'.format(u = mait_h5ad_uuid)
if not os.path.isdir(mait_h5ad_path):
hise_res = hisepy.reader.cache_files([mait_h5ad_uuid])
downloading fileID: 0f821486-866b-4c08-b0b8-508a5c544547 Files have been successfully downloaded!
mait_filename = os.listdir(mait_h5ad_path)[0]
mait_h5ad_file = '{p}/{f}'.format(p = mait_h5ad_path, f = mait_filename)
mait_adata = sc.read_h5ad(mait_h5ad_file)
mait_adata
AnnData object with n_obs × n_vars = 50823 × 1732 obs: 'barcodes', 'batch_id', 'cell_name', 'cell_uuid', 'chip_id', 'hto_barcode', 'hto_category', 'n_genes', 'n_mito_umis', 'n_reads', 'n_umis', 'original_barcodes', 'pbmc_sample_id', 'pool_id', 'well_id', 'sample.sampleKitGuid', 'cohort.cohortGuid', 'subject.subjectGuid', 'subject.biologicalSex', 'subject.race', 'subject.ethnicity', 'subject.birthYear', 'sample.visitName', 'sample.drawDate', 'file.id', 'subject.cmv', 'subject.bmi', 'celltypist.low', 'seurat.l1', 'seurat.l1.score', 'seurat.l2', 'seurat.l2.score', 'seurat.l2.5', 'seurat.l2.5.score', 'seurat.l3', 'seurat.l3.score', 'predicted_doublet', 'doublet_score', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'total_counts_mito', 'log1p_total_counts_mito', 'pct_counts_mito', 'leiden', 'leiden_resolution_1', 'leiden_resolution_1.5', 'leiden_resolution_2', 'leiden_resolution_3_t-mait' var: 'mito', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std' uns: 'celltypist.low_colors', 'hvg', 'leiden', 'leiden_colors', 'leiden_resolution_1.5_colors', 'log1p', 'neighbors', 'pca', 'seurat.l2.5_colors', 'umap' obsm: 'X_pca', 'X_pca_harmony', 'X_umap' varm: 'PCs' obsp: 'connectivities', 'distances'
To get an overview of cluster identity, we'll use a set of marker genes that are expressed in major classes of T cell types:
tcr_markers = [
'TRAC', # Alpha-Beta
'TRBC1', # Alpha-Beta
'TRBC2', # Alpha-Beta
'TRGC1', # Gamma-Delta
'TRGC2', # Gamma-Delta
'TRDC' # Gamma-Delta
]
sc.pl.dotplot(
mait_adata,
groupby = 'leiden_resolution_3_t-mait',
var_names = tcr_markers,
swap_axes = True
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_dotplot.py:747: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored dot_ax.scatter(x, y, **kwds)
To select clusters, we'll use select_clusters_by_gene_frac()
to select clusters for our desired cell type. We can also select clusters that express off-target genes (like HBB and PPBP), and use these to filter our list of clusters.
sc.pl.umap(mait_adata, color = 'leiden_resolution_3_t-mait', legend_loc = 'on data')
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
trdc_pos_cl = select_clusters_by_gene_frac(
mait_adata, gene = 'TRDC', cutoff = 0.5, clusters = 'leiden_resolution_3_t-mait'
)
sc.pl.umap(mait_adata, color = 'leiden_resolution_3_t-mait', groups = trdc_pos_cl)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
Here, we use Python's set
class to keep the clusters we want, and remove off-target hits.
keep_cl = trdc_pos_cl
keep_cl.sort()
keep_cl
['1']
Now, we can filter the dataset to get the subset we're after.
mait_adata_subset = mait_adata[mait_adata.obs['leiden_resolution_3_t-mait'].isin(keep_cl)]
mait_adata_subset.shape
(2739, 1732)
t_gdt_bc = t_adata_subset.obs['barcodes'].tolist()
cd8_cm_gdt_bc = cd8_cm_adata_subset.obs['barcodes'].tolist()
cd8_em_gdt_bc = cd8_em_adata_subset.obs['barcodes'].tolist()
mait_gdt_bc = mait_adata_subset.obs['barcodes'].tolist()
all_gdt_bc = t_gdt_bc + cd8_cm_gdt_bc + cd8_em_gdt_bc + mait_gdt_bc
adata_subset = adata[adata.obs['barcodes'].isin(all_gdt_bc)]
adata_subset.shape
(54113, 1487)
As in the original analysis of this dataset, we'll need to normalize, select marker genes, and run Harmony to integrate across our cohorts.
It's important that we redo this step for our subset, as gene variability may differ when computed within our subset of cells rather than across the entire set of PBMCs. This key feature selection step will affect our ability to cluster and identify cell types, so we do this iteratively for the subset we're using now.
We previously stored raw counts in adata.raw
- we can now recover these original count data for analysis of the selected cells:
adata_subset = adata_subset.raw.to_adata()
adata_subset.shape
(54113, 33538)
adata_subset.raw = adata_subset
sc.pp.normalize_total(adata_subset, target_sum=1e4)
sc.pp.log1p(adata_subset)
sc.pp.highly_variable_genes(adata_subset)
adata_subset = adata_subset[:, adata_subset.var_names[adata_subset.var['highly_variable']]]
WARNING: adata.X seems to be already log-transformed.
sc.pp.scale(adata_subset)
/opt/conda/lib/python3.10/site-packages/scanpy/preprocessing/_simple.py:843: UserWarning: Received a view of an AnnData. Making a copy. view_to_actual(adata)
sc.tl.pca(adata_subset, svd_solver='arpack')
sce.pp.harmony_integrate(
adata_subset,
'cohort.cohortGuid',
max_iter_harmony = 30)
2024-03-04 23:07:02,171 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans... 2024-03-04 23:07:26,282 - harmonypy - INFO - sklearn.KMeans initialization complete. 2024-03-04 23:07:26,664 - harmonypy - INFO - Iteration 1 of 30 2024-03-04 23:08:00,694 - harmonypy - INFO - Iteration 2 of 30 2024-03-04 23:08:35,966 - harmonypy - INFO - Converged after 2 iterations
sc.pp.neighbors(
adata_subset,
n_neighbors = 50,
use_rep = 'X_pca_harmony',
n_pcs = 30)
sc.tl.umap(adata_subset, min_dist = 0.05)
out_dir = 'output'
if not os.path.isdir(out_dir):
os.makedirs(out_dir)
subset_h5ad = 'output/ref_pbmc_{c}_subset_{d}.h5ad'.format(c = cell_class, d = date.today())
adata_subset.write_h5ad(subset_h5ad)
%%time
sc.tl.leiden(
adata_subset,
resolution = 1.5,
key_added = 'leiden_resolution_1.5_{c}'.format(c = cell_class)
)
CPU times: user 5min 2s, sys: 2.17 s, total: 5min 5s Wall time: 5min 2s
clustered_h5ad = 'output/pbmc_ref_{c}_clustered_{d}.h5ad'.format(c = cell_class, d = date.today())
adata_subset.write_h5ad(clustered_h5ad)
Now that we've clustered, it's helpful to plot reference labels and clusters on our UMAP projection to see how they fall relative to each other.
sc.pl.umap(
adata_subset,
color = ['seurat.l2.5'],
size = 2,
show = False,
ncols = 1 ,
frameon = False
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
<Axes: title={'center': 'seurat.l2.5'}, xlabel='UMAP1', ylabel='UMAP2'>
sc.pl.umap(
adata_subset,
color = ['celltypist.low'],
size = 2,
show = False,
ncols = 1 ,
frameon = False
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
<Axes: title={'center': 'celltypist.low'}, xlabel='UMAP1', ylabel='UMAP2'>
CMV status is also helpful to view, as CMV can drive expansion of some cell types.
sc.pl.umap(
adata_subset,
color = ['subject.cmv'],
size = 2,
show = False,
ncols = 1 ,
frameon = False
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
<Axes: title={'center': 'subject.cmv'}, xlabel='UMAP1', ylabel='UMAP2'>
sc.pl.umap(
adata_subset,
color = 'leiden_resolution_1.5_{c}'.format(c = cell_class),
size = 2,
show = False,
ncols = 1 ,
frameon = False
)
/opt/conda/lib/python3.10/site-packages/scanpy/plotting/_tools/scatterplots.py:394: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored cax = scatter(
<Axes: title={'center': 'leiden_resolution_1.5_t-gd'}, xlabel='UMAP1', ylabel='UMAP2'>
umap_mat = adata_subset.obsm['X_umap']
umap_df = pd.DataFrame(umap_mat, columns = ['umap_1', 'umap_2'])
obs = adata_subset.obs
obs['umap_1'] = umap_df['umap_1']
obs['umap_2'] = umap_df['umap_2']
out_csv = 'output/pbmc_ref_{c}_clustered_umap_meta_{d}.csv'.format(c = cell_class, d = date.today())
obs.to_csv(out_csv)
/opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str) /opt/conda/lib/python3.10/site-packages/pandas/core/internals/blocks.py:2540: RuntimeWarning: invalid value encountered in cast values = values.astype(str)
out_parquet = 'output/pbmc_ref_{c}_clustered_umap_meta_{d}.parquet'.format(c = cell_class, d = date.today())
obs = obs.to_parquet(out_parquet)
adata_subset = adata_subset.raw.to_adata()
sc.pp.normalize_total(adata_subset, target_sum=1e4)
sc.pp.log1p(adata_subset)
sc.tl.rank_genes_groups(adata_subset, 'leiden_resolution_1.5_{c}'.format(c = cell_class), method = 'wilcoxon')
df = sc.get.rank_genes_groups_df(adata_subset, group = None)
WARNING: adata.X seems to be already log-transformed.
res_csv = '{p}/pbmc_ref_{c}_res{n}_markers_{d}.csv'.format(p = out_dir, c = cell_class, n = 1.5, d = date.today())
df.to_csv(res_csv)
marker_files = res_csv
Finally, we'll use hisepy.upload.upload_files()
to send a copy of our output to HISE to use for downstream analysis steps.
study_space_uuid = '64097865-486d-43b3-8f94-74994e0a72e0'
title = 'PBMC Ref. gdT subclustering {d}'.format(d = date.today())
in_files = [h5ad_uuid]
in_files
['d6ebc576-34ea-4394-a569-e35e16f20253']
out_files = [clustered_h5ad, out_csv, out_parquet, marker_files]
out_files
['output/pbmc_ref_t-gd_clustered_2024-03-04.h5ad', 'output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.csv', 'output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.parquet', 'output/pbmc_ref_t-gd_res1.5_markers_2024-03-04.csv']
hisepy.upload.upload_files(
files = out_files,
study_space_id = study_space_uuid,
title = title,
input_file_ids = in_files
)
output/pbmc_ref_t-gd_clustered_2024-03-04.h5ad output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.csv output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.parquet output/pbmc_ref_t-gd_res1.5_markers_2024-03-04.csv you are trying to upload file_ids... ['output/pbmc_ref_t-gd_clustered_2024-03-04.h5ad', 'output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.csv', 'output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.parquet', 'output/pbmc_ref_t-gd_res1.5_markers_2024-03-04.csv']. Do you truly want to proceed?
{'trace_id': '18793c02-fb03-48fa-aea9-b8e9ad7def11', 'files': ['output/pbmc_ref_t-gd_clustered_2024-03-04.h5ad', 'output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.csv', 'output/pbmc_ref_t-gd_clustered_umap_meta_2024-03-04.parquet', 'output/pbmc_ref_t-gd_res1.5_markers_2024-03-04.csv']}
import session_info
session_info.show()
----- anndata 0.10.3 hisepy 0.3.0 matplotlib 3.8.0 pandas 2.1.4 scanpy 1.9.6 session_info 1.0.0 -----
PIL 10.0.1 anyio NA arrow 1.3.0 asttokens NA attr 23.2.0 attrs 23.2.0 babel 2.14.0 beatrix_jupyterlab NA brotli NA cachetools 5.3.1 certifi 2024.02.02 cffi 1.16.0 charset_normalizer 3.3.2 cloudpickle 2.2.1 colorama 0.4.6 comm 0.1.4 cryptography 41.0.7 cycler 0.10.0 cython_runtime NA dateutil 2.8.2 db_dtypes 1.1.1 debugpy 1.8.0 decorator 5.1.1 defusedxml 0.7.1 deprecated 1.2.14 exceptiongroup 1.2.0 executing 2.0.1 fastjsonschema NA fqdn NA google NA greenlet 2.0.2 grpc 1.58.0 grpc_status NA h5py 3.10.0 harmonypy NA idna 3.6 igraph 0.10.8 importlib_metadata NA ipykernel 6.28.0 ipython_genutils 0.2.0 ipywidgets 8.1.1 isoduration NA jedi 0.19.1 jinja2 3.1.2 joblib 1.3.2 json5 NA jsonpointer 2.4 jsonschema 4.20.0 jsonschema_specifications NA jupyter_events 0.9.0 jupyter_server 2.12.1 jupyterlab_server 2.25.2 jwt 2.8.0 kiwisolver 1.4.5 leidenalg 0.10.1 llvmlite 0.41.0 lz4 4.3.2 markupsafe 2.1.3 matplotlib_inline 0.1.6 mpl_toolkits NA mpmath 1.3.0 natsort 8.4.0 nbformat 5.9.2 numba 0.58.0 numpy 1.24.0 opentelemetry NA overrides NA packaging 23.2 parso 0.8.3 patsy 0.5.3 pexpect 4.8.0 pickleshare 0.7.5 pkg_resources NA platformdirs 4.1.0 plotly 5.18.0 prettytable 3.9.0 prometheus_client NA prompt_toolkit 3.0.42 proto NA psutil NA ptyprocess 0.7.0 pure_eval 0.2.2 pyarrow 13.0.0 pycparser 2.21 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.17.2 pynndescent 0.5.11 pynvml NA pyparsing 3.1.1 pyreadr 0.5.0 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA scipy 1.11.4 send2trash NA shapely 1.8.5.post1 six 1.16.0 sklearn 1.3.2 sniffio 1.3.0 socks 1.7.1 sparse 0.14.0 sql NA sqlalchemy 2.0.21 sqlparse 0.4.4 stack_data 0.6.2 statsmodels 0.14.0 sympy 1.12 termcolor NA texttable 1.7.0 threadpoolctl 3.2.0 torch 2.1.2+cu121 torchgen NA tornado 6.3.3 tqdm 4.66.1 traitlets 5.9.0 typing_extensions NA umap 0.5.5 uri_template NA urllib3 1.26.18 wcwidth 0.2.12 webcolors 1.13 websocket 1.7.0 wrapt 1.15.0 xarray 2023.12.0 yaml 6.0.1 zipp NA zmq 25.1.2 zoneinfo NA zstandard 0.22.0
----- IPython 8.19.0 jupyter_client 8.6.0 jupyter_core 5.6.1 jupyterlab 4.1.2 notebook 6.5.4 ----- Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] Linux-5.15.0-1052-gcp-x86_64-with-glibc2.31 ----- Session information updated at 2024-03-04 23:19