This notebook will calculate heatmaps for each tissue in the CCLE.
from clustergrammer_widget import *
net = Network(clustergrammer_widget)
I will load the CCLE data and export it as a Pandas DataFrame that will be used to generate tissue-specific heatmaps.
net.load_file('../original_data/CCLE.txt')
ccle = net.export_df()
print(ccle.shape)
(18874, 1037)
cols = ccle.columns.tolist()
tissues = []
for inst_col in cols:
tissues.append(inst_col[1])
tissues = sorted(list(set(tissues)))
# intra-tissue normalization: filter, enrich, cluster, and save JSON
keep_tissues = []
for inst_tissue in tissues:
net.load_df(ccle)
net.filter_cat('col', 1, inst_tissue)
num_cols = net.dat['mat'].shape[1]
# only keep tissues that have more than one cell line
if num_cols > 1:
print(inst_tissue + ': ' + str(num_cols))
# keep list of tissues with multiple cell lines
keep_tissues.append(inst_tissue)
# filter for top 250 genes in tissue based on variance
net.filter_N_top('row', 250, 'var')
# normalize gene expression across cell lines in tissue
net.normalize(axis='row', norm_type='zscore')
# pre-calculate enrichment analysis for Gene Ontology Biological Process
net.enrichrgram('GO_Biological_Process_2015')
# cluster and tell front-end to enable enrichrgram (do not calculate row-filtered views)
net.cluster(views=[], enrichrgram=True)
# save to JSON
filename = '../json/intra-norm_' + inst_tissue.split(': ')[1] + '.json'
net.write_json_to_file('viz', filename, indent='no-indent')
tissue: autonomic_ganglia: 17 tissue: biliary_tract: 8 tissue: bone: 29 tissue: breast: 59 tissue: central_nervous_system: 69 tissue: endometrium: 27 tissue: haematopoietic_and_lymphoid_tissue: 180 tissue: kidney: 36 tissue: large_intestine: 61 tissue: liver: 28 tissue: lung: 187 tissue: oesophagus: 26 tissue: ovary: 52 tissue: pancreas: 44 tissue: pleura: 11 tissue: prostate: 8 tissue: salivary_gland: 2 tissue: skin: 62 tissue: soft_tissue: 21 tissue: stomach: 38 tissue: thyroid: 12 tissue: upper_aerodigestive_tract: 32 tissue: urinary_tract: 27
Here, we are making tissue-specific heatmaps using the most consistently differentially expressed genes across each tissue relative to all cell lines in the CCLE.
# make inter-tissue normalized ccle DataFrame
net.load_df(ccle)
net.normalize(axis='row', norm_type='zscore')
ccle_zscore = net.export_df()
for inst_tissue in keep_tissues:
print(inst_tissue + ': ' + str(num_cols))
# load inter-tissue normalized data
net.load_df(ccle_zscore)
# filter for tissue of interest
net.filter_cat('col', 1, inst_tissue)
# keep the top 250 differentially expressed genes
net.filter_N_top('row', 250, 'sum')
# pre-calculate enrichment analysis for Gene Ontology Biological Process
net.enrichrgram('GO_Biological_Process_2015')
# cluster and tell front-end to enable enrichrgram
net.cluster(enrichrgram=True)
# save to JSON
filename = '../json/inter-norm_' + inst_tissue.split(': ')[1] + '.json'
net.write_json_to_file('viz', filename, indent='no-indent')
tissue: autonomic_ganglia: 27 tissue: biliary_tract: 27 tissue: bone: 27 tissue: breast: 27 tissue: central_nervous_system: 27 tissue: endometrium: 27 tissue: haematopoietic_and_lymphoid_tissue: 27 tissue: kidney: 27 tissue: large_intestine: 27 tissue: liver: 27 tissue: lung: 27 tissue: oesophagus: 27 tissue: ovary: 27 tissue: pancreas: 27 tissue: pleura: 27 tissue: prostate: 27 tissue: salivary_gland: 27 tissue: skin: 27 tissue: soft_tissue: 27 tissue: stomach: 27 tissue: thyroid: 27 tissue: upper_aerodigestive_tract: 27 tissue: urinary_tract: 27