Calculate CCLE Tissue Heatmaps

This notebook will calculate heatmaps for each tissue in the CCLE.

In [1]:
from clustergrammer_widget import *
net = Network(clustergrammer_widget)

Load CCLE data

I will load the CCLE data and export it as a Pandas DataFrame that will be used to generate tissue-specific heatmaps.

In [28]:
net.load_file('../original_data/CCLE.txt')
ccle = net.export_df()
print(ccle.shape)
(18874, 1037)

Get Unique Tissues

In [29]:
cols = ccle.columns.tolist()
tissues = []
for inst_col in cols:
    tissues.append(inst_col[1])
tissues = sorted(list(set(tissues)))

Intra-Normalized Tissue-Specific Heatmaps

In [30]:
# intra-tissue normalization: filter, enrich, cluster, and save JSON
keep_tissues = []
for inst_tissue in tissues:
    net.load_df(ccle)
    net.filter_cat('col', 1, inst_tissue)
    num_cols = net.dat['mat'].shape[1]
    
    # only keep tissues that have more than one cell line 
    if num_cols > 1: 
        print(inst_tissue + ': ' + str(num_cols))
        
        # keep list of tissues with multiple cell lines
        keep_tissues.append(inst_tissue)
    
        # filter for top 250 genes in tissue based on variance
        net.filter_N_top('row', 250, 'var')
        
        # normalize gene expression across cell lines in tissue
        net.normalize(axis='row', norm_type='zscore')
        
        # pre-calculate enrichment analysis for Gene Ontology Biological Process
        net.enrichrgram('GO_Biological_Process_2015')
        
        # cluster and tell front-end to enable enrichrgram (do not calculate row-filtered views)
        net.cluster(views=[], enrichrgram=True)
        
        # save to JSON
        filename = '../json/intra-norm_' + inst_tissue.split(': ')[1] + '.json'
        net.write_json_to_file('viz', filename, indent='no-indent')
tissue: autonomic_ganglia: 17
tissue: biliary_tract: 8
tissue: bone: 29
tissue: breast: 59
tissue: central_nervous_system: 69
tissue: endometrium: 27
tissue: haematopoietic_and_lymphoid_tissue: 180
tissue: kidney: 36
tissue: large_intestine: 61
tissue: liver: 28
tissue: lung: 187
tissue: oesophagus: 26
tissue: ovary: 52
tissue: pancreas: 44
tissue: pleura: 11
tissue: prostate: 8
tissue: salivary_gland: 2
tissue: skin: 62
tissue: soft_tissue: 21
tissue: stomach: 38
tissue: thyroid: 12
tissue: upper_aerodigestive_tract: 32
tissue: urinary_tract: 27

Inter-Normalized Tissue-Specific Heatmaps

Here, we are making tissue-specific heatmaps using the most consistently differentially expressed genes across each tissue relative to all cell lines in the CCLE.

In [31]:
# make inter-tissue normalized ccle DataFrame
net.load_df(ccle)
net.normalize(axis='row', norm_type='zscore')
ccle_zscore = net.export_df()

for inst_tissue in keep_tissues:
    print(inst_tissue + ': ' + str(num_cols))
    
    # load inter-tissue normalized data
    net.load_df(ccle_zscore)
    
    # filter for tissue of interest
    net.filter_cat('col', 1, inst_tissue)
    
    # keep the top 250 differentially expressed genes 
    net.filter_N_top('row', 250, 'sum')
    
    # pre-calculate enrichment analysis for Gene Ontology Biological Process
    net.enrichrgram('GO_Biological_Process_2015')
    
    # cluster and tell front-end to enable enrichrgram 
    net.cluster(enrichrgram=True)
    
    # save to JSON
    filename = '../json/inter-norm_' + inst_tissue.split(': ')[1] + '.json'
    net.write_json_to_file('viz', filename, indent='no-indent')
tissue: autonomic_ganglia: 27
tissue: biliary_tract: 27
tissue: bone: 27
tissue: breast: 27
tissue: central_nervous_system: 27
tissue: endometrium: 27
tissue: haematopoietic_and_lymphoid_tissue: 27
tissue: kidney: 27
tissue: large_intestine: 27
tissue: liver: 27
tissue: lung: 27
tissue: oesophagus: 27
tissue: ovary: 27
tissue: pancreas: 27
tissue: pleura: 27
tissue: prostate: 27
tissue: salivary_gland: 27
tissue: skin: 27
tissue: soft_tissue: 27
tissue: stomach: 27
tissue: thyroid: 27
tissue: upper_aerodigestive_tract: 27
tissue: urinary_tract: 27
In [ ]: