#!/usr/bin/env python # coding: utf-8 # # Introduction # We provide a set of notebooks to show how the GDSCTools package can be used in ipython / ipython notebook. # # The source code is available on github https://github.com/CancerRxGene/gdsctools # Would you have any issues (bug related), please fill an issue here https://github.com/CancerRxGene/gdsctools/issues # # In this notebook, we will simply give a flavour of what can be done. Other notebooks will provide more detailed examples. # # # Documentation is also available for users and developers in a dedicated entry page on Pypi and http://gdsctools.readthedocs.org # #

# #

**Other notebooks:**

Settings of the analysis

Cancer specific analysis

# # # ### Overview # The goal of this package is to provide tools related to the GDSC project # (Genomics of Drug Sensitivity in Cancer) http://www.cancerrxgene.org/ # Currently, GSDSTools provides functionalities to identify associations between drugs and genomic features across a set of cell lines # The genomic features are provided within the packages. Users need to provide IC50 for a set of drugs and a set of cell lines # We provide an example to play with. First let us get this IC50 test file and ad it # In[1]: get_ipython().run_line_magic('pylab', 'inline') from gdsctools import ic50_test, GenomicFeatures, genomic_features_test # In[2]: print(ic50_test) # This is just a file with a location and description. It can be read using # the IC50 class # In[3]: from gdsctools import IC50 data = IC50(ic50_test) print(data) # As you can see, it contains 11 drugs across 988 cell lines # Similarly, there is a genomic feature data set provided, which can be read # with the GenomicFeatures class # In[4]: gf = GenomicFeatures(genomic_features_test) print(gf) # This file is going to be downloaded automatically when an analysis # is performed. However, you may provide your own file. # Let us now perform the analysis using the ANOVA class # In[5]: from gdsctools import ANOVA # In[6]: an = ANOVA(data, genomic_features=genomic_features_test.filename) # In[7]: print(an) # so, we have 11 drugs, 677 features across 988 cell lines (27 tissues). This # is a PANCAN analysis (across several cancer cell types). # # We can analysis the entire data set, which takes some time (still reasonable; about 1 minute dependiing on your system). # In[8]: results = an.anova_all() # All results are now in the new variable results, which can be looked at. This is a dataframe formatted variable using Pandas library. Each association can be accessed to using a unique identifier from 0 to the length of the dataframe: # In[9]: results.df.loc[0] # As an example, we can plot the histogram of the FDR columns: # In[10]: get_ipython().run_line_magic('pylab', 'inline') results.df['ANOVA_FEATURE_FDR'].hist(bins=20) # In the next notebooks, we will now investigate more precisely # - the input data sets # - the analysis and in particular how to look at # - one association # - associations for a given drug # - all associations (what we did here when we called anova_all() function) # - How to generate HTML reports # - The settings # # In[ ]: