#!/usr/bin/env python
# coding: utf-8
# # Introduction
# We provide a set of notebooks to show how the GDSCTools package can be used in ipython / ipython notebook.
#
# The source code is available on github https://github.com/CancerRxGene/gdsctools
# Would you have any issues (bug related), please fill an issue here https://github.com/CancerRxGene/gdsctools/issues
#
# In this notebook, we will simply give a flavour of what can be done. Other notebooks will provide more detailed examples.
#
#
# Documentation is also available for users and developers in a dedicated entry page on Pypi and http://gdsctools.readthedocs.org
#
#
#
#
#
**Other notebooks:**
#
#
#
#
# ### Overview
# The goal of this package is to provide tools related to the GDSC project
# (Genomics of Drug Sensitivity in Cancer) http://www.cancerrxgene.org/
# Currently, GSDSTools provides functionalities to identify associations between drugs and genomic features across a set of cell lines
# The genomic features are provided within the packages. Users need to provide IC50 for a set of drugs and a set of cell lines
# We provide an example to play with. First let us get this IC50 test file and ad it
# In[1]:
get_ipython().run_line_magic('pylab', 'inline')
from gdsctools import ic50_test, GenomicFeatures, genomic_features_test
# In[2]:
print(ic50_test)
# This is just a file with a location and description. It can be read using
# the IC50 class
# In[3]:
from gdsctools import IC50
data = IC50(ic50_test)
print(data)
# As you can see, it contains 11 drugs across 988 cell lines
# Similarly, there is a genomic feature data set provided, which can be read
# with the GenomicFeatures class
# In[4]:
gf = GenomicFeatures(genomic_features_test)
print(gf)
# This file is going to be downloaded automatically when an analysis
# is performed. However, you may provide your own file.
# Let us now perform the analysis using the ANOVA class
# In[5]:
from gdsctools import ANOVA
# In[6]:
an = ANOVA(data, genomic_features=genomic_features_test.filename)
# In[7]:
print(an)
# so, we have 11 drugs, 677 features across 988 cell lines (27 tissues). This
# is a PANCAN analysis (across several cancer cell types).
#
# We can analysis the entire data set, which takes some time (still reasonable; about 1 minute dependiing on your system).
# In[8]:
results = an.anova_all()
# All results are now in the new variable results, which can be looked at. This is a dataframe formatted variable using Pandas library. Each association can be accessed to using a unique identifier from 0 to the length of the dataframe:
# In[9]:
results.df.loc[0]
# As an example, we can plot the histogram of the FDR columns:
# In[10]:
get_ipython().run_line_magic('pylab', 'inline')
results.df['ANOVA_FEATURE_FDR'].hist(bins=20)
# In the next notebooks, we will now investigate more precisely
# - the input data sets
# - the analysis and in particular how to look at
# - one association
# - associations for a given drug
# - all associations (what we did here when we called anova_all() function)
# - How to generate HTML reports
# - The settings
#
# In[ ]: