Denis Torre
September 20, 2017
This notebook explains how to extract data from the Datasets2Tools API using Python. The notebook can be downloaded at the following GitHub page: https://github.com/denis-torre/datasets2tools/tree/master/api.
The Datasets2Tools API can be used to search three types of objects:
More detailed explanation on searching these objects is available below.
Here is an example of search results for the analyses endpoint.
# Import modules
import json
import requests
import pandas as pd
# Get API URL
url = 'http://amp.pharm.mssm.edu/datasets2tools/api/search'
# Search 5 analyses
data = {
'object_type': 'canned_analysis',
'page_size': 5
}
# Get response
response = requests.post(url, params=data)
# Read response
results = json.loads(response.text)
# Convert to dataframe
results_dataframe = pd.DataFrame(results)
results_dataframe
canned_analysis_accession | canned_analysis_description | canned_analysis_title | canned_analysis_url | datasets | date | metadata | tools | |
---|---|---|---|---|---|---|---|---|
0 | DCA00000024 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE16256] | September 20, 2017 | {} | [ARCHS4] |
1 | DCA00000025 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE17312] | September 20, 2017 | {} | [ARCHS4] |
2 | DCA00000026 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE18927] | September 20, 2017 | {} | [ARCHS4] |
3 | DCA00000027 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE22959] | September 20, 2017 | {} | [ARCHS4] |
4 | DCA00000028 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE24565] | September 20, 2017 | {} | [ARCHS4] |
For convenience, we define a function to search the API and return a pandas DataFrame.
# Import modules
import json
import requests
import pandas as pd
def search_datasets2tools(search_options):
# Get API URL
url = 'http://amp.pharm.mssm.edu/datasets2tools/api/search'
# Get response
response = requests.post(url, params=search_options)
try:
# Read response
results_dict = json.loads(response.text)
# Convert to dataframe
results_dataframe = pd.DataFrame(results_dict)
# Set index
results_dataframe.set_index(search_options['object_type']+'_accession', inplace=True)
return results_dataframe
except:
return 'Sorry, there has been an error.'
results = search_datasets2tools({'object_type': 'canned_analysis',
'q': 'prostate cancer'})
results.head()
canned_analysis_description | canned_analysis_title | canned_analysis_url | datasets | date | metadata | tools | |
---|---|---|---|---|---|---|---|
canned_analysis_accession | |||||||
DCA00000060 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE35126] | September 20, 2017 | {} | [ARCHS4] |
DCA00000123 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE39509] | September 20, 2017 | {} | [ARCHS4] |
DCA00000139 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE40050] | September 20, 2017 | {} | [ARCHS4] |
DCA00000262 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE43986] | September 20, 2017 | {} | [ARCHS4] |
DCA00000448 | Highly interactive web-based heatmap visualiza... | Interactive heatmap visualization of RNA-seq d... | http://amp.pharm.mssm.edu/datasets2tools/analy... | [GSE48403] | September 20, 2017 | {} | [ARCHS4] |
Search all canned analyses associated to GEO dataset GSE775.
results = search_datasets2tools({'object_type': 'canned_analysis',
'dataset_accession': 'GSE775'})
results.head()
canned_analysis_description | canned_analysis_title | canned_analysis_url | datasets | date | metadata | tools | |
---|---|---|---|---|---|---|---|
canned_analysis_accession | |||||||
DCA00000002 | An enrichment analysis was performed on the to... | Enrichment analysis of genes downregulated in ... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000003 | An enrichment analysis was performed on the to... | Enrichment analysis of genes upregulated in ac... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000004 | An enrichment analysis was performed on the to... | Enrichment analysis of genes downregulated in ... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000005 | An enrichment analysis was performed on the to... | Enrichment analysis of genes upregulated in ac... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000006 | The L1000 database was queried in order to ide... | Small molecules which mimic acute myocardial i... | http://amp.pharm.mssm.edu/L1000CDS2/#/result/5... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'direction': u'mimic... | [L1000CDS2] |
Search all canned analyses generated by Enrichr.
results = search_datasets2tools({'object_type': 'canned_analysis',
'tool_name': 'Enrichr'})
results.head()
canned_analysis_description | canned_analysis_title | canned_analysis_url | datasets | date | metadata | tools | |
---|---|---|---|---|---|---|---|
canned_analysis_accession | |||||||
DCA00000002 | An enrichment analysis was performed on the to... | Enrichment analysis of genes downregulated in ... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000003 | An enrichment analysis was performed on the to... | Enrichment analysis of genes upregulated in ac... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000004 | An enrichment analysis was performed on the to... | Enrichment analysis of genes downregulated in ... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00000005 | An enrichment analysis was performed on the to... | Enrichment analysis of genes upregulated in ac... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 19, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
DCA00059407 | An enrichment analysis was performed on the to... | Enrichment analysis of genes downregulated in ... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE775] | September 20, 2017 | {u'do_id': u'DOID:9408', u'cell_type': u'Heart... | [Enrichr] |
Search all canned analyses with the colon cancer disease name.
results = search_datasets2tools({'object_type': 'canned_analysis',
'disease_name': 'colon cancer'})
results.head()
canned_analysis_description | canned_analysis_title | canned_analysis_url | datasets | date | metadata | tools | |
---|---|---|---|---|---|---|---|
canned_analysis_accession | |||||||
DCA00032919 | The analysis explores the gene interaction net... | Interaction network and enrichment analysis of... | http://genemania.org/#/search/mouse/Lgals6|Guc... | [GSE2178] | September 20, 2017 | {u'do_id': u'DOID:219', u'cell_type': u'Intest... | [GeneMANIA] |
DCA00032920 | The analysis explores the gene interaction net... | Interaction network and enrichment analysis of... | http://genemania.org/#/search/mouse/Slpi|Gcnt2... | [GSE2178] | September 20, 2017 | {u'do_id': u'DOID:219', u'cell_type': u'Intest... | [GeneMANIA] |
DCA00033223 | The analysis explores the gene interaction net... | Interaction network and enrichment analysis of... | http://genemania.org/#/search/human/RPS4Y1|NDR... | [GSE4107] | September 20, 2017 | {u'do_id': u'DOID:219', u'cell_type': u'Intest... | [GeneMANIA] |
DCA00033224 | The analysis explores the gene interaction net... | Interaction network and enrichment analysis of... | http://genemania.org/#/search/human/FOS|SH3KBP... | [GSE4107] | September 20, 2017 | {u'do_id': u'DOID:219', u'cell_type': u'Intest... | [GeneMANIA] |
DCA00033763 | The analysis explores the gene interaction net... | Interaction network and enrichment analysis of... | http://genemania.org/#/search/human/RPS26|RPL1... | [GSE34299] | September 20, 2017 | {u'do_id': u'DOID:219', u'cell_type': u'HT29 C... | [GeneMANIA] |
Search all analyses generated by Enrichr on dataset GSE31106, where the geneset is upregulated.
results = search_datasets2tools({'object_type': 'canned_analysis',
'tool_name': 'Enrichr',
'dataset_accession': 'GSE31106',
'geneset': 'upregulated'})
results.head()
canned_analysis_description | canned_analysis_title | canned_analysis_url | datasets | date | metadata | tools | |
---|---|---|---|---|---|---|---|
canned_analysis_accession | |||||||
DCA00059528 | An enrichment analysis was performed on the to... | Enrichment analysis of genes upregulated in co... | http://amp.pharm.mssm.edu/Enrichr/enrich?datas... | [GSE31106] | September 20, 2017 | {u'do_id': u'DOID:0050861', u'cell_type': u'Co... | [Enrichr] |
results = search_datasets2tools({'object_type': 'dataset',
'dataset_accession': 'GSE775'})
results.head()
analyses | dataset_description | dataset_landing_url | dataset_title | repository_name | |
---|---|---|---|---|---|
dataset_accession | |||||
GSE775 | 42 | Temporal analysis of acute myocardial infarcti... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Myocardial infarction time course | Gene Expression Omnibus |
Search all datasets which contain the keyword asthma.
results = search_datasets2tools({'object_type': 'dataset',
'q': 'asthma'})
results.head()
analyses | dataset_description | dataset_landing_url | dataset_title | repository_name | |
---|---|---|---|---|---|
dataset_accession | |||||
GSE43696 | 49 | Analysis of bronchial epithelial cells from pa... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Severe asthma: bronchial epithelial cell | Gene Expression Omnibus |
GSE31773 | 33 | Analysis of circulating CD4+ and CD8+ T-cells ... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Severe asthma: circulating CD4+ and CD8+ T-cells | Gene Expression Omnibus |
GSE27011 | 28 | Analysis of white blood cells from children wi... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Asthma: white blood cells | Gene Expression Omnibus |
GSE6858 | 7 | Comparison of whole lungs of wild-type and rec... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Asthma model: lungs | Gene Expression Omnibus |
GSE18965 | 7 | Analysis of airway epithelial cells (AEC) from... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Asthmatic atopic epithelium | Gene Expression Omnibus |
Search all datasets which have been analyzed by Enrichr.
results = search_datasets2tools({'object_type': 'dataset',
'tool_name': 'Enrichr'})
results.head()
analyses | dataset_description | dataset_landing_url | dataset_title | repository_name | |
---|---|---|---|---|---|
dataset_accession | |||||
GSE50588 | 294 | One goal of human genetics is to understand ho... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | The Functional Consequences of Variation in Tr... | Gene Expression Omnibus |
GSE47856 | 119 | Chemo-resistance to platinum such as cisplatin... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Expression data from cultured human ovarian ca... | Gene Expression Omnibus |
GSE6930 | 119 | Analysis of Ewings sarcoma A673 cells for up t... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Cytosine arabinoside effect on Ewing's sarcoma... | Gene Expression Omnibus |
GSE7002 | 119 | Analysis of nasal epithelia exposed to various... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Formaldehyde effect on nasal epithelium: dose ... | Gene Expression Omnibus |
GSE35366 | 112 | Analysis of brains from 4 models of human neur... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Models of Neuronal Migration Defects: time course | Gene Expression Omnibus |
Search all datasets which have been used to generate canned analysis DCA00000002.
results = search_datasets2tools({'object_type': 'dataset',
'canned_analysis_accession': 'DCA00000002'})
results.head()
analyses | dataset_description | dataset_landing_url | dataset_title | repository_name | |
---|---|---|---|---|---|
dataset_accession | |||||
GSE775 | 42 | Temporal analysis of acute myocardial infarcti... | https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi... | Myocardial infarction time course | Gene Expression Omnibus |
results = search_datasets2tools({'object_type': 'tool',
'tool_name': 'ARCHS4'})
results.head()
analyses | articles | tool_description | tool_name | |
---|---|---|---|---|
tool_accession | ||||
DCT00010052 | 4645 | [https://doi.org/10.1101/189092] | ARCHS4 provides access to gene counts from HiS... | ARCHS4 |
Search all tools which contain the keyword enrichment.
results = search_datasets2tools({'object_type': 'tool',
'q': 'enrichment'})
results.head()
analyses | articles | tool_description | tool_name | |
---|---|---|---|---|
tool_accession | ||||
DCT00004702 | 7759 | [https://doi.org/10.1093/nar/gkw377] | A comprehensive gene set enrichment analysis w... | Enrichr |
DCT00010044 | 3879 | [] | Enrichment analysis tool implementing the prin... | PAEA |
DCT00000149 | 0 | [https://doi.org/10.1093/bioinformatics/btq503] | An R/C++ package to identify patterns and biol... | CoGAPS |
DCT00004852 | 0 | [https://doi.org/10.1093/nar/gkx295] | A web-based tool for comprehensive statistical... | MicrobiomeAnalyst |
DCT00002174 | 0 | [https://doi.org/10.1093/bioinformatics/btw511] | Translating PubMed and PMC texts to networks f... | HiPub |
Search all tools which have analyzed GEO dataset GSE775.
results = search_datasets2tools({'object_type': 'tool',
'dataset_accession': 'GSE775'})
results.head()
analyses | articles | tool_description | tool_name | |
---|---|---|---|---|
tool_accession | ||||
DCT00004702 | 7759 | [https://doi.org/10.1093/nar/gkw377] | A comprehensive gene set enrichment analysis w... | Enrichr |
DCT00010043 | 7756 | [] | An ultra-fast LINCS L1000 Characteristic Direc... | L1000CDS2 |
DCT00003348 | 7435 | [https://doi.org/10.1093/nar/gkq537] | Biological network integration for gene priori... | GeneMANIA |
DCT00010044 | 3879 | [] | Enrichment analysis tool implementing the prin... | PAEA |
Search all tools which have been used to generate canned analysis DCA00000002.
results = search_datasets2tools({'object_type': 'tool',
'canned_analysis_accession': 'DCA00000002'})
results.head()
analyses | articles | tool_description | tool_name | |
---|---|---|---|---|
tool_accession | ||||
DCT00004702 | 7759 | [https://doi.org/10.1093/nar/gkw377] | A comprehensive gene set enrichment analysis w... | Enrichr |