This notebook search for a term in IDR and the Ontology Cross reference service (OXO) https://www.ebi.ac.uk/spot/oxo/index. We use the API of both resources to find data and terms.
import requests
import csv
import os
import pandas as pd
from tempfile import NamedTemporaryFile
INDEX_PAGE = "https://idr.openmicroscopy.org/webclient/?experimenter=-1"
# create http session
with requests.Session() as session:
request = requests.Request('GET', INDEX_PAGE)
prepped = session.prepare_request(request)
response = session.send(prepped)
if response.status_code != 200:
response.raise_for_status()
URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}&case_sensitive=false&orphaned=true"
SCREENS_PROJECTS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}"
PLATES_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/plates/?value={value}&id={screen_id}"
DATASETS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/datasets/?value={value}&id={project_id}"
IMAGES_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/images/?value={value}&node={parent_type}&id={parent_id}"
ATTRIBUTES_URL = "https://idr.openmicroscopy.org/webclient/api/annotations/?type=map&image={image_id}"
URL_OXO = "https://www.ebi.ac.uk/spot/oxo/api/search"
TERM = "HGNC_5009"
TYPE = "gene"
KEYS = {"phenotype":
("Phenotype",
"Phenotype Term Name",
"Phenotype Term Accession",
"Phenotype Term Accession URL")
}
Using the specified TERM, search for studies with that specific term. No result should be found
query = {'key': TYPE, 'value': TERM}
url = URL.format(**query)
json = session.get(url).json()
print(json)
{'maps': [], 'screens': [], 'projects': []}
We use the Ontoloty Cross Reference service https://www.ebi.ac.uk/spot/oxo/index to find associated terms. We search using a mapping distance of 2.
# Prepare the query
headers = {
'Accept': 'application/json',
'Content-Type': 'application/json'
}
data = {
"ids" : [ TERM.replace("_", ":") ], "distance" : 2
}
response = requests.post(URL_OXO, headers=headers, json=data)
resp_json = response.json()
df_oxo = pd.DataFrame.from_records(resp_json["_embedded"]["searchResults"][0]["mappingResponseList"])
df_oxo
curie | label | sourcePrefixes | targetPrefix | distance | |
---|---|---|---|---|---|
0 | OMIM:600698 | [Orphanet] | OMIM | 2 | |
1 | Genatlas:HMGA2 | [Orphanet] | Genatlas | 2 | |
2 | Reactome:P52926 | [Orphanet] | Reactome | 2 | |
3 | Orphanet:248487 | high mobility group AT-hook 2 | [Orphanet] | Orphanet | 1 |
4 | Ensembl:ENSG00000149948 | [Orphanet] | Ensembl | 2 |
Parse the result of the mapping queries, the terms found will then be used to find studies in IDR
queries = resp_json["_embedded"]["searchResults"]
terms = []
for i in range(len(queries)):
mappings = queries[i]["mappingResponseList"]
for j in range(len(mappings)):
curie = mappings[j]["curie"]
terms.append(curie.split(":")[1])
print(terms)
['600698', 'HMGA2', 'P52926', '248487', 'ENSG00000149948']
We search for the terms returned by OXO. This time several studies are found and we keep the studies only returned once. We then query IDR for phenotypes associated with the genes.
ids = {}
genes = {}
for v in terms:
qs1 = {'key': TYPE, 'value': v}
url1 = URL.format(**qs1)
json = session.get(url1).json()
for m in json['maps']:
qs2 = {'key': TYPE, 'value': v, 'compound_id': m['id']}
url2 = SCREENS_PROJECTS_URL.format(**qs2)
json = session.get(url2).json()
for s in json['screens']:
id = s['id']
ids[id] = ids.get(id, 0) + 1
for p in json['projects']:
id = p['id']
ids[id] = ids.get(id, 0) + 1
to_keep = []
for id in ids:
if ids[id] == 1:
to_keep.append(id)
print(to_keep)
[1202, 2751]
Parse the output of the json and save it into the CVS file.
def parse_annotation(writer, json_data, name, data_type):
screen_name = "-"
plate_name = "-"
project_name = "-"
dataset_name = "-"
if data_type == 'datasets':
project_name = name
else:
screen_name = name
for p in json_data[data_type]:
parent_id = p['id']
if data_type == 'datasets':
dataset_name = p['name']
else:
plate_name = p['name']
qs3 = {'key': TYPE, 'value': gene,
'parent_type': data_type[:-1], 'parent_id': parent_id}
url3 = IMAGES_URL.format(**qs3)
for i in session.get(url3).json()['images']:
image_id = i['id']
url4 = ATTRIBUTES_URL.format(**{'image_id': image_id})
for a in session.get(url4).json()['annotations']:
ontologies = [] # for ontology terms for a phenotype
row = {}
for v in a['values']:
if str(v[0]) in KEYS['phenotype']:
if str(v[0]) in ['Phenotype']: # has phenotype
row[str(v[0])] = v[1] # so create row
# if there are ontology mappings for the
# phenotype, add them to the ontologies list
ontList = ['Phenotype Term Name',
'Phenotype Term Accession',
'Phenotype Term Accession URL']
if str(v[0]) in ontList:
ontologies.extend([str(v[0]), str(v[1])])
if row:
if (len(ontologies) > 0): # 1+ ontology mapping
row.update({'Gene': gene,
'Screen': screen_name,
'Plate': plate_name,
'Image': image_id,
'Project' : project_name,
'Dataset': dataset_name})
# we have the start of a row now
# but we want to print out as many rows
# as there are ontology mappings
# so if there is mapping to 1 ontology term
# print 1 row, if there are 2 ontology terms
# print 2 rows etc
numberOfRows = len(ontologies)/6
# this is 3 pairs of ontology values per
# mapping, add the ontology mappings and print
n = 1
while (n <= numberOfRows):
row.update({ontologies[0]: ontologies[1],
ontologies[2]: ontologies[3],
ontologies[4]: ontologies[5]})
# remove that set of ontology mappings
ontologies = ontologies[6:]
writer.writerow(row)
n = n + 1
home = os.path.expanduser("~")
csvfile = NamedTemporaryFile("w", delete=False, newline='', dir=home, suffix=".csv")
try:
fieldnames = [
'Gene', 'Screen', 'Plate', 'Project', 'Dataset', 'Image',
'Phenotype', 'Phenotype Term Name', 'Phenotype Term Accession',
'Phenotype Term Accession URL']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for v in terms:
qs1 = {'key': TYPE, 'value': v}
url1 = URL.format(**qs1)
json = session.get(url1).json()
for m in json['maps']:
qs2 = {'key': TYPE, 'value': v, 'compound_id': m['id']}
url2 = SCREENS_PROJECTS_URL.format(**qs2)
json = session.get(url2).json()
for s in json['screens']:
id = s['id']
if id in to_keep:
gene = s['extra']['value']
qs3 = {'key': TYPE, 'value': gene, 'screen_id': s['id']}
url3 = PLATES_URL.format(**qs3)
parse_annotation(writer, session.get(url3).json(), s['name'], 'plates')
finally:
csvfile.close()
Read the generated CSV file into a dataframe.
df = pd.read_csv(csvfile.name)
df = df.sort_values(by=['Gene'])
df
Gene | Screen | Plate | Project | Dataset | Image | Phenotype | Phenotype Term Name | Phenotype Term Accession | Phenotype Term Accession URL | |
---|---|---|---|---|---|---|---|---|---|---|
0 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608479 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
1 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608485 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
2 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608484 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
3 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608481 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
4 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608483 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
5 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608482 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
6 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608486 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
7 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608480 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
8 | HMGA2 | idr0093-mueller-perturbation/screenA (9) | 19 | - | - | 12608478 | Increased or decreased total RNA production rate | transcription, DNA-templated | GO_0006351 | http://purl.obolibrary.org/obo/GO_0006351 |
term = df["Phenotype Term Accession"][0]
term = term.replace("_", ":")
data = {
"ids" : [ term ], "distance" : 2
}
response = requests.post(URL_OXO, headers=headers, json=data)
resp_json = response.json()
df_oxo = pd.DataFrame.from_records(resp_json["_embedded"]["searchResults"][0]["mappingResponseList"])
df_oxo
curie | label | sourcePrefixes | targetPrefix | distance | |
---|---|---|---|---|---|
0 | OMP:0007106 | transcription phenotype | [OMP] | OMP | 1 |
1 | Wikipedia:Transcription_(genetics) | [PLANP, OGSF, CMPO, OMP, BAO, GO, FYPO, WBPhen... | Wikipedia | 1 |
Copyright (C) 2021 University of Dundee. All Rights Reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.