Cross resources example¶

This notebook search for a term in IDR and the Ontology Cross reference service (OXO) https://www.ebi.ac.uk/spot/oxo/index. We use the API of both resources to find data and terms.

Libraries used¶

In [70]:

import requests
import csv
import os
import pandas as pd
from tempfile import NamedTemporaryFile

Set up to interact with IDR¶

In [71]:

INDEX_PAGE = "https://idr.openmicroscopy.org/webclient/?experimenter=-1"

# create http session
with requests.Session() as session:
    request = requests.Request('GET', INDEX_PAGE)
    prepped = session.prepare_request(request)
    response = session.send(prepped)
    if response.status_code != 200:
        response.raise_for_status()

Set up base URLS so can use shorter variable names later on¶

In [72]:

URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}&case_sensitive=false&orphaned=true"
SCREENS_PROJECTS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/?value={value}"
PLATES_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/plates/?value={value}&id={screen_id}"
DATASETS_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/datasets/?value={value}&id={project_id}"
IMAGES_URL = "https://idr.openmicroscopy.org/mapr/api/{key}/images/?value={value}&node={parent_type}&id={parent_id}"
ATTRIBUTES_URL = "https://idr.openmicroscopy.org/webclient/api/annotations/?type=map&image={image_id}"

URL_OXO = "https://www.ebi.ac.uk/spot/oxo/api/search"

Context¶

In [73]:

TERM = "HGNC_5009"
TYPE = "gene"

KEYS = {"phenotype":
    ("Phenotype",
     "Phenotype Term Name",
     "Phenotype Term Accession",
     "Phenotype Term Accession URL")
}

Search for studies in IDR¶

Using the specified TERM, search for studies with that specific term. No result should be found

In [74]:

query = {'key': TYPE, 'value': TERM}
url = URL.format(**query)
json = session.get(url).json()
print(json)

{'maps': [], 'screens': [], 'projects': []}

Search for mapping in OXO¶

We use the Ontoloty Cross Reference service https://www.ebi.ac.uk/spot/oxo/index to find associated terms. We search using a mapping distance of 2.

In [75]:

# Prepare the query
headers = {
    'Accept': 'application/json',
    'Content-Type': 'application/json'
}

data = {
    "ids" : [ TERM.replace("_", ":") ], "distance" : 2
}

In [76]:

response = requests.post(URL_OXO, headers=headers, json=data)

In [77]:

resp_json = response.json()
df_oxo = pd.DataFrame.from_records(resp_json["_embedded"]["searchResults"][0]["mappingResponseList"])
df_oxo

Out[77]:

	curie	label	sourcePrefixes	targetPrefix	distance
0	OMIM:600698		[Orphanet]	OMIM	2
1	Genatlas:HMGA2		[Orphanet]	Genatlas	2
2	Reactome:P52926		[Orphanet]	Reactome	2
3	Orphanet:248487	high mobility group AT-hook 2	[Orphanet]	Orphanet	1
4	Ensembl:ENSG00000149948		[Orphanet]	Ensembl	2

Parse the mapping from OXO¶

Parse the result of the mapping queries, the terms found will then be used to find studies in IDR

In [78]:

queries = resp_json["_embedded"]["searchResults"]
terms = []
for i in range(len(queries)):
    mappings = queries[i]["mappingResponseList"]
    for j in range(len(mappings)):
        curie = mappings[j]["curie"]
        terms.append(curie.split(":")[1])
print(terms)

['600698', 'HMGA2', 'P52926', '248487', 'ENSG00000149948']

Search for the terms in IDR¶

We search for the terms returned by OXO. This time several studies are found and we keep the studies only returned once. We then query IDR for phenotypes associated with the genes.

In [79]:

ids = {}
genes = {}
for v in terms:
    qs1 = {'key': TYPE, 'value': v}
    url1 = URL.format(**qs1)
    json = session.get(url1).json()
    for m in json['maps']:
        qs2 = {'key': TYPE, 'value': v, 'compound_id': m['id']}
        url2 = SCREENS_PROJECTS_URL.format(**qs2)
        json = session.get(url2).json()
        for s in json['screens']:
            id = s['id']
            ids[id] = ids.get(id, 0) + 1
        for p in json['projects']:
            id = p['id']
            ids[id] = ids.get(id, 0) + 1

to_keep = []
for id in ids:
    if ids[id] == 1:
        to_keep.append(id)

print(to_keep)

[1202, 2751]

Helper method¶

Parse the output of the json and save it into the CVS file.

In [80]:

def parse_annotation(writer, json_data, name, data_type):
    screen_name = "-"
    plate_name = "-"
    project_name = "-"
    dataset_name = "-"
    if data_type == 'datasets':
        project_name = name
    else:
        screen_name = name
     
    for p in json_data[data_type]:
        parent_id = p['id']
        if data_type == 'datasets':
            dataset_name = p['name']
        else:
            plate_name = p['name']
        qs3 = {'key': TYPE, 'value': gene,
                'parent_type': data_type[:-1], 'parent_id': parent_id}
        url3 = IMAGES_URL.format(**qs3)
        for i in session.get(url3).json()['images']:

            image_id = i['id']
            url4 = ATTRIBUTES_URL.format(**{'image_id': image_id})
            for a in session.get(url4).json()['annotations']:
                ontologies = []  # for ontology terms for a phenotype
                row = {}
                for v in a['values']:
                    if str(v[0]) in KEYS['phenotype']:
                        if str(v[0]) in ['Phenotype']:  # has phenotype
                            row[str(v[0])] = v[1]  # so create row
   
                        # if there are ontology mappings for the
                        # phenotype, add them to the ontologies list
                        ontList = ['Phenotype Term Name',
                                   'Phenotype Term Accession',
                                   'Phenotype Term Accession URL']

                        if str(v[0]) in ontList:
                            ontologies.extend([str(v[0]), str(v[1])])
                    if row:
                        if (len(ontologies) > 0):  # 1+ ontology mapping
                            row.update({'Gene': gene,
                                        'Screen': screen_name,
                                        'Plate': plate_name,
                                        'Image': image_id,
                                        'Project' : project_name,
                                        'Dataset': dataset_name})
                            # we have the start of a row now
                            # but we want to print out as many rows
                            # as there are ontology mappings
                            # so if there is mapping to 1 ontology term
                            # print 1 row, if there are 2 ontology terms
                            # print 2 rows etc
                            numberOfRows = len(ontologies)/6
                            # this is 3 pairs of ontology values per
                            # mapping, add the ontology mappings and print
                            n = 1
                            while (n <= numberOfRows):
                                row.update({ontologies[0]: ontologies[1],
                                            ontologies[2]: ontologies[3],
                                            ontologies[4]: ontologies[5]})
                                # remove that set of ontology mappings
                                ontologies = ontologies[6:]
                                writer.writerow(row)
                                n = n + 1

In [81]:

home = os.path.expanduser("~")
csvfile = NamedTemporaryFile("w", delete=False, newline='', dir=home, suffix=".csv")

try:
    fieldnames = [
        'Gene', 'Screen', 'Plate', 'Project', 'Dataset', 'Image',
        'Phenotype', 'Phenotype Term Name', 'Phenotype Term Accession',
        'Phenotype Term Accession URL']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    
    for v in terms:
        qs1 = {'key': TYPE, 'value': v}
        url1 = URL.format(**qs1)
        json = session.get(url1).json()
        for m in json['maps']:
            qs2 = {'key': TYPE, 'value': v, 'compound_id': m['id']}
            url2 = SCREENS_PROJECTS_URL.format(**qs2)
            json = session.get(url2).json()
            for s in json['screens']:
                id = s['id']
                if id in to_keep:
                    gene = s['extra']['value']
                    qs3 = {'key': TYPE, 'value': gene, 'screen_id': s['id']}
                    url3 = PLATES_URL.format(**qs3)
                    parse_annotation(writer, session.get(url3).json(), s['name'], 'plates')
finally:
    csvfile.close()

Explore the data¶

Read the generated CSV file into a dataframe.

In [82]:

df = pd.read_csv(csvfile.name)
df = df.sort_values(by=['Gene'])
df

Out[82]:

	Gene	Screen	Plate	Project	Dataset	Image	Phenotype	Phenotype Term Name	Phenotype Term Accession	Phenotype Term Accession URL
0	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608479	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
1	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608485	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
2	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608484	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
3	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608481	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
4	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608483	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
5	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608482	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
6	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608486	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
7	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608480	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351
8	HMGA2	idr0093-mueller-perturbation/screenA (9)	19	-	-	12608478	Increased or decreased total RNA production rate	transcription, DNA-templated	GO_0006351	http://purl.obolibrary.org/obo/GO_0006351

Search in Oxo using the Phenotype Term Accession¶

In [83]:

term = df["Phenotype Term Accession"][0]
term = term.replace("_", ":")
data = {
    "ids" : [ term ], "distance" : 2
}

In [84]:

response = requests.post(URL_OXO, headers=headers, json=data)

In [85]:

resp_json = response.json()
df_oxo = pd.DataFrame.from_records(resp_json["_embedded"]["searchResults"][0]["mappingResponseList"])
df_oxo

Out[85]:

	curie	label	sourcePrefixes	targetPrefix	distance
0	OMP:0007106	transcription phenotype	[OMP]	OMP	1
1	Wikipedia:Transcription_(genetics)		[PLANP, OGSF, CMPO, OMP, BAO, GO, FYPO, WBPhen...	Wikipedia	1

License (BSD 2-Clause)¶

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.