Since json is a dictionary, you can pull out a single datapoint using the key.
{
"source": "ensembl_havana",
"object_type": "Gene",
"logic_name": "ensembl_havana_gene",
"version": 12,
"species": "homo_sapiens",
"description": "B-Raf proto-oncogene, serine/threonine kinase [Source:HGNC Symbol;Acc:HGNC:1097]",
"display_name": "BRAF",
"assembly_name": "GRCh38",
"biotype": "protein_coding",
"end": 140924764,
"seq_region_name": "7",
"db_type": "core",
"strand": -1,
"id": "ENSG00000157764",
"start": 140719327
}
We can add this to our previous script:
import requests, json
from pprint import pprint
def fetch_endpoint(server, request, content_type):
r = requests.get(server+request, headers={ "Accept" : content_type})
if not r.ok:
r.raise_for_status()
sys.exit()
if content_type == 'application/json':
return r.json()
else:
return r.text
server = "http://rest.ensembl.org/"
ext = "lookup/id/ENSG00000157764?"
con = "application/json"
get_gene = fetch_endpoint(server, ext, con)
symbol = get_gene['display_name']
print (symbol)
BRAF
1. Write a script to lookup the gene called ESPN in human and print the stable ID of this gene.
# Exercise 3.1
2. Get all variants that are associated with the phenotype 'Coffee consumption'. For each variant print
a. the p-value for the association
b. the PMID for the publication which describes the association between that variant and ‘Coffee consumption’
c. the risk allele and the associated gene.
# Exercise 3.2
3. Get the mouse homologue of the human BRCA2 and print the ID and sequence of both.
Note that the JSON for the endpoint you need is several layers deep, containing nested lists (appear as square brackets [ ] in the JSON) and key value sets (dictionary; appear as curly brackets { } in the JSON). Pretty print (pprint) comes in very useful here for the intermediate stage when you're trying to work out the json.
# Exercise 3.3