Check out more notebooks at our Community Notebooks Repository!
Title: How to Use ISB-CGC APIs
Author: Lauren Hagen
Created: 2019-09-16
Purpose: Introduction to using ISB-CGC APIs with Python
URL: https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_use_ISB-CGC_APIs.ipynb
Notes:
This notebook is designed as a quick introduction to the ISB-CGC APIs and how to access them with Python.
Topic Covered:
ISB-CGC has created several APIs to interact with ISB-CGC and user data available on Google Cloud Platforms. They were created with Google’s OpenAPI Endpoints and can be accessed through a SwaggerUI interface. For more information on ISB-CGC APIs, please visit our documentation.
An API or application-programming interface is a software intermediary that allows two applications to talk to each other. In other words, an API is the messenger that delivers your request to the provider that you’re requesting it from and then delivers the response back to you (Wikipedia). Each action that an API can take is called an "endpoint".
Some useful tutorials and quick start guides on APIs are:
An endpoint is the call for a specific functionally of an API. For example, /data/availabile
at the end of the API request URL https://api-dot-isb-cgc.appspot.com/v4/data/available
is an endpoint that returns (or GETs) information about the available programs and data sets.
requests
¶In order to use the ISB-CGC APIs with Python, the requests
library needs to be installed and then imported.
# Install requests if needed
# pip install requests
# Import the requests library
import requests
The ISB-CGC APIs can be used for a number of different tasks for interacting with the Google Cloud Platform and BigQuery. It can be used to subset data into cohorts or to access cohorts that have been created using the WebApp. It can also be used to interact with the user's GCP to retrieve available user projects along with registering projects with ISB-CGC.
about
Endpoint¶We are first going to explore the about
endpoint using the 'get' request to the API. This API will give you information about the ISB-CGC API such a link to the Swagger UI interface and the documentation.
# First submit the 'get' request to the API
about_req = requests.get('https://api-dot-isb-cgc.appspot.com/v4/about')
Now that we have the request response, we are going to check that we didn't receive an error code or if the request was successful. If the request was successful, then the status code will come back as 200 but if something went wrong then the status code may be something 404 or 503. If you have recieved any error codes, you can check out Google's Troubleshooting response errors guide.
# Check that there wasn't an error with the request
if about_req.status_code != 200:
# Print the error code if something went wrong
print(about_req.status_code)
Finally, we will print out the information that we have received from the API. This response returns as a dictionary though responses can also be a combination of dictionaries and lists depending on which endpoint is called. This means that you can access different data in the response the same way that you would access dictionaries and lists as demonstarted below.
# Print the full response
print("Full response:\n")
print(about_req.json(), end='\n\n')
# Print the message portion of the response
print("Message:\n")
print(about_req.json()['message'], end='\n\n')
# Print the documentation portion of the response
print("Documentation:\n")
print(about_req.json()['documentation'])
Full response: {'code': 200, 'documentation': 'SwaggerUI interface available at <https://api-dot-isb-cgc.appspot.com/v4/swagger/>.Documentation available at <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/progapi/progAPI-v4/Programmatic-Demo.html>', 'message': 'Welcome to the ISB-CGC API, Version 4.'} Message: Welcome to the ISB-CGC API, Version 4. Documentation: SwaggerUI interface available at <https://api-dot-isb-cgc.appspot.com/v4/swagger/>.Documentation available at <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/progapi/progAPI-v4/Programmatic-Demo.html>
That wasn't difficult at all! Next we will cover a few of the other information APIs.
/data/available
Endpoint¶The /data/available
Endpoint is designed to return the data sets and programs available on the WebApp along with the projects or studies that are within those data sets and programs. This endpoint returns a more complicated JSON object which has a combination of lists and dictionaries. We will first retrieve the request and then view if there was an error code within the response.
# Retrieve the response from the API endpoint
programs_req = requests.get('https://api-dot-isb-cgc.appspot.com/v4/data/available')
# Check that there wasn't an error with the request
if programs_req.status_code != 200:
# Print the error code if something went wrong
print(programs_req.status_code)
We are going to use the library json
in order to view the response more easily.
# Install requests if needed
# pip install requests
# install pip json
import json
# Create a variable with the JSON output
program_json = json.dumps(programs_req.json(), sort_keys=True, indent=4)
# Print the program JSON text
print(program_json)
{ "code": 200, "datasets_for_registration": "None found", "programs_for_cohorts": [ { "description": null, "name": "TCGA", "program_privacy": "Public", "projects": [ { "description": null, "name": "ACC" }, { "description": null, "name": "DLBC" }, { "description": null, "name": "READ" }, { "description": null, "name": "GBM" }, { "description": null, "name": "LGG" }, { "description": null, "name": "THCA" }, { "description": null, "name": "STAD" }, { "description": null, "name": "UCEC" }, { "description": null, "name": "PCPG" }, { "description": null, "name": "CESC" }, { "description": null, "name": "UCS" }, { "description": null, "name": "TGCT" }, { "description": null, "name": "LIHC" }, { "description": null, "name": "CHOL" }, { "description": null, "name": "HNSC" }, { "description": null, "name": "UVM" }, { "description": null, "name": "SKCM" }, { "description": null, "name": "COAD" }, { "description": null, "name": "PAAD" }, { "description": null, "name": "THYM" }, { "description": null, "name": "LUSC" }, { "description": null, "name": "MESO" }, { "description": null, "name": "OV" }, { "description": null, "name": "ESCA" }, { "description": null, "name": "SARC" }, { "description": null, "name": "KIRP" }, { "description": null, "name": "BLCA" }, { "description": null, "name": "LAML" }, { "description": null, "name": "PRAD" }, { "description": null, "name": "LUAD" }, { "description": null, "name": "BRCA" }, { "description": null, "name": "KIRC" }, { "description": null, "name": "KICH" } ] }, { "description": null, "name": "CCLE", "program_privacy": "Public", "projects": [ { "description": "Controls", "name": "CNTL" }, { "description": "FFPE Pilot Phase II", "name": "FPPP" }, { "description": "Sarcoma", "name": "SARC" }, { "description": "Skin Cutaneous Melanoma", "name": "SKCM" }, { "description": "Mesothelioma", "name": "MESO" }, { "description": "Acute Myeloid Leukemia", "name": "LAML" }, { "description": "Pheochromocytoma and Paraganglioma", "name": "PCPG" }, { "description": "Prostate Adenocarcinoma", "name": "PRAD" }, { "description": "Kidney Renal Clear Cell Carcinoma", "name": "KIRC" }, { "description": "Esophageal Carcinoma", "name": "ESCA" }, { "description": "Brain Lower Grade Glioma", "name": "LGG" }, { "description": "Lung Adenocarcinoma", "name": "LUAD" }, { "description": "Pancreatic Adenocarcinoma", "name": "PAAD" }, { "description": "Kidney Chromophobe", "name": "KICH" }, { "description": "Chronic Lymphocytic Leukemia", "name": "LCLL" }, { "description": "Kidney Renal Papillary Cell Carcinoma", "name": "KIRP" }, { "description": "Glioblastoma Multiforme", "name": "GBM" }, { "description": "Miscellaneous", "name": "MISC" }, { "description": "Lung Squamous Cell Carcinoma", "name": "LUSC" }, { "description": "Thymoma", "name": "THYM" }, { "description": "Head and Neck Squamous Cell Carcinoma", "name": "HNSC" }, { "description": "Testicular Germ Cell Tumors", "name": "TGCT" }, { "description": "Bladder Urothelial Carcinoma", "name": "BLCA" }, { "description": "Thyroid Carcinoma", "name": "THCA" }, { "description": "Uterine Carcinosarcoma", "name": "UCS" }, { "description": "Cholangiocarcinoma", "name": "CHOL" }, { "description": "Multiple Myeloma", "name": "MM" }, { "description": "Breast Invasive Carcinoma", "name": "BRCA" }, { "description": "Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma", "name": "CESC" }, { "description": "Lymphoid Neoplasm Diffuse Large B-cell Lymphoma", "name": "DLBC" }, { "description": "Uveal Melanoma", "name": "UVM" }, { "description": "Liver Hepatocellular Carcinoma", "name": "LIHC" }, { "description": "Colon Adenocarcinoma", "name": "COAD" }, { "description": "Rectum Adenocarcinoma", "name": "READ" }, { "description": "Uterine Corpus Endometrial Carcinoma", "name": "UCEC" }, { "description": "Ovarian Serous Cystadenocarcinoma", "name": "OV" }, { "description": "Stomach Adenocarcinoma", "name": "STAD" }, { "description": "Adrenocortical Carcinoma", "name": "ACC" }, { "description": "Chronic Myelogenous Leukemia", "name": "LCML" } ] }, { "description": null, "name": "TARGET", "program_privacy": "Public", "projects": [ { "description": "Rhabdoid Tumor", "name": "RT" }, { "description": "Neuroblastoma", "name": "NBL" }, { "description": "Clear Cell Sarcoma of the Kidney", "name": "CCSK" }, { "description": "High-Risk Wilms Tumor", "name": "WT" }, { "description": "Osteosarcoma", "name": "OS" }, { "description": "Acute Lymphoblastic Leukemia - Phase II", "name": "ALL-P2" }, { "description": "Acute Lymphoblastic Leukemia - Phase I", "name": "ALL-P1" }, { "description": "Acute Myeloid Leukemia", "name": "AML" } ] } ] }
We can now see easily that our information that we are interested in is a combination of a dictionaries and lists. Next we will we will iterate over the JSON object to neatly return the data sets/programs along with which projects/studies are available.
# Create a variable with the dataset information
datasets = programs_req.json()['programs_for_cohorts']
# Create an empty dictionary for the program names
programs = {}
# For each dataset, create a list of available programs and add it to the
# dictionary with the dataset name.
for data in datasets:
# Create a blank list for the projects
projects = []
# for each project, add it to the list of projects for the data set
for project in data['projects']:
projects.append(project['name'])
# Add the program name and list of projects to the dictionary
programs[data['name']] = projects
# Print the name of the program and the number of projects available
print("The {} program has {} projects.".format(data['name'], len(projects)))
The TCGA program has 33 projects. The CCLE program has 39 projects. The TARGET program has 8 projects.
We now have an easy dictionary of programs with lists of the projects for each program. Let us look at which projects are available in the TCGA data set.
for project in programs['TCGA']:
print(project)
ACC DLBC READ GBM LGG THCA STAD UCEC PCPG CESC UCS TGCT LIHC CHOL HNSC UVM SKCM COAD PAAD THYM LUSC MESO OV ESCA SARC KIRP BLCA LAML PRAD LUAD BRCA KIRC KICH
Wow, that is a lot of projects/studies available within the TCGA data set. Descriptions of the different data sets and programs can be found in our documentation.
cohort
Endpoint¶This last section will cover the get cohorts
Endpoint which requires authorization before submitting the request to the API. This endpoint retrieves information about user generated cohorts from within the WebApp or with the ISB-CGC API.
In order to use several of the ISB-CGC APIs, you need have authorization with ISB-CGC.
The following steps are required to use an API that Requires Authorization:
*The 'Quick Start Guide to ISB-CGC' Notebook in the Community Notebook Repository and the How to Get Started on ISB-CGC can assist you with these steps.
# If you skipped earlier sections, you will need these two packages to run the
# code below
# Install requests if needed
#pip install requests
# Install pip json
#import json
# Import the requests library
#import requests
# Import files helper for Colab
from google.colab import files
# Upload your credentials to the cloud environment
uploaded = files.upload()
Saving .isb_credentials to .isb_credentials (2)
Now that we have the Credentials file created and uploaded to the cloud environment, we can open the file to create the header information need for the API to verify that you have Authorization.
# Open the credentials file
token = open(".isb_credentials", "r")
# Create a json object from teh credential file
token = json.loads(token.read())
# Get Credentials from the token
creds = token['token_response']['id_token']
# Create a json object for requests header
head = {'Authorization': 'Bearer ' + creds}
Note: the credentials file will expire after 1 hour and a new one will need to be generated. If a new file is not generated with the isb_auth script, you can delete the original file and try running the script again.
If you are having any issues, you can contact us at feedback@isb-cgc.org
Finally, we can make a get
request to the cohorts
ISB-CGC API.
# Make API request
cohort_req = requests.get('https://api-dot-isb-cgc.appspot.com/v4/cohorts', headers=head)
Now we can format the response for easy view and view it.
cohorts_json = json.dumps(cohort_req.json(), sort_keys=True, indent=4)
print(cohorts_json)
{ "code": 200, "data": [ { "filters": { "TCGA": [ { "name": "gender", "program": "TCGA", "value": "FEMALE" }, { "name": "age_at_diagnosis", "program": "TCGA", "value": "70 to 79" } ] }, "id": 1962, "name": "Test 1", "permission": "OWNER" }, { "filters": {}, "id": 1, "name": "All TCGA Data", "permission": "READER" }, { "filters": { "TCGA": [ { "name": "tumor_tissue_site", "program": "TCGA", "value": "Breast" }, { "name": "sample_type", "program": "TCGA", "value": "01" }, { "name": "vital_status", "program": "TCGA", "value": "Alive" } ] }, "id": 1972, "name": "Test 2", "permission": "OWNER" }, { "filters": { "TCGA": [ { "name": "disease_code", "program": "TCGA", "value": "PAAD" } ] }, "id": 2217, "name": "Test 4", "permission": "OWNER" } ] }
Then we can retrieve the contents of the response and view which cohorts have been created.
# Create a variable with the dataset information
cohorts = cohort_req.json()['data']
print(cohorts)
[{'filters': {'TCGA': [{'name': 'gender', 'program': 'TCGA', 'value': 'FEMALE'}, {'name': 'age_at_diagnosis', 'program': 'TCGA', 'value': '70 to 79'}]}, 'id': 1962, 'name': 'Test 1', 'permission': 'OWNER'}, {'filters': {}, 'id': 1, 'name': 'All TCGA Data', 'permission': 'READER'}, {'filters': {'TCGA': [{'name': 'tumor_tissue_site', 'program': 'TCGA', 'value': 'Breast'}, {'name': 'sample_type', 'program': 'TCGA', 'value': '01'}, {'name': 'vital_status', 'program': 'TCGA', 'value': 'Alive'}]}, 'id': 1972, 'name': 'Test 2', 'permission': 'OWNER'}, {'filters': {'TCGA': [{'name': 'disease_code', 'program': 'TCGA', 'value': 'PAAD'}]}, 'id': 2217, 'name': 'Test 4', 'permission': 'OWNER'}]
We can then see what filters were applied by choosing the number of the cohort you wish to see.
# View the names of the cohorts
for k in cohorts:
print(k['name'])
Test 1 All TCGA Data Test 2 Test 4
# View the filters that have been applied to the first cohort
cohorts[0]['filters']
{'TCGA': [{'name': 'gender', 'program': 'TCGA', 'value': 'FEMALE'}, {'name': 'age_at_diagnosis', 'program': 'TCGA', 'value': '70 to 79'}]}