Creating an InterMIne workflow using the API.

We are going to re-create the workflow we did using the web interface using the python API.

We start by importing the Service class from Intermine's webservice module. You will need to access your account on humanMine and you do this through an API token. You can get your token by logging into humanmine and going to the account details tab within MyMine. Cut and paste your token into the code below.

In [ ]:

from intermine.webservice import Service
service = Service("http://www.humanmine.org/humanmine/service", token = "Your Token")

Our first query looked at genes that are upregulated in adipose tissue. Using the API we can either generate a query object or a template object to do this. The code below shows how to generate a query object. The "AtlasExpression" passed to the query object defines the query class. To run the template through the API is very similar except we generate a template object rather than a query object (template = service.get_template('TissueAtlas_Expression'). TissueAtlas_Expression is the name of the template).

In [ ]:

query = service.new_query("AtlasExpression")

First we will define the output columns that we want in our result - i.e the view. Note that we have started our query from the Atlas Expression class. "Condition", "expression", "pValue" and "tStatistic" are attributes of this class. The gene class is referenced from the AtlasExpression class, so to return the gene information we give the path to that information from the Atlas Expression class - i.e gene.primaryIdentifier etc.

In [ ]:

query.add_view(
    "condition", "gene.primaryIdentifier", "gene.symbol", "gene.name",
    "expression", "pValue", "tStatistic", "dataSets.name"
)

Next, add the constraints to your query. We are only interested in genes expressed in Adipose tissue with a pValue <= 0.01.

In [ ]:

query.add_constraint("condition", "=", "Adipose tissue")
query.add_constraint("pValue", "<=", "0.01")

Now, let's check what the query returns by looping through the rows and printing the results:

In [ ]:

for row in query.rows():
    print (row["condition"], row["gene.primaryIdentifier"], row["gene.symbol"], row["gene.name"], 
        row["expression"], row["pValue"], row["tStatistic"], row["dataSets.name"])

Note that this gives a lot of rows. If we just want to check we are getting the right results we could print just the first 10 rows:

In [ ]:

for row in query.rows(start=0,size=10):
    print (row["condition"], row["gene.primaryIdentifier"], row["gene.symbol"], row["gene.name"], 
        row["expression"], row["pValue"], row["tStatistic"], row["dataSets.name"])

Now, remember that when we looked at the results table we used the filter options to show just the genes that are "UP" expressed in Adipose tissue. We can do this by adding another constraint to our query. (We could have added this straight away in our first set of constraints).

In [ ]:

query.add_constraint("expression", "=", "UP", code = "A")

Now let's check our results again.

In [ ]:

for row in query.rows(start=0, size=10):
    print (row["condition"], row["gene.primaryIdentifier"], row["gene.symbol"], row["gene.name"], 
        row["expression"], row["pValue"], row["tStatistic"], row["dataSets.name"])

We want to save this set of genes that are UP expressed in adipose for further analysis. To do this we define our python list and loop through our results again - this time, instead of printing the results, we append just the primary identifiers returned to our list.

In [ ]:

UpinAdipose = list()
for row in query.rows():
    UpinAdipose.append(row["gene.primaryIdentifier"])

and check that the list we have created looks correct:

In [ ]:

print(UpinAdipose)

We now need to save the list to our intermine account so we can use it again in a later query. The ListManager class provides methods to manage list contents and operations.

In [ ]:

lm=service.list_manager()
lm.create_list(content=UpinAdipose, list_type="Gene", name="UpinAdipose")

Log in to HumanMine and check your list has been created.

Our second query looked at whether any of the genes that were UP expressed in adipose tissue interact with the pparg gene. First, we define our new query object. This time we start our query from the Gene class:

In [ ]:

query2 = service.new_query("Gene")

Add the views and constraints:

In [ ]:

query2.add_view(
    "primaryIdentifier", "symbol",
    "interactions.participant2.primaryIdentifier",
    "interactions.participant2.symbol", "interactions.details.type",
    "interactions.details.role1", "interactions.details.role2",
    "interactions.details.experiment.interactionDetectionMethods.name",
    "interactions.details.experiment.publication.pubMedId",
    "interactions.details.dataSets.name"
)

In [ ]:

query2.add_constraint("Gene", "LOOKUP", "pparg", "H. sapiens", code = "A")
query2.add_constraint("interactions.participant2", "IN", "UpinAdipose", code = "B")

In an interaction we have two participants. Our first participant is from the Gene class and we have constrained this to be the gene PPARG. Note that the pparg constraint is a LOOKUP. The lookup operator searches through all the fields in a particular class for the value specified. In the example given below, it will search through the entire gene class to find if any of the fields has an occurence of "pparg". The advantage of this is that you do not need to remember if pparg is a symbol or a name or a primaryIdentifier. Our second participant is from the interactions class and called participant2. This is a bioentity like Gene and so shares some of the attributes, like primary identifier and symbol.

Check the results:

In [ ]:

for row in query2.rows():
    print (row["primaryIdentifier"], row["symbol"], 
        row["interactions.participant2.primaryIdentifier"], row["interactions.participant2.symbol"], 
        row["interactions.details.type"], row["interactions.details.role1"], 
        row["interactions.details.role2"], 
        row["interactions.details.experiment.interactionDetectionMethods.name"], 
        row["interactions.details.experiment.publication.pubMedId"], 
        row["interactions.details.dataSets.name"])

Save the genes that interact with pparg to a list and save this list to your intermine account.

In [ ]:

UpinAdiposeInteractPparg = list()
for row in query2.rows():
    UpinAdiposeInteractPparg.append(row["interactions.participant2.primaryIdentifier"])

In [ ]:

lm=service.list_manager()
lm.create_list(content=UpinAdiposeInteractPparg, list_type="Gene", name="UpinAdiposeInteractPparg")

Finally, run the third query (genes that are associated with the diesease diabetes, that we originally created using the query builder) and again, save the set of genes that are returned to your intermine account.

In [ ]:

query3 = service.new_query("Gene")
query3.add_view("primaryIdentifier", "symbol")
query3.add_constraint("organism.name", "=", "Homo sapiens", code = "A")
query3.add_constraint("diseases.name", "CONTAINS", "diabetes", code = "B")

for row in query3.rows():
    print (row["primaryIdentifier"], row["symbol"])

In [ ]:

diabetesGenes = list()
for row in query3.rows():
    diabetesGenes.append(row["primaryIdentifier"])

In [ ]:

lm=service.list_manager()
lm.create_list(content=diabetesGenes, list_type="Gene", name="diabetesGenes")

Finally, we used a list intersect to find those genes that are upregulated in adipose tissue that also interact with pparg, that are also associated with the diease diabetes. We need to intersect the second (UpinAdiposeInteractPparg) and third (diabetesGenes) lists that we created. We can do this using the intersect method from the ListManager class.

In [ ]:

lm.intersect(["UpinAdiposeInteractPparg", "diabetesGenes"], "intersectedList")

In [ ]:

final = lm.get_list("intersectedList")

In [ ]:

print(final)

In [ ]:

for gene in final:
        print (gene.primaryIdentifier, gene.symbol)

In [ ]: