Welcome to your first intermine-python tutorial. Over a series of approximately 12 tutorials, we will go through the basics of writing code in Python that allows us to query the intermine database.
This tutorial will tell you about the basics of intermine-python queries and how to write your first query. To get started, you would want to "pip install" the intermine package. Once you have installed the package, you are good to go!
We start by importing the Service class from InterMine's webservice module.
from intermine.webservice import Service
The Service class has a method called "new_query" that creates a query object:
service = Service("www.flymine.org/flymine/service")
query=service.new_query()
A query object defines what we want to extract from the InterMine database. The first part of a query is referred to as the "views". The views define the output columns that we want in our result. Let's query the FlyMine database to extract the symbol, primaryIdentifier and length of all genes.
query.select("Gene.symbol","Gene.primaryIdentifier", "Gene.length")
<intermine.query.Query at 0x7f63f5718320>
Now that we have added the output columns to our query request we can print the results of our query.
for row in query.rows(start=0,size=10):
print(row)
Gene: symbol='0610005C13Rik' primaryIdentifier='MGI:1918911' length=None Gene: symbol='0610006L08Rik' primaryIdentifier='MGI:1923503' length=None Gene: symbol='0610007P14Rik' primaryIdentifier='MGI:1915571' length=None Gene: symbol='0610008J02Rik' primaryIdentifier='MGI:1925547' length=None Gene: symbol='0610009B22Rik' primaryIdentifier='MGI:1913300' length=None Gene: symbol='0610009E02Rik' primaryIdentifier='MGI:3698435' length=None Gene: symbol='0610009F21Rik' primaryIdentifier='MGI:1918921' length=None Gene: symbol='0610009K14Rik' primaryIdentifier='MGI:1918931' length=None Gene: symbol='0610009L18Rik' primaryIdentifier='MGI:1914088' length=None Gene: symbol='0610009O20Rik' primaryIdentifier='MGI:1914089' length=None
The query can also be rewritten in the following way.
query=service.new_query("Gene")
query.select("symbol","primaryIdentifier","length")
<intermine.query.Query at 0x7fcdd577b160>
for row in query.rows(start=0,size=10):
print(row)
Gene: symbol='0610005C13Rik' primaryIdentifier='MGI:1918911' length=None Gene: symbol='0610006L08Rik' primaryIdentifier='MGI:1923503' length=None Gene: symbol='0610007P14Rik' primaryIdentifier='MGI:1915571' length=None Gene: symbol='0610008J02Rik' primaryIdentifier='MGI:1925547' length=None Gene: symbol='0610009B22Rik' primaryIdentifier='MGI:1913300' length=None Gene: symbol='0610009E02Rik' primaryIdentifier='MGI:3698435' length=None Gene: symbol='0610009F21Rik' primaryIdentifier='MGI:1918921' length=None Gene: symbol='0610009K14Rik' primaryIdentifier='MGI:1918931' length=None Gene: symbol='0610009L18Rik' primaryIdentifier='MGI:1914088' length=None Gene: symbol='0610009O20Rik' primaryIdentifier='MGI:1914089' length=None
Feel free to use whichever method you find more comfortable. Now, let us try to write a new query that returns all organisms in the database.
query2=service.new_query()
query2.select("Organism.name")
<intermine.query.Query at 0x7fcdd5714320>
If we want to add another column to our final output, instead of rewriting your query, you can use the add_view method.
query2.add_view("Organism.taxonId")
<intermine.query.Query at 0x7fcdd5714320>
for row in query2.rows(start=0,size=10):
print(row)
Organism: name='Anopheles gambiae' taxonId=7165 Organism: name='Caenorhabditis elegans' taxonId=6239 Organism: name='Danio rerio' taxonId=7955 Organism: name='Drosophila ananassae' taxonId=7217 Organism: name='Drosophila erecta' taxonId=7220 Organism: name='Drosophila grimshawi' taxonId=7222 Organism: name='Drosophila melanogaster' taxonId=7227 Organism: name='Drosophila mojavensis' taxonId=7230 Organism: name='Drosophila persimilis' taxonId=7234 Organism: name='Drosophila pseudoobscura' taxonId=7237
By default, the result will be sorted according to the first column that you defined. If you want to change this sorting order to another column, use the add_sort_order method of the query class.
query2.add_sort_order("Organism.taxonId")
<intermine.query.Query at 0x7fcdd5714320>
for row in query2.rows(start=0,size=10):
print(row)
Organism: name='Saccharomyces cerevisiae' taxonId=4932 Organism: name='Caenorhabditis elegans' taxonId=6239 Organism: name='Anopheles gambiae' taxonId=7165 Organism: name='Drosophila ananassae' taxonId=7217 Organism: name='Drosophila erecta' taxonId=7220 Organism: name='Drosophila grimshawi' taxonId=7222 Organism: name='Drosophila melanogaster' taxonId=7227 Organism: name='Drosophila mojavensis' taxonId=7230 Organism: name='Drosophila persimilis' taxonId=7234 Organism: name='Drosophila pseudoobscura' taxonId=7237
As you can see, I've limited the results to only 10 rows. You can change this number if you want to view more or less rows. The above queries will list all the organisms or all the genes in the database, and hence we limited the number of rows in our output. Views or output columns are one part of queries. The second part is to add constraints on these queries. We will take a look at adding constraints in our next tutorial.