In the previous tutorials we learnt about the basic backbone of a query - views and constraints. This short tutorial will talk about another feature of an intermine query - Outer and Inner Joins.
When we add a path to a query, even if it is in the view, then by default there is a constraint involved. Our query will consist of only those records that have information in the fields or attributes that are described by the path.
Let's say that we want to get any genes involved in a biosynthetic process and any publications on them. If a particular gene has the publication information available, then we want to view it but if it does not then we still want to view the general information about the Gene. By default, Intermine is designed to give you an Inner Join, which basically means that no partial matches will be part of the result. However, for the example that we are discussing over here, we would require something known as an Outer Join. An Outer Join on Gene.publications would help in solving the problem for us. The code for the same is given below.
We begin as usual by importing the Service module and creating a Query object.
from intermine.webservice import Service
service = Service("www.flymine.org/flymine/service")
query=service.new_query("Gene")
We then select the columns that we want in our final output and add the constraint.
query.select("primaryIdentifier","symbol","publications.year","publications.firstAuthor","publications.title")
<intermine.query.Query at 0x7f53e83555c0>
query.add_constraint("ontologyAnnotations.ontologyTerm.name","=","*biosynthetic process*")
<BinaryConstraint: Gene.ontologyAnnotations.ontologyTerm.name = *biosynthetic process*>
And finally, we add an outer join.
query.outerjoin("publications")
<intermine.query.Query at 0x7f53e83555c0>
for row in query.rows(size=10):
print(row)
Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=2010 publications.firstAuthor='Neely G Gregory' publications.title='A genome-wide Drosophila screen for heat nociception identifies α2δ3 as an evolutionarily conserved pain gene.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=2005 publications.firstAuthor='Hoskins Roger A' publications.title='Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=2015 publications.firstAuthor='Nitta Kazuhiro R' publications.title='Conservation of transcription factor binding specificities across 600 million years of bilateria evolution.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=2010 publications.firstAuthor='Neely G Gregory' publications.title='A global in vivo Drosophila RNAi screen identifies NOT3 as a conserved regulator of heart function.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=2001 publications.firstAuthor='Benos P V' publications.title='From first base: the sequence of the tip of the X chromosome of Drosophila melanogaster, a comparison of two sequencing strategies.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=1999 publications.firstAuthor='Duronio R J' publications.title='Establishing links between developmental signaling pathways and cell-cycle regulation in Drosophila.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=None publications.firstAuthor='Apitz Holger' publications.title='A challenge of numbers and diversity: neurogenesis in the Drosophila optic lobe.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=2002 publications.firstAuthor='Nybakken Kent' publications.title='Heparan sulfate proteoglycan modulation of developmental signaling in Drosophila.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=1986 publications.firstAuthor='Lefevre G' publications.title='The question of the total gene number in Drosophila melanogaster.' Gene: primaryIdentifier='FBgn0000022' symbol='ac' publications.year=1998 publications.firstAuthor='Doe C Q' publications.title='Neural stem cells: from fly to vertebrates.'
Another query feature that Intermine has is the ability to define shorter column names. This can be done using the add_path_description method. I'll show you a short example.
query.add_view("ontologyAnnotations.ontologyTerm.name")
<intermine.query.Query at 0x7f53e83555c0>
query.add_path_description("ontologyAnnotations.ontologyTerm","Ontology Term")
<PathDescription: Gene.ontologyAnnotations.ontologyTerm>
query.add_path_description("publications","Pub.")
<PathDescription: Gene.publications>
This helps us when we want to print our tables into a file and want the column names to be in a more readable format.
In the next tutorial we will look at dealing with the results that are generated.