In the previous tutorial, we learnt about adding constraints to our query so that we could filter the results. In this tutorial we will take a look at some more contraints and the different types of constraints.
from intermine.webservice import Service
service = Service("www.flymine.org/flymine/service")
query=service.new_query("Gene")
The first type of constraint that we will look at is a Unary Constraint. A Unary Constraint is one that does not take any value but can be used to check if a particular attirbute is absent or present. The Unary constraints are IS Null and IS NOT Null. We can look at a small example.
query.add_constraint("primaryIdentifier","IS NOT NULL")
<UnaryConstraint: Gene.primaryIdentifier IS NOT NULL>
for row in query.rows(size=10):
print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd' Gene: briefDescription=None cytoLocation='-' description=None id=1004698 length=1951 name=None primaryIdentifier='FBgn0039942' score=None scoreType=None secondaryIdentifier='CG17163' symbol='CG17163' Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A' Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd' Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1' Gene: briefDescription=None cytoLocation='-' description=None id=1018843 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167' Gene: briefDescription=None cytoLocation='-' description=None id=1018936 length=11613 name=None primaryIdentifier='FBgn0040031' score=None scoreType=None secondaryIdentifier='CG12061' symbol='CG12061' Gene: briefDescription=None cytoLocation='-' description=None id=1019191 length=45356 name=None primaryIdentifier='FBgn0039959' score=None scoreType=None secondaryIdentifier='CG17514' symbol='CG17514' Gene: briefDescription=None cytoLocation='-' description=None id=1019474 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099' Gene: briefDescription=None cytoLocation='-' description=None id=1020233 length=11099 name=None primaryIdentifier='FBgn0040056' score=None scoreType=None secondaryIdentifier='CG17698' symbol='CG17698'
The next type of constraint is a Binary Constraint. This refers to constraints that take a value. Most of the constraints that we looked at in the second tutorial were binary constraints. Binary constraints are the largest group of constraints. The operators are =,<=,>=,<,>,!=
query.add_constraint("length",">=","12000")
<BinaryConstraint: Gene.length >= 12000>
for row in query.rows(size=10):
print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd' Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A' Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd' Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1' Gene: briefDescription=None cytoLocation='-' description=None id=1018843 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167' Gene: briefDescription=None cytoLocation='-' description=None id=1019191 length=45356 name=None primaryIdentifier='FBgn0039959' score=None scoreType=None secondaryIdentifier='CG17514' symbol='CG17514' Gene: briefDescription=None cytoLocation='-' description=None id=1019474 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099' Gene: briefDescription=None cytoLocation='-' description=None id=1020460 length=32088 name='kelch like family member 10' primaryIdentifier='FBgn0040038' score=None scoreType=None secondaryIdentifier='CG12423' symbol='klhl10' Gene: briefDescription=None cytoLocation='-' description=None id=1058972 length=133933 name=None primaryIdentifier='FBgn0058006' score=None scoreType=None secondaryIdentifier='CG40006' symbol='CG40006' Gene: briefDescription=None cytoLocation='-' description=None id=1060939 length=76790 name='neverland' primaryIdentifier='FBgn0259697' score=None scoreType=None secondaryIdentifier='CG40050' symbol='nvd'
The above constraint is an example of a binary constraint.
We will now look at Ternary constraints. A ternary constraint is a type of constraint which has one required value and one optional value. Currently, intermine supports only one such type of operator: LOOKUP. The lookup operator searches through all the fields in a particular class for the value specified by the user. In the example given below, it will search through the entire gene class to find if any of the fields has an occurence of "zen". The advantage of this is that you do not need to remember if zen is a symbol or a name or a primaryIdentifier. However, this may lead to ambiguous results and so you can use the optional extra_value parameter to limit the search to the type of object (for example, organism in genes).
query2=service.new_query()
query2.add_constraint("Gene","LOOKUP","zen",extra_value="D. melanogaster")
<TernaryConstraint: Gene LOOKUP zen IN D. melanogaster>
for row in query2.rows():
print(row)
Gene: briefDescription=None cytoLocation='84A5-84A5' description=None id=1007678 length=1331 name='zerknullt' primaryIdentifier='FBgn0004053' score=None scoreType=None secondaryIdentifier='CG1046' symbol='zen'
The next constraint type that we will look at is Multi-Value constraints. This allows the constraint to take multiple values. The two operators that are allowed are ONE OF and NONE OF.
query3=service.new_query("Gene")
query3.add_constraint("symbol","NONE OF",['zen','eve'])
<MultiConstraint: Gene.symbol NONE OF ['zen', 'eve']>
for row in query3.rows(size=10):
print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd' Gene: briefDescription=None cytoLocation='-' description=None id=1004698 length=1951 name=None primaryIdentifier='FBgn0039942' score=None scoreType=None secondaryIdentifier='CG17163' symbol='CG17163' Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A' Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd' Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1' Gene: briefDescription=None cytoLocation='-' description=None id=1018843 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167' Gene: briefDescription=None cytoLocation='-' description=None id=1018936 length=11613 name=None primaryIdentifier='FBgn0040031' score=None scoreType=None secondaryIdentifier='CG12061' symbol='CG12061' Gene: briefDescription=None cytoLocation='-' description=None id=1019191 length=45356 name=None primaryIdentifier='FBgn0039959' score=None scoreType=None secondaryIdentifier='CG17514' symbol='CG17514' Gene: briefDescription=None cytoLocation='-' description=None id=1019474 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099' Gene: briefDescription=None cytoLocation='-' description=None id=1020233 length=11099 name=None primaryIdentifier='FBgn0040056' score=None scoreType=None secondaryIdentifier='CG17698' symbol='CG17698'
List Constraints: List constraints allow users to create a named list of objects and then use the operators IN and NOT IN to use those named lists in queries. An example for the same is below. The path in such a query must always be a Class (for example - Gene is a valid path). The available lists in intermine can be found at: http://www.flymine.org/flymine/bag.do?subtab=view .
query4=service.new_query()
query4.add_constraint("Gene","IN","PL FlyAtlas_brain_top")
<ListConstraint: Gene IN PL FlyAtlas_brain_top>
for row in query4.rows(size=10):
print(row)
Gene: briefDescription=None cytoLocation='10A3-10A3' description=None id=1107310 length=2075 name=None primaryIdentifier='FBgn0030259' score=None scoreType=None secondaryIdentifier='CG1545' symbol='CG1545' Gene: briefDescription=None cytoLocation='11D8-11D8' description=None id=1039485 length=90456 name='radish' primaryIdentifier='FBgn0265597' score=None scoreType=None secondaryIdentifier='CG44424' symbol='rad' Gene: briefDescription=None cytoLocation='14A1-14A3' description=None id=1040836 length=26224 name='mind-meld' primaryIdentifier='FBgn0259110' score=None scoreType=None secondaryIdentifier='CG42252' symbol='mmd' Gene: briefDescription=None cytoLocation='16F3-16F5' description=None id=1059829 length=138941 name='Shaker' primaryIdentifier='FBgn0003380' score=None scoreType=None secondaryIdentifier='CG12348' symbol='Sh' Gene: briefDescription=None cytoLocation='18C2-18C3' description=None id=1076481 length=21373 name='nicotinic Acetylcholine Receptor alpha7' primaryIdentifier='FBgn0086778' score=None scoreType=None secondaryIdentifier='CG32538' symbol='nAChRalpha7' Gene: briefDescription=None cytoLocation='19A4-19A4' description=None id=1174409 length=58027 name='Dopamine 2-like receptor' primaryIdentifier='FBgn0053517' score=None scoreType=None secondaryIdentifier='CG33517' symbol='Dop2R' Gene: briefDescription=None cytoLocation='24C9-24D1' description=None id=1008362 length=88786 name='friend of echinoid' primaryIdentifier='FBgn0051774' score=None scoreType=None secondaryIdentifier='CG31774' symbol='fred' Gene: briefDescription=None cytoLocation='25B4-25B4' description=None id=1699681 length=1564 name=None primaryIdentifier='FBgn0031650' score=None scoreType=None secondaryIdentifier='CG14044' symbol='CG14044' Gene: briefDescription=None cytoLocation='30D1-30E1' description=None id=1433896 length=92934 name='nicotinic Acetylcholine Receptor alpha6' primaryIdentifier='FBgn0032151' score=None scoreType=None secondaryIdentifier='CG4128' symbol='nAChRalpha6' Gene: briefDescription=None cytoLocation='42D4-42D6' description=None id=1181524 length=10005 name=None primaryIdentifier='FBgn0033108' score=None scoreType=None secondaryIdentifier='CG15236' symbol='CG15236'
The intermine database is a hierarchical database. Sub-class constraints allow you to specify a sub-class of a class to constrain a path to. This basically allows us to constrain our results to only those items of the sub class. The example below is an example of a sub-class constraint.
query5=service.new_query("Gene")
query5.add_constraint("ontologyAnnotations","GOAnnotation")
<SubClassConstraint: Gene.ontologyAnnotations ISA GOAnnotation>
for row in query5.rows(size=10):
print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000415 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd' Gene: briefDescription=None cytoLocation='-' description=None id=1005938 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A' Gene: briefDescription=None cytoLocation='-' description=None id=1007519 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd' Gene: briefDescription=None cytoLocation='-' description=None id=1015398 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1' Gene: briefDescription=None cytoLocation='-' description=None id=1018843 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167' Gene: briefDescription=None cytoLocation='-' description=None id=1018936 length=11613 name=None primaryIdentifier='FBgn0040031' score=None scoreType=None secondaryIdentifier='CG12061' symbol='CG12061' Gene: briefDescription=None cytoLocation='-' description=None id=1019191 length=45356 name=None primaryIdentifier='FBgn0039959' score=None scoreType=None secondaryIdentifier='CG17514' symbol='CG17514' Gene: briefDescription=None cytoLocation='-' description=None id=1019474 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099' Gene: briefDescription=None cytoLocation='-' description=None id=1020233 length=11099 name=None primaryIdentifier='FBgn0040056' score=None scoreType=None secondaryIdentifier='CG17698' symbol='CG17698' Gene: briefDescription=None cytoLocation='-' description=None id=1020460 length=32088 name='kelch like family member 10' primaryIdentifier='FBgn0040038' score=None scoreType=None secondaryIdentifier='CG12423' symbol='klhl10'
Unlike most constraints, Sub-class constraints do not have an operator that is specified as a parameter to a constraint.
This tutorial summed up some of the important constraint types. In the next tutorial we will look at some of the other features of a query.