Querying¶

This notebook demonstrates Nexus Forge data querying features.

In [1]:

from kgforge.core import KnowledgeGraphForge

A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook 00-Initialization.ipynb.

In [162]:

forge = KnowledgeGraphForge("../../configurations/forge.yml")

Imports¶

In [66]:

from kgforge.core import Resource
from kgforge.specializations.resources import Dataset
from kgforge.core.wrappings.paths import Filter, FilterOperator

Retrieval¶

latest version¶

In [4]:

jane = Resource(type="Person", name="Jane Doe")

In [5]:

forge.register(jane)

<action> _register_one
<succeeded> True

In [6]:

resource = forge.retrieve(jane.id)

In [7]:

resource == jane

Out[7]:

False

specific version¶

In [17]:

jane = Resource(type="Person", name="Jane Doe")

In [18]:

forge.register(jane)

<action> _register_one
<succeeded> True

In [19]:

forge.tag(jane, "v1")

<action> _tag_one
<succeeded> True

In [20]:

jane.email = "jane.doe@epfl.ch"

In [21]:

forge.update(jane)

<action> _update_one
<succeeded> True

In [22]:

try:
    # DemoStore
    print(jane._store_metadata.version)
except:
    # BlueBrainNexus
    print(jane._store_metadata._rev)

In [23]:

jane_v1 = forge.retrieve(jane.id, version=1)

In [24]:

jane_v1_tag = forge.retrieve(jane.id, version="v1")

In [25]:

jane_v1 == jane_v1_tag

Out[25]:

True

crossbucket retrieval¶

It is possible to retrieve resources stored in buckets different then the configured one. The configured store should of course support it.

In [26]:

resource = forge.retrieve(jane.id, cross_bucket=True) # cross_bucket defaults to False

error handling¶

In [27]:

resource = forge.retrieve("123")

<action> retrieve
<error> RetrievalError: 404 Client Error: Not Found for url: https://sandbox.bluebrainnexus.io/v1/resources/github-users/mfsy/_/%3A%2F%2F123

In [28]:

resource is None

Out[28]:

True

Searching¶

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel.

In [29]:

jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [30]:

john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [31]:

dataset = Dataset(forge, type="Dataset", contribution=[contribution_jane, contribution_john])
dataset.add_distribution("../../data/associations.tsv")

In [32]:

forge.register(dataset)

<action> _register_one
<succeeded> True

In [ ]:

forge.as_json(dataset)

Paths as filters¶

The paths method load the template or property paths for a given type.

Please refer to the Modeling.ipynb notebook to learn about templates and types.

In [34]:

p = forge.paths("Dataset")

You have autocompletion on p and this can be used to create search filters.

Note: There is a known issue for RdfModel which requires using p.type.id instead of p.type.

All python comparison operators are supported.

In [80]:

resources = forge.search(p.type.id=="Person", limit=3)

In [81]:

type(resources)

Out[81]:

list

In [37]:

len(resources)

Out[37]:

In [38]:

forge.as_dataframe(resources)

Out[38]:

	id	type	_schemaProject	name	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	distribution.digest.value	distribution.encodingFormat	distribution.name
0	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	Jane Doe	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	John Smith	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	Jane Doe	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	52.0	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	1dacd765946963fda4949753659089c5f532714b418d30...	text/csv	persons.csv

In [ ]:

forge.as_json(resources[2])

In [48]:

forge.as_dataframe(resources, store_metadata=True)

Out[48]:

	id	type	_schemaProject	name	_constrainedBy	_createdAt	_createdBy	_deprecated	_incoming	_outgoing	...	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	distribution.digest.value	distribution.encodingFormat	distribution.name
0	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	Jane Doe	https://bluebrain.github.io/nexus/schemas/unco...	2022-01-06T15:46:40.285Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	Jane Doe	https://bluebrain.github.io/nexus/schemas/unco...	2022-01-06T15:47:00.719Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	52.0	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	1dacd765946963fda4949753659089c5f532714b418d30...	text/csv	persons.csv
2	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	Jane Doe	https://bluebrain.github.io/nexus/schemas/unco...	2022-01-07T11:26:11.330Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

3 rows × 25 columns

Nested property querying¶

Property autocompletion is available on a path p even for nested properties like p.contribution.

In [41]:

# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat
resources = forge.search(p.type.id == "Person", p.distribution.encodingFormat == "text/tab-separated-values", limit=3)

In [42]:

len(resources)

Out[42]:

In [43]:

forge.as_dataframe(resources)

Out[43]:

	id	type	_schemaProject	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	distribution.digest.value	distribution.encodingFormat	distribution.name	name
0	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	9639abc864e91c645779f510ae5c06a1618941d569eb1a...	text/tab-separated-values	associations.tsv	Jane Doe
1	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	9639abc864e91c645779f510ae5c06a1618941d569eb1a...	text/tab-separated-values	associations.tsv	Jane Doe
2	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	9639abc864e91c645779f510ae5c06a1618941d569eb1a...	text/tab-separated-values	associations.tsv	Jane Doe

Dict as filters¶

A dictionary can be provided for filters:

{'type': {'id':'Dataset'}} is equivalent to p.type.id=="Dataset"
only the '==' operator is supported
nested dict are supported
it is not mandatory for the provided properties and values to be defined in the forge model. Results will be retrieved if there are corresponding data in the store.

This feature is not supported when using the DemoStore

In [49]:

# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat
filters = {"type": "Person", "distribution":{"encodingFormat":"text/tab-separated-values"}}
resources = forge.search(filters, limit=3)

In [50]:

type(resources)

Out[50]:

list

In [51]:

len(resources)

Out[51]:

In [52]:

forge.as_dataframe(resources, store_metadata=True)

Out[52]:

	id	type	_schemaProject	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	...	_createdAt	_createdBy	_deprecated	_incoming	_outgoing	_project	_rev	_self	_updatedAt	_updatedBy
0	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-17T11:00:14.662Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-17T11:00:14.662Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...
1	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-23T09:12:24.049Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-23T09:12:24.049Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...
2	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-23T09:18:43.327Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-23T09:18:43.327Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...

3 rows × 25 columns

Built-in Filter objects¶

Supported filter operators¶

In [83]:

[f"{op.value} ({op.name})" for op in FilterOperator] # These are equivalent to the Python comparison operators

Out[83]:

['__eq__ (EQUAL)',
 '__ne__ (NOT_EQUAL)',
 '__lt__ (LOWER_THAN)',
 '__le__ (LOWER_OR_Equal_Than)',
 '__gt__ (GREATER_Than)',
 '__ge__ (GREATER_OR_Equal_Than)']

In [84]:

# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat

filter_1 = Filter(operator="__eq__", path=["type"], value="Person")
filter_2 = Filter(operator="__eq__", path=["distribution","encodingFormat"], value="text/tab-separated-values")
resources = forge.search(filter_1,filter_2, limit=3)

In [85]:

type(resources)

Out[85]:

list

In [86]:

len(resources)

Out[86]:

In [87]:

forge.as_dataframe(resources, store_metadata=True)

Out[87]:

	id	type	_schemaProject	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	...	_createdAt	_createdBy	_deprecated	_incoming	_outgoing	_project	_rev	_self	_updatedAt	_updatedBy
0	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-17T11:00:14.662Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-17T11:00:14.662Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...
1	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-23T09:12:24.049Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-23T09:12:24.049Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...
2	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-23T09:18:43.327Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-23T09:18:43.327Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...

3 rows × 25 columns

Search Endpoints¶

Two types of search endpoints are supported: 'sparql' for graph queries and 'elastic' for document oriented queries. The types of available search endpoint can be configured (see 00-Initialization.ipynb for an example of search endpoints config) or set when creating a KnowledgeGraphForge session using the 'searchendpoints' arguments.

The search endpoint to hit when calling forge.search(...) is 'sparql' by default but can be specified using the 'search_endpoint' argument.

SPARQL Search Endpoint¶

In [184]:

# Search for resources of type Person and with text/tab-separated-values as distribution.encodingFormat
filters = {"type": "Person", "distribution":{"encodingFormat":"text/tab-separated-values"}}
resources = forge.search(filters, limit=3, search_endpoint='sparql')

In [172]:

type(resources)

Out[172]:

list

In [173]:

len(resources)

Out[173]:

In [174]:

forge.as_dataframe(resources, store_metadata=True)

Out[174]:

	id	type	_schemaProject	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	...	_createdAt	_createdBy	_deprecated	_incoming	_outgoing	_project	_rev	_self	_updatedAt	_updatedBy
0	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-17T11:00:14.662Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-17T11:00:14.662Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...
1	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-23T09:12:24.049Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-23T09:12:24.049Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...
2	https://sandbox.bluebrainnexus.io/v1/resources...	Person	https://sandbox.bluebrainnexus.io/v1/projects/...	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	506	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	...	2021-08-23T09:18:43.327Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...	False	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/resources...	https://sandbox.bluebrainnexus.io/v1/projects/...	1	https://sandbox.bluebrainnexus.io/v1/resources...	2021-08-23T09:18:43.327Z	https://sandbox.bluebrainnexus.io/v1/realms/gi...

3 rows × 25 columns

ElasticSearch Endpoint¶

In [ ]:

# Search for resources of type Person and retrieve their ids and names.

filters = {"@type": "http://schema.org/Person"}
resources = forge.search(filters, limit=3, search_endpoint='elastic', debug=True, includes=["@id","name"]) # fields can also be excluded with 'excludes'

In [194]:

type(resources)

Out[194]:

list

In [195]:

len(resources)

Out[195]:

In [196]:

forge.as_dataframe(resources, store_metadata=True)

Out[196]:

	@id	name
0	https://sandbox.bluebrainnexus.io/v1/resources...	Jane Doe
1	https://sandbox.bluebrainnexus.io/v1/resources...	Jane Doe
2	https://sandbox.bluebrainnexus.io/v1/resources...	Jane Doe

Crossbucket search¶

It is possible to search for resources stored in buckets different than the configured one. The configured store should of course support it.

In [88]:

resources = forge.search(p.type.id == "Association", limit=3, cross_bucket=True)  # cross_bucket defaults to False

In [89]:

type(resources)

Out[89]:

list

In [90]:

len(resources)

Out[90]:

In [91]:

forge.as_dataframe(resources)

Out[91]:

	id	type	_schemaProject	agent.type	agent.gender.id	agent.gender.type	agent.gender.label	agent.name	distribution.type	distribution.atLocation.type	distribution.atLocation.store.id	distribution.contentSize.unitCode	distribution.contentSize.value	distribution.contentUrl	distribution.digest.algorithm	distribution.digest.value	distribution.encodingFormat	distribution.name	name
0	https://kg.example.ch/associations/123	Association	https://sandbox.bluebrainnexus.io/v1/projects/...	Person	http://purl.obolibrary.org/obo/PATO_0000383	LabeledOntologyEntity	female	Marie Curie	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	46.0	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	e0fe65f725bf28fe2b88c7bafb51fb5ef1df0ab14c68a3...	text/plain	marie_curie.txt	Curie Association
1	https://sandbox.bluebrainnexus.io/v1/resources...	Association	https://sandbox.bluebrainnexus.io/v1/projects/...	Person	http://purl.obolibrary.org/obo/PATO_0000384	LabeledOntologyEntity	male	Albert Einstein	DataDownload	Location	https://bluebrain.github.io/nexus/vocabulary/d...	bytes	50.0	https://sandbox.bluebrainnexus.io/v1/files/git...	SHA-256	91a5ce5c84dc5bead730a4b49d0698b4aaef4bc06ce164...	text/plain	albert_einstein.txt	Einstein Association
2	https://sandbox.bluebrainnexus.io/v1/resources...	Association	https://sandbox.bluebrainnexus.io/v1/projects/...	Person	NaN	NaN	NaN	Jane Doe	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

In [ ]:

#Furthermore it is possible to filter by bucket when cross_bucket is set to True. Setting a bucket value when cross_bucket is False will trigger a not_supported exception.
resources = forge.search(p.type.id == "Person", limit=3, cross_bucket=True, bucket=<str>)  # add a bucket

In [ ]:

type(resources)

In [ ]:

len(resources)

In [ ]:

forge.as_dataframe(resources)

Graph traversing¶

SPARQL is used as a query language to perform graph traversing.

Nexus Forge implements a SPARQL query rewriting strategy leveraging a configured RDFModel that lets users write SPARQL queries without adding prefix declarations, prefix names or long IRIs. With this strategy, only provides type and property names can be provided.

Please refer to the Modeling.ipynb notebook to learn about templates.

Note: DemoStore doesn't implement SPARQL operations yet. Please use another store for this section.

Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel.

In [96]:

jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [97]:

john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [98]:

association = Resource(type="Dataset", contribution=[contribution_jane, contribution_john])

In [99]:

forge.register(association)

<action> _register_one
<succeeded> True

In [124]:

forge.template("Dataset") # Templates help know which property to use when writing a query

{
    id: ""
    type:
    {
        id: ""
    }
    annotation:
    {
        id: ""
        type: Annotation
        hasBody:
        {
            id: ""
            type:
            {
                id: ""
            }
            label: ""
            note: ""
        }
        hasTarget:
        {
            id: ""
            type: AnnotationTarget
        }
        note: ""
    }
    brainLocation:
    {
        id: ""
        type: BrainLocation
        atlasSpatialReferenceSystem:
        {
            id: ""
            type: AtlasSpatialReferenceSystem
        }
        brainRegion:
        {
            id: ""
            label: ""
        }
        coordinatesInBrainAtlas:
        {
            id: ""
            valueX: 0.0
            valueY: 0.0
            valueZ: 0.0
        }
        coordinatesInSlice:
        {
            spatialReferenceSystem:
            {
                id: ""
                type: SpatialReferenceSystem
            }
            valueX: 0.0
            valueY: 0.0
            valueZ: 0.0
        }
        distanceToBoundary:
        {
            boundary:
            {
                id: ""
                label: ""
            }
            distance:
            {
                unitCode: ""
                value:
                [
                    0.0
                    0
                ]
            }
        }
        layer:
        {
            id: ""
            label: ""
        }
        longitudinalAxis:
        [
            Dorsal
            Ventral
        ]
        positionInLayer:
        [
            Deep
            Superficial
        ]
    }
    contribution:
    {
        id: ""
    }
    distribution:
    {
        id: ""
        type: DataDownload
        contentSize:
        {
            unitCode: ""
            value:
            [
                0.0
                0
            ]
        }
        digest:
        {
            algorithm: ""
            value: ""
        }
        encodingFormat: ""
        license: ""
        name: ""
    }
    objectOfStudy:
    {
        id: ""
        type: ObjectOfStudy
    }
    releaseDate: 9999-12-31T00:00:00
    subject:
    {
        id: ""
        type: Subject
    }
}

Prefix and namespace free SPARQL query¶

When a forge RDFModel is configured, then there is no need to provide prefixes and namespaces when writing a SPARQL query. Prefixes and namespaces will be automatically inferred from the provided schemas and/or JSON-LD context and the query rewritten accordingly.

In [101]:

query = """
    SELECT ?id ?name
    WHERE {
        ?id a Dataset ;
        contribution/agent ?contributor.
        ?contributor name ?name.
    }
"""

In [102]:

resources = forge.sparql(query, limit=3)

In [103]:

type(resources)

Out[103]:

list

In [104]:

len(resources)

Out[104]:

In [105]:

type(resources[0])

Out[105]:

kgforge.core.resource.Resource

In [106]:

forge.as_dataframe(resources)

Out[106]:

	id	name
0	https://sandbox.bluebrainnexus.io/v1/resources...	John Smith
1	https://sandbox.bluebrainnexus.io/v1/resources...	Jane Doe
2	https://sandbox.bluebrainnexus.io/v1/resources...	John Smith

display rewritten SPARQL query¶

In [107]:

resources = forge.sparql(query, limit=3, debug=True)

Submitted query:
   PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX vann: <http://purl.org/vocab/vann/>
   PREFIX void: <http://rdfs.org/ns/void#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX : <https://neuroshapes.org/>
   
       SELECT ?id ?name
       WHERE {
           ?id a schema:Dataset ;
           nsg:contribution/prov:agent ?contributor.
           ?contributor schema:name ?name.
       }

Full SPARQL query¶

regular SPARQL query can also be provided.

In [108]:

query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
   PREFIX dcat: <http://www.w3.org/ns/dcat#>
   PREFIX dcterms: <http://purl.org/dc/terms/>
   PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
   PREFIX nsg: <https://neuroshapes.org/>
   PREFIX owl: <http://www.w3.org/2002/07/owl#>
   PREFIX prov: <http://www.w3.org/ns/prov#>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX schema: <http://schema.org/>
   PREFIX sh: <http://www.w3.org/ns/shacl#>
   PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX vann: <http://purl.org/vocab/vann/>
   PREFIX void: <http://rdfs.org/ns/void#>
   PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
   PREFIX : <https://neuroshapes.org/>
   SELECT ?id ?name
   WHERE {
       ?id a schema:Dataset ;
       nsg:contribution/prov:agent ?contributor.
       ?contributor schema:name ?name.
   }
"""

In [109]:

resources = forge.sparql(query, limit=3)

In [110]:

type(resources)

Out[110]:

list

In [111]:

len(resources)

Out[111]:

In [112]:

type(resources[0])

Out[112]:

kgforge.core.resource.Resource

In [113]:

forge.as_dataframe(resources)

Out[113]:

	id	name
0	https://sandbox.bluebrainnexus.io/v1/resources...	John Smith
1	https://sandbox.bluebrainnexus.io/v1/resources...	Jane Doe
2	https://sandbox.bluebrainnexus.io/v1/resources...	John Smith

ElasticSearch DSL Query¶

ElasticSearch DSL can be used as a query language search for resources provided that the configured store supports it. The 'BlueBrainNexusStore' supports ElasticSearch.

Note: DemoStore doesn't implement ElasaticSearch DSL operations.

In [125]:

jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)

In [126]:

john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)

In [127]:

association = Resource(type="Dataset", contribution=[contribution_jane, contribution_john])

In [128]:

forge.register(association)

<action> _register_one
<succeeded> True

In [129]:

forge.template("Dataset") # Templates help know which property to use when writing a query

{
    id: ""
    type:
    {
        id: ""
    }
    annotation:
    {
        id: ""
        type: Annotation
        hasBody:
        {
            id: ""
            type:
            {
                id: ""
            }
            label: ""
            note: ""
        }
        hasTarget:
        {
            id: ""
            type: AnnotationTarget
        }
        note: ""
    }
    brainLocation:
    {
        id: ""
        type: BrainLocation
        atlasSpatialReferenceSystem:
        {
            id: ""
            type: AtlasSpatialReferenceSystem
        }
        brainRegion:
        {
            id: ""
            label: ""
        }
        coordinatesInBrainAtlas:
        {
            id: ""
            valueX: 0.0
            valueY: 0.0
            valueZ: 0.0
        }
        coordinatesInSlice:
        {
            spatialReferenceSystem:
            {
                id: ""
                type: SpatialReferenceSystem
            }
            valueX: 0.0
            valueY: 0.0
            valueZ: 0.0
        }
        distanceToBoundary:
        {
            boundary:
            {
                id: ""
                label: ""
            }
            distance:
            {
                unitCode: ""
                value:
                [
                    0.0
                    0
                ]
            }
        }
        layer:
        {
            id: ""
            label: ""
        }
        longitudinalAxis:
        [
            Dorsal
            Ventral
        ]
        positionInLayer:
        [
            Deep
            Superficial
        ]
    }
    contribution:
    {
        id: ""
    }
    distribution:
    {
        id: ""
        type: DataDownload
        contentSize:
        {
            unitCode: ""
            value:
            [
                0.0
                0
            ]
        }
        digest:
        {
            algorithm: ""
            value: ""
        }
        encodingFormat: ""
        license: ""
        name: ""
    }
    objectOfStudy:
    {
        id: ""
        type: ObjectOfStudy
    }
    releaseDate: 9999-12-31T00:00:00
    subject:
    {
        id: ""
        type: Subject
    }
}

Plain ElasticSearch DSL¶

In [155]:

query = """
        {
          "_source": {
            "includes": [
              "@id",
              "name"
            ]
          },
          "query": {
            "term": {
              "@type": "http://schema.org/Dataset"
            }
          }
        }
"""

In [156]:

resources = forge.elastic(query, limit=3) # limit and offset (when provided in this method call) superseed 'size' and 'from' values provided in the query

In [157]:

type(resources)

Out[157]:

list

In [158]:

len(resources)

Out[158]:

In [159]:

type(resources[0])

Out[159]:

kgforge.core.resource.Resource

In [160]:

forge.as_dataframe(resources)

Out[160]:

	@id	name
0	https://bbp.epfl.ch/neurosciencegraph/data/neu...	Scnn1a-Tg3-Cre;Ai14-187849.06.01.01
1	https://bbp.epfl.ch/neurosciencegraph/data/neu...	H17.06.004.11.05.04
2	https://bbp.epfl.ch/neurosciencegraph/data/neu...	H16.06.009.01.01.15.01

Downloading¶

Note: DemoStore doesn't implement file operations yet. Please use another store for this section.

In [114]:

jane = Resource(type="Person", name="Jane Doe")

In [115]:

! ls -p ../../data | egrep -v /$

associations.tsv
my_data.xwz
my_data_derived.txt
persons.csv
tfidfvectorizer_model_schemaorg_linking

In [116]:

distribution = forge.attach("../../data")

In [117]:

association = Resource(type="Association", agent=jane, distribution=distribution)

In [118]:

forge.register(association)

<action> _register_one
<succeeded> True

In [122]:

# The argument overwrite: bool can be provided to decide whether to overwrite (True) existing files with the same name or
# to create new ones (False) with their names suffixed with a timestamp.
# A cross_bucket argument can be provided to download data from the configured bucket (cross_bucket=False - the default value) 
# or from a bucket different than the configured one (cross_bucket=True). The configured store should support crossing buckets for this to work.
forge.download(association, "distribution.contentUrl", "./downloaded/")

In [123]:

! ls -l ./downloaded/

total 440
-rw-r--r--  1 mfsy  staff     477 Jan  7 13:51 associations.tsv
-rw-r--r--  1 mfsy  staff      16 Jan  7 13:51 my_data.xwz
-rw-r--r--  1 mfsy  staff      24 Jan  7 13:51 my_data_derived.txt
-rw-r--r--  1 mfsy  staff      52 Jan  7 13:51 persons.csv
-rw-r--r--  1 mfsy  staff  204848 Jan  7 13:51 tfidfvectorizer_model_schemaorg_linking

In [121]:

#! rm -R ./downloaded/