This notebook demonstrates Nexus Forge data querying features.
from kgforge.core import KnowledgeGraphForge
A configuration file is needed in order to create a KnowledgeGraphForge session. A configuration can be generated using the notebook 00-Initialization.ipynb.
forge = KnowledgeGraphForge("../../configurations/forge.yml")
from kgforge.core import Resource
from kgforge.specializations.resources import Dataset
jane = Resource(type="Person", name="Jane Doe")
forge.register(jane)
resource = forge.retrieve(jane.id)
resource == jane
jane = Resource(type="Person", name="Jane Doe")
forge.register(jane)
forge.tag(jane, "v1")
jane.email = "jane.doe@epfl.ch"
forge.update(jane)
try:
# DemoStore
print(jane._store_metadata.version)
except:
# BlueBrainNexus
print(jane._store_metadata._rev)
jane_v1 = forge.retrieve(jane.id, version=1)
jane_v1_tag = forge.retrieve(jane.id, version="v1")
jane_v1 == jane_v1_tag
It is possible to retrieve resources stored in buckets different then the configured one. The configured store should of course support it.
resource = forge.retrieve(jane.id, cross_bucket=True) # cross_bucket defaults to False
resource = forge.retrieve("123")
resource is None
Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel. Commented lines are for DemoModel.
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)
dataset = Dataset(forge, type="Dataset", contribution=[contribution_jane, contribution_john])
dataset.add_distribution("../../data/associations.tsv")
forge.register(dataset)
forge.as_json(dataset)
The paths
method load the data structure for the given type.
Please refer to the Modeling.ipynb notebook to learn about modeling and types.
p = forge.paths("Dataset")
You have autocompletion on p
and this can be used to create search filters.
Note: There is a known issue for RdfModel which requires using p.type.id
instead of p.type
.
resources = forge.search(p.type.id=="Dataset", limit=3)
type(resources)
len(resources)
forge.as_dataframe(resources)
forge.as_json(resources[2])
forge.as_dataframe(resources, store_metadata=True)
Property autocompletion is available on a path p
even for nested properties like p.contribution
.
# Search for resources of type Dataset and with a attached files of content type text/tab-separated-values
resources = forge.search(p.type.id == "Dataset", p.distribution.encodingFormat == "text/tab-separated-values", limit=3)
len(resources)
forge.as_dataframe(resources)
A dictionary can be provided for filters:
This feature is not supported when using the DemoStore
# Search for resources of type Dataset and with conribution from agent named "Jane Doe"
filters = {"type": "Dataset", "contribution":{"type":"Contribution", "agent":{"name":"Jane Doe"}}}
resources = forge.search(filters, limit=3)
type(resources)
len(resources)
forge.as_dataframe(resources, store_metadata=True)
It is possible to search for resources stored in buckets different then the configured one. The configured store should of course support it.
resources = forge.search(p.type.id == "Dataset", limit=3, cross_bucket=True) # cross_bucket defaults to False
type(resources)
len(resources)
forge.as_dataframe(resources)
#Furthermore it is possible to filter by bucket when cross_bucket is set to True. Setting a bucket value when cross_bucket is False will trigger a not_supported exception.
resources = forge.search(p.type.id == "Dataset", limit=3, cross_bucket=True, bucket=<str>) # add a bucket
type(resources)
len(resources)
forge.as_dataframe(resources)
SPARQL is used as a query language.
A SPARQL query rewriting strategy lets users write simplified queries without using prefix declarations, prefix names or long IRIs. With this strategy, the user could only provides type and property names. For a given entity type, these names could be seen in its template.
Please refer to the Modeling.ipynb notebook to learn about templates.
Note: DemoStore doesn't implement SPARQL operations yet. Please use another store for this section.
Note: DemoModel and RdfModel schemas have not been synchronized yet. This section is to be run with RdfModel.
jane = Resource(type="Person", name="Jane Doe")
contribution_jane = Resource(type="Contribution", agent=jane)
john = Resource(type="Person", name="John Smith")
contribution_john = Resource(type="Contribution", agent=john)
association = Resource(type="Dataset", contribution=[contribution_jane, contribution_john])
forge.register(association)
forge.template("Dataset")
When a forge RDFModel is configured, then there is no need to provide prefixes and namespaces when writing a SPARQL query. Prefixes and namespaces will be automatically inferred from the provided schemas and/or JSON-LD context and the query rewritten accordingly.
query = """
SELECT ?id ?name
WHERE {
?id a Dataset ;
contribution/agent ?contributor.
?contributor name ?name.
}
"""
resources = forge.sparql(query, limit=3)
type(resources)
len(resources)
type(resources[0])
forge.as_dataframe(resources)
The prefix free SPARQL query provided above is rewritten as the ouput of cell when a forge Model is configured.
resources = forge.sparql(query, limit=3, debug=True)
regular SPARQL query can also be provided.
query = """
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX mba: <http://api.brain-map.org/api/v2/data/Structure/>
PREFIX nsg: <https://neuroshapes.org/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX shsh: <http://www.w3.org/ns/shacl-shacl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX vann: <http://purl.org/vocab/vann/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX : <https://neuroshapes.org/>
SELECT ?id ?name
WHERE {
?id a schema:Dataset ;
nsg:contribution/prov:agent ?contributor.
?contributor schema:name ?name.
}
"""
resources = forge.sparql(query, limit=3)
type(resources)
len(resources)
type(resources[0])
forge.as_dataframe(resources)
Note: DemoStore doesn't implement file operations yet. Please use another store for this section.
jane = Resource(type="Person", name="Jane Doe")
! ls -p ../../data | egrep -v /$
distribution = forge.attach("../../data")
association = Resource(type="Association", agent=jane, distribution=distribution)
forge.register(association)
forge.download(association, "distribution.contentUrl", "./downloaded/")
! ls -l ./downloaded/
# ! rm -R ./downloaded/