Searching and Downloading Data from the Blue Brain Knowledge Graph using the Knowledge Graph Forge¶

Initialize and configure¶

Get an authentication token¶

For now, the Nexus web application can be used to get a token. We are looking for other simpler alternatives.

Step 1: From the opened web page, click on the login button on the right corner and follow the instructions.

login-ui

Step 2: At the end you’ll see a token button on the right corner. Click on it to copy the token.

login-ui

Once a token is obtained then proceed to paste it below.

In [1]:

import getpass

In [ ]:

TOKEN = getpass.getpass()

Configure a client (forge) to access the knowledge graph¶

In [3]:

from kgforge.core import KnowledgeGraphForge

In [4]:

# Let target the sscx dissemination project in Nexus
ORG = "public"
PROJECT = "sscx"

In [ ]:

forge = KnowledgeGraphForge("prod-forge-nexus.yml",bucket=f"{ORG}/{PROJECT}",token=TOKEN)

Search and Download¶

In [ ]:

forge.types()

Ontologies¶

Set filters¶

In [7]:

# Supported filters for the time being are:
from kgforge.core.commons.strategies import ResolvingStrategy
text = "somatosensory"
limit=10

In [8]:

# other Search strategy can be ResolvingStrategy.BEST_MATCH, ResolvingStrategy.EXACT_MATCH
brain_region = forge.resolve(text, scope="ontology", target="terms", strategy=ResolvingStrategy.ALL_MATCHES, limit=limit)

In [ ]:

forge.as_dataframe(brain_region).head(100)

Neuron Morphologies¶

Set filters¶

In [10]:

# Supported filters for the time being are:
_type = "ReconstructedCell"
classification_type="nsg:MType"
mType="L4_NBC"
brainRegion = "primary somatosensory cortex"
layer = "layer 4"
encodingFormat="application/swc"
limit=2

In [ ]:

forge.template("Dataset")

Run Query¶

In [ ]:

path = forge.paths("Dataset") # to have autocompletion on the properties

In [ ]:

data = forge.search(path.type.id == _type,
                    path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
                    path.annotation.hasBody.label ==mType,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.brainLocation.layer.label == layer,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" dataset of type '"+_type+"' found.")

Display the results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label",
                                          "brainLocation.layer.id","brainLocation.layer.label", "contribution",
                                          "brainLocation.layer.id","brainLocation.layer.label","distribution.name",
                                          "distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

Dowload¶

In [16]:

dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

Get storage path¶

It is possible to get files locations and storages (e.g. Blue Brain Nexus Store or GPFS, ...).

In [ ]:

forge.as_json(data[0].distribution[0].atLocation)

In [ ]:

data[0].distribution[0].atLocation.location

Electrophysiology Traces¶

Set filters¶

In [19]:

# Supported filters for the time being are:
_type = "Trace"
classification_type="nsg:EType"
eType="cADpyr"
brainRegion = "primary somatosensory cortex"
layer = "layer 5"
encodingFormat="application/nwb"
limit=10

Run Query¶

In [ ]:

path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
                    path.annotation.hasBody.type.id ==classification_type,
                    path.annotation.hasBody.label ==eType,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.brainLocation.layer.label == layer,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

Display the results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label",
                                          "brainLocation.layer.id","brainLocation.layer.label", "contribution",
                                          "brainLocation.layer.id","brainLocation.layer.label",
                                          "distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

Dowload¶

In [22]:

dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

LayerThickness¶

Set filters¶

In [23]:

# Supported filters for the time being are:
_type = "LayerThickness"
brainRegion = "primary somatosensory cortex"
layer = "layer 2"
encodingFormat="application/xlsx"
limit=10

Run query¶

In [ ]:

path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
                    path.brainLocation.layer.label == layer,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

Display Results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label",
                                          "brainLocation.layer.id","brainLocation.layer.label", "contribution",
                                          "brainLocation.layer.id","brainLocation.layer.label","distribution.name",
                                          "distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

Dowload¶

In [26]:

dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

Neuron Density¶

Set filters¶

In [27]:

# Supported filters for the time being are:
_type = "NeuronDensity"
brainRegion = "primary somatosensory cortex"
layer = "layer 2"
encodingFormat="application/xlsx"
limit=10

Run query¶

In [29]:

path = forge.paths("Dataset") # to have autocompletion on the properties
data = forge.search(path.type.id == _type,
                    path.brainLocation.layer.label == layer,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

1 data of type 'NeuronDensity' found.

Display Results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

Dowload¶

In [31]:

dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath)

Atlas Release¶

In [ ]:

# Let target the bbp/atlas project in Nexus

forge_atlas = KnowledgeGraphForge("prod-forge-nexus.yml", bucket="bbp/atlas", token=TOKEN)

Atlas related types: AtlasRelease BrainParcellationDataLayer CellDensityDataLayer GeneExpressionVolumetricDataLayer GliaCellDensity NISSLImageDataLayer

Set filters¶

In [35]:

# Supported filters for the time being are:
_type = "BrainParcellationDataLayer"
limit=10

Run query¶

In [ ]:

#path = forge_atlas.paths("Dataset") # to have autocompletion on the properties
data = forge_atlas.search(path.type.id == _type,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

Display Results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge_atlas.reshape(data, keep=["id","name","brainLocation.brainRegion.id","brainLocation.brainRegion.label", "contribution","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_atlas.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

In [38]:

dirpath = "./downloaded/"
forge_atlas.download(data, "distribution.contentUrl", dirpath)

Data at a given tag¶

Tagged data are data with immutable identifiers. Such identifier gives the guarantee to retrieve the state of the data at the time the tag was created. Tag here is similaar to git tag.

Choose a bucket (or project) to query¶

In [39]:

bucket = "bbp/lnmce"

In [ ]:

forge_tag = KnowledgeGraphForge("prod-forge-nexus.yml", bucket=bucket, token=TOKEN)

Set tag value¶

In [41]:

tag = "LNMCE2020"

Set filters¶

In [46]:

# Let search for Electrophysiology Traces
_type = "Trace"
classification_type="EType"
eType="bIR"
brainRegion = "primary somatosensory cortex"
encodingFormat="application/nwb"
limit=10

Run Query¶

In [47]:

path = forge_tag.paths("Dataset") # to have autocompletion on the properties
data = forge_tag.search(path.type.id == _type,
                    path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
                    path.annotation.hasBody.label ==eType,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

10 data of type 'Trace' found.

Retrieve results at the set tag¶

In [ ]:

results = [forge_tag.retrieve(d.id, version=tag) for d in data]
print(str(f"{len(results)} data of type '{_type}' at tag {tag} found."))

Display the results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge_tag.reshape(results, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_tag.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

Dowload¶

In [50]:

dirpath = "./downloaded/"
forge_tag.download(results, "distribution.contentUrl", dirpath)

Data in a given view¶

A view exposes a subset of data for query and access in specialised indices (SPARQL, ElasticSearch).

In [51]:

# Here is an example of view url
view_url = "https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex"

In [ ]:

searchendpoints = {"sparql":{"endpoint":"https://bluebrain.github.io/nexus/vocabulary/lnmce2020SparqlIndex"}}
forge_view = KnowledgeGraphForge("prod-forge-nexus.yml", bucket="bbp/lnmce", token=TOKEN, searchendpoints=searchendpoints)

Set filters¶

In [64]:

# Let search for Electrophysiology Traces
_type = "Trace"
classification_type=":EType"
eType="bIR"
brainRegion = "primary somatosensory cortex"
encodingFormat="application/nwb"
limit=10

Run Query¶

In [66]:

path = forge_view.paths("Dataset") # to have autocompletion on the properties
data = forge_view.search(path.type.id == _type,
                    path.annotation.hasBody.type.id ==classification_type, # Known issue: use path.annotation.hasBody.type.id in case of error: AttributeError: 'PathWrapper' object has no attribute '_path'
                    path.annotation.hasBody.label ==eType,
                    path.brainLocation.brainRegion.label == brainRegion,
                    path.distribution.encodingFormat == encodingFormat,
                    limit=limit)

print(str(len(data))+" data of type '"+_type+"' found.")

10 data of type 'Trace' found.

Display the results¶

In [ ]:

DISPLAY_LIMIT = 10
reshaped_data = forge_tag.reshape(data, keep=["id","name","subject","brainLocation.brainRegion.id","brainLocation.brainRegion.label","brainLocation.layer.id","brainLocation.layer.label", "contribution","brainLocation.layer.id","brainLocation.layer.label","distribution.name","distribution.contentUrl","distribution.encodingFormat"])

forge_view.as_dataframe(reshaped_data[:DISPLAY_LIMIT])

Dowload¶

In [68]:

dirpath = "./downloaded/"
forge_view.download(data, "distribution.contentUrl", dirpath)