This notebook presents a step by step approach for registering datasets (any resource with files attached) with eventually metadata and provenance in a configured project.
The Nexus production deployment can be used to login and get a token.
import getpass
TOKEN = getpass.getpass()
nexus_staging_endpoint = "https://staging.nise.bbp.epfl.ch/nexus/v1" # use staging to try and test.
nexus_endpoint = nexus_staging_endpoint
ORG = "bbp"
PROJECT = "MyProject"
from kgforge.core import KnowledgeGraphForge
from kgforge.core import Resource
from kgforge.specializations.resources import Dataset
import pandas as pd
A KnowledgeGraphForge session is a python object that exposes all necessary functions to register with metadata, search and download datasets. A configuration file is needed in order to create a KnowledgeGraphForge session but a ready to use configuration file is made available here.
forge = KnowledgeGraphForge("https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/notebooks/use-cases/prod-forge-nexus.yml",
endpoint=nexus_endpoint,
bucket=f"{ORG}/{PROJECT}",
token= TOKEN
)
You are all set up !
A resource is anything that can be identified and that can have metadata associated. The following cell creates two resources of type Person and Agent and with each a name as metadata.
Any 'property=value' can be given here as metadata.
jane = Resource(type="Person", name="Jane Doe", givenName="Jane", familyName="Doe")
john = Resource(type=["Person","Agent"], name="John Smith", givenName="John", familyName="Smith")
persons = [jane, john]
forge.register(persons)
<count> 2 <action> _register_many <succeeded> True
# A resource can be retrieved by its id
result = forge.retrieve(id= john.id)
# Note that the Blue Brain Nexus has automatically generated ids (id property) for the resources. The generated ids are unique in the selected project.
forge.as_json(result)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': ['Person', 'Agent'], 'familyName': 'Smith', 'givenName': 'John', 'name': 'John Smith'}
# Add store_metadata=True to see extra metadata added by Blue Brain Nexus (e.g. _rev, _createdBy, _updatedAt, _deprecated, ...)
forge.as_json(result, store_metadata=True)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': ['Person', 'Agent'], 'familyName': 'Smith', 'givenName': 'John', 'name': 'John Smith', '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json', '_createdAt': '2022-03-22T16:38:47.861Z', '_createdBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy', '_deprecated': False, '_incoming': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274/incoming', '_outgoing': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274/outgoing', '_project': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/MyProject', '_rev': 1, '_schemaProject': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/MyProject', '_self': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', '_updatedAt': '2022-03-22T16:38:47.861Z', '_updatedBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy'}
# Each person has a file attached. We'll use forge.attach to be able to register the files in the knowledge graph
scientists_df = pd.read_csv("../../data/persons-with-id.csv")
scientists_df
id | type | name | |
---|---|---|---|
0 | https://www.wikidata.org/wiki/Q7186 | Person | Marie Curie |
1 | https://www.wikidata.org/wiki/Q937 | Person | Albert Einstein |
# Resources can be created from a pandas dataframe
scientists = forge.from_dataframe(scientists_df)
forge.as_json(scientists[0])
{'id': 'https://www.wikidata.org/wiki/Q7186', 'type': 'Person', 'name': 'Marie Curie'}
# Note that registering an existing resource in a given project will throw a 'RegistrationError: resource already exists' error.
# forge.retrieve(id=...) can be used to fetch the registered resource as shown in next cell.
forge.register(scientists)
<count> 2 <action> _register_many <succeeded> True
scientists = []
Marie_Curie = forge.retrieve(id= "https://www.wikidata.org/wiki/Q7186") # Let retrieve Marie Curie resource
Albert_Einstein = forge.retrieve(id= "https://www.wikidata.org/wiki/Q937") # Let retrieve Albert Einstein resource
scientists.append(Marie_Curie)
scientists.append(Albert_Einstein)
# Note that the Blue Brain Nexus Store has kept the provided id
forge.as_json(Marie_Curie)
{'id': 'https://www.wikidata.org/wiki/Q7186', 'type': 'Person', 'name': 'Marie Curie'}
See the notebook DataFrame IO.ipynb for more details on converting Pandas DataFrame to forge Resources and the other way around.
Even though any type can be provided for a Resource, there are a set of available types that can be obtained programmatically by using the following command or by looking at the schemas doc (Note these schemas may change in the future).
forge.types()
This use case is about registering files with metadata in the knowledge graph. A specific type of Resource, called Dataset will be used. Since Dataset is also a Resource, then everything that applies to Resource also applies to Dataset.
# Let list the files that will be used and let capture the start time for this part
import time
startedAtTime = time.strftime("%Y%m%d%H%M%S")
! ls -p ../../data | egrep -v /$
associations.tsv my_data.xwz my_data_derived.txt persons-with-id.csv persons.csv tfidfvectorizer_model_schemaorg_linking
Any 'property=value' can be given here as metadata. We recommend to use properties from the Dataset schema. A Dataset is a Resource with a distribution property to account for where the data (files) are stored and where they can be accessed.
# The file content type can be provided by setting the content_type.
my_data_distribution = forge.attach("../../data/my_data.xwz")
my_dataset = Dataset(forge, type=["Entity","Dataset", "MyOtherType"],name="Interesting Dataset", distribution=my_data_distribution)
forge.register(my_dataset)
<action> _register_one <succeeded> True
# Visualise the metadata. Note the distribution property with file related metadata automatically added (contentSize, digest, encodingFormat, ...)
forge.as_json(my_dataset)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 16}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/5e1ce30f-adb4-469e-998d-f7a9fe4f10b6', 'digest': {'algorithm': 'SHA-256', 'value': 'df03e7e93f870c6731540b3cae26391670da682c7a8dbdd18448cbcfc4fb7981'}, 'encodingFormat': 'application/octet-stream', 'name': 'my_data.xwz'}, 'name': 'Interesting Dataset'}
result = forge.retrieve(id= my_dataset.id)
forge.as_json(result)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 16}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/5e1ce30f-adb4-469e-998d-f7a9fe4f10b6', 'digest': {'algorithm': 'SHA-256', 'value': 'df03e7e93f870c6731540b3cae26391670da682c7a8dbdd18448cbcfc4fb7981'}, 'encodingFormat': 'application/octet-stream', 'name': 'my_data.xwz'}, 'name': 'Interesting Dataset'}
See the notebook BBP KG Search and Download.ipynb for more search details and options.
filters = {"type":"Dataset", "name":"Interesting Dataset"}
results = forge.search(filters, limit=3)
print(f"{len(results)} results found")
1 results found
forge.as_json(results[0])
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 16}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/5e1ce30f-adb4-469e-998d-f7a9fe4f10b6', 'digest': {'algorithm': 'SHA-256', 'value': 'df03e7e93f870c6731540b3cae26391670da682c7a8dbdd18448cbcfc4fb7981'}, 'encodingFormat': 'application/octet-stream', 'name': 'my_data.xwz'}, 'name': 'Interesting Dataset'}
# A list of resources can be transformed in pandas dataframe
forge.as_dataframe(results)
id | type | distribution.type | distribution.atLocation.type | distribution.atLocation.store.id | distribution.atLocation.store.type | distribution.atLocation.store._rev | distribution.contentSize.unitCode | distribution.contentSize.value | distribution.contentUrl | distribution.digest.algorithm | distribution.digest.value | distribution.encodingFormat | distribution.name | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | https://staging.nexus.ocp.bbp.epfl.ch/v1/resou... | [Entity, Dataset, MyOtherType] | DataDownload | Location | https://bluebrain.github.io/nexus/vocabulary/d... | DiskStorage | 1 | bytes | 16 | https://staging.nexus.ocp.bbp.epfl.ch/v1/files... | SHA-256 | df03e7e93f870c6731540b3cae26391670da682c7a8dbd... | application/octet-stream | my_data.xwz | Interesting Dataset |
# The argument overwrite: bool can be provided to decide whether to overwrite (True) existing files with the same name or
# to create new ones (False) with their names suffixed with a timestamp
my_dataset.download(path="./downloaded/", source="distributions")
! ls -l ./downloaded
total 8 -rw-r--r-- 1 mfsy staff 16 Mar 22 17:43 my_data.xwz
#! rm -R ./downloaded/
In case the dataset files are stored in an external storage (e.g. GPFS), it is possible to get their location
forge.as_json(my_dataset.distribution.atLocation)
{'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}
# This will break when in staging as no gpfs storage is used
my_dataset.distribution.atLocation.location
Provenance are specific metadata accounting for (among other things) data lineage (derivation), who contributed to the generation of the dataset (contribution), how the dataset was generated (generation), the subject of the dataset if any (subject).
Let consider the file ../../data/my_data_derived.txt to derive from ../../data/my_data.xwz
# The file content type can be provided by setting the content_type.
my_derived_data_distribution = forge.attach("../../data/my_data_derived.txt", content_type="application/txt")
my_derived_dataset = Dataset(forge, name="Derived Dataset from my_dataset", distribution=my_derived_data_distribution)
forge.register(my_derived_dataset)
<action> _register_one <succeeded> True
result = forge.retrieve(id=my_derived_dataset.id)
# Note the added distribution property
forge.as_json(result)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'name': 'Derived Dataset from my_dataset'}
# my_derived_dataset derived from my_dataset
my_derived_dataset.add_derivation(my_dataset)
# Since the my_derived_dataset is already registered, it can be updated to store its derivation information.
# If no change occurs (i.e there is nothing to update), then forge.update(...) will throw a "UpdatingError: resource should not be synchronized" error.
forge.update(my_derived_dataset)
<action> _update_one <succeeded> True
# Note the increased _rev number because of the update
forge.as_json(my_derived_dataset, store_metadata=True)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'name': 'Derived Dataset from my_dataset', '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json', '_createdAt': '2022-03-22T16:43:48.958Z', '_createdBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy', '_deprecated': False, '_incoming': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d/incoming', '_outgoing': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d/outgoing', '_project': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/MyProject', '_rev': 2, '_schemaProject': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/MyProject', '_self': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', '_updatedAt': '2022-03-22T16:44:26.248Z', '_updatedBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy'}
Adding contributors to the dataset. The contributors are john, jane and the persons stored in the ../../data/persons.csv file. All persons from the file will be resources in the knowledge graph to be able to reference them as contributors.
# An id can also be provided to add_contribution(). By default, ids are versioned when referenced to avoid being impacted by further changes and keep the state at which they were when referenced.
for contributor in scientists:
my_derived_dataset.add_contribution(contributor)
my_derived_dataset.add_contribution(john.id, versioned=False)
my_derived_dataset.add_contribution(jane)
forge.update(my_derived_dataset)
<action> _update_one <succeeded> True
result = forge.retrieve(id= my_derived_dataset.id)
forge.as_json(result)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'name': 'Derived Dataset from my_dataset'}
# By adding store_metadata=True, the revision number of a resource can be introspected
forge.as_json(result, store_metadata=True)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'name': 'Derived Dataset from my_dataset', '_constrainedBy': 'https://bluebrain.github.io/nexus/schemas/unconstrained.json', '_createdAt': '2022-03-22T16:43:48.958Z', '_createdBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy', '_deprecated': False, '_incoming': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d/incoming', '_outgoing': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d/outgoing', '_project': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/MyProject', '_rev': 3, '_schemaProject': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/projects/bbp/MyProject', '_self': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', '_updatedAt': '2022-03-22T16:44:38.758Z', '_updatedBy': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/realms/bbp/users/sy'}
An activity used some entity to generate new ones aand can potentially follow a Protocol. It has a start and end time and is associated with some agents (Person, Organization and/or SoftwareAgent)
#Was a protocol followed ?
protocol = Resource(type="Protocol", name="Protocol used to generate the dataset", description="Description of the protocol")
activity = Resource(type=["Activity", "MyCustomActivity"],
description= "Activity",
used=Resource(id=my_dataset.id,type = my_dataset.type), # the value here can be an array of any dataset or entity (e.g. config files) that was used to generate my_derived_dataset
hadProtocol=protocol,
startedAtTime=startedAtTime,
endedAtTime=time.strftime("%Y%m%d%H%M%S"),
wasAssociatedWith= Resource(id = jane.id,type = jane.type) # the value here can be an array of any agents
)
forge.register(activity)
<action> _register_one <succeeded> True
forge.as_json(activity)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/bf727e72-a131-4b0d-a305-15b0cd902a47', 'type': ['Activity', 'MyCustomActivity'], 'description': 'Activity', 'endedAtTime': '20220322174505', 'hadProtocol': {'type': 'Protocol', 'description': 'Description of the protocol', 'name': 'Protocol used to generate the dataset'}, 'startedAtTime': '20220322174225', 'used': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a', 'type': ['Entity', 'Dataset', 'MyOtherType']}, 'wasAssociatedWith': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3', 'type': 'Person'}}
my_derived_dataset.add_generation(activity)
forge.update(my_derived_dataset)
<action> _update_one <succeeded> True
forge.as_json(my_derived_dataset)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'generation': {'type': 'Generation', 'activity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/bf727e72-a131-4b0d-a305-15b0cd902a47?rev=1', 'type': ['Activity', 'MyCustomActivity']}}, 'name': 'Derived Dataset from my_dataset'}
The subject on wich the study was performed can be added if any. The subject schema can be used for more informatation.
# Note that Resource can be used as value of a property
my_derived_dataset.subject = Resource(type=["Subject","Entity"],
name="P14-12 Rattus norvegicus Wistar Han",
species= Resource(id="http://purl.obolibrary.org/obo/NCBITaxon_10116", label="Rattus norvegicus"),
strain = Resource(id="http://purl.obolibrary.org/obo/RS_0001833", label="Wistar Han"),
age = Resource(period="Post-natal", value=14, unitCode="days"),
sex = Resource(id="http://purl.obolibrary.org/obo/PATO_0000384", label="male")
)
forge.update(my_derived_dataset)
<action> _update_one <succeeded> True
forge.as_json(my_derived_dataset)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'generation': {'type': 'Generation', 'activity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/bf727e72-a131-4b0d-a305-15b0cd902a47?rev=1', 'type': ['Activity', 'MyCustomActivity']}}, 'name': 'Derived Dataset from my_dataset', 'subject': {'type': ['Subject', 'Entity'], 'age': {'period': 'Post-natal', 'unitCode': 'days', 'value': 14}, 'name': 'P14-12 Rattus norvegicus Wistar Han', 'sex': {'id': 'http://purl.obolibrary.org/obo/PATO_0000384', 'label': 'male'}, 'species': {'id': 'http://purl.obolibrary.org/obo/NCBITaxon_10116', 'label': 'Rattus norvegicus'}, 'strain': {'id': 'http://purl.obolibrary.org/obo/RS_0001833', 'label': 'Wistar Han'}}}
my_derived_dataset.license = Resource (id="https://creativecommons.org/licenses/by/4.0", label="CC BY 4.0", description="You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.") # this is just an example
forge.update(my_derived_dataset)
<action> _update_one <succeeded> True
forge.as_json(my_derived_dataset)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'generation': {'type': 'Generation', 'activity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/bf727e72-a131-4b0d-a305-15b0cd902a47?rev=1', 'type': ['Activity', 'MyCustomActivity']}}, 'license': {'id': 'https://creativecommons.org/licenses/by/4.0', 'label': 'CC BY 4.0', 'description': 'You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.'}, 'name': 'Derived Dataset from my_dataset', 'subject': {'type': ['Subject', 'Entity'], 'age': {'period': 'Post-natal', 'unitCode': 'days', 'value': 14}, 'name': 'P14-12 Rattus norvegicus Wistar Han', 'sex': {'id': 'http://purl.obolibrary.org/obo/PATO_0000384', 'label': 'male'}, 'species': {'id': 'http://purl.obolibrary.org/obo/NCBITaxon_10116', 'label': 'Rattus norvegicus'}, 'strain': {'id': 'http://purl.obolibrary.org/obo/RS_0001833', 'label': 'Wistar Han'}}}
Tagging a dataset is equivalent to git tag. It allows to version a dataset.
forge.tag(my_derived_dataset, value="releaseV112")
<action> _tag_one <succeeded> True
my_derived_dataset.description="Derived Dataset description"
forge.update(my_derived_dataset)
<action> _update_one <succeeded> True
forge.as_json(my_derived_dataset)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'description': 'Derived Dataset description', 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'generation': {'type': 'Generation', 'activity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/bf727e72-a131-4b0d-a305-15b0cd902a47?rev=1', 'type': ['Activity', 'MyCustomActivity']}}, 'license': {'id': 'https://creativecommons.org/licenses/by/4.0', 'label': 'CC BY 4.0', 'description': 'You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.'}, 'name': 'Derived Dataset from my_dataset', 'subject': {'type': ['Subject', 'Entity'], 'age': {'period': 'Post-natal', 'unitCode': 'days', 'value': 14}, 'name': 'P14-12 Rattus norvegicus Wistar Han', 'sex': {'id': 'http://purl.obolibrary.org/obo/PATO_0000384', 'label': 'male'}, 'species': {'id': 'http://purl.obolibrary.org/obo/NCBITaxon_10116', 'label': 'Rattus norvegicus'}, 'strain': {'id': 'http://purl.obolibrary.org/obo/RS_0001833', 'label': 'Wistar Han'}}}
# version argument can be specified to retroeive the dataset at a given tag.
result = forge.retrieve(id=my_derived_dataset.id, version="releaseV112")
result != my_derived_dataset
True
# Note that description is not retrieved as it was added after the tag
forge.as_json(result)
{'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/03b6023a-96ab-4e6d-8d76-4027588c548d', 'type': 'Dataset', 'contribution': [{'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q7186?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://www.wikidata.org/wiki/Q937?rev=1', 'type': 'Person'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/5748c182-8390-4d2a-b290-9fa174a5e274', 'type': 'Agent'}}, {'type': 'Contribution', 'agent': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/88652303-dec0-4501-8352-bb33630e1ef3?rev=1', 'type': 'Person'}}], 'derivation': {'type': 'Derivation', 'entity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/1cab4ebb-6016-4c61-aa6e-d3c1411e8b5a?rev=1', 'type': ['Entity', 'Dataset', 'MyOtherType'], 'name': 'Interesting Dataset'}}, 'distribution': {'type': 'DataDownload', 'atLocation': {'type': 'Location', 'store': {'id': 'https://bluebrain.github.io/nexus/vocabulary/diskStorageDefault', 'type': 'DiskStorage', '_rev': 1}}, 'contentSize': {'unitCode': 'bytes', 'value': 24}, 'contentUrl': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/files/bbp/MyProject/520d4f3e-d3c8-457f-983f-0d5c63e8818d', 'digest': {'algorithm': 'SHA-256', 'value': '0cb8f5c19ee618551fb27872cd24578eeb298963596a10fe6fccfd55c11258a4'}, 'encodingFormat': 'application/txt', 'name': 'my_data_derived.txt'}, 'generation': {'type': 'Generation', 'activity': {'id': 'https://staging.nexus.ocp.bbp.epfl.ch/v1/resources/bbp/MyProject/_/bf727e72-a131-4b0d-a305-15b0cd902a47?rev=1', 'type': ['Activity', 'MyCustomActivity']}}, 'license': {'id': 'https://creativecommons.org/licenses/by/4.0', 'label': 'CC BY 4.0', 'description': 'You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.'}, 'name': 'Derived Dataset from my_dataset', 'subject': {'type': ['Subject', 'Entity'], 'age': {'period': 'Post-natal', 'unitCode': 'days', 'value': 14}, 'name': 'P14-12 Rattus norvegicus Wistar Han', 'sex': {'id': 'http://purl.obolibrary.org/obo/PATO_0000384', 'label': 'male'}, 'species': {'id': 'http://purl.obolibrary.org/obo/NCBITaxon_10116', 'label': 'Rattus norvegicus'}, 'strain': {'id': 'http://purl.obolibrary.org/obo/RS_0001833', 'label': 'Wistar Han'}}}