This notebook focuses demonstrate how to integrate Allen datasets into the Blue Brain Knowledge Graph
The tasks to be demonstrated are the following:
import getpass
import allensdk
from kgforge.core import KnowledgeGraphForge
from kgforge.version import __version__
Check versions
print("Allensdk is", allensdk.__version__, ", and Nexus Fogre is", __version__)
Allensdk is 1.7.1 , and Nexus Fogre is 0.2.1.dev82+g0b66b09
Please enter your BBP token:
# token = getpass.getpass()
token = "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICI5T0R3Z1JSTFVsTTJHbFphVDZjVklnenJsb0lzUWJmbTBDck1icXNjNHQ4In0.eyJqdGkiOiI4NmJiNmUxNi04OWFhLTRjYjktODNhMC0zYjc2MDU0NDk5NGYiLCJleHAiOjE1OTExNjA4NTIsIm5iZiI6MCwiaWF0IjoxNTkxMTMyMDUyLCJpc3MiOiJodHRwczovL2JicGF1dGguZXBmbC5jaC9hdXRoL3JlYWxtcy9CQlAiLCJzdWIiOiJmOjBmZGFkZWY3LWIyYjktNDkyYi1hZjQ2LWM2NTQ5MmQ0NTljMjphZ2FyY2lhIiwidHlwIjoiQmVhcmVyIiwiYXpwIjoibmV4dXMtd2ViIiwibm9uY2UiOiJjM2M5YmRhMTZkNTM0Zjk0ODgxZGRhYzZhZGQ2NjAzZCIsImF1dGhfdGltZSI6MTU5MTEzMjA1MSwic2Vzc2lvbl9zdGF0ZSI6ImIwM2FjYWM1LTIxOGItNDA5My05Y2ZmLTRlNTRhY2FhNTQxYSIsImFjciI6IjAiLCJhbGxvd2VkLW9yaWdpbnMiOlsiaHR0cHM6Ly9kZXYubmV4dXMub2NwLmJicC5lcGZsLmNoIiwiaHR0cHM6Ly9iYnAuZXBmbC5jaCIsImh0dHA6Ly9kZXYubmV4dXMub2NwLmJicC5lcGZsLmNoIiwiaHR0cHM6Ly9zdGFnaW5nLm5leHVzLm9jcC5iYnAuZXBmbC5jaCIsImh0dHBzOi8vYmJwLW5leHVzLmVwZmwuY2giLCJodHRwczovL2JicHRlYW0uZXBmbC5jaCIsImh0dHA6Ly9zdGFnaW5nLm5leHVzLm9jcC5iYnAuZXBmbC5jaCJdLCJzY29wZSI6Im9wZW5pZCBwcm9maWxlIGdyb3VwcyBlbWFpbCIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJuYW1lIjoiQWxlamFuZHJhIEdhcmNpYSBSb2phcyBHYXJjaWEgUm9qYXMgTWFydGluZXoiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJhZ2FyY2lhIiwiZ2l2ZW5fbmFtZSI6IkFsZWphbmRyYSBHYXJjaWEgUm9qYXMiLCJmYW1pbHlfbmFtZSI6IkdhcmNpYSBSb2phcyBNYXJ0aW5leiIsImVtYWlsIjoiYWxlamFuZHJhLmdhcmNpYXJvamFzQGVwZmwuY2gifQ.oB7SCkvflnTudSlDINGueJLZalRcMhTNPeevFpJRajCXXlRivH4JRENbVRuYZDO5__N3KjYigRkRIhrP-AXBxj8TLv8nFpyGP6G_1T3BRRqed9EjCxowoA13tEb7x40U1BNnoEPdGau2YiH3149MEALgKpmCLftiCd1ooEwpSzd-6NiVJuWbdlZWd12OJYg3D2oL62pt5n7tCgKUBCIphgG2Okc9StU1Wm-P6UsT--23q-0WXdxODMUrvrOiWi1d9V50LHQ3gzIOqHx4MruBA4NgvE_3QcIOuysWvhZVifErnjKmYAiLhcslkW8Ecilj7wkOu_ZBd64TnBTUdYKnkA"
Note: Initialiting the forge may take a few seconds if the source is a directory
forge = KnowledgeGraphForge("../../configurations/demo-forge-nexus-neuroshapes.yml", token=token)
from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.api.queries.cell_types_api import CellTypesApi
ALLEN_DIR = "allen_cell_types_database"
ctc = CellTypesCache(manifest_file=f"{ALLEN_DIR}/manifest.json")
human_cells = ctc.get_cells(species=[CellTypesApi.HUMAN], require_reconstruction=True)
!ls allen_cell_types_database
cells.json manifest.json
len(human_cells)
152
FROM = 8
TO = 10
human_cell_ids = [x["id"] for x in human_cells][FROM:TO]
human_cell_ids
[527942865, 529807751]
For the picked cells, we check that they are not already integrated. If they are already integrated, get another couple of ids in human_cell_ids in the step 2.B.
Note that forge.format
method is used to create the identifier for the patchedcells to be attempted to be retreived from Nexus.
for id_ in human_cell_ids:
kg_id = forge.format("identifier", "patchedcells", id_)
print(kg_id)
resource = forge.retrieve(kg_id)
if resource:
print("> already integrated")
https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865 <action> retrieve <error> RetrievalError: resource 'https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865' not found for schema 'https://bluebrain.github.io/nexus/schemas/unconstrained.json' https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/529807751 <action> retrieve <error> RetrievalError: resource 'https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/529807751' not found for schema 'https://bluebrain.github.io/nexus/schemas/unconstrained.json'
human_cell_reconstructions = [ctc.get_reconstruction(x) for x in human_cell_ids]
2020-06-02 23:31:21,701 allensdk.api.api.retrieve_file_over_http INFO Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/601947673 2020-06-02 23:31:23,344 allensdk.api.api.retrieve_file_over_http INFO Downloading URL: http://api.brain-map.org/api/v2/well_known_file_download/667320244
import json
with open(f"{ALLEN_DIR}/cells.json") as f:
allen_cell_types_metadata = json.load(f)
human_cell_metadata = [x for x in allen_cell_types_metadata if x["specimen__id"] in human_cell_ids]
Have a look to a single record
human_cell_metadata[0]
{'cell_reporter_status': None, 'csl__normalized_depth': 0.694465802153644, 'csl__x': 67.0, 'csl__y': 256.0, 'csl__z': 110.0, 'donor__age': '24 yrs', 'donor__disease_state': 'epilepsy', 'donor__id': 527747035, 'donor__name': 'H16.06.008', 'donor__race': 'Hispanic', 'donor__sex': 'Female', 'donor__species': 'Homo Sapiens', 'donor__years_of_seizure_history': '7', 'ef__adaptation': 0.975858514620517, 'ef__avg_firing_rate': 2.71252644713286, 'ef__avg_isi': 368.66, 'ef__f_i_curve_slope': 0.0785947712418301, 'ef__fast_trough_v_long_square': -53.1875, 'ef__peak_t_ramp': 4.21818666666667, 'ef__ri': 136.718824505806, 'ef__tau': 26.8857603438525, 'ef__threshold_i_long_square': 70.0, 'ef__upstroke_downstroke_ratio_long_square': 3.90618458032267, 'ef__vrest': -67.7251434326172, 'ephys_inst_thresh_thumb_path': '/api/v2/well_known_file_download/529903821', 'ephys_thumb_path': '/api/v2/well_known_file_download/529903819', 'erwkf__id': 616980497, 'line_name': '', 'm__biophys': 1, 'm__biophys_all_active': 0, 'm__biophys_perisomatic': 1, 'm__glif': 0, 'morph_thumb_path': '/api/v2/well_known_file_download/618683378', 'nr__average_contraction': 0.845900877489379, 'nr__average_parent_daughter_ratio': 0.930432608365231, 'nr__max_euclidean_distance': 1042.45317093294, 'nr__number_bifurcations': 21, 'nr__number_stems': 6, 'nr__reconstruction_type': 'full', 'nrwkf__id': 601947673, 'si__height': 9632, 'si__path': '/external/humancelltypes/prod44/specimen_527942865/min_xy_527942865.aff', 'si__width': 14842, 'specimen__hemisphere': 'left', 'specimen__id': 527942865, 'specimen__name': 'H16.06.008.01.26.04', 'structure__acronym': 'MTG', 'structure__id': 12141, 'structure__layer': '5', 'structure__name': '"middle temporal gyrus"', 'structure_parent__acronym': 'MTG', 'structure_parent__id': 12141, 'tag__apical': 'truncated', 'tag__dendrite_type': 'spiny'}
The forge a Dictionary Mapper that uses mapping files that provide the required transformation form a dictionary to another dictionary. The Dictionary Mappings are HJSON files containing the required transformations. This notebook has three mappings for: Subject, Patched Cell and Neuron Morphology.
!ls -l ../../mappings/allen-database-mappings
total 24 -rw-r--r-- 1 agarcia INTRANET\Domain Users 2059 Jun 2 23:23 NeuronMorphology.hjson -rw-r--r-- 1 agarcia INTRANET\Domain Users 1043 Jun 2 23:23 PatchedCell.hjson -rw-r--r-- 1 agarcia INTRANET\Domain Users 277 Jun 2 23:23 Subject.hjson
DIR = "../../mappings/allen-database-mappings"
subject_mapping_file = f"{DIR}/Subject.hjson"
patched_cel_mapping_file = f"{DIR}/PatchedCell.hjson"
neuronmorphology_mapping_file = f"{DIR}/NeuronMorphology.hjson"
from kgforge.specializations.mappings import DictionaryMapping
subject_mapping = DictionaryMapping.load(subject_mapping_file)
patchedcell_mapping = DictionaryMapping.load(patched_cel_mapping_file)
neuronmorphology_mapping = DictionaryMapping.load(neuronmorphology_mapping_file)
One of he mapping file context is shown next:
print(subject_mapping)
{ id: forge.format("identifier", "subjects", x.donor__id) type: Subject identifier: x.donor__id name: x.donor__name sex: forge.resolve(x.donor__sex, scope="terms", target="sex") species: forge.resolve(x.donor__species, scope="terms", target="species") }
Inside mapping files, it is possible to use methos form the Forge such as:
- forge.format : used to format a string using a preconfigured string format (used previously in 2.C)
- forge.resolve: used to retrieve identifiers using a string that is part of the name of the desired resource
An example of the resolver is shown next:
from kgforge.core.commons.strategies import ResolvingStrategy
Check available resolvers
forge.resolvers()
Available scopes: - agent : - resolver: AgentResolver - targets: agents - ontology : - resolver: OntologyResolver - targets: terms - terms : - resolver: DemoResolver - targets: species,sex,structure-layer
Resolve the identifier for male in the terms scope and sex as target
print(forge.resolve("male", scope="terms", target="sex"))
{ id: http://purl.obolibrary.org/obo/PATO_0000384 label: male }
It is possible to provide a list of mappings to be applied to a single dataset.
mappings = [subject_mapping, patchedcell_mapping, neuronmorphology_mapping]
resources = forge.map(human_cell_metadata, mappings)
Check the created resources
len(resources)
6
print(resources[2])
{ id: https://bbp.epfl.ch/neurosciencegraph/data/neuronmorphologies/527942865 type: NeuronMorphology apicalDendrite: truncated brainLocation: { type: BrainLocation brainRegion: { id: http://api.brain-map.org/api/v2/data/Structure/12141 label: MTG } coordinatesInBrainAtlas: { valueX: 67.0 valueY: 256.0 valueZ: 110.0 } layer: { id: http://purl.obolibrary.org/obo/UBERON_0005394 label: layer 5 } } contribution: { type: Contribution agent: { id: https://www.grid.ac/institutes/grid.417881.3 type: Organization } } derivation: [ { type: Derivation entity: { id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035 type: Subject } } { type: Derivation entity: { id: https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865 type: PatchedCell } } ] distribution: LazyAction(operation=Store.upload, args=['./allen_cell_types_database/specimen_527942865/reconstruction.swc', 'application/swc']) identifier: 527942865 name: H16.06.008.01.26.04 subject: { id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035 type: Subject } }
forge.register(resources)
<count> 6 <action> _register_many <succeeded> True
print(resources[1])
{ context: https://bbp.neuroshapes.org id: https://bbp.epfl.ch/neurosciencegraph/data/patchedcells/527942865 type: PatchedCell brainLocation: { type: BrainLocation brainRegion: { id: http://api.brain-map.org/api/v2/data/Structure/12141 label: MTG } } contribution: { type: Contribution agent: { id: https://www.grid.ac/institutes/grid.417881.3 type: Organization } } derivation: { type: Derivation entity: { id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035 type: Subject } } identifier: 527942865 name: H16.06.008.01.26.04 subject: { id: https://bbp.epfl.ch/neurosciencegraph/data/subjects/527747035 type: Subject } }
If you know exactly the ID you can just retreive as did in 2.C or you can use the search()
method. To search for resource you can start by picking a Type in the available types.
To create a search based on the PatchedCell
structure, use the paths() method which will load the structure of the givent type in a Python object and these fields can be accessed using auto-completition.
Next, the p
object will hold the properties of PatchedCell
and can be used to create a search.
p = forge.paths("PatchedCell")
results = forge.search(p.type == "PatchedCell")
len(results)
10
DISPLAY_LIMIT = 25
forge.as_dataframe(results[:DISPLAY_LIMIT])
id | type | brainLocation.type | brainLocation.brainRegion.id | brainLocation.brainRegion.label | contribution.type | contribution.agent.id | contribution.agent.type | derivation.type | derivation.entity.id | identifier | name | subject.id | subject.type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 596020931 | H17.06.009.11.04.02 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
1 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 519832676 | H16.03.001.01.09.01 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
2 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | AnG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 569095789 | H17.06.004.11.05.04 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
3 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 528706755 | H16.06.009.01.01.15.01 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
4 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 616647103 | H17.03.005.11.09.02 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
5 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | FroL | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 531520637 | H16.06.007.01.05.03 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
6 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 542143598 | H16.03.008.11.11.03 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
7 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | FroL | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 531520401 | H16.06.007.01.07.02 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
8 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 527942865 | H16.06.008.01.26.04 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
9 | https://bbp.epfl.ch/neurosciencegraph/data/pat... | PatchedCell | BrainLocation | http://api.brain-map.org/api/v2/data/Structure... | MTG | Contribution | https://www.grid.ac/institutes/grid.417881.3 | Organization | Derivation | https://bbp.epfl.ch/neurosciencegraph/data/sub... | 529807751 | H16.06.010.01.03.04.01 | https://bbp.epfl.ch/neurosciencegraph/data/sub... | Subject |
! rm -R allen_cell_types_database