{panels_fairplus}
:identifier_text: FCB066
:identifier_link: 'https://w3id.org/faircookbook/FCB066'
:difficulty_level: 2
:recipe_type: hands_on
:reading_time_minutes: 15
:intended_audience: principal_investigator, data_manager, data_scientist
:maturity_level: 4
:maturity_indicator: 4
:has_executable_code: yeah
:recipe_name: Minimal Metadata Maturity with ISA
The goal of this tutorial is to show how to package a dataset, an ISA JSON-LD document with the associated raw data files and a computational workflow available as a CWL file in this example, as a minimal Research Object crate.
To do so, we will be using:
a subset of the Research Object Crate specifications.
Let's get started by getting all necessary modules:
import os
import json
import datetime
import isatools
import uuid
import hashlib
import datetime
from json import load
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
from rocrate.model.dataset import Dataset
from rocrate.model.softwareapplication import SoftwareApplication
from rocrate.model.computationalworkflow import ComputationalWorkflow
from rocrate.model.computerlanguage import ComputerLanguage
from rocrate import rocrate_api
With the previous notebooks (recipes FCBXY1 and FCBXY2), we generated several distinct ISA documents:
We will be using the RDF serialization, associated raw data files (dummy FASTQ files), a computational workflow available as a CWL file.
Research Object
and providing basic metadata¶ontology = "obo"
a_crate_for_isa = ROCrate()
# a_crate_for_isa.id = "#research_object/" + str(ro_id)
a_crate_for_isa.name = "ISA JSON-LD representation of BII-S-3"
a_crate_for_isa.description = "ISA study serialized as JSON-LD using " + ontology + " ontology mapping"
a_crate_for_isa.keywords = ["ISA", "JSON-LD"]
a_crate_for_isa.license = "https://creativecommons.org/licenses/by/4.0/"
# a_crate_for_isa.creator = Person(a_crate_for_isa, "https://www.orcid.org/0000-0001-9853-5668", {"name": "Philippe Rocca-Serra"})
test = a_crate_for_isa.add()
a_crate_for_isa.license = "https://creativecommons.org/licenses/by/4.0/"
In this case, we show how to use an ORCID to do so but using the creator
property of the RO-crate object
, and building
a Person
object
a_crate_for_isa.creator = Person(a_crate_for_isa,"https://www.orcid.org/0000-0001-9853-5668")
Research Object create
.¶# instance_path = os.path.join("./output/BII-S-3-synth/", "isa-new_ids.json")
#
# with open(instance_path, 'r') as instance_file:
# instance = load(instance_file)
# instance_file.close()
isa_json_ld_path = os.path.join("./output/BII-S-3-synth/", "isa-new_ids-BII-S-3-ld-" + ontology + "-v1.json")
isa_nquads_path = os.path.join("./output/BII-S-3-synth/", "isa.ttl")
files = [isa_json_ld_path, isa_nquads_path ]
# with a python comprehension, we do it like this:
[a_crate_for_isa.add_file(file) for file in files]
ds = Dataset(a_crate_for_isa, "raw_images")
ds.format_id="http://edamontology.org/format_3604"
ds.datePublished=datetime.datetime.now()
ds.as_jsonld=isa_json_ld_path
a_crate_for_isa.add(ds)
Computational Workflow
object and we add it to the Research Object
¶{admonition}
Note that the Computation Workflow may also be representated as an ISA Protocol Object.
wf = ComputationalWorkflow(a_crate_for_isa, "metagenomics-sequence-analysis.cwl")
wf.language="http://edamontology.org/format_3857"
wf.datePublished=datetime.datetime.now()
with open("metagenomics-sequence-analysis.cwl","rb") as f:
bytes = f.read()
new_hash = hashlib.sha256(bytes).hexdigest()
wf.hash=new_hash
a_crate_for_isa.add(wf)
Research Object
to file¶ro_outpath = "./output/BII-S-3-synth/ISA_in_a_ROcrate"
a_crate_for_isa.write_crate(ro_outpath)
with open(os.path.join(ro_outpath,"ro-crate-metadata.json"), 'r') as handle:
# print(handle)
parsed = json.load(handle)
print(json.dumps(parsed, indent=4, sort_keys=True))
a_crate_for_isa.write_zip(ro_outpath)
et Voilà!
With this content type, we have briefly introduced the notion of RO-Crate as a mechanism to package data and associated
metadata using a python library providing initial capability by offering a minimal implementation of the specifications.
The current iteration of the python library presents certain limitations. For instance, it does not provide the
necessary functionality to allow recording of Provenance
information. However, this can be easily accomplished by
extending the code.
The key message behind this recipe is simply to show that RO-crate can improve over simply zipping a bunch of files
together by providing a little semantic over the different parts making up an archive.
Also, it is important to bear in mind that the Research Object crate is nascent and more work is needed to define
use best practices and implementation profiles.
What to read next ?