The goal of this tutorial is to show how to go from an ISA document to an equivalent RDF representation using python tools but also to highlight some of the limitations of existing libraries and point to alternative options to complete a meainingful conversion to RDF Turtle format.
This notebook mainly highlights the new functionality coming with ISA-API rc10.3 latest release which allows to convert ISA-JSON to ISA-JSON-LD, with the choice of 3 popular ontological frameworks for semantic anchoring. These are:
These frameworks have been chosen for interoperability.
This notebook has a companion notebook which goes over the exploration of the resulting RDF representations using a set of SPARQL queries. Check it out here
import os
import json
from json import load
import datetime
import isatools
from isatools.convert.json2jsonld import ISALDSerializer
json.load()
function¶Prior to invoking the ISALDserializer function, we need to do 3 things.
obo
sdo
wdt
instance_path = os.path.join("./output/BII-S-3-synth/", "isa-new_ids.json")
with open(instance_path, 'r') as instance_file:
instance = load(instance_file)
instance_file.close()
ISALDserializer
function¶# we now invoke the ISALDSerializer function
ontology = "wdt"
serializer = ISALDSerializer(instance)
serializer.set_ontology(ontology)
serializer.set_instance(instance)
jsonldcontent = serializer.output
Now that the conversion is performed, we can write the resulting ISA-JSON-LD to file:
isa_json_ld_path = os.path.join("./output/BII-S-3-synth/", "BII-S-3-isa-rdf-" + ontology + "-v3.jsonld")
with open(isa_json_ld_path, 'w') as outfile:
json.dump(jsonldcontent, outfile, ensure_ascii=False, indent=4)
from rdflib import Graph
graph = Graph()
graph.parse(isa_json_ld_path)
print(f"Graph g has {len(graph)} statements.")
# Write turtle file
rdf_path=os.path.join("./output/BII-S-3-synth/", "BII-S-3-isa-rdf-" + ontology + "-v3.ttl")
with open(rdf_path, 'w') as rdf_file:
rdf_file.write(graph.serialize(format='turtle'))
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
from rocrate.model.dataset import Dataset
from rocrate.model.softwareapplication import SoftwareApplication
from rocrate.model.computationalworkflow import ComputationalWorkflow
from rocrate.model.computerlanguage import ComputerLanguage
from rocrate import rocrate_api
import uuid
import hashlib
ro_id = uuid.uuid4()
print(ro_id)
a_crate_for_isa = ROCrate()
# a_crate_for_isa.id = "#research_object/" + str(ro_id)
a_crate_for_isa.name = "ISA JSON-LD representation of BII-S-3"
a_crate_for_isa.description = "ISA study serialized as JSON-LD using " + ontology + " ontology mapping"
a_crate_for_isa.keywords = ["ISA", "JSON-LD"]
a_crate_for_isa.license = "https://creativecommons.org/licenses/by/4.0/"
a_crate_for_isa.creator = Person(a_crate_for_isa, "https://www.orcid.org/0000-0001-9853-5668", {"name": "Philippe Rocca-Serra"})
files = [isa_json_ld_path]
[a_crate_for_isa.add_file(file) for file in files]
ds = Dataset(a_crate_for_isa, "raw_images")
ds.format_id="http://edamontology.org/format_3604"
ds.datePublished=datetime.datetime.now()
ds.as_jsonld=isa_json_ld_path
a_crate_for_isa.add(ds)
wf = ComputationalWorkflow(a_crate_for_isa, "metagenomics-sequence-analysis.cwl")
wf.language="http://edamontology.org/format_3857"
wf.datePublished=datetime.datetime.now()
with open("metagenomics-sequence-analysis.cwl","rb") as f:
bytes = f.read()
new_hash = hashlib.sha256(bytes).hexdigest()
wf.hash=new_hash
a_crate_for_isa.add(wf)
ro_outpath = "./output/BII-S-3-synth/ISA_in_a_ROcrate"
a_crate_for_isa.write_crate(ro_outpath)
with open(os.path.join(ro_outpath,"ro-crate-metadata.json"), 'r') as handle:
parsed = json.load(handle)
print(json.dumps(parsed, indent=4, sort_keys=True))
a_crate_for_isa.write_zip(ro_outpath)
With this content type, we have briefly introduced the notion of RO-Crate as a mechanism to package data and associated
metadata using a python library providing initial capability by offering a minimal implementation of the specifications.
The current iteration of the python library presents certain limitations. For instance, it does not provide the
necessary functionality to allow recording of Provenance
information. However, this can be easily accomplished by
extending the code.
The key message behind this recipe is simply to show that RO-crate can improve over simply zipping a bunch of files
together by providing a little semantic over the different parts making up an archive.
Also, it is important to bear in mind that the Research Object crate is nascent and more work is needed to define
use best practices and implementation profiles.