i've got a hypothesis that notebooks should have more metadata than display data. further, we can demostrate that rdf data can stored in notebooks and consumed by open source tools, specifically rdflib
.
from schemata import *; import rdflib, json
we'll demonstrate these points on the github user data. the snippet below makes a requests for my public github data.
user = Uri.cache()["https://api.github.com/users{/user}"]
data = user("tonyfast").get().json()
now we need to annotate this information with rdf metadata. i appreciate it you ignore the messy round about way we are writing context. it will get better.
class Context(Dict):
avatar_url: Type__["@id"] + Id__[SCHEMA.image]
html_url: str(Id__)
description: SCHEMA.description
company: SCHEMA.affiliation
name: SCHEMA.name
ctx = Context__[Context.schema().pop("properties")]; ctx.schema()
{'@context': {'avatar_url': {'@type': '@id', '@id': 'http://schema.org/image'}, 'html_url': '@id', 'description': 'http://schema.org/description', 'company': 'http://schema.org/affiliation', 'name': 'http://schema.org/name'}}
nevertheless, we derive a ctx
jsonld_graph = ctx.expand(data); jsonld_graph
[{'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], '@id': 'https://github.com/tonyfast', 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
now we get funky on some rdflib
. or ctx
is json-ld that we parse into a rdflib.Graph
g = rdflib.Graph()
g.parse(data=json.dumps(jsonld_graph), format=mediatypes.LD.value(mediatypes.ContentMediaType))
<Graph identifier=N63d22d984fff449aa4181440b1185f56 (<class 'rdflib.graph.Graph'>)>
next we'll embed the metadata into our document as turtle that rdflib
will discover. _this encoding will be hidden from the user, i am print
ing it for demonstration.
html = String.html()(F"""<script type="text/turtle">
{g.serialize(format="ttl").decode()}</script>"""); print(html); html
<script type="text/turtle"> @prefix ns1: <http://schema.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <https://github.com/tonyfast> ns1:affiliation "data scientist @Quansight " ; ns1:image <https://avatars.githubusercontent.com/u/4236275?v=4> ; ns1:name "Tony Fast" . </script>
to ensure we've captured the metadata we will parse the rdflib.Graph
in a few ways
snippet_graph
is derived just for the minimal script tag. snippet_graph = rdflib.Graph()
snippet_graph.parse(data=html, format="html")
<Graph identifier=Nfde680e65278402a9dbc7c5c46d1bc6c (<class 'rdflib.graph.Graph'>)>
nbconvert_graph
is derived just for a full fledged webpage of this notebook nbconvert_graph = rdflib.Graph()
nbconvert_graph.parse(data=__import__("nbconvert").get_exporter("html")().from_filename("embed-rdf.ipynb")[0], format="html")
<Graph identifier=N8cd4e26b26c74d80b0e0c590d96880f0 (<class 'rdflib.graph.Graph'>)>
nbviewer_graph
is loaded with a single url from the nbviewer
service nbviewer_graph = rdflib.Graph()
nbviewer_graph.parse("https://nbviewer.jupyter.org/gist/tonyfast/16d3bc82d69890949212b46040bd86e1")
<Graph identifier=Nb40aae7c876c475d84a355cfd86db566 (<class 'rdflib.graph.Graph'>)>
finally we verify we got all the datas
def normalize(graph): """to something we can compare"""; return sorted(__import__("json").loads(graph.serialize(format="json-ld")), key=len)
the context from the snippet_graph
and the local nbconvert
translation are the same.
assert normalize(snippet_graph)\
== normalize(nbconvert_graph)
we'll filter the primary metadata
from the graphs
metadata = {
x: sorted(normalize(x), key=len)[-1] for x in (snippet_graph, nbconvert_graph, nbviewer_graph)
}
and notice that they all contain that good, good metadata bout me.
assert metadata[snippet_graph] == metadata[nbconvert_graph] == metadata[nbviewer_graph]
normalize(snippet_graph)
[{'@id': '', 'http://www.w3.org/ns/md#item': [{'@list': []}]}, {'@id': 'https://github.com/tonyfast', 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], 'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
normalize(nbconvert_graph)
[{'@id': '', 'http://www.w3.org/ns/md#item': [{'@list': []}]}, {'@id': 'https://github.com/tonyfast', 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], 'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
normalize(nbviewer_graph)
[{'@id': 'https://nbviewer.jupyter.org/gist/tonyfast/16d3bc82d69890949212b46040bd86e1', 'http://www.w3.org/ns/md#item': [{'@list': []}]}, {'@id': 'https://github.com/tonyfast', 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], 'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
Ø