i've got a hypothesis that notebooks should have more metadata than display data. further, we can demostrate that rdf data can stored in notebooks and consumed by open source tools, specifically rdflib
from schemata import *; import rdflib, json
we'll demonstrate these points on the github user data. the snippet below makes a requests for my public github data.
user = Uri.cache()["https://api.github.com/users{/user}"]
data = user("tonyfast").get().json()
now we need to annotate this information with rdf metadata. i appreciate it you ignore the messy round about way we are writing context. it will get better.
class Context(Dict):
avatar_url: Type__["@id"] + Id__[SCHEMA.image]
html_url: str(Id__)
description: SCHEMA.description
company: SCHEMA.affiliation
name: SCHEMA.name
ctx = Context__[Context.schema().pop("properties")]; ctx.schema()
{'@context': {'avatar_url': {'@type': '@id', '@id': 'http://schema.org/image'}, 'html_url': '@id', 'description': 'http://schema.org/description', 'company': 'http://schema.org/affiliation', 'name': 'http://schema.org/name'}}
nevertheless, we derive a ctx
jsonld_graph = ctx.expand(data); jsonld_graph
[{'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], '@id': 'https://github.com/tonyfast', 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
now we get funky on some rdflib
. or ctx
is json-ld that we parse into a rdflib.Graph
g = rdflib.Graph()
g.parse(data=json.dumps(jsonld_graph), format=mediatypes.LD.value(mediatypes.ContentMediaType))
<Graph identifier=N63d22d984fff449aa4181440b1185f56 (<class 'rdflib.graph.Graph'>)>
next we'll embed the metadata into our document as turtle that rdflib
will discover. _this encoding will be hidden from the user, i am print
ing it for demonstration.
html = String.html()(F"""<script type="text/turtle">
{g.serialize(format="ttl").decode()}</script>"""); print(html); html
<script type="text/turtle"> @prefix ns1: <http://schema.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <https://github.com/tonyfast> ns1:affiliation "data scientist @Quansight " ; ns1:image <https://avatars.githubusercontent.com/u/4236275?v=4> ; ns1:name "Tony Fast" . </script>
to ensure we've captured the metadata we will parse the rdflib.Graph
in a few ways
is derived just for the minimal script tag. snippet_graph = rdflib.Graph()
snippet_graph.parse(data=html, format="html")
<Graph identifier=Nfde680e65278402a9dbc7c5c46d1bc6c (<class 'rdflib.graph.Graph'>)>
is derived just for a full fledged webpage of this notebook nbconvert_graph = rdflib.Graph()
nbconvert_graph.parse(data=__import__("nbconvert").get_exporter("html")().from_filename("embed-rdf.ipynb")[0], format="html")
<Graph identifier=N8cd4e26b26c74d80b0e0c590d96880f0 (<class 'rdflib.graph.Graph'>)>
is loaded with a single url from the nbviewer
service nbviewer_graph = rdflib.Graph()
<Graph identifier=Nb40aae7c876c475d84a355cfd86db566 (<class 'rdflib.graph.Graph'>)>
finally we verify we got all the datas
def normalize(graph): """to something we can compare"""; return sorted(__import__("json").loads(graph.serialize(format="json-ld")), key=len)
the context from the snippet_graph
and the local nbconvert
translation are the same.
assert normalize(snippet_graph)\
== normalize(nbconvert_graph)
we'll filter the primary metadata
from the graphs
metadata = {
x: sorted(normalize(x), key=len)[-1] for x in (snippet_graph, nbconvert_graph, nbviewer_graph)
and notice that they all contain that good, good metadata bout me.
assert metadata[snippet_graph] == metadata[nbconvert_graph] == metadata[nbviewer_graph]
[{'@id': '', 'http://www.w3.org/ns/md#item': [{'@list': []}]}, {'@id': 'https://github.com/tonyfast', 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], 'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
[{'@id': '', 'http://www.w3.org/ns/md#item': [{'@list': []}]}, {'@id': 'https://github.com/tonyfast', 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], 'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]
[{'@id': 'https://nbviewer.jupyter.org/gist/tonyfast/16d3bc82d69890949212b46040bd86e1', 'http://www.w3.org/ns/md#item': [{'@list': []}]}, {'@id': 'https://github.com/tonyfast', 'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}], 'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}], 'http://schema.org/name': [{'@value': 'Tony Fast'}]}]