i've got a hypothesis that notebooks should have more metadata than display data. further, we can demostrate that rdf data can stored in notebooks and consumed by open source tools, specifically rdflib.

In [1]:

    from schemata import *; import rdflib, json

we'll demonstrate these points on the github user data. the snippet below makes a requests for my public github data.

In [2]:

    user = Uri.cache()["https://api.github.com/users{/user}"]
    data = user("tonyfast").get().json()

now we need to annotate this information with rdf metadata. i appreciate it you ignore the messy round about way we are writing context. it will get better.

In [3]:

    class Context(Dict):
        avatar_url: Type__["@id"] + Id__[SCHEMA.image]
        html_url: str(Id__)
        description: SCHEMA.description
        company: SCHEMA.affiliation
        name: SCHEMA.name
    ctx = Context__[Context.schema().pop("properties")]; ctx.schema()

Out[3]:

{'@context': {'avatar_url': {'@type': '@id', '@id': 'http://schema.org/image'},
  'html_url': '@id',
  'description': 'http://schema.org/description',
  'company': 'http://schema.org/affiliation',
  'name': 'http://schema.org/name'}}

nevertheless, we derive a ctx

In [4]:

    jsonld_graph = ctx.expand(data); jsonld_graph

Out[4]:

[{'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}],
  'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}],
  '@id': 'https://github.com/tonyfast',
  'http://schema.org/name': [{'@value': 'Tony Fast'}]}]

now we get funky on some rdflib. or ctx is json-ld that we parse into a rdflib.Graph

In [5]:

    g = rdflib.Graph()
    g.parse(data=json.dumps(jsonld_graph), format=mediatypes.LD.value(mediatypes.ContentMediaType))

Out[5]:

<Graph identifier=N63d22d984fff449aa4181440b1185f56 (<class 'rdflib.graph.Graph'>)>

next we'll embed the metadata into our document as turtle that rdflib will discover. _this encoding will be hidden from the user, i am printing it for demonstration.

In [6]:

    html = String.html()(F"""<script type="text/turtle">
    {g.serialize(format="ttl").decode()}</script>"""); print(html); html

<script type="text/turtle">
@prefix ns1: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://github.com/tonyfast> ns1:affiliation "data scientist @Quansight " ;
    ns1:image <https://avatars.githubusercontent.com/u/4236275?v=4> ;
    ns1:name "Tony Fast" .

</script>

Out[6]:

to ensure we've captured the metadata we will parse the rdflib.Graph in a few ways

the snippet_graph is derived just for the minimal script tag.

In [7]:

    snippet_graph = rdflib.Graph()
    snippet_graph.parse(data=html, format="html")

Out[7]:

<Graph identifier=Nfde680e65278402a9dbc7c5c46d1bc6c (<class 'rdflib.graph.Graph'>)>

the nbconvert_graph is derived just for a full fledged webpage of this notebook

In [8]:

    nbconvert_graph = rdflib.Graph()
    nbconvert_graph.parse(data=__import__("nbconvert").get_exporter("html")().from_filename("embed-rdf.ipynb")[0], format="html")

Out[8]:

<Graph identifier=N8cd4e26b26c74d80b0e0c590d96880f0 (<class 'rdflib.graph.Graph'>)>

the nbviewer_graph is loaded with a single url from the nbviewer service

In [9]:

    nbviewer_graph = rdflib.Graph()
    nbviewer_graph.parse("https://nbviewer.jupyter.org/gist/tonyfast/16d3bc82d69890949212b46040bd86e1")

Out[9]:

<Graph identifier=Nb40aae7c876c475d84a355cfd86db566 (<class 'rdflib.graph.Graph'>)>

finally we verify we got all the datas

In [10]:

    def normalize(graph): """to something we can compare"""; return sorted(__import__("json").loads(graph.serialize(format="json-ld")), key=len)

the context from the snippet_graph and the local nbconvert translation are the same.

In [11]:

    assert normalize(snippet_graph)\
    == normalize(nbconvert_graph)    

we'll filter the primary metadata from the graphs

In [12]:

    metadata = {
        x: sorted(normalize(x), key=len)[-1] for x in (snippet_graph, nbconvert_graph, nbviewer_graph)
    }

and notice that they all contain that good, good metadata bout me.

In [13]:

    assert metadata[snippet_graph] == metadata[nbconvert_graph] == metadata[nbviewer_graph]

In [14]:

    normalize(snippet_graph)

Out[14]:

[{'@id': '', 'http://www.w3.org/ns/md#item': [{'@list': []}]},
 {'@id': 'https://github.com/tonyfast',
  'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}],
  'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}],
  'http://schema.org/name': [{'@value': 'Tony Fast'}]}]

In [15]:

    normalize(nbconvert_graph)

Out[15]:

[{'@id': '', 'http://www.w3.org/ns/md#item': [{'@list': []}]},
 {'@id': 'https://github.com/tonyfast',
  'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}],
  'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}],
  'http://schema.org/name': [{'@value': 'Tony Fast'}]}]

In [16]:

    normalize(nbviewer_graph)

Out[16]:

[{'@id': 'https://nbviewer.jupyter.org/gist/tonyfast/16d3bc82d69890949212b46040bd86e1',
  'http://www.w3.org/ns/md#item': [{'@list': []}]},
 {'@id': 'https://github.com/tonyfast',
  'http://schema.org/affiliation': [{'@value': 'data scientist @Quansight '}],
  'http://schema.org/image': [{'@id': 'https://avatars.githubusercontent.com/u/4236275?v=4'}],
  'http://schema.org/name': [{'@value': 'Tony Fast'}]}]

Ø