You might want to consider the start of this tutorial.
Short introductions to other TF datasets:
or the
%load_ext autoreload
%autoreload 2
import os
from tf.app import use
from tf.fabric import Fabric
# from tf.volumes import extract, collect
# from tf.core.helpers import unexpanduser as ux
# GH = os.path.expanduser("~/github")
# BH = f"{GH}/CLARIAH/wp6-missieven"
VERSION = "1.0"
# SOURCE = f"{BH}/tf/{VERSION}"
# TARGET = f"{BH}/tf/{VERSION}/_local"
We load the corpus in the usual way:
Aw = use("CLARIAH/wp6-missieven")
Note: we are going to load several volumes and collections too, so instead storing the
handle to the API in a variable with the name A
, we choose one with the name Aw
.
And for the same reason, we do not use the hoist=globals()
argument, so that we do not
pollute our namespace.
We can now extract volumes:
Aw.extract(overwrite=True, show=True, silent="auto")
0.00s Check volumes ... | Work consists of 14 volumes: | volume 1 : with 352727 slots | volume 2 : with 400901 slots | volume 3 : with 439609 slots | volume 4 : with 366207 slots | volume 5 : with 412436 slots | volume 6 : with 434227 slots | volume 7 : with 393559 slots | volume 8 : with 125206 slots | volume 9 : with 464578 slots | volume 10 : with 637655 slots | volume 11 : with 499612 slots | volume 12 : with 342699 slots | volume 13 : with 371337 slots | volume 14 : with 736607 slots Volume 1 exists and will be recreated Volume 2 exists and will be recreated Volume 3 exists and will be recreated Volume 4 exists and will be recreated Volume 5 exists and will be recreated Volume 6 exists and will be recreated Volume 7 exists and will be recreated Volume 8 exists and will be recreated Volume 9 exists and will be recreated Volume 10 exists and will be recreated Volume 11 exists and will be recreated Volume 12 exists and will be recreated Volume 13 exists and will be recreated Volume 14 exists and will be recreated 1.20s volumes ok 1.20s Distribute nodes over volumes ... | 0.00s volume 1 ... | | 0.00s volume 1 with 352727 slots | 0.17s volume 1 with 352727 slots and 392938 nodes ... | 0.17s volume 2 ... | | 0.00s volume 2 with 400901 slots | 0.36s volume 2 with 400901 slots and 445525 nodes ... | 0.36s volume 3 ... | | 0.00s volume 3 with 439609 slots | 0.57s volume 3 with 439609 slots and 491639 nodes ... | 0.57s volume 4 ... | | 0.00s volume 4 with 366207 slots | 0.75s volume 4 with 366207 slots and 412419 nodes ... | 0.75s volume 5 ... | | 0.00s volume 5 with 412436 slots | 0.95s volume 5 with 412436 slots and 460482 nodes ... | 0.95s volume 6 ... | | 0.00s volume 6 with 434227 slots | 1.16s volume 6 with 434227 slots and 482936 nodes ... | 1.16s volume 7 ... | | 0.00s volume 7 with 393559 slots | 1.34s volume 7 with 393559 slots and 434429 nodes ... | 1.34s volume 8 ... | | 0.00s volume 8 with 125206 slots | 1.40s volume 8 with 125206 slots and 138395 nodes ... | 1.40s volume 9 ... | | 0.00s volume 9 with 464578 slots | 1.63s volume 9 with 464578 slots and 514626 nodes ... | 1.63s volume 10 ... | | 0.00s volume 10 with 637655 slots | 1.94s volume 10 with 637655 slots and 700106 nodes ... | 1.94s volume 11 ... | | 0.00s volume 11 with 499612 slots | 2.17s volume 11 with 499612 slots and 549911 nodes ... | 2.17s volume 12 ... | | 0.00s volume 12 with 342699 slots | 2.34s volume 12 with 342699 slots and 378918 nodes ... | 2.34s volume 13 ... | | 0.00s volume 13 with 371337 slots | 2.52s volume 13 with 371337 slots and 413620 nodes ... | 2.52s volume 14 ... | | 0.00s volume 14 with 736607 slots | 2.88s volume 14 with 736607 slots and 823039 nodes ... 4.08s distribution done 4.08s Remap features ... | 0.00s volume 1 with 392938 nodes ... | 1.96s volume 2 with 445525 nodes ... | 4.12s volume 3 with 491639 nodes ... | 6.54s volume 4 with 412419 nodes ... | 8.74s volume 5 with 460482 nodes ... | 11s volume 6 with 482936 nodes ... | 13s volume 7 with 434429 nodes ... | 15s volume 8 with 138395 nodes ... | 16s volume 9 with 514626 nodes ... | 18s volume 10 with 700106 nodes ... | 22s volume 11 with 549911 nodes ... | 24s volume 12 with 378918 nodes ... | 26s volume 13 with 413620 nodes ... | 28s volume 14 with 823039 nodes ... 37s remapping done 37s Write volumes as TF datasets | 0.00s Writing volume 1 | 1.52s Writing volume 2 | 3.20s Writing volume 3 | 5.07s Writing volume 4 | 6.63s Writing volume 5 | 8.38s Writing volume 6 | 10s Writing volume 7 | 12s Writing volume 8 | 12s Writing volume 9 | 14s Writing volume 10 | 17s Writing volume 11 | 19s Writing volume 12 | 21s Writing volume 13 | 22s Writing volume 14 1m 02s writing done 1m 02s All done 1 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/1 2 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/2 3 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/3 4 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/4 5 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/5 6 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/6 7 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/7 8 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/8 9 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/9 10 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/10 11 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/11 12 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/12 13 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/13 14 (new) @ ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0/_local/14
A3 = use("CLARIAH/wp6-missieven", volume=3)
We see it reported that a single volume has been loaded instead of the whole work.
The volume info can be obtained separately by reading the attribute volumeInfo
:
print(A3.volumeInfo)
3
When volumes are created, some extra features are generated, which have to do with the relation between the original work and the volume, and what happens at the boundaries of volumes.
for (feat, info) in A3.isLoaded("owork ointerfrom ointerto", pretty=False).items():
print(f"\t{feat}: {info['meta']['description']}")
owork: mapping from nodes in the volume to nodes in the work ointerfrom: all outgoing inter-volume edges ointerto: all incoming inter-volume edges
Note that each volume has an extra feature: owork
. Its value for each node in a volume dataset
is the corresponding node in the original work from which the volume is taken.
If you use the volume to compute annotations,
and you want to publish these annotations against the original work,
the feature owork
provides the necessary information to do so.
Suppose annotvx
is a dict, mapping some nodes in the volume x
to interesting values,
then you apply them to the original work as follows
{F.owork.v(n): value for (n, value) in annotvx.items}
There is another important function of owork
: when collecting volumes, we may encounter nodes in the volumes
that come from a single node in the work. We want to merge these nodes in the collected work.
The information in owork
provides the necessary information for that.
ointerto
, ointerfrom
¶Note that we do have features ointerto
and ointerfrom
.
They are used to store information that spans different volumes: edges from nodes in one volume to nodes in another volume.
We can collect volumes into new works by means of the collect()
method on Aw
.
We define three collections out of the volumes of the General Missives:
COLLECTIONS = dict(
middle=(8,),
beginning=(1, 2),
end=(13, 14),
)
for (name, volumes) in COLLECTIONS.items():
Aw.collect(
volumes,
name,
overwrite=True,
silent="terse",
)
We can load the collection in the same way as a volume, but now using collection=
:
Ab = use("CLARIAH/wp6-missieven", collection="beginning")
Which volumes have we got?
for b in Ab.api.F.otype.s("volume"):
print(Ab.api.T.sectionFromNode(b)[0])
1 2
There are more ways to work with volumes and collections, and there is more complexity that is dealt with behind the scenes. To see that at work, consult the volume tutorial of the Hebrew Bible
CC-BY Dirk Roorda