You might want to consider the start of this tutorial.
Short introductions to other TF datasets:
or the
%load_ext autoreload
%autoreload 2
import os
from tf.app import use
from tf.fabric import Fabric
from tf.volumes import extract, collect
from tf.core.files import unexpanduser as ux
GH = os.path.expanduser("~/github")
BH = f"{GH}/ETCBC/bhsa"
VERSION = "2021"
SOURCE = f"{BH}/tf/{VERSION}"
TARGET = f"{BH}/tf/{VERSION}/_local"
We use the Hebrew Bible as work. The volumes of a work are lists of its top-level sections. Volumes may have a name.
We define three volumes out of the smallest books of the bible:
VOLUMES = dict(
tiny=("Obadiah", "Nahum", "Haggai", "Habakkuk", "Jonah", "Micah"),
small=("Malachi", "Joel"),
medium=("Ezra",),
)
COLLECTION = "prophets"
We can work with works through several TF apis:
A = use(work)
.TF = Fabric(locations, modules)
extract()
and collect()
.We show all ways of doing it.
A = use()
¶If we load the BHSA with the advanced API, like A = use("ETCBC/bhsa", ...)
, we also get some standard modules,
such as phono
for phonological transcription and parallels
for cross-references between similar passages.
We see that when we split the BHSA into volumes we also get these features into the volumes.
We load the BHSA in the advanced way:
Aw = use("ETCBC/bhsa:clone", checkout="clone")
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 39 | 10938.21 | 100 |
chapter | 929 | 459.19 | 100 |
lex | 9230 | 46.22 | 100 |
verse | 23213 | 18.38 | 100 |
half_verse | 45179 | 9.44 | 100 |
sentence | 63717 | 6.70 | 100 |
sentence_atom | 64514 | 6.61 | 100 |
clause | 88131 | 4.84 | 100 |
clause_atom | 90704 | 4.70 | 100 |
phrase | 253203 | 1.68 | 100 |
phrase_atom | 267532 | 1.59 | 100 |
subphrase | 113850 | 1.42 | 38 |
word | 426590 | 1.00 | 100 |
3
ETCBC/bhsa
/Users/me/github/ETCBC/bhsa/app
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
absent
n/a
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
clone
/Users/me/github/ETCBC/bhsa/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
ETCBC
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
ETCBC
/tf
parallels
ETCBC
/tf
bhsa
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
We check that the features of interest are loaded:
Aw.isLoaded(features="lex phono crossref")
crossref edge (int) 🆗 links between similar passages lex node (str) ✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/) phono node (str) 🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)
We can now extract volumes by using the extract()
method on the app
object
which is held in the variable Aw
.
Note: we are going to load several volumes and collections too, so instead storing the
handle to the API in a variable with the name A
, we choose one with the name Aw
.
volumes = Aw.extract(VOLUMES, overwrite=True)
0.00s Check volumes ... | Volume tiny exists and will be recreated | Volume small exists and will be recreated | Volume medium exists and will be recreated | Work consists of 39 books: | book Genesis : with 28764 slots | book Exodus : with 23748 slots | book Leviticus : with 17099 slots | book Numbers : with 23188 slots | book Deuteronomy : with 20128 slots | book Joshua : with 14526 slots | book Judges : with 14086 slots | book 1_Samuel : with 18929 slots | book 2_Samuel : with 15612 slots | book 1_Kings : with 18685 slots | book 2_Kings : with 17307 slots | book Isaiah : with 22931 slots | book Jeremiah : with 29736 slots | book Ezekiel : with 26182 slots | book Hosea : with 3146 slots | book Joel : with 1318 slots | book Amos : with 2780 slots | book Obadiah : with 392 slots | book Jonah : with 985 slots | book Micah : with 1895 slots | book Nahum : with 746 slots | book Habakkuk : with 897 slots | book Zephaniah : with 1037 slots | book Haggai : with 877 slots | book Zechariah : with 4471 slots | book Malachi : with 1187 slots | book Psalms : with 25372 slots | book Job : with 10912 slots | book Proverbs : with 8859 slots | book Ruth : with 1802 slots | book Song_of_songs : with 1682 slots | book Ecclesiastes : with 4233 slots | book Lamentations : with 1945 slots | book Esther : with 4621 slots | book Daniel : with 8072 slots | book Ezra : with 5268 slots | book Nehemiah : with 7842 slots | book 1_Chronicles : with 15566 slots | book 2_Chronicles : with 19764 slots 0.09s volumes ok 0.09s Distribute nodes over volumes ... | 0.00s volume tiny ... | | 0.00s book Obadiah with 392 slots | | 0.00s book Nahum with 746 slots | | 0.00s book Haggai with 877 slots | | 0.00s book Habakkuk with 897 slots | | 0.00s book Jonah with 985 slots | | 0.00s book Micah with 1895 slots | 0.01s volume tiny with 5792 slots and 21779 nodes ... | 0.01s volume small ... | | 0.00s book Malachi with 1187 slots | | 0.00s book Joel with 1318 slots | 0.01s volume small with 2505 slots and 9495 nodes ... | 0.01s volume medium ... | | 0.00s book Ezra with 5268 slots | 0.02s volume medium with 5268 slots and 17286 nodes ... 0.11s distribution done 0.11s Remap features ... | 0.00s volume tiny with 21779 nodes ... | 0.17s volume small with 9495 nodes ... | 0.24s volume medium with 17286 nodes ... 0.45s remapping done 0.45s Write volumes as TF datasets | 0.00s Writing volume tiny | 0.14s Writing volume small | 0.20s Writing volume medium 0.77s writing done 0.77s All done
The extract()
method returns basic information about the volumes:
their location on disk.
if volumes:
for (name, info) in volumes.items():
loc = info["location"]
new = "(new) " if info["new"] else "(existing)"
print(f"volume {name:<7}: {new} at {ux(loc)}")
else:
print(volumes)
volume medium : (new) at ~/github/ETCBC/bhsa/tf/2021/_local/medium volume small : (new) at ~/github/ETCBC/bhsa/tf/2021/_local/small volume tiny : (new) at ~/github/ETCBC/bhsa/tf/2021/_local/tiny
We load the volumes separately.
For each volume we get a handle, which we store in a dictionary As
, keyed by its name.
As = {}
for name in volumes:
As[name] = use("ETCBC/bhsa:clone", checkout="clone", version="2021", volume=name)
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 1 | 5268.00 | 100 |
chapter | 10 | 526.80 | 100 |
verse | 280 | 18.81 | 100 |
sentence | 491 | 10.73 | 100 |
half_verse | 492 | 10.71 | 100 |
sentence_atom | 506 | 10.41 | 100 |
clause | 824 | 6.39 | 100 |
clause_atom | 870 | 6.06 | 100 |
lex | 991 | 5.32 | 100 |
phrase | 2385 | 2.21 | 100 |
phrase_atom | 2730 | 1.93 | 100 |
subphrase | 2438 | 1.40 | 65 |
word | 5268 | 1.00 | 100 |
3
ETCBC/bhsa
/Users/me/github/ETCBC/bhsa/app
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
absent
n/a
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
clone
/Users/me/github/ETCBC/bhsa/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
ETCBC
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
ETCBC
/tf
parallels
ETCBC
/tf
bhsa
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 2 | 1252.50 | 100 |
chapter | 7 | 357.86 | 100 |
verse | 128 | 19.57 | 100 |
half_verse | 253 | 9.90 | 100 |
sentence | 450 | 5.57 | 100 |
sentence_atom | 461 | 5.43 | 100 |
clause | 582 | 4.30 | 100 |
lex | 587 | 4.27 | 100 |
clause_atom | 600 | 4.17 | 100 |
phrase | 1641 | 1.53 | 100 |
phrase_atom | 1681 | 1.49 | 100 |
subphrase | 598 | 1.36 | 32 |
word | 2505 | 1.00 | 100 |
3
ETCBC/bhsa
/Users/me/github/ETCBC/bhsa/app
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
absent
n/a
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
clone
/Users/me/github/ETCBC/bhsa/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
ETCBC
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
ETCBC
/tf
parallels
ETCBC
/tf
bhsa
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 6 | 965.33 | 100 |
chapter | 20 | 289.60 | 100 |
verse | 315 | 18.39 | 100 |
half_verse | 623 | 9.30 | 100 |
sentence | 1032 | 5.61 | 100 |
sentence_atom | 1046 | 5.54 | 100 |
lex | 1173 | 4.94 | 100 |
clause | 1399 | 4.14 | 100 |
clause_atom | 1426 | 4.06 | 100 |
phrase | 3774 | 1.53 | 100 |
phrase_atom | 3911 | 1.48 | 100 |
subphrase | 1262 | 1.30 | 28 |
word | 5792 | 1.00 | 100 |
3
ETCBC/bhsa
/Users/me/github/ETCBC/bhsa/app
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
absent
n/a
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
clone
/Users/me/github/ETCBC/bhsa/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
ETCBC
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
ETCBC
/tf
parallels
ETCBC
/tf
bhsa
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
We see it reported that single volumes have been loaded instead of the whole work.
The volume info can be obtained separately by reading the attribute volumeInfo
,
either on the A
or on the TF
object:
for name in volumes:
print(As[name].volumeInfo)
medium:Ezra small:Malachi-Joel tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah
When volumes are created, some extra features are generated, which have to do with the relation between the original work and the volume, and what happens at the boundaries of volumes.
for name in volumes:
print(name)
for (feat, info) in As[name].isLoaded("owork ointerfrom ointerto", pretty=False).items():
print(f"\t{feat}: {info['meta']['description']}")
medium owork: mapping from nodes in the volume to nodes in the work ointerfrom: all outgoing inter-volume edges ointerto: all incoming inter-volume edges small owork: mapping from nodes in the volume to nodes in the work ointerfrom: all outgoing inter-volume edges ointerto: all incoming inter-volume edges tiny owork: mapping from nodes in the volume to nodes in the work ointerfrom: all outgoing inter-volume edges ointerto: all incoming inter-volume edges
owork
¶Note that each volume has an extra feature: owork
. Its value for each node in a volume dataset
is the corresponding node in the original work from which the volume is taken.
If you use the volume to compute annotations,
and you want to publish these annotations against the original work,
the feature owork
provides the necessary information to do so.
Suppose annotvx
is a dict, mapping some nodes in the volume x
to interesting values,
then you apply them to the original work as follows
{F.owork.v(n): value for (n, value) in annotvx.items}
There is another important function of owork
: when collecting volumes, we may encounter nodes in the volumes
that come from a single node in the work. We want to merge these nodes in the collected work.
The information in owork
provides the necessary information for that.
ointerto
, ointerfrom
¶Note that we do have features ointerto
and ointerfrom
.
We'll come back to them later.
We can collect volumes into new works by means of the collect()
method on Aw
.
Let's collect all volumes just created.
Aw.collect(
tuple(volumes),
COLLECTION,
overwrite=True,
)
Collection prophets exists and will be recreated 0.00s Loading volume medium from ~/github/ETCBC/bhsa/tf/2021/_local/medium ... 0.03s Feature overview: 85 for nodes; 3 for edges; 2 configs; 9 computed 0.05s Loading volume small from ~/github/ETCBC/bhsa/tf/2021/_local/small ... 0.02s Feature overview: 85 for nodes; 3 for edges; 2 configs; 9 computed 0.08s Loading volume tiny from ~/github/ETCBC/bhsa/tf/2021/_local/tiny ... 0.04s Feature overview: 85 for nodes; 3 for edges; 2 configs; 9 computed 0.14s inspect metadata ... 0.14s metadata sorted out 0.14s check nodetypes ... | volume medium | volume small | volume tiny 0.14s node types ok 0.14s Collect nodes from volumes ... | 0.00s Check against overlapping slots ... | | medium : 5268 slots | | small : 2505 slots | | tiny : 5792 slots | 0.01s no overlap | 0.01s Group non-slot nodes by type | | medium : 5269- 17286 | | small : 2506- 9495 | | tiny : 5793- 21779 | 0.01s Mapping nodes from volume to/from work ... | | book : 13566 - 13574 | | chapter : 13575 - 13611 | | clause : 13612 - 16416 | | clause_atom : 16417 - 19312 | | half_verse : 19313 - 20680 | | phrase : 20681 - 28480 | | phrase_atom : 28481 - 36802 | | sentence : 36803 - 38775 | | sentence_atom : 38776 - 40788 | | subphrase : 40789 - 45086 | | verse : 45087 - 45809 | | lex : 45810 - 47884 | 0.02s The new work has 47884 nodes of which 13565 slots 0.17s collection done 0.17s remap features ... 0.42s remapping done 0.42s write work as TF data set 0.72s writing done 0.72s done
True
We can load the collection in the same way as a volume, but now using collection=
:
Ac = use("ETCBC/bhsa:clone", checkout="clone", version="2021", collection=COLLECTION)
Locating corpus resources ...
| 0.03s T otype from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.30s T oslots from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@ar from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@he from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T qere_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T qere from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T chapter from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T phono from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T g_word from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@ur from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@yo from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@pt from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T verse from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@en from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@am from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T trailer from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@tr from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T g_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@da from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T phono_trailer from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@el from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T voc_lex_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T qere_trailer from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T qere_trailer_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@bn from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@hi from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@ru from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T lex_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@de from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@fa from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T g_word_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@ja from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T g_lex_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@nl from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@id from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@syc from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T g_cons_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@fr from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@pa from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@es from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T g_cons from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@sw from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T trailer_utf8 from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@zh from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@ko from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T book@la from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | | 0.01s C __levels__ from otype, oslots, otext | | 0.23s C __order__ from otype, oslots, __levels__ | | 0.01s C __rank__ from otype, __order__ | | 0.50s C __levUp__ from otype, oslots, __rank__ | | 0.32s C __levDown__ from otype, __levUp__, __rank__ | | 0.04s C __characters__ from otext | | 0.10s C __boundary__ from otype, oslots, __rank__ | | 0.00s C __sections__ from otype, oslots, otext, __levUp__, __levels__, book, chapter, verse | 0.01s T code from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T crossref from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T det from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.01s T domain from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T freq_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.02s T function from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T gloss from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T gn from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.01s T label from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T language from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T ls from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T mother from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.00s T nametype from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T nme from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T nu from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.07s T number from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.13s T ovolume from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.09s T owork from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.01s T pargr from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T pdp from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T pfm from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T prs from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T prs_gn from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T prs_nu from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T prs_ps from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T ps from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T rank_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.05s T rela from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T sp from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T st from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.01s T tab from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.01s T txt from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.05s T typ from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T uvf from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T vbe from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T vbs from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.04s T voc_lex from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T vs from ~/github/ETCBC/bhsa/tf/2021/_local/prophets | 0.03s T vt from ~/github/ETCBC/bhsa/tf/2021/_local/prophets
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 9 | 1507.22 | 100 |
chapter | 37 | 366.62 | 100 |
verse | 723 | 18.76 | 100 |
half_verse | 1368 | 9.92 | 100 |
sentence | 1973 | 6.88 | 100 |
sentence_atom | 2013 | 6.74 | 100 |
clause | 2805 | 4.84 | 100 |
clause_atom | 2896 | 4.68 | 100 |
lex | 2075 | 3.38 | 52 |
phrase | 7800 | 1.74 | 100 |
phrase_atom | 8322 | 1.63 | 100 |
subphrase | 4298 | 1.36 | 43 |
word | 13565 | 1.00 | 100 |
3
ETCBC/bhsa
/Users/me/github/ETCBC/bhsa/app
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
absent
n/a
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
clone
/Users/me/github/ETCBC/bhsa/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
ETCBC
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
ETCBC
/tf
parallels
ETCBC
/tf
bhsa
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
Which books have we got?
Fc = Ac.api.F
Tc = Ac.api.T
for b in Fc.otype.s("book"):
print(Tc.sectionFromNode(b)[0])
Ezra Malachi Joel Obadiah Nahum Haggai Habakkuk Jonah Micah
b = Tc.nodeFromSection(("Obadiah",))
Tc.text(b, fmt="text-trans-plain")
'XZWN <BDJH KH&>MR >DNJ JHWH L>DWM CMW<H CM<NW M>T JHWH WYJR BGWJM CLX QWMW WNQWMH <LJH LMLXMH00 HNH QVN NTTJK BGWJM BZWJ >TH M>D00 ZDWN LBK HCJ>K CKNJ BXGWJ&SL< MRWM CBTW >MR BLBW MJ JWRDNJ >RY00 >M&TGBJH KNCR W>M&BJN KWKBJM FJM QNK MCM >WRJDK N>M&JHWH00 >M&GNBJM B>W&LK >M&CWDDJ LJLH >JK NDMJTH HLW> JGNBW DJM >M&BYRJM B>W LK HLW> JC>JRW <LLWT00 >JK NXPFW <FW NB<W MYPNJW00 <D&HGBWL CLXWK KL >NCJ BRJTK HCJ>WK JKLW LK >NCJ CLMK LXMK JFJMW MZWR TXTJK >JN TBWNH BW00 HLW> BJWM HHW> N>M JHWH WH>BDTJ XKMJM M>DWM WTBWNH MHR <FW00 WXTW GBWRJK TJMN LM<N JKRT&>JC MHR <FW MQVL00 MXMS >XJK J<QB TKSK BWCH WNKRT L<WLM00 BJWM <MDK MNGD BJWM CBWT ZRJM XJLW WNKRJM B>W C<RW W<L&JRWCLM JDW GWRL GM&>TH K>XD MHM00 W>L&TR> BJWM&>XJK BJWM NKRW W>L&TFMX LBNJ&JHWDH BJWM >BDM W>L&TGDL PJK BJWM YRH00 >L&TBW> BC<R&<MJ BJWM >JDM >L&TR> GM&>TH BR<TW BJWM >JDW W>L&TCLXNH BXJLW BJWM >JDW00 W>L&T<MD <L&HPRQ LHKRJT >T&PLJVJW W>L&TSGR FRJDJW BJWM YRH00 KJ&QRWB JWM&JHWH <L&KL&HGWJM K>CR <FJT J<FH LK GMLK JCWB BR>CK00 KJ K>CR CTJTM <L&HR QDCJ JCTW KL&HGWJM TMJD WCTW WL<W WHJW KLW> HJW00 WBHR YJWN THJH PLJVH WHJH QDC WJRCW BJT J<QB >T MWRCJHM00 WHJH BJT&J<QB >C WBJT JWSP LHBH WBJT <FW LQC WDLQW BHM W>KLWM WL>&JHJH FRJD LBJT <FW KJ JHWH DBR00 WJRCW HNGB >T&HR <FW WHCPLH >T&PLCTJM WJRCW >T&FDH >PRJM W>T FDH CMRWN WBNJMN >T&HGL<D00 WGLT HXL&HZH LBNJ JFR>L >CR&KN<NJM <D&YRPT WGLT JRWCLM >CR BSPRD JRCW >T <RJ HNGB00 W<LW MWC<JM BHR YJWN LCPV >T&HR <FW WHJTH LJHWH HMLWKH00 '
First we count the lexeme nodes in the original work, as far as they occur in the books contained in the volumes.
books = set()
for parts in VOLUMES.values():
books |= set(parts)
books
{'Ezra', 'Habakkuk', 'Haggai', 'Joel', 'Jonah', 'Malachi', 'Micah', 'Nahum', 'Obadiah'}
lexNodesWork = set()
Fw = Aw.api.F
Tw = Aw.api.T
Lw = Aw.api.L
for b in Fw.otype.s("book"):
if Tw.sectionFromNode(b)[0] not in books:
continue
for w in Lw.d(b, otype="word"):
lx = Lw.u(w, otype="lex")[0]
lexNodesWork.add(lx)
len(lexNodesWork)
2075
Let's count the lexeme nodes in the individual volumes and add up the numbers. Each volume has its own lexemes, so lexemes that occur in multiple volumes correspond to multiple nodes. We expect more lexeme nodes.
total = 0
for (name, Av) in As.items():
Fv = Av.api.F
nLex = len(Fv.otype.s("lex"))
total += nLex
print(f"{name:<10} has {nLex:>5} lexeme nodes")
print(f"{'Total':<10} {total:>5} lexeme nodes")
medium has 991 lexeme nodes small has 587 lexeme nodes tiny has 1173 lexeme nodes Total 2751 lexeme nodes
Now let's count the lexemes in the new collection.
lexNodesCollection = Fc.otype.s("lex")
len(lexNodesCollection)
2075
Exactly the same amount as in the original work.
Let's make absolutely sure that we have the same lexeme set:
lexNodesWork == lexNodesCollection
False
Of course, because the node numbers in the original work are almost guaranteed to be different from the node numbers in the collection.
But the information attached to the nodes in the collection should be identical to the information attached to the corresponding nodes in the work.
lexemesWork = {Fw.lex.v(n) for n in lexNodesWork}
lexemesCollection = {Fc.lex.v(n) for n in lexNodesCollection}
lexemesWork == lexemesCollection
True
Another way of verifying this is to map the lexeme nodes of the collection back to those of the work and see whether they are equal sets.
lexNodesCollectionToWork = {Fc.owork.v(n) for n in lexNodesCollection}
lexNodesWork == lexNodesCollectionToWork
True
crossrefs
¶The edge feature crossref
has inter-volume edges.
We explore the situation in the original work, inside the volumes, and in the new collection.
We count the incoming and outgoing edges w.r.t. the nodes in the relevant material.
crossref
edges are between verses, so we first collect all relevant verses in the original work.
We want the verses in all the books of all the volumes, and we want those verses per volume.
books = dict(all=set())
for (name, parts) in VOLUMES.items():
partsSet = set(parts)
books[name] = partsSet
books["all"] |= partsSet
books
{'all': {'Ezra', 'Habakkuk', 'Haggai', 'Joel', 'Jonah', 'Malachi', 'Micah', 'Nahum', 'Obadiah'}, 'tiny': {'Habakkuk', 'Haggai', 'Jonah', 'Micah', 'Nahum', 'Obadiah'}, 'small': {'Joel', 'Malachi'}, 'medium': {'Ezra'}}
verseNodesWork = {}
for (name, heads) in books.items():
for b in Fw.otype.s("book"):
if Tw.sectionFromNode(b)[0] not in heads:
continue
for vs in Lw.d(b, otype="verse"):
verseNodesWork.setdefault(name, set()).add(vs)
for (name, verses) in verseNodesWork.items():
print(f"{name:<10} {len(verses):>3} verses")
all 723 verses tiny 315 verses small 128 verses medium 280 verses
Now we determine the number of incoming and outgoing edges w.r.t. these portions, and we split them into inter-portion and intra-portion edges.
Ew = Aw.api.E
incomingWorkTotal = {}
incomingWorkIntra = {}
incomingWorkInter = {}
outgoingWorkTotal = {}
outgoingWorkIntra = {}
outgoingWorkInter = {}
for (name, verses) in verseNodesWork.items():
inct = set()
inca = set()
incr = set()
ougt = set()
ouga = set()
ougr = set()
for vs in verses:
wvs = Ew.crossref.t(vs)
if wvs:
for wv in wvs:
ws = wv[0]
inct.add((ws, vs))
if ws in verses:
inca.add((ws, vs))
else:
incr.add((ws, vs))
wvs = Ew.crossref.f(vs)
if wvs:
for wv in wvs:
ws = wv[0]
ougt.add((vs, ws))
if ws in verses:
ouga.add((vs, ws))
else:
ougr.add((vs, ws))
incomingWorkTotal[name] = inct
incomingWorkIntra[name] = inca
incomingWorkInter[name] = incr
outgoingWorkTotal[name] = ougt
outgoingWorkIntra[name] = ouga
outgoingWorkInter[name] = ougr
for name in verseNodesWork:
print(f"{name:<10}: total: incoming: {len(incomingWorkTotal[name]):>3} outgoing: {len(outgoingWorkTotal[name]):>3}")
print(f"{name:<10}: intra: incoming: {len(incomingWorkIntra[name]):>3} outgoing: {len(outgoingWorkIntra[name]):>3}")
print(f"{name:<10}: inter: incoming: {len(incomingWorkInter[name]):>3} outgoing: {len(outgoingWorkInter[name]):>3}")
all : total: incoming: 400 outgoing: 400 all : intra: incoming: 64 outgoing: 64 all : inter: incoming: 336 outgoing: 336 tiny : total: incoming: 245 outgoing: 245 tiny : intra: incoming: 8 outgoing: 8 tiny : inter: incoming: 237 outgoing: 237 small : total: incoming: 3 outgoing: 3 small : intra: incoming: 0 outgoing: 0 small : inter: incoming: 3 outgoing: 3 medium : total: incoming: 152 outgoing: 152 medium : intra: incoming: 56 outgoing: 56 medium : inter: incoming: 96 outgoing: 96
Ah, the crossref
edges are symmetric, so there are as many incoming as outgoing edges.
We only see the intra edges, they should coincide with the incomingWorkIntra[volume]
edges.
First the number of edges:
incomingVolumeTotal = {}
outgoingVolumeTotal = {}
for name in volumes:
Av = As[name]
Fv = Av.api.F
Ev = Av.api.E
verses = Fv.otype.s("verse")
inct = set()
ougt = set()
for vs in verses:
wvs = Ev.crossref.t(vs)
if wvs:
for wv in wvs:
ws = wv[0]
inct.add((ws, vs))
wvs = Ev.crossref.f(vs)
if wvs:
for wv in wvs:
ws = wv[0]
ougt.add((vs, ws))
incomingVolumeTotal[name] = inct
outgoingVolumeTotal[name] = ougt
We have gathered the data.
Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping.
for name in volumes:
Av = As[name]
Fv = Av.api.F
inVolTotal = incomingVolumeTotal[name]
outVolTotal = outgoingVolumeTotal[name]
inWorkIntra = incomingWorkIntra[name]
outWorkIntra = outgoingWorkIntra[name]
print(f"{name:<10}: total: incoming: {len(inVolTotal):>3} outgoing: {len(outVolTotal):>3}")
eqamountIncoming = len(inWorkIntra) == len(inVolTotal)
eqamountOutgoing = len(outWorkIntra) == len(outVolTotal)
print(f"equal amount of incoming inter-edges as in work? {eqamountIncoming}")
print(f"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}")
inVolToWork = {(Fv.owork.v(f), Fv.owork.v(t)) for (f, t) in inVolTotal}
outVolToWork = {(Fv.owork.v(f), Fv.owork.v(t)) for (f, t) in outVolTotal}
sameIncoming = inWorkIntra == inVolToWork
sameOutgoing = outWorkIntra == outVolToWork
print(f"same incoming inter-edges as in work? {sameIncoming}")
print(f"same outgoing inter-edges as in work? {sameOutgoing}")
medium : total: incoming: 56 outgoing: 56 equal amount of incoming inter-edges as in work? True equal amount of outgoing inter-edges as in work? True same incoming inter-edges as in work? True same outgoing inter-edges as in work? True small : total: incoming: 0 outgoing: 0 equal amount of incoming inter-edges as in work? True equal amount of outgoing inter-edges as in work? True same incoming inter-edges as in work? True same outgoing inter-edges as in work? True tiny : total: incoming: 8 outgoing: 8 equal amount of incoming inter-edges as in work? True equal amount of outgoing inter-edges as in work? True same incoming inter-edges as in work? True same outgoing inter-edges as in work? True
The final test is whether the collection has the right edges.
When the collection was created, inter-volume edges have been added on the basis of the ointerto
and ointerfrom
features
in the individual volumes.
Now we check whether that went well.
Ec = Ac.api.E
verses = Fc.otype.s("verse")
inct = set()
ougt = set()
for vs in verses:
wvs = Ec.crossref.t(vs)
if wvs:
for wv in wvs:
ws = wv[0]
inct.add((ws, vs))
wvs = Ec.crossref.f(vs)
if wvs:
for wv in wvs:
ws = wv[0]
ougt.add((vs, ws))
incomingCollectionTotal = inct
outgoingCollectionTotal = ougt
We have gathered the data.
Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping.
inColTotal = incomingCollectionTotal
outColTotal = outgoingCollectionTotal
inWorkIntra = incomingWorkIntra["all"]
outWorkIntra = outgoingWorkIntra["all"]
print(f"collection: total: incoming: {len(inColTotal):>3} outgoing: {len(outColTotal):>3}")
eqamountIncoming = len(inWorkIntra) == len(inColTotal)
eqamountOutgoing = len(outWorkIntra) == len(outColTotal)
print(f"equal amount of incoming inter-edges as in work? {eqamountIncoming}")
print(f"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}")
inColToWork = {(Fc.owork.v(f), Fc.owork.v(t)) for (f, t) in inColTotal}
outColToWork = {(Fc.owork.v(f), Fc.owork.v(t)) for (f, t) in outColTotal}
sameIncoming = inWorkIntra == inColToWork
sameOutgoing = outWorkIntra == outColToWork
print(f"same incoming inter-edges as in work? {sameIncoming}")
print(f"same outgoing inter-edges as in work? {sameOutgoing}")
collection: total: incoming: 64 outgoing: 64 equal amount of incoming inter-edges as in work? True equal amount of outgoing inter-edges as in work? True same incoming inter-edges as in work? True same outgoing inter-edges as in work? True
We have seen that when we take a collection of volumes the identification of lexeme nodes of the same lexeme across volumes works out perfectly.
The collection of inter-volume edges works!
TF=Fabric()
¶We now load the data through Fabric()
.
You do not have to load the work before extracting volumes, but you may do so. The advantage of pre-loading is that after the extraction of volumes you still have a handle to the work.
TFw = Fabric(locations=SOURCE)
apiw = TFw.loadAll()
apiw.makeAvailableIn(globals())
2.20s Feature overview: 109 for nodes; 6 for edges; 1 configs; 9 computed
[('Computed', 'computed-data', ('C Computed', 'Call AllComputeds', 'Cs ComputedString')), ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')), ('Fabric', 'loading', ('TF',)), ('Locality', 'locality', ('L Locality',)), ('Nodes', 'navigating-nodes', ('N Nodes',)), ('Features', 'node-features', ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')), ('Search', 'search', ('S Search',)), ('Text', 'text', ('T Text',))]
We use the same specification as before.
volumes = TFw.extract(VOLUMES, overwrite=True)
0.00s Check volumes ... | Volume tiny exists and will be recreated | Volume small exists and will be recreated | Volume medium exists and will be recreated | Work consists of 39 books: | book Genesis : with 28764 slots | book Exodus : with 23748 slots | book Leviticus : with 17099 slots | book Numbers : with 23188 slots | book Deuteronomy : with 20128 slots | book Joshua : with 14526 slots | book Judges : with 14086 slots | book 1_Samuel : with 18929 slots | book 2_Samuel : with 15612 slots | book 1_Kings : with 18685 slots | book 2_Kings : with 17307 slots | book Isaiah : with 22931 slots | book Jeremiah : with 29736 slots | book Ezekiel : with 26182 slots | book Hosea : with 3146 slots | book Joel : with 1318 slots | book Amos : with 2780 slots | book Obadiah : with 392 slots | book Jonah : with 985 slots | book Micah : with 1895 slots | book Nahum : with 746 slots | book Habakkuk : with 897 slots | book Zephaniah : with 1037 slots | book Haggai : with 877 slots | book Zechariah : with 4471 slots | book Malachi : with 1187 slots | book Psalms : with 25372 slots | book Job : with 10912 slots | book Proverbs : with 8859 slots | book Ruth : with 1802 slots | book Song_of_songs : with 1682 slots | book Ecclesiastes : with 4233 slots | book Lamentations : with 1945 slots | book Esther : with 4621 slots | book Daniel : with 8072 slots | book Ezra : with 5268 slots | book Nehemiah : with 7842 slots | book 1_Chronicles : with 15566 slots | book 2_Chronicles : with 19764 slots 0.09s volumes ok 0.09s Distribute nodes over volumes ... | 0.00s volume tiny ... | | 0.00s book Obadiah with 392 slots | | 0.00s book Nahum with 746 slots | | 0.00s book Haggai with 877 slots | | 0.00s book Habakkuk with 897 slots | | 0.00s book Jonah with 985 slots | | 0.00s book Micah with 1895 slots | 0.01s volume tiny with 5792 slots and 21779 nodes ... | 0.01s volume small ... | | 0.00s book Malachi with 1187 slots | | 0.00s book Joel with 1318 slots | 0.01s volume small with 2505 slots and 9495 nodes ... | 0.01s volume medium ... | | 0.00s book Ezra with 5268 slots | 0.02s volume medium with 5268 slots and 17286 nodes ... 0.11s distribution done 0.11s Remap features ... | 0.00s volume tiny with 21779 nodes ... | 0.25s volume small with 9495 nodes ... | 0.35s volume medium with 17286 nodes ... 0.60s remapping done 0.60s Write volumes as TF datasets | 0.00s Writing volume tiny | 0.20s Writing volume small | 0.30s Writing volume medium 1.07s writing done 1.07s All done
TFs = {}
for name in volumes:
TFs[name] = Fabric(locations=SOURCE, volume=name)
TFs[name].loadAll(silent="deep")
for name in volumes:
TFv = TFs[name]
Fsv = TFv.api.Fs
print(TFv.volumeInfo)
for (feat, info) in TFv.isLoaded("owork ointerfrom ointerto", pretty=False).items():
n = 0
for x in Fsv(feat).items():
n += 1
print(f" {feat:<10}: {n:>7} values\n {info['meta']['description']}")
medium:Ezra owork : 17286 values mapping from nodes in the volume to nodes in the work ointerfrom: 0 values all outgoing inter-volume edges ointerto : 0 values all incoming inter-volume edges small:Malachi-Joel owork : 9495 values mapping from nodes in the volume to nodes in the work ointerfrom: 0 values all outgoing inter-volume edges ointerto : 0 values all incoming inter-volume edges tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah owork : 21779 values mapping from nodes in the volume to nodes in the work ointerfrom: 0 values all outgoing inter-volume edges ointerto : 0 values all incoming inter-volume edges
ointerto
, ointerfrom
¶Note that in our volumes the features ointerfrom
, ointerto
are empty.
These are features that collect edge data for edges between a node inside the volume and an edge outside the volume.
In our work, we do not have such edges, because we did not load the parallels module explicitly,
and the Fabric(locations, modules)
function only looks in directories specified in its locations
and modules
parameters.
We used the same collection specification as before.
TFw.collect(
tuple(volumes),
COLLECTION,
overwrite=True,
)
Collection prophets exists and will be recreated 0.00s Loading volume medium from ~/github/ETCBC/bhsa/tf/2021/_local/medium ... 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed 0.08s Loading volume small from ~/github/ETCBC/bhsa/tf/2021/_local/small ... 0.02s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed 0.13s Loading volume tiny from ~/github/ETCBC/bhsa/tf/2021/_local/tiny ... 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed 0.22s inspect metadata ... 0.22s metadata sorted out 0.22s check nodetypes ... | volume medium | volume small | volume tiny 0.22s node types ok 0.22s Collect nodes from volumes ... | 0.00s Check against overlapping slots ... | | medium : 5268 slots | | small : 2505 slots | | tiny : 5792 slots | 0.00s no overlap | 0.01s Group non-slot nodes by type | | medium : 5269- 17286 | | small : 2506- 9495 | | tiny : 5793- 21779 | 0.01s Mapping nodes from volume to/from work ... | | book : 13566 - 13574 | | chapter : 13575 - 13611 | | clause : 13612 - 16416 | | clause_atom : 16417 - 19312 | | half_verse : 19313 - 20680 | | phrase : 20681 - 28480 | | phrase_atom : 28481 - 36802 | | sentence : 36803 - 38775 | | sentence_atom : 38776 - 40788 | | subphrase : 40789 - 45086 | | verse : 45087 - 45809 | | lex : 45810 - 47884 | 0.02s The new work has 47884 nodes of which 13565 slots 0.24s collection done 0.24s remap features ... 0.61s remapping done 0.61s write work as TF data set 1.06s writing done 1.06s done
True
TFc = Fabric(locations=SOURCE, collection=COLLECTION)
TFc.loadAll(silent="deep")
<tf.core.api.Api at 0x315ed1b90>
We can pass the location of a work, or the API to the loaded features of a work. We do the latter here.
VOLUMES_WRONG = dict(
tiny=("Obadiah", "Nahum", "Haggai", "Habakkuk", "Jonah", "Micah"),
small=("Obadiah", "Malachi", "Joel"),
medium=("Ezra",),
)
This will turn out to be wrong because there is a book that occurs in several volumes.
volumes = extract(SOURCE, TARGET, VOLUMES_WRONG, api=apiw, overwrite=True)
0.00s Check volumes ...
| 17s Section Obadiah of volume tiny reoccurs in volume small
It is not allowed to extract volumes that have material in common!
volumes = extract(SOURCE, TARGET, VOLUMES, api=apiw, overwrite=True)
0.00s Check volumes ... | Volume tiny exists and will be recreated | Volume small exists and will be recreated | Volume medium exists and will be recreated | Work consists of 39 books: | book Genesis : with 28764 slots | book Exodus : with 23748 slots | book Leviticus : with 17099 slots | book Numbers : with 23188 slots | book Deuteronomy : with 20128 slots | book Joshua : with 14526 slots | book Judges : with 14086 slots | book 1_Samuel : with 18929 slots | book 2_Samuel : with 15612 slots | book 1_Kings : with 18685 slots | book 2_Kings : with 17307 slots | book Isaiah : with 22931 slots | book Jeremiah : with 29736 slots | book Ezekiel : with 26182 slots | book Hosea : with 3146 slots | book Joel : with 1318 slots | book Amos : with 2780 slots | book Obadiah : with 392 slots | book Jonah : with 985 slots | book Micah : with 1895 slots | book Nahum : with 746 slots | book Habakkuk : with 897 slots | book Zephaniah : with 1037 slots | book Haggai : with 877 slots | book Zechariah : with 4471 slots | book Malachi : with 1187 slots | book Psalms : with 25372 slots | book Job : with 10912 slots | book Proverbs : with 8859 slots | book Ruth : with 1802 slots | book Song_of_songs : with 1682 slots | book Ecclesiastes : with 4233 slots | book Lamentations : with 1945 slots | book Esther : with 4621 slots | book Daniel : with 8072 slots | book Ezra : with 5268 slots | book Nehemiah : with 7842 slots | book 1_Chronicles : with 15566 slots | book 2_Chronicles : with 19764 slots 0.10s volumes ok 0.10s Distribute nodes over volumes ... | 0.00s volume tiny ... | | 0.00s book Obadiah with 392 slots | | 0.00s book Nahum with 746 slots | | 0.00s book Haggai with 877 slots | | 0.00s book Habakkuk with 897 slots | | 0.00s book Jonah with 985 slots | | 0.00s book Micah with 1895 slots | 0.01s volume tiny with 5792 slots and 21779 nodes ... | 0.01s volume small ... | | 0.00s book Malachi with 1187 slots | | 0.00s book Joel with 1318 slots | 0.01s volume small with 2505 slots and 9495 nodes ... | 0.01s volume medium ... | | 0.00s book Ezra with 5268 slots | 0.02s volume medium with 5268 slots and 17286 nodes ... 0.12s distribution done 0.12s Remap features ... | 0.00s volume tiny with 21779 nodes ... | 0.23s volume small with 9495 nodes ... | 0.33s volume medium with 17286 nodes ... 0.60s remapping done 0.60s Write volumes as TF datasets | 0.00s Writing volume tiny | 0.20s Writing volume small | 0.30s Writing volume medium 1.06s writing done 1.06s All done
Now we make the same collection as before, but first we make a few deliberate mistakes.
collect(
(("tiny", f"{TARGET}/tiny"), ("tiny", f"{TARGET}/small")),
f"{TARGET}/bible",
overwrite=True,
)
Collection bible exists and will be recreated
25s Volume tiny is already part of the collection
False
collect(
(("tiny", f"{TARGET}/tiny"), ("small", f"{TARGET}/tiny")),
f"{TARGET}/bible",
overwrite=True,
)
28s Volume tiny at location ~/github/ETCBC/bhsa/tf/2021/_local/tiny reoccurs as volume small
False
collect(
{name: info["location"] for (name, info) in volumes.items()},
f"{TARGET}/bible",
overwrite=True,
)
Collection bible exists and will be recreated 0.00s Loading volume medium from ~/github/ETCBC/bhsa/tf/2021/_local/medium ... 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed 0.08s Loading volume small from ~/github/ETCBC/bhsa/tf/2021/_local/small ... 0.03s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed 0.13s Loading volume tiny from ~/github/ETCBC/bhsa/tf/2021/_local/tiny ... 0.04s Feature overview: 112 for nodes; 4 for edges; 1 configs; 9 computed 0.22s inspect metadata ... 0.22s metadata sorted out 0.22s check nodetypes ... | volume medium | volume small | volume tiny 0.22s node types ok 0.22s Collect nodes from volumes ... | 0.00s Check against overlapping slots ... | | medium : 5268 slots | | small : 2505 slots | | tiny : 5792 slots | 0.01s no overlap | 0.01s Group non-slot nodes by type | | medium : 5269- 17286 | | small : 2506- 9495 | | tiny : 5793- 21779 | 0.01s Mapping nodes from volume to/from work ... | | book : 13566 - 13574 | | chapter : 13575 - 13611 | | clause : 13612 - 16416 | | clause_atom : 16417 - 19312 | | half_verse : 19313 - 20680 | | phrase : 20681 - 28480 | | phrase_atom : 28481 - 36802 | | sentence : 36803 - 38775 | | sentence_atom : 38776 - 40788 | | subphrase : 40789 - 45086 | | verse : 45087 - 45809 | | lex : 45810 - 47884 | 0.02s The new work has 47884 nodes of which 13565 slots 0.24s collection done 0.24s remap features ... 0.62s remapping done 0.62s write work as TF data set 1.07s writing done 1.07s done
True
CC-BY Dirk Roorda