The ETCBC has a few other repositories with data that work in conjunction with the BHSA data. One of them you have already seen: phono, for phonetic transcriptions. There is also parallels for detecting parallel passages, and valence for studying patterns around verbs that determine their meanings.
If you study the additional data, you can observe how that data is created and also how it is turned into a text-fabric data module. The last step is incredibly easy. You can write out every Python dictionary where the keys are numbers and the values string or numbers as a Text-Fabric feature. When you are creating data, you have already constructed those dictionaries, so writing them out is just one method call. See for example how the flowchart notebook in valence writes out verb sense data.
You can then easily share your new features on GitHub, so that your colleagues everywhere can try it out for themselves.
Here is how you draw in other data, for example
You can add such data on the fly, by passing a mod={org}/{repo}/{path}
parameter,
or a bunch of them separated by commas.
If the data is there, it will be auto-downloaded and stored on your machine.
Let's do it.
%load_ext autoreload
%autoreload 2
The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.
from tf.app import use
A = use(
'bhsa',
mod=(
'etcbc/valence/tf,'
'etcbc/lingo/heads/tf,'
'ch-jensen/Semantic-mapping-of-participants/actor/tf'
),
hoist=globals(),
)
Using TF-app in /Users/dirk/github/annotation/app-bhsa/code: repo clone offline under ~/github (local github) connecting to online GitHub repo etcbc/bhsa ... connected Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/c: rv1.6 (latest release) connecting to online GitHub repo etcbc/phono ... connected Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/c: r1.2 (latest release) connecting to online GitHub repo etcbc/parallels ... connected Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/c: r1.2 (latest release) connecting to online GitHub repo etcbc/valence ... connected Using data in /Users/dirk/text-fabric-data/etcbc/valence/tf/c: r1.1 (latest release) connecting to online GitHub repo etcbc/lingo ... connected Using data in /Users/dirk/text-fabric-data/etcbc/lingo/heads/tf/c: r0.1 (latest release) connecting to online GitHub repo ch-jensen/Semantic-mapping-of-participants ... connected downloading https://github.com/ch-jensen/participants/releases/download/1.3/actor-tf-c.zip ... unzipping ... saving data could not save data to /Users/dirk/text-fabric-data/ch-jensen/Semantic-mapping-of-participants/actor/tf/c Will try something else actor/tf/c/actor.tf...downloaded actor/tf/c/coref.tf...downloaded actor/tf/c/prs_actor.tf...downloaded OK Using data in /Users/dirk/text-fabric-data/ch-jensen/Semantic-mapping-of-participants/actor/tf/c: r1.3=#1c17398f92c0836c06de5e1798687c3fa18133cf (latest release)
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det freq_lex function g_word g_word_utf8 gloss gn label language lex lex_utf8 ls nametype nu number otype pdp prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rela sp st trailer trailer_utf8 txt typ verse voc_lex voc_lex_utf8 vs vt mother oslots
ch-jensen/Semantic-mapping-of-participants/actor/tf: actor prs_actor coref
etcbc/lingo/heads/tf: heads noun_heads prep_obj
Parallel Passages: crossref
Phonetic Transcriptions: phono phono_trailer
etcbc/valence/tf: cfunction f_correction grammatical lexical original predication s_manual semantic sense valence
You see that the features from the etcbc/valence/tf and etcbc/lingo/heads/tf modules have been added to the mix.
If you want to check for data updates, you can add an check=True
argument.
Note that edge features are in bold italic.
Let's find out about sense.
F.sense.freqList()
(('--', 17999), ('d-', 9979), ('-p', 6193), ('-c', 4250), ('-i', 2869), ('dp', 1853), ('dc', 1073), ('di', 889), ('l.', 876), ('i.', 629), ('n.', 533), ('-b', 66), ('db', 61), ('c.', 57), ('k.', 54))
Which nodes have a sense feature?
{F.otype.v(n) for n in N() if F.sense.v(n)}
{'word'}
results = A.search('''
word sense
''')
0.32s 47381 results
Let's show some of the rarer sense values:
results = A.search('''
word sense=k.
''')
0.39s 54 results
A.table(results, end=5)
If we do a pretty display, the sense
feature shows up.
A.show(results, start=1, end=1, withNodes=True)
result 1
Let's find out about actor.
fl = F.actor.freqList()
len(fl)
411
fl[0:10]
(('JHWH', 358), ('BN JFR>L', 203), ('>JC', 103), ('2sm"YOUSgmas"', 66), ('MCH', 61), ('>RY', 58), ('>TM', 45), ('JFR>L', 35), ('NPC', 35), ('>X "YOUSgmas"', 34))
Which nodes have an actor feature?
{F.otype.v(n) for n in N() if F.actor.v(n)}
{'phrase_atom', 'subphrase'}
results = A.search('''
phrase_atom actor
''')
0.18s 2073 results
Let's show some of the rarer actor values:
results = A.search('''
phrase_atom actor=KHN
''')
0.27s 30 results
A.table(results)
Now, heads
is an edge feature, we cannot directly make it visible in pretty displays, but we can use it in queries.
We also want to make the feature sense
visible, so we mention the feature in the query, without restricting the results.
results = A.search('''
book book=Genesis
chapter chapter=1
clause
phrase
-heads> word sense*
'''
)
0.57s 402 results
We make the feature sense
visible:
A.show(results, start=1, end=3, withNodes=True)
result 1
result 2
result 3
Note how the words that are heads of their phrases are highlighted within their phrases.
Here is a query that shows results with all features.
results = A.search('''
book book=Leviticus
phrase sense*
phrase_atom actor=KHN
-heads> word
''')
0.74s 30 results
A.displaySetup(condensed=True, condenseType='verse')
A.show(results, start=8, end=8)
A.displaySetup()
If you want to load your features from your own local github repositories, instead of from the data
that TF has downloaded for you into ~/text-fabric-data
, you can do so by passing the checkout parameter checkout='clone'
.
A = use('bhsa', checkout='clone', hoist=globals())
Using TF-app in /Users/dirk/github/annotation/app-bhsa/code: repo clone offline under ~/github (local github) Using data in /Users/dirk/github/etcbc/bhsa/tf/c: repo clone offline under ~/github (local github) Using data in /Users/dirk/github/etcbc/phono/tf/c: repo clone offline under ~/github (local github) Using data in /Users/dirk/github/etcbc/parallels/tf/c: repo clone offline under ~/github (local github)
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det freq_lex function g_word g_word_utf8 gloss gn label language lex lex_utf8 ls nametype nu number otype pdp prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rela sp st trailer trailer_utf8 txt typ verse voc_lex voc_lex_utf8 vs vt mother oslots
Parallel Passages: crossref
Phonetic Transcriptions: phono phono_trailer
Hover over the features to see where they come from, and you'll see they come from your local github repo.
You may load extra features by specifying locations and modules manually.
Here we get the valence features, but not as a module, but in a custom way.
A = use('bhsa', locations='~/text-fabric-data/etcbc/valence/tf', modules='c', hoist=globals())
Using TF-app in /Users/dirk/github/annotation/app-bhsa/code: repo clone offline under ~/github (local github) connecting to online GitHub repo etcbc/bhsa ... connected Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/c: rv1.6 (latest release) connecting to online GitHub repo etcbc/phono ... connected Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/c: r1.2 (latest release) connecting to online GitHub repo etcbc/parallels ... connected Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/c: r1.2 (latest release)
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det freq_lex function g_word g_word_utf8 gloss gn label language lex lex_utf8 ls nametype nu number otype pdp prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rela sp st trailer trailer_utf8 txt typ verse voc_lex voc_lex_utf8 vs vt mother oslots
Parallel Passages: crossref
Phonetic Transcriptions: phono phono_trailer
etcbc/valence/tf/c: cfunction f_correction grammatical lexical original predication s_manual semantic sense valence
Still, all features of the main corpus and the standard modules have been loaded.
Using locations
and modules
is useful if you want to load extra features from custom locations on your computer.
If you want to load less features, you can set up TF in the traditional way first, and then wrap the app API around it.
Here we load just the minimal set of features to get going.
from tf.fabric import Fabric
TF = Fabric(locations='~/github/etcbc/bhsa/tf', modules='c')
This is Text-Fabric 7.5.4 Api reference : https://annotation.github.io/text-fabric/Api/Fabric/ 114 features found and 0 ignored
api = TF.load('pdp vs vt gn nu ps lex')
0.00s loading features ... | 0.10s B lex from /Users/dirk/github/etcbc/bhsa/tf/c | 0.11s B pdp from /Users/dirk/github/etcbc/bhsa/tf/c | 0.11s B vs from /Users/dirk/github/etcbc/bhsa/tf/c | 0.11s B vt from /Users/dirk/github/etcbc/bhsa/tf/c | 0.08s B gn from /Users/dirk/github/etcbc/bhsa/tf/c | 0.10s B nu from /Users/dirk/github/etcbc/bhsa/tf/c | 0.10s B ps from /Users/dirk/github/etcbc/bhsa/tf/c 4.58s All features loaded/computed - for details use loadLog()
And finally we wrap the app around it:
A = use('bhsa', api=api, hoist=globals())
Using TF-app in /Users/dirk/github/annotation/app-bhsa/code: repo clone offline under ~/github (local github)
This loads much quicker.
A small test: what are the verbal stems?
F.vs.freqList()
(('NA', 352874), ('qal', 50205), ('hif', 9407), ('piel', 6811), ('nif', 4145), ('hit', 960), ('peal', 654), ('pual', 492), ('hof', 427), ('hsht', 172), ('haf', 163), ('pael', 88), ('htpe', 53), ('peil', 40), ('htpa', 30), ('shaf', 15), ('etpa', 8), ('hotp', 8), ('pasq', 6), ('poel', 5), ('tif', 5), ('afel', 4), ('etpe', 3), ('htpo', 3), ('nit', 3), ('poal', 3))