You might want to consider the start of this tutorial.

Short introductions to other TF datasets:

or the

Quran

Upgrade features along a node mapping¶

Consider the semantic actor features in ch-jensen/participants/actor/tf.

We see only features for version c of the BHSA, but we prefer to work with version 2021 of the BHSA.

When we try to load the features by simply saying

A = use("ETCBC/bhsa", mod="ch-jensen/participants/actor/tf")

we have no luck, because there is no ch-jensen/participants/actor/tf/2021 on GitHub.

But, one of the features in the BHSA is omap@c-2021.tf and this contains the information to map all nodes in version c to the nodes of version 2021, as faithfully as is reasonably possible.

My homework as Text-Fabric developer is to make it so that the statement above works, by steering Text-Fabric to download version c and using the mapping feature to produce upgraded data in the right place. But I have not get round to that yet.

So, here is what you can do about it 😎.

File an issue and ask Christian whether he is inclined to use his software to build the features against BHSA version 2021. But he might be too busy to do that right now.
Fork ch-jensen/participants and try to run his software yourself. That might not be easy. It seems that the code to run is in another repository. Is all the input data publicly available? Are special settings needed for version 2021? Is the software still executable?
Do fork the repo by all means, and then use a tool of text-fabric to upgrade the features of the older version to the newer version.

We take you through the last option and evaluate how well the upgrade process fares.

In [1]:

%load_ext autoreload
%autoreload 2

Incantation¶

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.

In [2]:

import collections

from tf.app import use
from tf.fabric import Fabric
from tf.dataset.nodemaps import Versions

Load the current version of the BHSA¶

We need the current version (2021) of the BHSA anyway, so we are going to load it.

Convention¶

We will have two versions of the corpus in our notebook and in our variables. It is handy to have a consistent naming scheme:

N (the now version): 2021
P (the previous version): c

In [3]:

N = use("ETCBC/bhsa")

TF-app: ~/text-fabric-data/github/etcbc/bhsa/app

data: ~/text-fabric-data/github/etcbc/bhsa/tf/2021

data: ~/text-fabric-data/github/etcbc/phono/tf/2021

data: ~/text-fabric-data/github/etcbc/parallels/tf/2021

Text-Fabric: Text-Fabric API 10.2.0, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

book@ll

str

✅ book name in amharic (ኣማርኛ)

chapter

int

✅ chapter number (1; 2; 3; ...)

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

freq_lex

int

✅ frequency of lexemes

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

gloss

str

🆗 english translation of lexeme (beginning create god(s))

gn

str

✅ grammatical gender (m; f; NA; unknown.)

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

ls

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

nu

str

✅ grammatical number (sg; du; pl; NA; unknown.)

number

int

✅ sequence number of an object within its context

otype

str

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

ps

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

qere

str

✅ word pointed-transliterated masoretic reading correction

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

rank_lex

int

✅ ranking of lexemes based on freqnuecy

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

sp

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

st

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

tab

int

✅ clause atom: its level in the linguistic embedding

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

verse

int

✅ verse number

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

vs

str

✅ verbal stem (qal; piel; hif; apel; pael)

vt

str

✅ verbal tense (perf; impv; wayq; infc)

mother

none

✅ linguistic dependency between textual objects

oslots

none

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

phono_trailer

str

🆗 interword material in phonological transcription

Load the available version of the participant features¶

We have forked Christian's repo to etcbc/participants, so make sure to clone it to your computer:

cd ~/github/etcbc
git clone https://github.com/ETCBC/participants

In [4]:

LOCATION = "data:~/github/etcbc/participants/actor/tf"

Now we can load the actor features for version c.

In [5]:

P = use(LOCATION, version="c")

Text-Fabric: Text-Fabric API 10.2.0, no app configured
Data: ~/github/etcbc/participants/actor/tf/c
Features:

TF dataset (unspecified)

actor

str

Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

prs_actor

str

Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

coref

none

Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

By clicking the triangles you can find more information about these features.

Upgrade the participant features¶

We are going to upgrade the participant features from version c to version 2021.

For that, we use tf.dataset.nodemaps.Versions.

We initialize the Versions object with two text-fabric API objects:

In [15]:

apis = {"2021": N.api, "c": P.api}

V = Versions(apis, "c", "2021")

Finally we migrate the features from "c" to "2021" and save them in the correct location.

We skip the otext feature, since it is a special config feature, not a data feature made by Christian.

In [19]:

V.migrateFeatures(("actor", "coref", "prs_actor"), location=LOCATION)

    49s start migrating
  0.03s Done

Here it is handy to make the migration a bit more verbose. We do it again:

In [20]:

V.migrateFeatures(("actor", "coref", "prs_actor"), location=LOCATION, silent="auto")

    57s start migrating
  0.32s All additional features loaded - for details use TF.isLoaded()
  0.32s Mapping actor (node)
  0.33s Mapping coref (edge)
  0.40s Mapping prs_actor (node)
  0.00s Exporting 2 node and 1 edge and 0 config features to data:~/github/etcbc/participants/actor/tf/2021:
   |     0.00s T actor                to data:~/github/etcbc/participants/actor/tf/2021
   |     0.00s T prs_actor            to data:~/github/etcbc/participants/actor/tf/2021
   |     0.03s T coref                to data:~/github/etcbc/participants/actor/tf/2021
  0.03s Exported 2 node features and 1 edge features and 0 config features to data:~/github/etcbc/participants/actor/tf/2021
  0.03s Done

Load the upgraded module¶

Now we are in a position that we can load version 2021 of the BHSA together with the migrated module of participant features. Note that we we point Text-Fabric to the forked repo (etcbc instead of ch-jensen) and then to our local clone (:clone).

We increase the verbosity, in order to display more metadata of the features.

In [23]:

N = use("etcbc/bhsa", mod="etcbc/participants/actor/tf:clone", silent="verbose")

TF-app: ~/text-fabric-data/github/etcbc/bhsa/app

data: ~/text-fabric-data/github/etcbc/bhsa/tf/2021

data: ~/github/etcbc/participants/actor/tf/2021

data: ~/text-fabric-data/github/etcbc/phono/tf/2021

data: ~/text-fabric-data/github/etcbc/parallels/tf/2021

This is Text-Fabric 10.2.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

125 features found and 0 ignored
  0.67s Dataset without structure sections in otext:no structure functions in the T-API
  2.18s All features loaded/computed - for details use TF.isLoaded()
  1.48s All additional features loaded - for details use TF.isLoaded()

Text-Fabric: Text-Fabric API 10.2.0, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

author:

BHSA Data: Constantijn Sikkel; Parallels Notebook: Dirk Roorda, Martijn Naaijer

coreData:

BHSA

dateWritten:

2021-12-09T14:40:46Z

provenance:

Parallels notebook, see https://github.com/ETCBC/parallels

version:

2021

writtenBy:

Text-Fabric

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:55Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

book@ll

str

✅ book name in amharic (ኣማርኛ)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:20:27Z

email:

shebanq@ancient-data.org

encoders:

Dirk Roorda (TF)

language:

ኣማርኛ

languageCode:

am

languageEnglish:

amharic

provenance:

book names from wikipedia and other sources

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

chapter

int

✅ chapter number (1; 2; 3; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:55Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:56Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:56Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:57Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

freq_lex

int

✅ frequency of lexemes

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:24:45Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

computed on the basis of the ETCBC core set of features

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:57Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:57Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:58Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:58Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:59Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:04Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:04Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

gloss

str

🆗 english translation of lexeme (beginning create god(s))

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:13Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

gn

str

✅ grammatical gender (m; f; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:05Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:06Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:13Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:14Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

ls

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:08Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

nu

str

✅ grammatical number (sg; du; pl; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:08Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

number

int

✅ sequence number of an object within its context

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:09Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

otype

str

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:22:50Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional paragraph file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:10Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:11Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:11Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:11Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:12Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:12Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

ps

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:12Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere

str

✅ word pointed-transliterated masoretic reading correction

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

rank_lex

int

✅ ranking of lexemes based on freqnuecy

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:24:46Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

computed on the basis of the ETCBC core set of features

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:13Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

sp

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

st

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:14Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

tab

int

✅ clause atom: its level in the linguistic embedding

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:01Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:01Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

verse

int

✅ verse number

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:18Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

vs

str

✅ verbal stem (qal; piel; hif; apel; pael)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:18Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

vt

str

✅ verbal tense (perf; impv; wayq; infc)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:18Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

mother

none

✅ linguistic dependency between textual objects

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:22Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

oslots

none

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

etcbc/participants/actor/tf

actor

str

Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

coreData:

BHSA

coreVersion:

c

dateWritten:

2021-12-16T11:10:24Z

upgraded:

‼️ from version c to 2021

writtenBy:

Text-Fabric

prs_actor

str

Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

coreData:

BHSA

coreVersion:

c

dateWritten:

2021-12-16T11:10:24Z

upgraded:

‼️ from version c to 2021

writtenBy:

Text-Fabric

coref

none

Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

coreData:

BHSA

coreVersion:

c

dateWritten:

2021-12-16T11:10:24Z

upgraded:

‼️ from version c to 2021

writtenBy:

Text-Fabric

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

author:

BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda

coreData:

BHSA

dateWritten:

2021-12-09T14:25:55Z

provenance:

computed by the phono notebook, see https://github.com/ETCBC/phono

version:

2021

writtenBy:

Text-Fabric

phono_trailer

str

🆗 interword material in phonological transcription

author:

BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda

coreData:

BHSA

dateWritten:

2021-12-09T14:25:55Z

provenance:

computed by the phono notebook, see https://github.com/ETCBC/phono

version:

2021

writtenBy:

Text-Fabric

If you click the triangles and navigate to the full metadata of the participants features, you see a line

upgraded: ‼️ from version c to 2021

Checks¶

Let's do a few checks to see how well the upgrade process has worked.

First we load the c version of the BHSA and Christian's original features.

In [24]:

P = use("etcbc/bhsa", mod="ch-jensen/participants/actor/tf", version="c")

TF-app: ~/text-fabric-data/github/etcbc/bhsa/app

data: ~/text-fabric-data/github/etcbc/bhsa/tf/c

data: ~/text-fabric-data/github/ch-jensen/participants/actor/tf/c

data: ~/text-fabric-data/github/etcbc/phono/tf/c

data: ~/text-fabric-data/github/etcbc/parallels/tf/c

Text-Fabric: Text-Fabric API 10.2.0, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:

Parallel Passages

crossref

int

ch-jensen/participants/actor/tf

actor

str

Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

prs_actor

str

Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

coref

none

Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

book@ll

str

chapter

int

code

int

det

str

domain

str

freq_lex

int

function

str

g_cons

str

g_cons_utf8

str

g_lex

str

g_lex_utf8

str

g_word

str

g_word_utf8

str

gloss

str

gn

str

label

str

language

str

lex

str

lex_utf8

str

ls

str

nametype

str

nme

str

nu

str

number

int

otype

str

pargr

str

pdp

str

pfm

str

prs

str

prs_gn

str

prs_nu

str

prs_ps

str

ps

str

qere

str

str

str

str

int

rela

str

sp

str

st

str

tab

int

trailer

str

trailer_utf8

str

txt

str

typ

str

uvf

str

vbe

str

vbs

str

verse

int

voc_lex

str

voc_lex_utf8

str

vs

str

vt

str

mother

none

oslots

none

Phonetic Transcriptions

phono

str

phono_trailer

str

Below we are going to peek into the corpus by means of pretty displays. Here we tweak what is displayed and in what style.

we load the node mapping feature since it is not loaded by default
we hide a few container types that are not relevant for our investigation
we display material in sentence containers
we use the phonological transcription, instead of fully pointed Hebrew, so that non-Hebraists can see what is happening here.

In [25]:

N.load("omap@c-2021", silent="deep")
N.isLoaded("omap@c-2021")

hiddenTypes="half_verse,sentence_atom,clause,clause_atom"

N.displaySetup(hiddenTypes=hiddenTypes, condenseType="sentence", withNodes=True, fmt="text-phono-full")
P.displaySetup(hiddenTypes=hiddenTypes, condenseType="sentence", withNodes=True, fmt="text-phono-full")

omap@c-2021          edge (int) ⚠️ Maps the nodes of version c to 2021

Node feature "actor"¶

What are the node types that have an actor value?

In [26]:

{P.api.F.otype.v(n) for n in P.api.N.walk() if P.api.F.actor.v(n) is not None}

Out[26]:

{'phrase_atom', 'subphrase'}

In [27]:

{N.api.F.otype.v(n) for n in N.api.N.walk() if N.api.F.actor.v(n) is not None}

Out[27]:

{'phrase_atom', 'subphrase'}

Let's inspect the frequency lists of actor, per node type.

In [28]:

for otype in ("phrase_atom", "subphrase"):
    frequenciesN = N.api.F.actor.freqList(nodeTypes={otype})
    frequenciesP = P.api.F.actor.freqList(nodeTypes={otype})
    freqDictN = {v: f for (v, f) in frequenciesN}
    freqDictP = {v: f for (v, f) in frequenciesP}
    goodOnes = []
    badOnes = []
    for v in sorted(set(freqDictN) | set(freqDictP)):
        fN = freqDictN.get(v, 0)
        fP = freqDictP.get(v, 0)
        if fN == fP:
            goodOnes.append(v)
        else:
            badOnes.append((v, fN, fP))
            
    print(f"\nComparing frequencies on {otype}s: {len(goodOnes)} OK; {len(badOnes)} discrepancies")
    for (v, fN, fP) in badOnes[0:100]:
        print(f"{fN:>3} {fP:>3} {v}")

Comparing frequencies on phrase_atoms: 361 OK; 2 discrepancies
 91  94 >JC
  7   9 CNH

Comparing frequencies on subphrases: 135 OK; 0 discrepancies

Closer inspection¶

Most actors on phrase atoms carry over well. But e.g. CNH has discrepancies. Let's get a feel of why we get the discrepancies.

In [29]:

actorCNH = """
phrase_atom
  actor=CNH
"""

resultsN = N.search(actorCNH)
resultsP = P.search(actorCNH)

  0.09s 7 results
  0.09s 9 results

In [30]:

N.table(resultsN)
P.table(resultsP)

n	p	phrase_atom
1	Leviticus 25:11	945873tihyˈeh
2	Leviticus 25:12	945886yôvˈēl
3	Leviticus 25:12	945887hˈiw
4	Leviticus 25:12	945888qˌōḏeš
5	Leviticus 25:12	945889tihyˈeh
6	Leviticus 25:51	946353baššānˈîm
7	Leviticus 25:52	946362baššānˈîm

n	p	phrase_atom
1	Leviticus 25:10	945830šānˈā
2	Leviticus 25:11	945851šānˌā
3	Leviticus 25:11	945852tihyˈeh
4	Leviticus 25:12	945865yôvˈēl
5	Leviticus 25:12	945866hˈiw
6	Leviticus 25:12	945867qˌōḏeš
7	Leviticus 25:12	945868tihyˈeh
8	Leviticus 25:51	946332baššānˈîm
9	Leviticus 25:52	946341baššānˈîm

Clearly, there is something interesting in Leviticus 25 verses 10 and 11.

We compare verse 10 in both versions. Here are the original actors in version c:

In [31]:

P.show(resultsP, start=1, end=1, condensed=True)

sentence 1

Leviticus 25:10

sentence:1181939

phrase:690866

phrase_atom:945827

67332 wᵊ

phrase:690867

phrase_atom:945828

actor=BN JFR>L

67333 qiddaštˈem

phrase:690868

phrase_atom:945829

actor=CNH XMC

67334 ʔˈēṯ

subphrase:1318488

67335 šᵊnˈaṯ

subphrase:1318489

actor=XMC

67336 ha

67337 ḥᵃmiššîm

phrase:690869

phrase_atom:945830

actor=CNH

67338 šānˈā

Let's find the same sentence in version 2021

In [32]:

sP = 1181939
mappedSb = N.api.Es("omap@c-2021").f(sP)
mappedSb

Out[32]:

((1181957, None),)

In [33]:

N.pretty(mappedSb[0][0])

Leviticus 25:10

sentence:1181957

phrase:690899

phrase_atom:945850

67333 wᵊ

phrase:690900

phrase_atom:945851

actor=BN JFR>L

67334 qiddaštˈem

phrase:690901

phrase_atom:945852

actor=CNH XMC

67335 ʔˈēṯ

subphrase:1318495

67336 šᵊnˈaṯ

subphrase:1318498

subphrase:1318496

actor=XMC

67337 ha

67338 ḥᵃmiššîm

subphrase:1318497

67339 šānˈā

Aha: in version 2021 there is no counterpart of the phrase atom 945830, the one which carried actor=CNH.

This phrase atom has morphed into a subphrase, and hence we loose the connection and this particular annotation.

Edge feature `coref`¶

We also have an edge feature in the module. Let's test that as well.

First we explore the edge feature a little bit. From which node type to which node type do they go?

We constrain our displays to phrases from now on.

In [34]:

N.displaySetup(condenseType="phrase")
P.displaySetup(condenseType="phrase")

In [35]:

nodeTypes = collections.Counter()

for (f, ts) in P.api.E.coref.items():
    fromType = P.api.F.otype.v(f)
    for t in ts:
        toType = P.api.F.otype.v(t)
        nodeTypes[(fromType, toType)] += 1

In [36]:

nodeTypes

Out[36]:

Counter({('word', 'subphrase'): 471,
         ('word', 'phrase_atom'): 20254,
         ('word', 'word'): 19884,
         ('phrase_atom', 'phrase_atom'): 34404,
         ('phrase_atom', 'subphrase'): 1621,
         ('phrase_atom', 'word'): 20254,
         ('subphrase', 'word'): 471,
         ('subphrase', 'subphrase'): 1086,
         ('subphrase', 'phrase_atom'): 1621})

The coref relation seems to be symmetrical, so when we check cases, we can skip a number of pairs.

In [37]:

done = set()

for (fromType, toType) in nodeTypes:
    if (fromType, toType) in done:
        continue
    done.add((fromType, toType))
    done.add((toType, fromType))
    print(f"{fromType:<15} - {toType:<15}")
    template = f"""
{fromType}
-coref> {toType}
"""
    resultsN = N.search(template)
    resultsP = P.search(template)
    
    goodOnes = []
    badOnes = []

    phonoN = lambda n: N.api.T.text(n, fmt="text-phono-full")
    phonoP = lambda n: P.api.T.text(n, fmt="text-phono-full")

    for ((fN, tN), (fP, tP)) in zip(resultsN, resultsP):
        fNp = phonoN(fN)
        fPp = phonoP(fP)
        tNp = phonoN(tN)
        tPp = phonoP(tP)
        if fNp == fPp and tNp == tPp:
            goodOnes.append(f"{fNp} => {tNp}")
        else:
            fDif = fNp if fNp == fPp else f"{fNp} != {fPp}"
            tDif = tNp if tNp == tPp else f"{tNp} != {tPp}"
            badOnes.append((f"{fDif} => {tDif}", fN, fP, tN, tP))
    print(f"good: {len(goodOnes):>5}\nbad : {len(badOnes):>5}")
    if len(goodOnes):
        print("Good:")
        for rep in goodOnes[0:3]:
            print(f"\t{rep}")
    if len(badOnes):
        print("Bad:")
        for (rep, fN, fP, tN, tP) in badOnes[0:3]:
            print(f"\t{rep} {fN} {fP} => {tN} {tP}")
    print("-" * 40)
    print("")

word            - subphrase      
  0.09s 471 results
  0.08s 471 results
good:   471
bad :     0
Good:
	bānˈāʸw  => ʔˈel-ʔahᵃrˈōn 
	zivḥêhem  => bᵊnˈê yiśrāʔˈēl 
	zivḥêhem  => mibbᵊnˈê yiśrāʔˈēl 
----------------------------------------

word            - phrase_atom    
  0.17s 20188 results
  0.16s 20254 results
good:  3785
bad : 16403
Good:
	ʔᵃlêhˈem  => ʔˈel-ʔahᵃrˈōn wᵊʔel-bānˈāʸw wᵊʔˌel kol-bᵊnˈê yiśrāʔˈēl 
	hᵉvîʔˌô  => šˌôr ʔô-ḵˈeśev ʔô-ʕˌēz 
	ʕammˈô .  => ʔˌîš ʔîš 
Bad:
	zzarʕˈô  => ʔˈîš ʔîš  != ʔˈîš  64423 64422 => 944121 944096
	zzarʕˈô  => yittˈēn  != ʔîš  64423 64422 => 944127 944097
	zzarʕˈô  => yûmˈāṯ  != yittˈēn  64423 64422 => 944131 944103
----------------------------------------

word            - word           
  0.22s 19884 results
  0.22s 19884 results
good: 19884
bad :     0
Good:
	zivḥêhem  => zivḥêhˈem 
	zivḥêhem  => lāhˌem 
	zivḥêhem  => ḏōrōṯˈām . 
----------------------------------------

phrase_atom     - phrase_atom    
  0.16s 34215 results
  0.16s 34404 results
good:   745
bad : 33470
Good:
	yᵊḏabbˌēr  => [yᵊhwˌāh] 
	yᵊḏabbˌēr  => llēʔmˈōr . 
	yᵊḏabbˌēr  => ṣiwwˌā 
Bad:
	ʔˌîš ʔˈîš  != ʔˌîš  => ʔˌîš ʔˈîš  != ʔˈîš  943311 943285 => 943311 943286
	mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr  != ʔˈîš  => mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr  != ʔˌîš  943312 943286 => 943292 943285
	ggˈār  != mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr  => yāḡˈûr  != mibbˈêṯ yiśrāʔˈēl ûmin-haggˌēr  943314 943287 => 943294 943266
----------------------------------------

phrase_atom     - subphrase      
  0.06s 1599 results
  0.07s 1621 results
good:   220
bad :  1379
Good:
	yᵊḏabbˌēr  => [yᵊhwˈāh] 
	yᵊḏabbˌēr  => [yᵊhwˈāh] 
	[yᵊhwˌāh]  => [yᵊhwˈāh] 
Bad:
	ʔˌîš ʔˈîš  != ʔˌîš  => ʔîš  943311 943285 => 1317262 1317261
	ʔˌîš ʔˈîš  != ʔˌîš  => ʔˈîš  943311 943285 => 1317334 1317331
	ggˈār  != ʔˈîš  => min-haggˌēr  != ʔîš  943314 943286 => 1317308 1317261
----------------------------------------

subphrase       - subphrase      
  0.05s 1086 results
  0.04s 1086 results
good:  1086
bad :     0
Good:
	bᵊnˈê yiśrāʔˈēl  => mibbᵊnˈê yiśrāʔˈēl 
	yiśrāʔˈēl  => yiśrāʔˈēl 
	yiśrāʔˈēl  => yiśrāʔˈēl 
----------------------------------------

Observations:¶

All coref links between words and subphrases match perfectly.

But where phrase atoms are involved, we get bad ones, sometimes more bad ones than good ones.

We inspect a few bad cases.

between words and phrase atoms:¶

zzarʕˈô  => ʔˈîš ʔîš  != ʔˈîš  64423 64422 => 944121 944096

In [38]:

fP = 64422
tP = 944096
pfP = P.api.L.u(fP, otype="phrase")[0]
ptP = P.api.L.u(tP, otype="phrase")[0]
highlightsP = {fP: "orange", tP: "cyan"}

In [39]:

fN = 64423
tN = 944121
pfN = N.api.L.u(fN, otype="phrase")[0]
ptN = N.api.L.u(tN, otype="phrase")[0]
highlightsN = {fN: "orange", tN: "cyan"}

In [40]:

# original coref link
P.pretty(pfP, highlights=highlightsP)
if pfP != ptP:
    P.pretty(ptP, highlights=highlightsP)

Leviticus 20:2

phrase:689238

phrase_atom:944104

64421 mi

64422 zzarʕˈô

Leviticus 20:2

phrase:689232

phrase_atom:944096

64406 ʔˈîš

phrase_atom:944097

64407 ʔîš

phrase_atom:944098

subphrase:1317606

64408 mi

subphrase:1317604

64409 bbᵊnˌê

subphrase:1317605

64410 yiśrāʔˈēl

64411 û

subphrase:1317607

64412 min-

64413 ha

64414 ggˈēr

In [41]:

# mapped `coref` link
N.pretty(pfN, highlights=highlightsN)
if pfN != pfP:
    N.pretty(ptN, highlights=highlightsN)

Leviticus 20:2

phrase:689271

phrase_atom:944128

64422 mi

64423 zzarʕˈô

Leviticus 20:2

phrase:689265

phrase_atom:944121

subphrase:1317607

64407 ʔˈîš

subphrase:1317608

64408 ʔîš

phrase_atom:944122

subphrase:1317611

64409 mi

subphrase:1317609

64410 bbᵊnˌê

subphrase:1317610

64411 yiśrāʔˈēl

64412 û

subphrase:1317612

64413 min-

64414 ha

64415 ggˈēr

Force majeure! The phrase atom in the original has changed. In the new version it is combined with its neighbour, and the two constituting parts are now subphrases.

between phrase atoms:¶

ʔˌîš ʔˈîš  != ʔˌîš  => ʔˌîš ʔˈîš  != ʔˈîš  943311 943285 => 943311 943286

In [42]:

fP = 943285
tP = 943286
pfP = P.api.L.u(fP, otype="phrase")[0]
ptP = P.api.L.u(tP, otype="phrase")[0]
highlightsP = {fP: "orange", tP: "cyan"}

In [43]:

fN = 943311
tN = 943311
pfN = N.api.L.u(tN, otype="phrase")[0]
ptN = N.api.L.u(tN, otype="phrase")[0]
highlightsN = {fN: "orange", tN: "cyan"}

In [44]:

P.pretty(pfP, highlights=highlightsP)
if pfP != ptP:
    P.pretty(ptP, highlights=highlightsP)

Leviticus 17:10

phrase:688450

phrase_atom:943285

63210 ʔˌîš

phrase_atom:943286

63211 ʔˈîš

phrase_atom:943287

subphrase:1317318

63212 mi

subphrase:1317316

63213 bbˈêṯ

subphrase:1317317

63214 yiśrāʔˈēl

63215 û

subphrase:1317319

63216 min-

63217 ha

63218 ggˌēr

In [45]:

N.pretty(pfN, highlights=highlightsN)
if pfN != ptN:
    N.pretty(ptN, highlights=highlightsN)

Leviticus 17:10

phrase:688483

phrase_atom:943311

subphrase:1317317

63211 ʔˌîš

subphrase:1317318

63212 ʔˈîš

phrase_atom:943312

subphrase:1317321

63213 mi

subphrase:1317319

63214 bbˈêṯ

subphrase:1317320

63215 yiśrāʔˈēl

63216 û

subphrase:1317322

63217 min-

63218 ha

63219 ggˌēr

The same kind of force majeure. In this case the link was between the two original phrase atoms. In the new version these have merged into one phrase atom, and now there is a coref self-link!

between phrase atoms and subphrases:¶

ʔˌîš ʔˈîš  != ʔˌîš  => ʔîš  943311 943285 => 1317262 1317261

In [46]:

fP = 943285
tP = 1317261
pfP = P.api.L.u(fP, otype="phrase")[0]
ptP = P.api.L.u(tP, otype="phrase")[0]
highlightsP = {fP: "orange", tP: "cyan"}

In [47]:

fN = 943311
tN = 1317262
pfN = N.api.L.u(fN, otype="phrase")[0]
ptN = N.api.L.u(tN, otype="phrase")[0]
highlightsN = {fN: "orange", tN: "cyan"}

In [48]:

# original `coref` link
P.pretty(pfP, highlights=highlightsP)
if pfP != ptP:
    P.pretty(ptP, highlights=highlightsP)

Leviticus 17:10

phrase:688450

phrase_atom:943285

63210 ʔˌîš

phrase_atom:943286

63211 ʔˈîš

phrase_atom:943287

subphrase:1317318

63212 mi

subphrase:1317316

63213 bbˈêṯ

subphrase:1317317

63214 yiśrāʔˈēl

63215 û

subphrase:1317319

63216 min-

63217 ha

63218 ggˌēr

Leviticus 17:3

phrase:688362

phrase_atom:943190

subphrase:1317260

63037 ʔˌîš

subphrase:1317261

63038 ʔîš

phrase_atom:943191

63039 mi

subphrase:1317262

63040 bbˈêṯ

subphrase:1317263

63041 yiśrāʔˈēl

In [49]:

# mapped `coref` link
N.pretty(pfN, highlights=highlightsN)
if pfN != ptN:
    N.pretty(ptN, highlights=highlightsN)

Leviticus 17:10

phrase:688483

phrase_atom:943311

subphrase:1317317

63211 ʔˌîš

subphrase:1317318

63212 ʔˈîš

phrase_atom:943312

subphrase:1317321

63213 mi

subphrase:1317319

63214 bbˈêṯ

subphrase:1317320

63215 yiśrāʔˈēl

63216 û

subphrase:1317322

63217 min-

63218 ha

63219 ggˌēr

Leviticus 17:3

phrase:688395

phrase_atom:943216

subphrase:1317261

63038 ʔˌîš

subphrase:1317262

63039 ʔîš

phrase_atom:943217

63040 mi

subphrase:1317263

63041 bbˈêṯ

subphrase:1317264

63042 yiśrāʔˈēl

The same kind of force majeure.

Clearly, there is a massive reorganization of phrase atoms in version 2021 as compared to version c.

Conclusion¶

It is great to be able to upgrade features from a version against which they have been created to a newer version. But the corpus may have been changed in unforeseen ways, and not every node in the old corpus can be necessarily matched with a unique node in the new corpus. If there are annotations on such nodes, then they either do not carry over to the new version, or they may carry over to unintended extra nodes in the new version.

We saw a lot of "bad" cases. But yet, all these discrepancies are really not that bad. The mapping has always picked the closest node in the new version that corresponds with the original node in the old version.

There are ways to detect such discrepancies, and the node mapping already has relevant information about the quality of the mapping. In fact, the migrateFeatures of Text-Fabric uses the quality information when it assigns feature values to nodes.

But nothing beats generating the features against the new version by the same code that generated them against the old version. If there are issues due to important version differences, the author of the generated feature knows best how to handle that.

All steps¶

start your first step in mastering the bible computationally
display become an expert in creating pretty displays of your text structures
search turbo charge your hand-coding with search templates
export Excel make tailor-made spreadsheets out of your results
share draw in other people's data and let them use yours
export export your dataset as an Emdros database
annotate annotate plain text by means of other tools and import the annotations as TF features
map map somebody else's annotations to a new version of the corpus
volumes work with selected books only
trees work with the BHSA data as syntax trees

CC-BY Dirk Roorda

Upgrade features along a node mapping¶

Incantation¶

Load the current version of the BHSA¶

Convention¶

Load the available version of the participant features¶

Upgrade the participant features¶

Load the upgraded module¶

Checks¶

Node feature "actor"¶

Closer inspection¶

Edge feature coref¶

Observations:¶

between words and phrase atoms:¶

between phrase atoms:¶

between phrase atoms and subphrases:¶

Conclusion¶

All steps¶

Edge feature `coref`¶