In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

from tf.app import use

In [3]:

A = use("ETCBC/bhsa", hoist=globals())

Locating corpus resources ...

app: ~/text-fabric-data/github/ETCBC/bhsa/app

data: ~/text-fabric-data/github/ETCBC/bhsa/tf/2021

data: ~/text-fabric-data/github/ETCBC/phono/tf/2021

data: ~/text-fabric-data/github/ETCBC/parallels/tf/2021

Text-Fabric: Text-Fabric API 11.4.2, ETCBC/bhsa/app v3, Search Reference
Data: ETCBC - bhsa 2021, Character table, Feature docs

Node types

Name	# of nodes	# slots/node	% coverage
book	39	10938.21	100
chapter	929	459.19	100
lex	9230	46.22	100
verse	23213	18.38	100
half_verse	45179	9.44	100
sentence	63717	6.70	100
sentence_atom	64514	6.61	100
clause	88131	4.84	100
clause_atom	90704	4.70	100
phrase	253203	1.68	100
phrase_atom	267532	1.59	100
subphrase	113850	1.42	38
word	426590	1.00	100

Sets: no custom sets
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

book@ll

str

✅ book name in amharic (ኣማርኛ)

chapter

int

✅ chapter number (1; 2; 3; ...)

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

freq_lex

int

✅ frequency of lexemes

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

gloss

str

🆗 english translation of lexeme (beginning create god(s))

gn

str

✅ grammatical gender (m; f; NA; unknown.)

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

ls

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

nu

str

✅ grammatical number (sg; du; pl; NA; unknown.)

number

int

✅ sequence number of an object within its context

otype

str

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

ps

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

qere

str

✅ word pointed-transliterated masoretic reading correction

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

rank_lex

int

✅ ranking of lexemes based on freqnuecy

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

sp

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

st

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

tab

int

✅ clause atom: its level in the linguistic embedding

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

verse

int

✅ verse number

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

vs

str

✅ verbal stem (qal; piel; hif; apel; pael)

vt

str

✅ verbal tense (perf; impv; wayq; infc)

mother

none

✅ linguistic dependency between textual objects

oslots

none

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

phono_trailer

str

🆗 interword material in phonological transcription

Text-Fabric API: names N F E L T S C TF directly usable

In [4]:

from functools import cmp_to_key

bhsa = A.api

def convert_to_chunk(node):
    slots = bhsa.L.d(node, 'word')
    return (node, set(slots))


def _canonical_order(chunk_a, chunk_b):
    (n1, slotsA) = chunk_a
    (n2, slotsB) = chunk_b
    
    if slotsA == slotsB:
        return 0

    aWithoutB = slotsA - slotsB
    if not aWithoutB:
        return 1

    bWithoutA = slotsB - slotsA
    if not bWithoutA:
        return -1

    aMin = min(aWithoutB)
    bMin = min(bWithoutA)
    return -1 if aMin < bMin else 1


canonical_order = cmp_to_key(_canonical_order)

otype = 'subphrase'
canon_order = sorted(
    (convert_to_chunk(n)
    for n in bhsa.F.otype.s(otype)),
    key=canonical_order,
)
iter_order = [
    convert_to_chunk(n) for n in bhsa.F.otype.s(otype)
]

assert canon_order == iter_order

In [5]:

canon_order[32:35]

Out[5]:

[(1300573, {256, 257, 258, 259, 260}),
 (1300571, {256, 257}),
 (1300572, {259, 260})]

In [6]:

iter_order[32:35]

Out[6]:

[(1300573, {256, 257, 258, 259, 260}),
 (1300571, {256, 257}),
 (1300572, {259, 260})]

In [7]:

F.otype.s("subphrase")[32:35]

Out[7]:

(1300573, 1300571, 1300572)

In [3]:

c1 = F.otype.s("chapter")[0]

In [ ]:

for (node, boundary) in N.walk(nodes=L.d(c1), events=True):
    boundaryRep = "slot" if boundary is None else "end" if boundary else "start"
    nodeRep = f"{F.otype.v(node):<20} {node}"
    print(f"{boundaryRep:<5} {nodeRep}")

In [ ]: