Notebook

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

from tf.app import use

In [3]:

A = use("ETCBC/bhsa:clone", hoist=globals())

Locating corpus resources ...

app: ~/github/ETCBC/bhsa/app

data: ~/text-fabric-data/github/ETCBC/bhsa/tf/2021

data: ~/text-fabric-data/github/ETCBC/phono/tf/2021

data: ~/text-fabric-data/github/ETCBC/parallels/tf/2021

data: ~/text-fabric-data/github/ETCBC/bhsa/ner

TF: TF API 12.5.2, ETCBC/bhsa/app v3, Search Reference
Data: ETCBC - bhsa 2021, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	39	10938.21	100
chapter	929	459.19	100
lex	9230	46.22	100
verse	23213	18.38	100
half_verse	45179	9.44	100
sentence	63717	6.70	100
sentence_atom	64514	6.61	100
clause	88131	4.84	100
clause_atom	90704	4.70	100
phrase	253203	1.68	100
phrase_atom	267532	1.59	100
subphrase	113850	1.42	38
word	426590	1.00	100

Sets: no custom sets
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

book@ll

str

✅ book name in amharic (ኣማርኛ)

chapter

int

✅ chapter number (1; 2; 3; ...)

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

freq_lex

int

✅ frequency of lexemes

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

gloss

str

🆗 english translation of lexeme (beginning create god(s))

str

✅ grammatical gender (m; f; NA; unknown.)

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

str

✅ grammatical number (sg; du; pl; NA; unknown.)

number

int

✅ sequence number of an object within its context

otype

str

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

qere

str

✅ word pointed-transliterated masoretic reading correction

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

rank_lex

int

✅ ranking of lexemes based on freqnuecy

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

tab

int

✅ clause atom: its level in the linguistic embedding

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

verse

int

✅ verse number

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

str

✅ verbal stem (qal; piel; hif; apel; pael)

str

✅ verbal tense (perf; impv; wayq; infc)

mother

none

✅ linguistic dependency between textual objects

oslots

none

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

phono_trailer

str

🆗 interword material in phonological transcription

Settings:

specified

apiVersion: 3
appName: ETCBC/bhsa
appPath: /Users/me/github/ETCBC/bhsa/app
commit: no value
css: ''
dataDisplay:
- exampleSectionHtml:
  <code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
- excludedFeatures:
  - g_uvf_utf8
  - g_vbs
  - kq_hybrid
  - languageISO
  - g_nme
  - lex0
  - is_root
  - g_vbs_utf8
  - g_uvf
  - dist
  - root
  - suffix_person
  - g_vbe
  - dist_unit
  - suffix_number
  - distributional_parent
  - kq_hybrid_utf8
  - crossrefSET
  - instruction
  - g_prs
  - lexeme_count
  - rank_occ
  - g_pfm_utf8
  - freq_occ
  - crossrefLCS
  - functional_parent
  - g_pfm
  - g_nme_utf8
  - g_vbe_utf8
  - kind
  - g_prs_utf8
  - suffix_gender
  - mother_object_type
- noneValues:
  - absent
  - n/a
  - none
  - unknown
  - no value
  - NA
docs:
- docBase: {docRoot}/{repo}
- docExt: ''
- docPage: ''
- docRoot: https://{org}.github.io
- featurePage: 0_home
interfaceDefaults: {}
isCompatible: True
local: clone
localDir: /Users/me/github/ETCBC/bhsa/_temp
provenanceSpec:
- corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
- doi: 10.5281/zenodo.1007624
- extraData: ner
- moduleSpecs:
  - :
    backend: no value
    corpus: Phonetic Transcriptions
    docUrl:
    https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
    doi: 10.5281/zenodo.1007636
    org: ETCBC
    relative: /tf
    repo: phono
  - :
    backend: no value
    corpus: Parallel Passages
    docUrl:
    https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
    doi: 10.5281/zenodo.1007642
    org: ETCBC
    relative: /tf
    repo: parallels
- org: ETCBC
- relative: /tf
- repo: bhsa
- version: 2021
- webBase: https://shebanq.ancient-data.org/hebrew
- webHint: Show this on SHEBANQ
- webLang: la
- webLexId: True
- webUrl:
  {webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: no value
typeDisplay:
- clause:
  - label: {typ} {rela}
  - style: ''
- clause_atom:
  - hidden: True
  - label: {code}
  - level: 1
  - style: ''
- half_verse:
  - hidden: True
  - label: {label}
  - style: ''
  - verselike: True
- lex:
  - featuresBare: gloss
  - label: {voc_lex_utf8}
  - lexOcc: word
  - style: orig
  - template: {voc_lex_utf8}
- phrase:
  - label: {typ} {function}
  - style: ''
- phrase_atom:
  - hidden: True
  - label: {typ} {rela}
  - level: 1
  - style: ''
- sentence:
  - label: {number}
  - style: ''
- sentence_atom:
  - hidden: True
  - label: {number}
  - level: 1
  - style: ''
- subphrase:
  - hidden: True
  - label: {number}
  - style: ''
- word:
  - features: pdp vs vt
  - featuresBare: lex:gloss
writing: hbo

TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

Information about features and types¶

In [4]:

A.header(allMeta=True)

TF: TF API 12.5.2, ETCBC/bhsa/app v3, Search Reference
Data: ETCBC - bhsa 2021, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	39	10938.21	100
chapter	929	459.19	100
lex	9230	46.22	100
verse	23213	18.38	100
half_verse	45179	9.44	100
sentence	63717	6.70	100
sentence_atom	64514	6.61	100
clause	88131	4.84	100
clause_atom	90704	4.70	100
phrase	253203	1.68	100
phrase_atom	267532	1.59	100
subphrase	113850	1.42	38
word	426590	1.00	100

Sets: no custom sets
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

author:

BHSA Data: Constantijn Sikkel; Parallels Notebook: Dirk Roorda, Martijn Naaijer

coreData:

BHSA

dateWritten:

2021-12-09T14:40:46Z

provenance:

Parallels notebook, see https://github.com/ETCBC/parallels

version:

2021

writtenBy:

Text-Fabric

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:55Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

book@ll

str

✅ book name in amharic (ኣማርኛ)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:20:27Z

email:

shebanq@ancient-data.org

encoders:

Dirk Roorda (TF)

language:

ኣማርኛ

languageCode:

languageEnglish:

amharic

provenance:

book names from wikipedia and other sources

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

chapter

int

✅ chapter number (1; 2; 3; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:55Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:56Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:56Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:57Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

freq_lex

int

✅ frequency of lexemes

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:24:45Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

computed on the basis of the ETCBC core set of features

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:57Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:57Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:58Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:58Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:17:59Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:04Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:04Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

gloss

str

🆗 english translation of lexeme (beginning create god(s))

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:13Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ grammatical gender (m; f; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:05Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:06Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:13Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:14Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:08Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ grammatical number (sg; du; pl; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:08Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

number

int

✅ sequence number of an object within its context

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:09Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

otype

str

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:15Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:22:50Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional paragraph file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:10Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:11Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:11Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:11Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:12Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:12Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:12Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere

str

✅ word pointed-transliterated masoretic reading correction

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:23:29Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional ketiv/qere file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

rank_lex

int

✅ ranking of lexemes based on freqnuecy

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:24:46Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

computed on the basis of the ETCBC core set of features

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:13Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:14Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

tab

int

✅ clause atom: its level in the linguistic embedding

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:01Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:01Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

verse

int

✅ verse number

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:18Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:16Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

provenance:

from additional lexicon file provided by the ETCBC

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ verbal stem (qal; piel; hif; apel; pael)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:18Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

str

✅ verbal tense (perf; impv; wayq; infc)

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:18Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

mother

none

✅ linguistic dependency between textual objects

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:18:22Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

oslots

none

author:

Eep Talstra Centre for Bible and Computer

dataset:

BHSA

datasetName:

Biblia Hebraica Stuttgartensia Amstelodamensis

dateWritten:

2021-12-09T14:21:17Z

email:

shebanq@ancient-data.org

encoders:

Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)

version:

2021

website:

https://shebanq.ancient-data.org

writtenBy:

Text-Fabric

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

author:

BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda

coreData:

BHSA

dateWritten:

2021-12-09T14:25:55Z

provenance:

computed by the phono notebook, see https://github.com/ETCBC/phono

version:

2021

writtenBy:

Text-Fabric

phono_trailer

str

🆗 interword material in phonological transcription

author:

BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda

coreData:

BHSA

dateWritten:

2021-12-09T14:25:55Z

provenance:

computed by the phono notebook, see https://github.com/ETCBC/phono

version:

2021

writtenBy:

Text-Fabric

Settings:

specified

apiVersion: 3
appName: ETCBC/bhsa
appPath: /Users/me/github/ETCBC/bhsa/app
commit: no value
css: ''
dataDisplay:
- exampleSectionHtml:
  <code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
- excludedFeatures:
  - g_uvf_utf8
  - g_vbs
  - kq_hybrid
  - languageISO
  - g_nme
  - lex0
  - is_root
  - g_vbs_utf8
  - g_uvf
  - dist
  - root
  - suffix_person
  - g_vbe
  - dist_unit
  - suffix_number
  - distributional_parent
  - kq_hybrid_utf8
  - crossrefSET
  - instruction
  - g_prs
  - lexeme_count
  - rank_occ
  - g_pfm_utf8
  - freq_occ
  - crossrefLCS
  - functional_parent
  - g_pfm
  - g_nme_utf8
  - g_vbe_utf8
  - kind
  - g_prs_utf8
  - suffix_gender
  - mother_object_type
- noneValues:
  - absent
  - n/a
  - none
  - unknown
  - no value
  - NA
docs:
- docBase: {docRoot}/{repo}
- docExt: ''
- docPage: ''
- docRoot: https://{org}.github.io
- featurePage: 0_home
interfaceDefaults: {}
isCompatible: True
local: clone
localDir: /Users/me/github/ETCBC/bhsa/_temp
provenanceSpec:
- corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
- doi: 10.5281/zenodo.1007624
- extraData: ner
- moduleSpecs:
  - :
    backend: no value
    corpus: Phonetic Transcriptions
    docUrl:
    https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
    doi: 10.5281/zenodo.1007636
    org: ETCBC
    relative: /tf
    repo: phono
  - :
    backend: no value
    corpus: Parallel Passages
    docUrl:
    https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
    doi: 10.5281/zenodo.1007642
    org: ETCBC
    relative: /tf
    repo: parallels
- org: ETCBC
- relative: /tf
- repo: bhsa
- version: 2021
- webBase: https://shebanq.ancient-data.org/hebrew
- webHint: Show this on SHEBANQ
- webLang: la
- webLexId: True
- webUrl:
  {webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: no value
typeDisplay:
- clause:
  - label: {typ} {rela}
  - style: ''
- clause_atom:
  - hidden: True
  - label: {code}
  - level: 1
  - style: ''
- half_verse:
  - hidden: True
  - label: {label}
  - style: ''
  - verselike: True
- lex:
  - featuresBare: gloss
  - label: {voc_lex_utf8}
  - lexOcc: word
  - style: orig
  - template: {voc_lex_utf8}
- phrase:
  - label: {typ} {function}
  - style: ''
- phrase_atom:
  - hidden: True
  - label: {typ} {rela}
  - level: 1
  - style: ''
- sentence:
  - label: {number}
  - style: ''
- sentence_atom:
  - hidden: True
  - label: {number}
  - level: 1
  - style: ''
- subphrase:
  - hidden: True
  - label: {number}
  - style: ''
- word:
  - features: pdp vs vt
  - featuresBare: lex:gloss
writing: hbo

In [5]:

import collections

The following snippet is thanks to Marek Polášek:

In [6]:

A.indent(reset=True)
data = collections.defaultdict(list)

allTypes = F.otype.all

for prop in Fall():
    print(prop)
    if prop == "otype":
        continue
    for t in allTypes:
        if len(Fs(prop).freqList({t})) > 0:
            data[prop].append(t)

    print(f"\t{', '.join(data[prop])}")

A.info("done")

book
	book, chapter, verse
book@am
	book
book@ar
	book
book@bn
	book
book@da
	book
book@de
	book
book@el
	book
book@en
	book
book@es
	book
book@fa
	book
book@fr
	book
book@he
	book
book@hi
	book
book@id
	book
book@ja
	book
book@ko
	book
book@la
	book
book@nl
	book
book@pa
	book
book@pt
	book
book@ru
	book
book@sw
	book
book@syc
	book
book@tr
	book
book@ur
	book
book@yo
	book
book@zh
	book
chapter
	chapter, verse
code
	clause_atom
det
	phrase, phrase_atom
domain
	clause
freq_lex
	lex, word
function
	phrase
g_cons
	word
g_cons_utf8
	word
g_lex
	word
g_lex_utf8
	word
g_word
	word
g_word_utf8
	word
gloss
	lex, word
gn
	word
label
	verse, half_verse
language
	lex, word
lex
	lex, word
lex_utf8
	lex, word
ls
	lex, word
nametype
	lex, word
nme
	word
nu
	word
number
	sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, word
otype
pargr
	clause_atom
pdp
	word
pfm
	word
phono
	word
phono_trailer
	word
prs
	word
prs_gn
	word
prs_nu
	word
prs_ps
	word
ps
	word
qere
	word
qere_trailer
	word
qere_trailer_utf8
	word
qere_utf8
	word
rank_lex
	lex, word
rela
	clause, phrase, phrase_atom, subphrase
sp
	lex, word
st
	word
tab
	clause_atom
trailer
	word
trailer_utf8
	word
txt
	clause
typ
	clause, clause_atom, phrase, phrase_atom
uvf
	word
vbe
	word
vbs
	word
verse
	verse
voc_lex
	lex, word
voc_lex_utf8
	lex, word
vs
	word
vt
	word
    21s done

Here I explore if it can be done a bit faster

In [7]:

A.indent(reset=True)

data1 = collections.defaultdict(list)

allProps = [p for p in Fall() if p != "otype"]

for t in F.otype.all:
    print(t)
    nodes = F.otype.s(t)

    for prop in allProps:
        if any(Fs(prop).v(n) is not None for n in nodes):
            data1[prop].append(t)

for prop in allProps:
    print(prop)
    print(f"\t{', '.join(data1[prop])}")

A.info("done")

book
chapter
lex
verse
half_verse
sentence
sentence_atom
clause
clause_atom
phrase
phrase_atom
subphrase
word
book
	book, chapter, verse
book@am
	book
book@ar
	book
book@bn
	book
book@da
	book
book@de
	book
book@el
	book
book@en
	book
book@es
	book
book@fa
	book
book@fr
	book
book@he
	book
book@hi
	book
book@id
	book
book@ja
	book
book@ko
	book
book@la
	book
book@nl
	book
book@pa
	book
book@pt
	book
book@ru
	book
book@sw
	book
book@syc
	book
book@tr
	book
book@ur
	book
book@yo
	book
book@zh
	book
chapter
	chapter, verse
code
	clause_atom
det
	phrase, phrase_atom
domain
	clause
freq_lex
	lex, word
function
	phrase
g_cons
	word
g_cons_utf8
	word
g_lex
	word
g_lex_utf8
	word
g_word
	word
g_word_utf8
	word
gloss
	lex, word
gn
	word
label
	verse, half_verse
language
	lex, word
lex
	lex, word
lex_utf8
	lex, word
ls
	lex, word
nametype
	lex, word
nme
	word
nu
	word
number
	sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, word
pargr
	clause_atom
pdp
	word
pfm
	word
phono
	word
phono_trailer
	word
prs
	word
prs_gn
	word
prs_nu
	word
prs_ps
	word
ps
	word
qere
	word
qere_trailer
	word
qere_trailer_utf8
	word
qere_utf8
	word
rank_lex
	lex, word
rela
	clause, phrase, phrase_atom, subphrase
sp
	lex, word
st
	word
tab
	clause_atom
trailer
	word
trailer_utf8
	word
txt
	clause
typ
	clause, clause_atom, phrase, phrase_atom
uvf
	word
vbe
	word
vbs
	word
verse
	verse
voc_lex
	lex, word
voc_lex_utf8
	lex, word
vs
	word
vt
	word
    14s done

In [8]:

data == data1

Out[8]:

True

Marek responded with this algorithm:

In [13]:

A.indent(reset=True)

data3 = collections.defaultdict(list)
nodes_type = collections.defaultdict(list)

allProps = [p for p in Fall() if p != "otype"]
for t in F.otype.all:
    print(t)
    nodes_type[t] = F.otype.s(t)

for prop in allProps:
    nodes_with_prop = set([i[0] for i in Fs(prop).items()])
    for t in F.otype.all:
        if len(nodes_with_prop.intersection(nodes_type[t]))>0:
            data3[prop].append(t)
            
for prop in allProps:
    print(prop)
    print(f"\t{', '.join(data3[prop])}")

A.info("done")

book
chapter
lex
verse
half_verse
sentence
sentence_atom
clause
clause_atom
phrase
phrase_atom
subphrase
word
book
	book, chapter, verse
book@am
	book
book@ar
	book
book@bn
	book
book@da
	book
book@de
	book
book@el
	book
book@en
	book
book@es
	book
book@fa
	book
book@fr
	book
book@he
	book
book@hi
	book
book@id
	book
book@ja
	book
book@ko
	book
book@la
	book
book@nl
	book
book@pa
	book
book@pt
	book
book@ru
	book
book@sw
	book
book@syc
	book
book@tr
	book
book@ur
	book
book@yo
	book
book@zh
	book
chapter
	chapter, verse
code
	clause_atom
det
	phrase, phrase_atom
domain
	clause
freq_lex
	lex, word
function
	phrase
g_cons
	word
g_cons_utf8
	word
g_lex
	word
g_lex_utf8
	word
g_word
	word
g_word_utf8
	word
gloss
	lex, word
gn
	word
label
	verse, half_verse
language
	lex, word
lex
	lex, word
lex_utf8
	lex, word
ls
	lex, word
nametype
	lex, word
nme
	word
nu
	word
number
	sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, word
pargr
	clause_atom
pdp
	word
pfm
	word
phono
	word
phono_trailer
	word
prs
	word
prs_gn
	word
prs_nu
	word
prs_ps
	word
ps
	word
qere
	word
qere_trailer
	word
qere_trailer_utf8
	word
qere_utf8
	word
rank_lex
	lex, word
rela
	clause, phrase, phrase_atom, subphrase
sp
	lex, word
st
	word
tab
	clause_atom
trailer
	word
trailer_utf8
	word
txt
	clause
typ
	clause, clause_atom, phrase, phrase_atom
uvf
	word
vbe
	word
vbs
	word
verse
	verse
voc_lex
	lex, word
voc_lex_utf8
	lex, word
vs
	word
vt
	word
  2.24s done

In [10]:

data3 == data1

Out[10]:

True

Marvellous!

Now I make that code a tad more pythonic.

In [15]:

A.indent(reset=True)

data4 = collections.defaultdict(list)
nodes_type = collections.defaultdict(list)

allProps = [p for p in Fall() if p != "otype"]
allTypes = F.otype.all

for t in allTypes:
    print(t)
    nodes_type[t] = set(F.otype.s(t))

for prop in allProps:
    nodes_with_prop = {i[0] for i in Fs(prop).items()}
    for t in allTypes:
        if len(nodes_with_prop & nodes_type[t]) > 0:
            data4[prop].append(t)

for prop in allProps:
    print(prop)
    print(f"\t{', '.join(data4[prop])}")

A.info("done")

book
chapter
lex
verse
half_verse
sentence
sentence_atom
clause
clause_atom
phrase
phrase_atom
subphrase
word
book
	book, chapter, verse
book@am
	book
book@ar
	book
book@bn
	book
book@da
	book
book@de
	book
book@el
	book
book@en
	book
book@es
	book
book@fa
	book
book@fr
	book
book@he
	book
book@hi
	book
book@id
	book
book@ja
	book
book@ko
	book
book@la
	book
book@nl
	book
book@pa
	book
book@pt
	book
book@ru
	book
book@sw
	book
book@syc
	book
book@tr
	book
book@ur
	book
book@yo
	book
book@zh
	book
chapter
	chapter, verse
code
	clause_atom
det
	phrase, phrase_atom
domain
	clause
freq_lex
	lex, word
function
	phrase
g_cons
	word
g_cons_utf8
	word
g_lex
	word
g_lex_utf8
	word
g_word
	word
g_word_utf8
	word
gloss
	lex, word
gn
	word
label
	verse, half_verse
language
	lex, word
lex
	lex, word
lex_utf8
	lex, word
ls
	lex, word
nametype
	lex, word
nme
	word
nu
	word
number
	sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, word
pargr
	clause_atom
pdp
	word
pfm
	word
phono
	word
phono_trailer
	word
prs
	word
prs_gn
	word
prs_nu
	word
prs_ps
	word
ps
	word
qere
	word
qere_trailer
	word
qere_trailer_utf8
	word
qere_utf8
	word
rank_lex
	lex, word
rela
	clause, phrase, phrase_atom, subphrase
sp
	lex, word
st
	word
tab
	clause_atom
trailer
	word
trailer_utf8
	word
txt
	clause
typ
	clause, clause_atom, phrase, phrase_atom
uvf
	word
vbe
	word
vbs
	word
verse
	verse
voc_lex
	lex, word
voc_lex_utf8
	lex, word
vs
	word
vt
	word
  1.73s done

In [14]:

data4 == data3

Out[14]:

True

This takes 25% off the execution time.

The crux of Marek's optimization is to start with the items in the feature, which for most features is smaller than the number of nodes of a given type.

Knowing that, I wonder whether I can use the iterator any again instead of set construction? Will that improve the speed? No set has to be constructed. But set construction is very fast. Let's see.

In [20]:

A.indent(reset=True)

data5 = collections.defaultdict(list)
nodes_type = collections.defaultdict(list)

allProps = [p for p in Fall() if p != "otype"]
allTypes = F.otype.all

for t in allTypes:
    print(t)
    nodes_type[t] = set(F.otype.s(t))

for prop in allProps:
    nodes_with_prop = [i[0] for i in Fs(prop).items()]
    for t in allTypes:
        these_nodes_type = nodes_type[t]
        if any(n in these_nodes_type for n in nodes_with_prop):
            data5[prop].append(t)

for prop in allProps:
    print(prop)
    print(f"\t{', '.join(data5[prop])}")

A.info("done")

book
chapter
lex
verse
half_verse
sentence
sentence_atom
clause
clause_atom
phrase
phrase_atom
subphrase
word
book
	book, chapter, verse
book@am
	book
book@ar
	book
book@bn
	book
book@da
	book
book@de
	book
book@el
	book
book@en
	book
book@es
	book
book@fa
	book
book@fr
	book
book@he
	book
book@hi
	book
book@id
	book
book@ja
	book
book@ko
	book
book@la
	book
book@nl
	book
book@pa
	book
book@pt
	book
book@ru
	book
book@sw
	book
book@syc
	book
book@tr
	book
book@ur
	book
book@yo
	book
book@zh
	book
chapter
	chapter, verse
code
	clause_atom
det
	phrase, phrase_atom
domain
	clause
freq_lex
	lex, word
function
	phrase
g_cons
	word
g_cons_utf8
	word
g_lex
	word
g_lex_utf8
	word
g_word
	word
g_word_utf8
	word
gloss
	lex, word
gn
	word
label
	verse, half_verse
language
	lex, word
lex
	lex, word
lex_utf8
	lex, word
ls
	lex, word
nametype
	lex, word
nme
	word
nu
	word
number
	sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, word
pargr
	clause_atom
pdp
	word
pfm
	word
phono
	word
phono_trailer
	word
prs
	word
prs_gn
	word
prs_nu
	word
prs_ps
	word
ps
	word
qere
	word
qere_trailer
	word
qere_trailer_utf8
	word
qere_utf8
	word
rank_lex
	lex, word
rela
	clause, phrase, phrase_atom, subphrase
sp
	lex, word
st
	word
tab
	clause_atom
trailer
	word
trailer_utf8
	word
txt
	clause
typ
	clause, clause_atom, phrase, phrase_atom
uvf
	word
vbe
	word
vbs
	word
verse
	verse
voc_lex
	lex, word
voc_lex_utf8
	lex, word
vs
	word
vt
	word
  8.12s done

In [21]:

data5 == data4

Out[21]:

True

Important lesson: the overhead of the set construction is far less than using any.

In the set intersection there is no loop with python code involved, so it proceeds with C-speed.

In the any iterator there is a Python expression (n in these_nodes_type) and that will be executed at Python speed. There we loose the time.

So the second crux of Marek's method is to use set intersection instead of an iterator.

New function `showFeatureTypes()`¶

In [40]:

A.featureTypes()

feature	node types
book	book, chapter, verse
book@am	book
book@ar	book
book@bn	book
book@da	book
book@de	book
book@el	book
book@en	book
book@es	book
book@fa	book
book@fr	book
book@he	book
book@hi	book
book@id	book
book@ja	book
book@ko	book
book@la	book
book@nl	book
book@pa	book
book@pt	book
book@ru	book
book@sw	book
book@syc	book
book@tr	book
book@ur	book
book@yo	book
book@zh	book
chapter	chapter, verse
code	clause_atom
det	phrase, phrase_atom
domain	clause
freq_lex	lex, word
function	phrase
g_cons	word
g_cons_utf8	word
g_lex	word
g_lex_utf8	word
g_word	word
g_word_utf8	word
gloss	lex, word
gn	word
label	verse, half_verse
language	lex, word
lex	lex, word
lex_utf8	lex, word
ls	lex, word
nametype	lex, word
nme	word
nu	word
number	sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, word
pargr	clause_atom
pdp	word
pfm	word
phono	word
phono_trailer	word
prs	word
prs_gn	word
prs_nu	word
prs_ps	word
ps	word
qere	word
qere_trailer	word
qere_trailer_utf8	word
qere_utf8	word
rank_lex	lex, word
rela	clause, phrase, phrase_atom, subphrase
sp	lex, word
st	word
tab	clause_atom
trailer	word
trailer_utf8	word
txt	clause
typ	clause, clause_atom, phrase, phrase_atom
uvf	word
vbe	word
vbs	word
verse	verse
voc_lex	lex, word
voc_lex_utf8	lex, word
vs	word
vt	word

To get this overview in a dict, call A.featureTypes(show=False)

In [37]:

A.featureTypes(show=False)

Out[37]:

{'book': ['book', 'chapter', 'verse'],
 'book@am': ['book'],
 'book@ar': ['book'],
 'book@bn': ['book'],
 'book@da': ['book'],
 'book@de': ['book'],
 'book@el': ['book'],
 'book@en': ['book'],
 'book@es': ['book'],
 'book@fa': ['book'],
 'book@fr': ['book'],
 'book@he': ['book'],
 'book@hi': ['book'],
 'book@id': ['book'],
 'book@ja': ['book'],
 'book@ko': ['book'],
 'book@la': ['book'],
 'book@nl': ['book'],
 'book@pa': ['book'],
 'book@pt': ['book'],
 'book@ru': ['book'],
 'book@sw': ['book'],
 'book@syc': ['book'],
 'book@tr': ['book'],
 'book@ur': ['book'],
 'book@yo': ['book'],
 'book@zh': ['book'],
 'chapter': ['chapter', 'verse'],
 'code': ['clause_atom'],
 'det': ['phrase', 'phrase_atom'],
 'domain': ['clause'],
 'freq_lex': ['lex', 'word'],
 'function': ['phrase'],
 'g_cons': ['word'],
 'g_cons_utf8': ['word'],
 'g_lex': ['word'],
 'g_lex_utf8': ['word'],
 'g_word': ['word'],
 'g_word_utf8': ['word'],
 'gloss': ['lex', 'word'],
 'gn': ['word'],
 'label': ['verse', 'half_verse'],
 'language': ['lex', 'word'],
 'lex': ['lex', 'word'],
 'lex_utf8': ['lex', 'word'],
 'ls': ['lex', 'word'],
 'nametype': ['lex', 'word'],
 'nme': ['word'],
 'nu': ['word'],
 'number': ['sentence',
  'sentence_atom',
  'clause',
  'clause_atom',
  'phrase',
  'phrase_atom',
  'word'],
 'pargr': ['clause_atom'],
 'pdp': ['word'],
 'pfm': ['word'],
 'phono': ['word'],
 'phono_trailer': ['word'],
 'prs': ['word'],
 'prs_gn': ['word'],
 'prs_nu': ['word'],
 'prs_ps': ['word'],
 'ps': ['word'],
 'qere': ['word'],
 'qere_trailer': ['word'],
 'qere_trailer_utf8': ['word'],
 'qere_utf8': ['word'],
 'rank_lex': ['lex', 'word'],
 'rela': ['clause', 'phrase', 'phrase_atom', 'subphrase'],
 'sp': ['lex', 'word'],
 'st': ['word'],
 'tab': ['clause_atom'],
 'trailer': ['word'],
 'trailer_utf8': ['word'],
 'txt': ['clause'],
 'typ': ['clause', 'clause_atom', 'phrase', 'phrase_atom'],
 'uvf': ['word'],
 'vbe': ['word'],
 'vbs': ['word'],
 'verse': ['verse'],
 'voc_lex': ['lex', 'word'],
 'voc_lex_utf8': ['lex', 'word'],
 'vs': ['word'],
 'vt': ['word']}

Lexemes¶

Different lexemes can have the same value for feature lex.

What is happening?

In [15]:

for lx in F.otype.s("lex"):
    lex = F.lex.v(lx)
    lan = F.language.v(lx)
    
    if lex[0:3] == "NHR":
        print(f"{lx=} {lex=} {lan=}")

lx=1437743 lex='NHR/' lan='Hebrew'
lx=1442740 lex='NHR[' lan='Hebrew'
lx=1443420 lex='NHR=[' lan='Hebrew'
lx=1444637 lex='NHRH/' lan='Hebrew'
lx=1445749 lex='NHR/' lan='Aramaic'

The lexemes in question belong to a different language!

`freq` and `rank` features¶

In [27]:

from collections import Counter
import pandas as pd

In [29]:

lex_count = {
    lang: Counter([F.lex.v(n) for n in F.otype.s("word") if F.language.v(n) == lang])
    for lang in ("Hebrew", "Aramaic")
}
data = []

for i in F.otype.s("lex"):
    lang = F.language.v(i)
    data.append(
        {
            "id": i,
            "lex": F.lex.v(i),
            "lang": lang,
            "freq_lex": F.freq_lex.v(i),
            "rank_lex": F.rank_lex.v(i),
            "freq_cnt": lex_count[lang][F.lex.v(i)],
        }
    )

In [30]:

df = pd.DataFrame(data)
df = df.set_index(["id"])
df.sort_values("rank_lex")

Out[30]:

	lex	lang	freq_lex	rank_lex	freq_cnt
id
1443538	W	Aramaic	731	0	731
1437609	W	Hebrew	50272	0	50272
1443534	L	Aramaic	378	1	378
1437607	H	Hebrew	30386	1	30386
1437629	L	Hebrew	20069	2	20069
...	...	...	...	...	...
1444624	>LP=[	Hebrew	1	5713	1
1444625	JWY>T/	Hebrew	1	5713	1
1441980	>MH====/	Hebrew	1	5713	1
1444616	MDXPH/	Hebrew	1	5713	1
1446831	JCC/	Hebrew	1	5713	1

9230 rows × 5 columns

In [14]:

for (v, f) in F.lex.freqList(nodeTypes={"word"})[0:10]:
    print(f"{v} {f}")

W 51003
H 30392
L 20447
B 15768
>T 10987
MN 7681
JHWH/ 6828
<L 5870
>L 5521
>CR 5500

In [20]:

wa = 1443538
wh = 1437609

In [21]:

for lx in (wa, wh):
    print(f"{F.lex.v(lx)} {F.language.v(lx)}")

W Aramaic
W Hebrew

In [22]:

F.freq_lex.v(wa)

Out[22]:

In [23]:

F.freq_lex.v(wh)

Out[23]:

In [24]:

F.rank_lex.v(wa)

Out[24]:

In [25]:

F.rank_lex.v(wh)

Out[25]:

In [ ]:

Information about features and types¶

New function showFeatureTypes()¶

Lexemes¶

freq and rank features¶

New function `showFeatureTypes()`¶

`freq` and `rank` features¶