You might want to consider the start of this tutorial.

Short introductions to other TF datasets:

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
from tf.app import use
In [3]:
A = use("etcbc/bhsa", hoist=globals())
TF-app: ~/text-fabric-data/etcbc/bhsa/app
data: ~/text-fabric-data/etcbc/bhsa/tf/2021
data: ~/text-fabric-data/etcbc/phono/tf/2021
data: ~/text-fabric-data/etcbc/parallels/tf/2021
This is Text-Fabric 9.3.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

122 features found and 0 ignored
Text-Fabric: Text-Fabric API 9.3.2, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:
Parallel Passages
int
🆗 links between similar passages
author:
BHSA Data: Constantijn Sikkel; Parallels Notebook: Dirk Roorda, Martijn Naaijer
coreData:
BHSA
dateWritten:
2021-12-09T14:40:46Z
provenance:
Parallels notebook, see https://github.com/ETCBC/parallels
version:
2021
writtenBy:
Text-Fabric
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
str
✅ book name in Latin (Genesis; Numeri; Reges1; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:55Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ book name in amharic (ኣማርኛ)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:20:27Z
email:
shebanq@ancient-data.org
encoders:
Dirk Roorda (TF)
language:
ኣማርኛ
languageCode:
am
languageEnglish:
amharic
provenance:
book names from wikipedia and other sources
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
✅ chapter number (1; 2; 3; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:55Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
✅ identifier of a clause atom relationship (0; 74; 367; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:56Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
det
str
✅ determinedness of phrase(atom) (det; und; NA.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:56Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:57Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
✅ frequency of lexemes
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:24:45Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
computed on the basis of the ETCBC core set of features
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:57Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:57Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:58Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:58Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:59Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:04Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:04Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
🆗 english translation of lexeme (beginning create god(s))
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:13Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
gn
str
✅ grammatical gender (m; f; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:05Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:06Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ of word or lexeme (Hebrew; Aramaic.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:13Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
lex
str
✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:14Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
ls
str
✅ lexical set, subclassification of part-of-speech (card; ques; mult)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
⚠️ named entity type (pers; mens; gens; topo; ppde.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
nme
str
✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:08Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
nu
str
✅ grammatical number (sg; du; pl; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:08Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
✅ sequence number of an object within its context
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:09Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:22:50Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional paragraph file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
pdp
str
✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:10Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
pfm
str
✅ preformative consonantal-transliterated (absent; n/a; J, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:11Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
prs
str
✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:11Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ pronominal suffix gender (m; f; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:11Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ pronominal suffix number (sg; du; pl; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:12Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ pronominal suffix person (p1; p2; p3; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:12Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
ps
str
✅ grammatical person (p1; p2; p3; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:12Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ word pointed-transliterated masoretic reading correction
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ interword material -pointed-transliterated (Masoretic correction)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ interword material -pointed-transliterated (Masoretic correction)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ word pointed-Hebrew masoretic reading correction
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
✅ ranking of lexemes based on freqnuecy
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:24:46Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
computed on the basis of the ETCBC core set of features
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:13Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
sp
str
✅ part-of-speech (art; verb; subs; nmpr, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:16Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
st
str
✅ state of a noun (a (absolute); c (construct); e (emphatic).)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:14Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
tab
int
✅ clause atom: its level in the linguistic embedding
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:16Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ interword material pointed-transliterated (& 00 05 00_P ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:01Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ interword material pointed-Hebrew (־ ׃)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:01Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
txt
str
✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:16Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
typ
str
✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:16Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
uvf
str
✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:17Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vbe
str
✅ verbal ending consonantal-transliterated (n/a; W; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:17Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vbs
str
✅ root formation consonantal-transliterated (absent; n/a; H; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:17Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
✅ verse number
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:18Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:16Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:17Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vs
str
✅ verbal stem (qal; piel; hif; apel; pael)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:18Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vt
str
✅ verbal tense (perf; impv; wayq; infc)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:18Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
none
✅ linguistic dependency between textual objects
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:22Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
none
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:17Z
email:
shebanq@ancient-data.org
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
Phonetic Transcriptions
str
🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)
author:
BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda
coreData:
BHSA
dateWritten:
2021-12-09T14:25:55Z
provenance:
computed by the phono notebook, see https://github.com/ETCBC/phono
version:
2021
writtenBy:
Text-Fabric
str
🆗 interword material in phonological transcription
author:
BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda
coreData:
BHSA
dateWritten:
2021-12-09T14:25:55Z
provenance:
computed by the phono notebook, see https://github.com/ETCBC/phono
version:
2021
writtenBy:
Text-Fabric
Text-Fabric API: names N F E L T S C TF directly usable

Gaps and spans

Searches often do not deliver the results you expect. Besides typos, lack of familiarity with the template formalism and bugs in the system, there is another cause: difficult semantics of the data.

Most users reason about phrases, clauses and sentences as if they are consecutive blocks of words. But in the BHSA this is not the case: each of these objects may have gaps.

Most of the time, verse boundaries coincide with the boundaries of sentences, clauses, and phrases. But not always, there are verse spanning sentences.

Note These phenomena may wreak havoc with your intuitive reasoning about what search templates should deliver. Query templates do not require the objects to be consecutive and still they make sense. But that might not be your sense, unless you Mind the gap!

We are going to show these issues in depth.

Gaps

TF-search has no primitives to deal with gaps directly. Nodes correspond to textual objects such as words, phrases, clauses, verses, books. Usually these are consecutive sequences of one or more words, but in theory they can be arbitrary sets of slots.

And, as far as the BHSA corpus is concerned, in practice too. If we look at phrases, then the overwhelming majority is consecutive, without gaps, But there is also a substantial amount of phrases with gaps.

People that are familiar with MQL (see fromMQL) may remember that in MQL you can search for a gap. The MQL query

SELECT ALL OBJECTS WHERE

[phrase FOCUS
    [word lex='L']
    [gap]
]

looks for a phrase with a gap in it (i.e. one or more consecutive words between the start and the end of the phrase that do not belong to the phrase). The query then asks additionally for those gap-containing phrases that have a certain word in front of the gap.

We want this too!

Find the gap

We start with a query that aims to get the same results as the MQL query above.

In our template, we require that there is a word wPreGap in the phrase that is just before the gap, a word wGap that comes right after, so it is in the gap, and hence does not belong to the phrase. But this all must happen before the last word wLast of the phrase.

In [4]:
query = """
verse
    p:phrase
      wPreGap:word lex=L
      wLast:word
      :=

wGap:word
wPreGap <: wGap
wGap < wLast
p || wGap
"""
In [5]:
results = A.search(query)
  0.40s 12 results

Nice and quick. Let's see the results.

In [6]:
A.table(results, skipCols="1")
npphrasewordwordword
1Genesis 17:7לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ לְךָ֙ אַחֲרֶֽיךָ׃ לֵֽ
2Genesis 28:4לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ לְךָ֙ אִתָּ֑ךְ אֶת־
3Exodus 30:21לָהֶ֧ם לֹ֥ו וּלְזַרְעֹ֖ו לָהֶ֧ם זַרְעֹ֖ו חָק־
4Leviticus 25:6לָכֶם֙ לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ לָכֶם֙ תֹושָׁ֣בְךָ֔ לְ
5Numbers 20:15לָ֛נוּ וְלַאֲבֹתֵֽינוּ׃ לָ֛נוּ אֲבֹתֵֽינוּ׃ מִצְרַ֖יִם
6Numbers 32:33לָהֶ֣ם׀ לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף לָהֶ֣ם׀ יֹוסֵ֗ף מֹשֶׁ֡ה
7Deuteronomy 1:36לֹֽו־וּלְבָנָ֑יו לֹֽו־בָנָ֑יו אֶתֵּ֧ן
8Deuteronomy 26:11לְךָ֛ וּלְבֵיתֶ֑ךָ לְךָ֛ בֵיתֶ֑ךָ יְהוָ֥ה
91_Samuel 25:31לְךָ֡ לַאדֹנִ֗י לְךָ֡ אדֹנִ֗י לְ
102_Kings 25:24לָהֶ֤ם וּלְאַנְשֵׁיהֶ֔ם לָהֶ֤ם אַנְשֵׁיהֶ֔ם גְּדַלְיָ֨הוּ֙
11Jeremiah 40:9לָהֶ֜ם וּלְאַנְשֵׁיהֶ֣ם לָהֶ֜ם אַנְשֵׁיהֶ֣ם גְּדַלְיָ֨הוּ
12Daniel 9:8לָ֚נוּ לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ לָ֚נוּ אֲבֹתֵ֑ינוּ בֹּ֣שֶׁת

Let's color the word in the gap differently.

In [7]:
A.displaySetup(
    skipCols="1", colorMap={1: "aqua", 2: "yellow", 4: "magenta"}, condenseType="clause"
)
In [8]:
A.table(results, condensed=False)
npphrasewordwordword
1Genesis 17:7לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ לְךָ֙ אַחֲרֶֽיךָ׃ לֵֽ
2Genesis 28:4לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ לְךָ֙ אִתָּ֑ךְ אֶת־
3Exodus 30:21לָהֶ֧ם לֹ֥ו וּלְזַרְעֹ֖ו לָהֶ֧ם זַרְעֹ֖ו חָק־
4Leviticus 25:6לָכֶם֙ לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ לָכֶם֙ תֹושָׁ֣בְךָ֔ לְ
5Numbers 20:15לָ֛נוּ וְלַאֲבֹתֵֽינוּ׃ לָ֛נוּ אֲבֹתֵֽינוּ׃ מִצְרַ֖יִם
6Numbers 32:33לָהֶ֣ם׀ לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף לָהֶ֣ם׀ יֹוסֵ֗ף מֹשֶׁ֡ה
7Deuteronomy 1:36לֹֽו־וּלְבָנָ֑יו לֹֽו־בָנָ֑יו אֶתֵּ֧ן
8Deuteronomy 26:11לְךָ֛ וּלְבֵיתֶ֑ךָ לְךָ֛ בֵיתֶ֑ךָ יְהוָ֥ה
91_Samuel 25:31לְךָ֡ לַאדֹנִ֗י לְךָ֡ אדֹנִ֗י לְ
102_Kings 25:24לָהֶ֤ם וּלְאַנְשֵׁיהֶ֔ם לָהֶ֤ם אַנְשֵׁיהֶ֔ם גְּדַלְיָ֨הוּ֙
11Jeremiah 40:9לָהֶ֜ם וּלְאַנְשֵׁיהֶ֣ם לָהֶ֜ם אַנְשֵׁיהֶ֣ם גְּדַלְיָ֨הוּ
12Daniel 9:8לָ֚נוּ לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ לָ֚נוּ אֲבֹתֵ֑ינוּ בֹּ֣שֶׁת
In [9]:
A.show(results, end=3, condensed=False)

result 1

clause
phrase
lex=L
phrase
phrase

result 2

clause
phrase
lex=W
phrase
phrase
phrase

result 3

clause
phrase
lex=W
phrase
phrase
phrase
lex=XQ/
phrase
lex=W
lex=L
phrase
In [10]:
A.displayReset()

All gapped phrases

These were particular gaps. Now we want to get all gapped phrases.

We can just lift the special requirement that the preGapWord has to satisfy a special lexical condition.

In [11]:
query = """
p:phrase
  wPreGap:word
  wLast:word
  :=

wGap:word
wPreGap <: wGap
wGap < wLast

p || wGap
"""
In [12]:
results = A.search(query)
  0.91s 716 results

Not too bad! We could wait for it. Here are some results.

In [13]:
A.table(results, start=5, end=10)
npphrasewordwordword
5Genesis 2:25שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו שְׁנֵיהֶם֙ אִשְׁתֹּ֑ו עֲרוּמִּ֔ים
6Genesis 4:4הֶ֨בֶל גַם־ה֛וּא הֶ֨בֶל ה֛וּא הֵבִ֥יא
7Genesis 7:8מִן־הַבְּהֵמָה֙ הַטְּהֹורָ֔ה וּמִן־הַ֨בְּהֵמָ֔ה וּמִ֨ן־הָעֹ֔וף וְכֹ֥ל בְּהֵמָ֔ה כֹ֥ל אֲשֶׁ֥ר
8Genesis 7:14הֵ֜מָּה וְכָל־הַֽחַיָּ֣ה לְמִינָ֗הּ וְכָל־הַבְּהֵמָה֙ לְמִינָ֔הּ וְכָל־הָרֶ֛מֶשׂ לְמִינֵ֑הוּ וְכָל־הָעֹ֣וף לְמִינֵ֔הוּ כֹּ֖ל צִפֹּ֥ור כָּל־כָּנָֽף׃ רֶ֛מֶשׂ כָּנָֽף׃ הָ
9Genesis 7:21כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃ בָּשָׂ֣ר׀ אָדָֽם׃ הָ
10Genesis 7:21כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃ שֶּׁ֖רֶץ אָדָֽם׃ הַ

If a phrase has multiple gaps, we encounter it multiple times in our results.

We show the two results in Genesis 7:21.

Excursion

Sometimes there are two subphrases with exactly the same words in it. They only differ in their values for the feature rela. Here we have such a case.

We do have to show the subphrases, though.

Some types are hidden, let's find out which ones:

In [15]:
A.displayShow()

current display options

1. baseTypes

  1. word

</details>

2. colorMap

None

</details>

3. condenseType

verse

</details>

4. condensed

False

</details>

5. end

None

</details>

6. extraFeatures

  1. ()
  2. {}

</details>

7. fmt

None

</details>

8. full

False

</details>

9. hiddenTypes

  1. clause_atom
  2. half_verse
  3. phrase_atom
  4. sentence_atom
  5. subphrase

</details>

10. hideTypes

True

</details>

11. highlights

{}

</details>

12. lineNumbers

None

</details>

13. noneValues

  1. none
  2. unknown
  3. None
  4. NA

</details>

14. plainGaps

True

</details>

15. prettyTypes

True

</details>

16. queryFeatures

True

</details>

17. showGraphics

None

</details>

18. skipCols

set()

</details>

19. standardFeatures

False

</details>

20. start

None

</details>

21. suppress

set()

</details>

22. tupleFeatures

    • 0
    • ()
    • 1
    • ()
    • 2
    • ()
    • 3
    • ()

</details>

23. withNodes

False

</details>

24. withPassage

True

</details>

25. withTypes

False

</details> </details>

Let's pass a different set of hidden types:

In [16]:
highlights = {1301449: "lightsalmon", 1301450: "lightblue"}
A.pretty(
    1301452,
    withNodes=True,
    extraFeatures="rela",
    highlights=highlights,
    hiddenTypes="clause_atom half_verse phrase_atom",
)

The red on has a feature rela='par', the blue one not.

Two nodes with the same node type and the same slots. Yet: different nodes, different feature annotations.

At the moment I do not know why the encoders of the BHSA have chosen to do this.

If we want just the phrases, and only once, we can run the query in shallow mode, see advanced:

In [17]:
gapQueryResults = A.search(query, shallow=True)
  1.11s 672 results

A different query

We can make an equivalent query to get the gaps.

In [18]:
query = """
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap
"""

Experience has shown that this is a slow query, so we handle it with care.

In [19]:
S.study(query)
S.showPlan(details=True)
  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.09s Constraining search space with 7 relations ...
  0.40s 	2 edges thinned
  0.40s Setting up retrieval plan with strategy small_choice_multi ...
  0.43s Ready to deliver results from 1186199 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 4 objects and 6 relations
Results are instantiations of the following objects:
node  0-phrase                                        253203   choices
node  1-word                                          253203   choices
node  2-word                                          253203   choices
node  3-word                                          426590   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-phrase        253203   choices
edge        0-phrase           [[     2-word               1.0 choices
edge        0-phrase           :=     2-word               0   choices
edge        0-phrase           =:     1-word               1.0 choices (thinned)
edge        1-word             ]]     0-phrase             0   choices
edge      1,2-word            <,>     3-word           21329.5 choices
edge        3-word             ||     0-phrase             0   choices
  0.44s The results are connected to the original search template as follows:
 0     
 1 R0  p:phrase
 2 R1      =: wFirst:word
 3 R2      wLast:word
 4         :=
 5     
 6 R3  wGap:word
 7     wFirst < wGap
 8     wLast > wGap
 9     
10     p || wGap
11     
In [20]:
S.count(progress=1, limit=4)
  0.00s Counting results per 1 up to 4 ...
   |     5.45s 1
   |     5.45s 2
   |     5.45s 3
   |     5.45s 4
  5.45s Done: 5 results

This is a good example of a query that is slow to deliver even its first result. And that is bad, because it is such a straightforward query.

Why is this one so slow, while the previous one went so smoothly?

The crucial thing is the wGap word. In the latter template, wGap is not embedded in anything. It is constrained by wFirst < wGap and wGap < wLast. However, the way the search strategy works is by examining all possibilities for wFirst < wGap and only then checking whether wGap < wLast. The algorithm cannot check both conditions at the same time.

With embedding relations, things are better. Text-Fabric is heavily optimized to deal with embedding relationships.

In the former template, we see that the wGap is required to be adjacent to wPreGap, and this one is embedded in the phrase. Hence there are few cases to consider for wPreGap, and per instance there is only one wGap.

Lesson Try to prevent the use of free floating nodes in your template that become constrained by other spatial relationships than embedding.

To the rescue

The former template had it right. Can we rescue the latter template?

We can assume that the phrase and the gap each contain a word in one and the same verse. Note that phrase and gap may belong to different clauses and sentences. We assume that a phrase cannot belong to more than two verses, so either the first or the last word of the phrase is in the same verse as a word in the gap.

In [21]:
query = """
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap

v:verse

v [[ wFirst
v [[ wGap
"""
In [22]:
S.study(query)
S.showPlan(details=True)
S.count(progress=100, limit=3000)
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.10s Constraining search space with 9 relations ...
  0.41s 	2 edges thinned
  0.42s Setting up retrieval plan with strategy small_choice_multi ...
  0.45s Ready to deliver results from 1209412 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 5 objects and 8 relations
Results are instantiations of the following objects:
node  0-phrase                                        253203   choices
node  1-word                                          253203   choices
node  2-word                                          253203   choices
node  3-word                                          426590   choices
node  4-verse                                          23213   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  4-verse          23213   choices
edge        4-verse            [[     1-word              10.2 choices
edge        1-word             ]]     0-phrase             1.0 choices
edge        0-phrase           =:     1-word               0   choices
edge        0-phrase           :=     2-word               1.0 choices (thinned)
edge        0-phrase           [[     2-word               0   choices
edge        4-verse            [[     3-word              15.8 choices
edge      1,2-word            <,>     3-word               0   choices
edge        3-word             ||     0-phrase             0   choices
  0.49s The results are connected to the original search template as follows:
 0     
 1 R0  p:phrase
 2 R1      =: wFirst:word
 3 R2      wLast:word
 4         :=
 5     
 6 R3  wGap:word
 7     wFirst < wGap
 8     wLast > wGap
 9     
10     p || wGap
11     
12 R4  v:verse
13     
14     v [[ wFirst
15     v [[ wGap
16     
  0.00s Counting results per 100 up to 3000 ...
   |     0.10s 100
   |     0.21s 200
   |     0.28s 300
   |     0.32s 400
   |     0.35s 500
   |     0.43s 600
   |     0.44s 700
   |     0.53s 800
   |     0.56s 900
   |     0.61s 1000
   |     0.65s 1100
   |     0.70s 1200
   |     0.74s 1300
   |     0.86s 1400
   |     1.10s 1500
   |     1.17s 1600
   |     1.31s 1700
   |     1.36s 1800
   |     1.44s 1900
   |     1.55s 2000
   |     1.62s 2100
   |     1.69s 2200
   |     1.82s 2300
   |     2.23s 2400
   |     2.34s 2500
   |     2.45s 2600
   |     2.58s 2700
  2.65s Done: 2739 results
In [23]:
# ignore this
# S.tweakPerformance(yarnRatio=1)

We are going to run this query in shallow mode.

In [24]:
results = A.search(query, shallow=True)
  4.53s 672 results

Shallow mode tends to be quicker, but that does not always materialize. The number of results agrees with the first query. Yet we have been lucky, because we required the word in the gap to be in the same verse as the first word in the phrase. What if we require if it is the last word in the phrase?

In [25]:
query = """
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap

v:verse

v [[ wLast
v [[ wGap
"""
In [26]:
results = A.search(query, shallow=True)
  4.55s 661 results

Then we would not have found all results.

So, this road, although doable, is much less comfortable, performance-wise and logic-wise.

Check the gaps

In this misty landscape of gaps we need some corroboration that we found the right results.

  1. is every node in gapQueryResults a phrase?
  2. does every phrase in the gapQueryResults have a gap?
  3. is every gapped phrase contained in gapQueryResults?

We check all this by hand coding.

Here is a function that checks whether a phrase has a gap. If the distance between its end points is greater than the number of words it contains, it must have a gap.

In [27]:
def hasGap(p):
    words = L.d(p, otype="word")
    return words[-1] - words[0] + 1 > len(words)

Now we can perform the checks.

In [28]:
otypesGood = True
haveGaps = True

for p in gapQueryResults:
    otype = F.otype.v(p)
    if otype != "phrase":
        print(f"Non phrase detected: {p}) is a {otype}")
        otypesGood = False
        break

    if not hasGap(p):
        print(f"Phrase without a gap: {p}")
        A.pretty(p)
        haveGaps = False
        break

print(f"{len(gapQueryResults)} nodes in query result")
if otypesGood:
    print("1. all nodes are phrases")
if haveGaps:
    print("2. all nodes have gaps")

inResults = True
for p in F.otype.s("phrase"):
    if hasGap(p):
        if p not in gapQueryResults:
            print(f"Gapped phrase outside query results: {p}")
            A.pretty(p)
            inResults = False
            break

if inResults:
    print("3. all gapped phrases are contained in the results")
672 nodes in query result
1. all nodes are phrases
2. all nodes have gaps
3. all gapped phrases are contained in the results

Note that by hand coding we can get the gapped phrases much more quickly and securely!

Custom sets for (non-)gapped phrases

We have obtained a set with all gapped phrases, and we have paid a price:

  • either an expensive query,
  • or an inconvenient bit of hand coding.

It would be nice if we could kick-start our queries using this set as a given. And that is exactly what we are going to do now.

We make two custom sets and give them a name, gapphrase for gapped phrases and conphrase for non-gapped phrases (consecutive phrases).

In [29]:
customSets = dict(
    gapphrase=gapQueryResults,
    conphrase=set(F.otype.s("phrase")) - gapQueryResults,
)

Suppose we want all verbs that occur in a gapped phrase.

In [30]:
query = """
gapphrase
  word sp=verb
"""

Note that we have used the foreign name gapphrase in our search template, instead of phrase.

But we can still run search(), provided we tell it what we mean by gapphrase. We do that by passing the sets parameter to search(), which should be a dictionary of sets. Search will look up gapphrase in this dictionary, and will use its value, which should be a node set. That way, it understands that the expression gapphrase stands for the nodes in the given node set.

Here we go:

In [31]:
results = A.search(query, sets=customSets)
  0.18s 94 results
In [32]:
A.show(results, start=1, end=3, condenseType="clause")

result 1

clause
phrase
phrase

result 2

clause
phrase
sp=conj
phrase
phrase
sp=prep
sp=art
sp=art
clause
phrase
sp=conj
sp=subs
sp=prep
sp=art

result 3

clause
phrase
sp=conj
phrase
phrase
sp=prep
sp=art
sp=art
clause
phrase
sp=conj
sp=subs
sp=prep
sp=art

That looks good.

We can also apply feature conditions to gapphrase:

In [33]:
query = """
gapphrase function=Subj
"""
results = A.search(query, sets=customSets)
A.table(results, start=1, end=3)
  0.00s 177 results
npphrase
1Genesis 2:25שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו
2Genesis 4:4הֶ֨בֶל גַם־ה֛וּא
3Genesis 7:14הֵ֜מָּה וְכָל־הַֽחַיָּ֣ה לְמִינָ֗הּ וְכָל־הַבְּהֵמָה֙ לְמִינָ֔הּ וְכָל־הָרֶ֛מֶשׂ לְמִינֵ֑הוּ וְכָל־הָעֹ֣וף לְמִינֵ֔הוּ כֹּ֖ל צִפֹּ֥ור כָּל־כָּנָֽף׃

We reduce the details by setting the baseType to phrase. The highlighted phrases will now get a yellow background.

In [35]:
A.show(results, start=3, end=3, baseTypes="phrase")

result 3

verse
sentence
clause
phrase הֵ֜מָּה וְכָל־הַֽחַיָּ֣ה לְמִינָ֗הּ וְכָל־הַבְּהֵמָה֙ לְמִינָ֔הּ וְכָל־הָרֶ֛מֶשׂ
function=Subj
clause
phrase הָ
function=Rela
phrase רֹמֵ֥שׂ
function=PreC
phrase עַל־הָאָ֖רֶץ
function=Cmpl
clause
phrase לְמִינֵ֑הוּ וְכָל־הָעֹ֣וף לְמִינֵ֔הוּ כֹּ֖ל צִפֹּ֥ור כָּל־כָּנָֽף׃
function=Subj

We reduce the details by setting the baseType to phrase_atom. The highlighted phrases will not get a yellow background now.

Two-phrase clauses

We can find the gaps, but do our minds always reckon with gaps? Gaps cause unexpected semantics. Here is a little puzzle.

Suppose we want to count the clauses consisting of exactly two phrases.

Here follows a little journey. We use a query to find the clauses, check the result with hand-coding, scratch our heads, refine the query, the hand-coding and our question until we are satisfied.

Attempt 1

By query

The following template should do it: a clause, starting with a phrase, followed by an adjacent phrase, which terminates the clause.

In [37]:
query = """
clause
    =: phrase
    <: phrase
    :=
"""
In [38]:
# ignore this
# S.tweakPerformance(yarnRatio=1.2)
In [39]:
S.study(query)
  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.04s Constraining search space with 5 relations ...
  0.24s 	2 edges thinned
  0.24s Setting up retrieval plan with strategy small_choice_multi ...
  0.25s Ready to deliver results from 264393 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
In [40]:
S.showPlan(details=True)
Search with 3 objects and 5 relations
Results are instantiations of the following objects:
node  0-clause                                         88131   choices
node  1-phrase                                         88131   choices
node  2-phrase                                         88131   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-clause         88131   choices
edge        0-clause           [[     2-phrase             1.0 choices
edge        2-phrase           :=     0-clause             0   choices
edge        2-phrase           :>     1-phrase             0.2 choices
edge        1-phrase           ]]     0-clause             0   choices
edge        0-clause           =:     1-phrase             0   choices
  2.92s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1      =: phrase
 3 R2      <: phrase
 4         :=
 5     
In [41]:
results = A.search(query)
A.table(results, end=7)
  0.49s 23486 results
npclausephrasephrase
1Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י אֹ֑ור
2Genesis 1:4כִּי־טֹ֑וב כִּי־טֹ֑וב
3Genesis 1:7אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ
4Genesis 1:7אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ
5Genesis 1:10כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
6Genesis 1:11מַזְרִ֣יעַ זֶ֔רַע מַזְרִ֣יעַ זֶ֔רַע
7Genesis 1:12כִּי־טֹֽוב׃ כִּי־טֹֽוב׃

If we want to have the clauses only, we run it in shallow mode:

In [42]:
clausesByQuery = sorted(A.search(query, shallow=True))
  0.43s 23486 results

Note result 3 above: it seems we have 3 phrases. Yet there are only 2. We take a closer look:

In [43]:
focus = results[2][0]
A.pretty(focus)

One phrase is chunked into two phrase atoms, which are hidden by default. Let's make that more clear:

In [44]:
A.pretty(focus, hideTypes=False)
clause
clause_atom
phrase
phrase_atom
phrase

By hand

Let us check this with a piece of hand-written code. We want clauses that consist of exactly two phrases.

In [45]:
A.indent(reset=True)
A.info("counting ...")

clausesByHand = []
for clause in F.otype.s("clause"):
    phrases = L.d(clause, otype="phrase")
    if len(phrases) == 2:
        clausesByHand.append(clause)
clausesByHand = sorted(clausesByHand)
A.info(f"Done: found {len(clausesByHand)}")
  0.00s counting ...
  0.22s Done: found 23864

The difference

Strange, we end up with more cases. What is happening? Let us compare the results. We look at the first result where both methods diverge.

We put the difference finding in a little function.

In [46]:
def showDiff(queryResults, handResults):
    diff = [x for x in zip(queryResults, handResults) if x[0] != x[1]]
    if not diff:
        print(
            f"""
{len(queryResults):>6} queryResults
         are identical with
{len(handResults):>6} handResults
"""
        )
        return
    (rQuery, rHand) = diff[0]
    if rQuery < rHand:
        print(f"clause {rQuery} is a query result but not found by hand")
        toShow = rQuery
    else:
        print(f"clause {rHand} is not a query result but has been found by hand")
        toShow = rHand
    colors = ["aqua", "aquamarine", "khaki", "lavender", "yellow"]
    highlights = {}
    for (i, phrase) in enumerate(L.d(toShow, otype="phrase")):
        highlights[phrase] = colors[i % len(colors)]
        for atom in L.d(phrase, otype="phrase_atom"):
            highlights[atom] = colors[i % len(colors)]
    A.pretty(
        toShow,
        hideTypes=False,
        withNodes=True,
        suppress={"lex", "sp", "vt", "vs"},
        highlights=highlights,
        baseTypes="phrase_atom",
    )
In [47]:
showDiff(clausesByQuery, clausesByHand)
clause 427937 is not a query result but has been found by hand
clause:427937
clause_atom:516080
phrase:652701
phrase_atom:905945 כָל־
clause:427937
clause_atom:516082
phrase:652703
phrase_atom:905947 יַֽהַרְגֵֽנִי׃

Lo and behold:

  • the hand-written code is right in a sense: this is a clause that consists exactly of two phrases.
  • the query is also right in a sense: the two phrases are not adjacent: there is a gap in the clause between them!

Attempt 2

By hand

We modify the hand-written code such that only clauses qualify if the two phrases are adjacent.

In [48]:
A.indent(reset=True)
A.info("counting ...")

clausesByHand2 = []
for clause in F.otype.s("clause"):
    phrases = L.d(clause, otype="phrase")
    if len(phrases) == 2:
        if L.d(phrases[0], otype="word")[-1] + 1 == L.d(phrases[1], otype="word")[0]:
            clausesByHand2.append(clause)
clausesByHand2 = sorted(clausesByHand2)
A.info(f"Done: found {len(clausesByHand2)}")
  0.00s counting ...
  0.26s Done: found 23403

The difference

Now we have less cases. What is going on?

In [49]:
showDiff(clausesByQuery, clausesByHand2)
clause 428698 is a query result but not found by hand
clause:428698
clause_atom:516896
phrase:655130
phrase_atom:908523 וְ
phrase:655131
phrase_atom:908524 גַם֩ אֶת־לֹ֨וט
phrase_atom:908525 אָחִ֤יו
phrase_atom:908526 וּ
phrase_atom:908527 רְכֻשֹׁו֙
phrase:655132
phrase_atom:908528 הֵשִׁ֔יב
phrase:655131
phrase_atom:908529 וְ
phrase_atom:908530 גַ֥ם אֶת־הַנָּשִׁ֖ים וְאֶת־הָעָֽם׃

Observe:

This clause has three phrases, but the third one lies inside the second one.

  • the hand-written code is right in a sense: this clause has three phrases.
  • the query is right in a sense: it contains two adjacent phrases that together span the whole clause.

Attempt 3

By query

Can we adjust the pattern to exclude cases like this? Yes, with custom sets, see advanced.

Instead of looking through all phrases, we can just consider non gapped phrases only.

Earlier in this notebook we have constructed the set of non-gapped phrases and put it under the name conphrase in the custom sets.

In [50]:
query = """
clause
    =: conphrase
    <: conphrase
    :=
"""

clausesByQuery2 = sorted(A.search(query, sets=customSets, shallow=True))
  0.46s 23330 results

The difference

There is still a difference.

In [51]:
showDiff(clausesByQuery2, clausesByHand2)
clause 428380 is not a query result but has been found by hand
clause:428380
clause_atom:516560
phrase:654133
phrase_atom:907448 וְֽ
phrase:654134
phrase_atom:907449 אֶת־פַּתְרֻסִ֞ים וְאֶת־כַּסְלֻחִ֗ים
clause:428380
clause_atom:516562
phrase:654134
phrase_atom:907454 וְ
phrase_atom:907455 אֶת־כַּפְתֹּרִֽים׃ ס

Observe:

This clause has two phrases, the second one has a gap, which coincides with a gap in the clause.

  • the hand-written code is right in a sense: this clause has two phrases, adjacent, and they span the whole clause, nothing left out.
  • the query is right in a sense: the second phrase is not consecutive.

Attempt 4

By hand

We modify the hand-written code, so that only consecutive clauses qualify.

In [52]:
A.indent(reset=True)
A.info("counting ...")

clausesByHand3 = []
for clause in F.otype.s("clause"):
    if hasGap(clause):
        continue
    phrases = L.d(clause, otype="phrase")
    if len(phrases) == 2:
        if L.d(phrases[0], otype="word")[-1] + 1 == L.d(phrases[1], otype="word")[0]:
            clausesByHand3.append(clause)
clausesByHand3 = sorted(clausesByHand3)
A.info(f"Done: found {len(clausesByHand3)}")
  0.00s counting ...
  0.35s Done: found 23330

The difference

Now the number of results agree. But are they really the same?

In [53]:
showDiff(clausesByQuery2, clausesByHand3)
 23330 queryResults
         are identical with
 23330 handResults

Conclusion

It took four attempts to arrive at the final concept of things that we were looking for.

Sometimes the search template had to be modified, sometimes the hand-written code.

The interplay and systematic comparison between the attempts helped to spot all relevant configurations of phrases within clauses.

Spans

Here is another cause of wrong query results: there are sentences that span multiple verses. Such sentences are not contained in any verse. That makes that they are easily missed out in queries.

We describe a scenario where that happens.

Mother clauses

A clause and its mother do not have to be in the same verse. We are going to fetch are the cases where they are in different verses.

All mother clauses

But first we fetch all pairs of clauses connected by a mother edge.

In [54]:
query = """
clause
-mother> clause
"""
allMotherPairs = A.search(query)
A.table(results, end=7)
  0.07s 13917 results
npclausephrasephrase
1Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י אֹ֑ור
2Genesis 1:4כִּי־טֹ֑וב כִּי־טֹ֑וב
3Genesis 1:7אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ
4Genesis 1:7אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ
5Genesis 1:10כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
6Genesis 1:11מַזְרִ֣יעַ זֶ֔רַע מַזְרִ֣יעַ זֶ֔רַע
7Genesis 1:12כִּי־טֹֽוב׃ כִּי־טֹֽוב׃

Mother in another verse

Now we modify the query to the effect that mother and daughter must sit in distinct verses.

In [55]:
query = """
cm:clause
-mother> cd:clause

v1:verse
v2:verse
v1 # v2

cm ]] v1
cd ]] v2
"""
diffMotherPairs = A.search(query)
A.table(diffMotherPairs, end=7, skipCols="3 4", withPassage="1 2")
  0.11s 710 results
nclauseclause
1Genesis 1:18  וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה Genesis 1:17  לְהָאִ֖יר עַל־הָאָֽרֶץ׃
2Genesis 2:7  וַיִּיצֶר֩ יְהוָ֨ה אֱלֹהִ֜ים אֶת־הָֽאָדָ֗ם עָפָר֙ מִן־הָ֣אֲדָמָ֔ה Genesis 2:4  בְּיֹ֗ום
3Genesis 7:3  לְחַיֹּ֥ות זֶ֖רַע עַל־פְּנֵ֥י כָל־הָאָֽרֶץ׃ Genesis 7:2  מִכֹּ֣ל׀ הַבְּהֵמָ֣ה הַטְּהֹורָ֗ה תִּֽקַּח־לְךָ֛ שִׁבְעָ֥ה שִׁבְעָ֖ה אִ֣ישׁ וְאִשְׁתֹּ֑ו
4Genesis 22:17  כִּֽי־בָרֵ֣ךְ אֲבָרֶכְךָ֗ Genesis 22:16  כִּ֗י
5Genesis 24:44  הִ֣וא הָֽאִשָּׁ֔ה Genesis 24:43  הָֽעַלְמָה֙
6Genesis 27:45  עַד־שׁ֨וּב אַף־אָחִ֜יךָ מִמְּךָ֗ Genesis 27:44  עַ֥ד אֲשֶׁר־תָּשׁ֖וּב חֲמַ֥ת אָחִֽיךָ׃
7Genesis 36:16  אַלּֽוּף־קֹ֛רַח אַלּ֥וּף גַּעְתָּ֖ם אַלּ֣וּף עֲמָלֵ֑ק Genesis 36:15  בְּנֵ֤י אֱלִיפַז֙ בְּכֹ֣ור עֵשָׂ֔ו אַלּ֤וּף תֵּימָן֙ אַלּ֣וּף אֹומָ֔ר אַלּ֥וּף צְפֹ֖ו אַלּ֥וּף קְנַֽז׃

Mother in same verse

As a check, we modify the latter query and require v1 and v2 to be the same verse, to get the mother pairs of which both members are in the same verse.

In [56]:
query = """
cm:clause
-mother> cd:clause

v1:verse
v2:verse
v1 = v2

cm ]] v1
cd ]] v2
"""
sameMotherPairs = A.search(query)
A.table(sameMotherPairs, end=7, skipCols="3 4", withPassage="1 2")
  0.13s 13181 results
nclauseclause
1Genesis 1:4  כִּי־טֹ֑וב Genesis 1:4  וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָאֹ֖ור
2Genesis 1:10  כִּי־טֹֽוב׃ Genesis 1:10  וַיַּ֥רְא אֱלֹהִ֖ים
3Genesis 1:12  כִּי־טֹֽוב׃ Genesis 1:12  וַיַּ֥רְא אֱלֹהִ֖ים
4Genesis 1:14  לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה Genesis 1:14  יְהִ֤י מְאֹרֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם
5Genesis 1:15  לְהָאִ֖יר עַל־הָאָ֑רֶץ Genesis 1:15  וְהָי֤וּ לִמְאֹורֹת֙ בִּרְקִ֣יעַ הַשָּׁמַ֔יִם
6Genesis 1:17  לְהָאִ֖יר עַל־הָאָֽרֶץ׃ Genesis 1:17  וַיִּתֵּ֥ן אֹתָ֛ם אֱלֹהִ֖ים בִּרְקִ֣יעַ הַשָּׁמָ֑יִם
7Genesis 1:18  וּֽלֲהַבְדִּ֔יל בֵּ֥ין הָאֹ֖ור וּבֵ֣ין הַחֹ֑שֶׁךְ Genesis 1:18  וְלִמְשֹׁל֙ בַּיֹּ֣ום וּבַלַּ֔יְלָה

The difference

Let's check if the numbers add up:

  • the first query asked for all pairs
  • the second query asked for pairs with members in different verses
  • the third query asked for pairs with members in the same verse

Then the results of the second and third query combined should equal the results of the first query.

That makes sense.

Still, let's check:

In [57]:
discrepancy = len(allMotherPairs) - len(diffMotherPairs) - len(sameMotherPairs)
print(discrepancy)
26

The numbers do not add up. We are missing cases. Why?

Clauses may cross verse boundaries. In that case they are not part of a verse, and hence our latter two queries do not detect them. Let's count how many verse boundary crossing clauses there are.

In [58]:
query = """
clause
/with/
v1:verse
&& ..
v2:verse
&& ..
v1 < v2
/-/
"""
results = A.search(query)
  0.57s 50 results

You might think we can speed up the query by requiring v1 <: v2 (both verses are adjacent). There are less possibilities to consider, to maybe we gain something.

In [59]:
query = """
clause
/with/
v1:verse
&& ..
v2:verse
&& ..
v1 <: v2
/-/
"""
results = A.search(query)
  0.53s 49 results

Indeed, slightly faster, but one result less! How can that be?

There must be a clause that spans at least two verses and in doing so, skips at least one verse.

Let's find that one:

In [60]:
query = """
clause
/with/
v1:verse
&& ..
v2:verse
|| ..
v3:verse
&& ..
v1 < v2
v2 < v3
v1 < v3
/-/
"""
resultsX = A.search(query)
  0.93s 1 result
In [61]:
A.table(resultsX)
A.show(resultsX, baseTypes="clause_atom")
npclause
11_Kings 8:41וְגַם֙ אֶל־הַנָּכְרִ֔י אַתָּ֞ה תִּשְׁמַ֤ע הַשָּׁמַ֨יִם֙ מְכֹ֣ון שִׁבְתֶּ֔ךָ

result 1

verse
sentence
clause
phrase וְ
phrase גַם֙ אֶל־הַנָּכְרִ֔י
clause
phrase אֲשֶׁ֛ר
phrase לֹא־
phrase מֵעַמְּךָ֥ יִשְׂרָאֵ֖ל
phrase ה֑וּא
clause
phrase וּ
phrase בָ֛א
phrase מֵאֶ֥רֶץ רְחֹוקָ֖ה
phrase לְמַ֥עַן שְׁמֶֽךָ׃
verse
sentence
clause
phrase אַתָּ֞ה
phrase תִּשְׁמַ֤ע
phrase הַשָּׁמַ֨יִם֙ מְכֹ֣ון שִׁבְתֶּ֔ךָ
sentence
clause
phrase וְ
phrase עָשִׂ֕יתָ
phrase כְּכֹ֛ל
clause
phrase אֲשֶׁר־
phrase יִקְרָ֥א
phrase אֵלֶ֖יךָ
phrase הַנָּכְרִ֑י
sentence
clause
phrase לְמַ֣עַן
phrase יֵדְעוּן֩
phrase כָּל־עַמֵּ֨י הָאָ֜רֶץ
phrase אֶת־שְׁמֶ֗ךָ
clause
phrase לְיִרְאָ֤ה
phrase אֹֽתְךָ֙
phrase כְּעַמְּךָ֣ יִשְׂרָאֵ֔ל
clause
phrase וְ
phrase לָדַ֕עַת
clause
phrase כִּי־
phrase שִׁמְךָ֣
phrase נִקְרָ֔א
phrase עַל־הַבַּ֥יִת הַזֶּ֖ה
clause
phrase אֲשֶׁ֥ר
phrase בָּנִֽיתִי׃

A more roundabout way to find the same clauses:

In [62]:
query = """
clause
    =: first:word
    last:word
    :=
v1:verse
    w1:word
v2:verse
    w2:word

first = w1
last = w2
v1 # v2
"""
results = A.search(query)
  1.01s 50 results

Some of these verse spanning clauses do not have mothers or are not mothers. Let's count the cases where two clauses are in a mother relation and at least one of them spans a verse.

We need two queries for that. These queries are almost similar. One retrieves the clause pairs where the mother crosses verse boundaries, and the other where the daughter does so.

But we are programmers. We do not have to repeat ourselves:

In [63]:
queryCommon = """
c1:clause
-mother> c2:clause

c3:clause
/with/
v1:verse
&& ..
v2:verse
&& ..
v1 < v2
/-/
"""

query1 = f"""
{queryCommon}
c1 = c3
"""
query2 = f"""
{queryCommon}
c2 = c3
"""

results1 = A.search(query1, silent=True)
results2 = A.search(query2, silent=True)
spannersByQuery = {(r[0], r[1]) for r in results1 + results2}
print(f"{len(spannersByQuery):>3} spanners are missing")
print(f"{discrepancy:>3} missing cases were detected before")
print(f"{discrepancy - len(spannersByQuery):>3} is the resulting disagreement")
 26 spanners are missing
 26 missing cases were detected before
  0 is the resulting disagreement

We may find the mother clause pairs in which it least one member is verse spanning by hand-coding in an easier way:

Starting with the set of all mother pairs, we filter out any pair that has a verse spanner.

In [64]:
spannersByHand = set()

for (c1, c2) in allMotherPairs:
    if not (L.u(c1, otype="verse") and L.u(c2, otype="verse")):
        spannersByHand.add((c1, c2))

len(spannersByHand)
Out[64]:
26

And, to be completely sure:

In [65]:
spannersByHand == spannersByQuery
Out[65]:
True

By custom sets

If we are content with the clauses that do not span verses, we can put them in a set, and modify the queries by replacing clause by conclause and bind the right set to it.

Here we go. In one cell we run the queries to get all pairs, the mother-daughter-in-separate-verses pairs, and the mother-daughter-in-same-verses pair and we do the math of checking.

In [66]:
conClauses = {c for c in F.otype.s("clause") if L.u(c, otype="verse")}
customSets = dict(conclause=conClauses)

print("All pairs")
allPairs = A.search(
    """
conclause
-mother> conclause
""",
    sets=customSets,
)

print("Different verse pairs")
diffPairs = A.search(
    """
cm:conclause
-mother> cd:conclause

v1:verse
v2:verse
v1 # v2

cm ]] v1
cd ]] v2
""",
    sets=customSets,
)

print("Same verse pairs")
samePairs = A.search(
    """
cm:conclause
-mother> cd:conclause

v1:verse
v2:verse
v1 = v2

cm ]] v1
cd ]] v2
""",
    sets=customSets,
)

allPairSet = set(allPairs)
diffPairSet = {(r[0], r[1]) for r in diffPairs}
samePairSet = {(r[0], r[1]) for r in samePairs}

print(f"Intersection same-verse/different-verse pairs: {samePairSet & diffPairSet}")
print(
    f"All pairs is union of same-verse/different-verse pairs: {allPairSet == (samePairSet | diffPairSet)}"
)
All pairs
  0.06s 13891 results
Different verse pairs
  0.09s 710 results
Same verse pairs
  0.12s 13181 results
Intersection same-verse/different-verse pairs: set()
All pairs is union of same-verse/different-verse pairs: True

Lessons

  • mix programming with composing queries;
  • a good way to do so is custom sets;
  • use programming for processing results;
  • find the balance between queries and hand-coding.

All steps

  • start your first step in mastering the bible computationally
  • display become an expert in creating pretty displays of your text structures
  • search turbo charge your hand-coding with search templates

advanced sets relations quantifiers fromMQL rough gaps

You have now finished the search tutorial.

Share the work!


  • exportExcel make tailor-made spreadsheets out of your results
  • share draw in other people's data and let them use yours
  • export export your dataset as an Emdros database
  • annotate annotate plain text by means of other tools and import the annotations as TF features
  • map map somebody else's annotations to a new version of the corpus
  • volumes work with selected books only
  • trees work with the BHSA data as syntax trees

CC-BY Dirk Roorda