You might want to consider the start of this tutorial.

Short introductions to other TF datasets:

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
from tf.app import use
In [3]:
VERSION = "2021"
In [4]:
A = use("etcbc/bhsa", hoist=globals())
TF-app: ~/text-fabric-data/etcbc/bhsa/app
data: ~/text-fabric-data/etcbc/bhsa/tf/2021
data: ~/text-fabric-data/etcbc/phono/tf/2021
data: ~/text-fabric-data/etcbc/parallels/tf/2021
This is Text-Fabric 9.3.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

122 features found and 0 ignored
Text-Fabric: Text-Fabric API 9.3.2, etcbc/bhsa/app v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:
Parallel Passages
int
๐Ÿ†— links between similar passages
author:
BHSA Data: Constantijn Sikkel; Parallels Notebook: Dirk Roorda, Martijn Naaijer
coreData:
BHSA
dateWritten:
2021-12-09T14:40:46Z
provenance:
Parallels notebook, see https://github.com/ETCBC/parallels
version:
2021
writtenBy:
Text-Fabric
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
str
โœ… book name in Latin (Genesis; Numeri; Reges1; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:55Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… book name in amharic (แŠฃแˆ›แˆญแŠ›)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:20:27Z
encoders:
Dirk Roorda (TF)
language:
แŠฃแˆ›แˆญแŠ›
languageCode:
am
languageEnglish:
amharic
provenance:
book names from wikipedia and other sources
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
โœ… chapter number (1; 2; 3; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:55Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
โœ… identifier of a clause atom relationship (0; 74; 367; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:56Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
det
str
โœ… determinedness of phrase(atom) (det; und; NA.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:56Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:57Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
โœ… frequency of lexemes
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:24:45Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
computed on the basis of the ETCBC core set of features
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… syntactic function of phrase (Cmpl; Objc; Pred; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:57Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… word consonantal-transliterated (B R>CJT BR> >LHJM ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:57Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… word consonantal-Hebrew (ื‘ ืจืืฉืื™ืช ื‘ืจื ืืœื”ื™ื)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:58Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… lexeme pointed-transliterated (B.:- R;>CIJT [email protected]@> >:ELOH ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:58Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… lexeme pointed-Hebrew (ื‘ึฐึผ ืจึตืืฉึดืื™ืช ื‘ึธึผืจึธื ืึฑืœึนื”)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:17:59Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… word pointed-transliterated (B.:- R;>CI73JT [email protected]@74> >:ELOHI92JM)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:04Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… word pointed-Hebrew (ื‘ึฐึผ ืจึตืืฉึดืึ–ื™ืช ื‘ึธึผืจึธึฃื ืึฑืœึนื”ึดึ‘ื™ื)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:04Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
๐Ÿ†— english translation of lexeme (beginning create god(s))
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:13Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
gn
str
โœ… grammatical gender (m; f; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:05Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… (half-)verse label (half verses: A; B; C; verses: GEN 01,02)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:06Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… of word or lexeme (Hebrew; Aramaic.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:13Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
lex
str
โœ… lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:14Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… lexeme consonantal-Hebrew (ื‘ ืจืืฉืื™ืชึœ ื‘ืจื ืืœื”ื™ืึœ)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
ls
str
โœ… lexical set, subclassification of part-of-speech (card; ques; mult)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โš ๏ธ named entity type (pers; mens; gens; topo; ppde.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
nme
str
โœ… nominal ending consonantal-transliterated (absent; n/a; JM, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:08Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
nu
str
โœ… grammatical number (sg; du; pl; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:08Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
โœ… sequence number of an object within its context
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:09Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:15Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
๐Ÿ†— hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:22:50Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional paragraph file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
pdp
str
โœ… phrase dependent part-of-speech (art; verb; subs; nmpr, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:10Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
pfm
str
โœ… preformative consonantal-transliterated (absent; n/a; J, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:11Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
prs
str
โœ… pronominal suffix consonantal-transliterated (absent; n/a; W; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:11Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… pronominal suffix gender (m; f; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:11Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… pronominal suffix number (sg; du; pl; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:12Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… pronominal suffix person (p1; p2; p3; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:12Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
ps
str
โœ… grammatical person (p1; p2; p3; NA; unknown.)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:12Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… word pointed-transliterated masoretic reading correction
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… interword material -pointed-transliterated (Masoretic correction)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… interword material -pointed-transliterated (Masoretic correction)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… word pointed-Hebrew masoretic reading correction
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:23:29Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional ketiv/qere file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
โœ… ranking of lexemes based on freqnuecy
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:24:46Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
computed on the basis of the ETCBC core set of features
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:13Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
sp
str
โœ… part-of-speech (art; verb; subs; nmpr, ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:16Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
st
str
โœ… state of a noun (a (absolute); c (construct); e (emphatic).)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:14Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
tab
int
โœ… clause atom: its level in the linguistic embedding
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:16Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… interword material pointed-transliterated (& 00 05 00_P ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:01Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… interword material pointed-Hebrew (ึพ ืƒ)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:01Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
txt
str
โœ… text type of clause and surrounding (repetion of ? N D Q as in feature domain)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:16Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
typ
str
โœ… clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:16Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
uvf
str
โœ… univalent final consonant consonantal-transliterated (absent; N; J; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:17Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vbe
str
โœ… verbal ending consonantal-transliterated (n/a; W; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:17Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vbs
str
โœ… root formation consonantal-transliterated (absent; n/a; H; ...)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:17Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
int
โœ… verse number
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:18Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:16Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
str
โœ… vocalized lexeme pointed-Hebrew (ื‘ึฐึผ ืจึตืืฉึดืื™ืช ื‘ืจื ืึฑืœึนื”ึดื™ื)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:17Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
provenance:
from additional lexicon file provided by the ETCBC
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vs
str
โœ… verbal stem (qal; piel; hif; apel; pael)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:18Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
vt
str
โœ… verbal tense (perf; impv; wayq; infc)
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:18Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
none
โœ… linguistic dependency between textual objects
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:18:22Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
none
author:
Eep Talstra Centre for Bible and Computer
dataset:
BHSA
datasetName:
Biblia Hebraica Stuttgartensia Amstelodamensis
dateWritten:
2021-12-09T14:21:17Z
encoders:
Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)
version:
2021
website:
https://shebanq.ancient-data.org
writtenBy:
Text-Fabric
Phonetic Transcriptions
str
๐Ÿ†— phonological transcription (bแตŠ rฤ“ลกหŒรฎแนฏ bฤrหˆฤ ส”แต‰lลhหˆรฎm)
author:
BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda
coreData:
BHSA
dateWritten:
2021-12-09T14:25:55Z
provenance:
computed by the phono notebook, see https://github.com/ETCBC/phono
version:
2021
writtenBy:
Text-Fabric
str
๐Ÿ†— interword material in phonological transcription
author:
BHSA Data: Constantijn Sikkel; Phono Notebook: Dirk Roorda
coreData:
BHSA
dateWritten:
2021-12-09T14:25:55Z
provenance:
computed by the phono notebook, see https://github.com/ETCBC/phono
version:
2021
writtenBy:
Text-Fabric
Text-Fabric API: names N F E L T S C TF directly usable

Rough edges

It might be helpful to peek under the hood, especially when exploring searches that go slow.

If you went through the previous parts of the tutorial you have encountered cases where things come to a grinding halt.

Yet we can get a hunch of what is going on, even in those cases. For that, we use the lower-level search api S of Text-Fabric, and not the wrappers that the high level A api provides.

The main difference is, that S.search() returns a generator of the results, whereas A.search() returns a list of the results. In fact, A.search() calls the generator function delivered by S.search() as often as needed.

For some queries, the fetching of results is quite costly, so costly that we do not want to fetch all results up-front. Rather we want to fetch a few, to see how it goes. In these cases, directly using S.search() is preferred over A.search().

In [5]:
query = """
book
  chapter
    verse
      phrase det=und
        word lex=>LHJM/
"""

Study

First we call S.study(query).

The syntax will be checked, features loaded, the search space will be set up, narrowed down, and the fetching of results will be prepared, but not yet executed.

In order to make the query a bit more interesting, we lift the constraint that the results must be in Genesis 1-2.

In [6]:
S.study(query)
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.24s Constraining search space with 4 relations ...
  0.28s 	2 edges thinned
  0.28s Setting up retrieval plan with strategy small_choice_multi ...
  0.31s Ready to deliver results from 3345 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

Before we rush to the results, lets have a look at the plan.

In [7]:
S.showPlan()
  3.50s The results are connected to the original search template as follows:
 0     
 1 R0  book
 2 R1    chapter
 3 R2      verse
 4 R3        phrase det=und
 5 R4          word lex=>LHJM/
 6     

Here you see already what your results will look like. Each result r is a tuple of nodes:

(R0, R1, R2, R3, R4)

that instantiate the objects in your template.

In case you are curious, you can get details about the search space as well:

In [8]:
S.showPlan(details=True)
Search with 5 objects and 4 relations
Results are instantiations of the following objects:
node  0-book                                              39   choices
node  1-chapter                                          929   choices
node  2-verse                                            754   choices
node  3-phrase                                           805   choices
node  4-word                                             818   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-book              39   choices
edge        0-book             [[     1-chapter           23.8 choices
edge        1-chapter          [[     2-verse              0.2 choices
edge        2-verse            [[     3-phrase             1.0 choices (thinned)
edge        3-phrase           [[     4-word               1.0 choices (thinned)
  6.27s The results are connected to the original search template as follows:
 0     
 1 R0  book
 2 R1    chapter
 3 R2      verse
 4 R3        phrase det=und
 5 R4          word lex=>LHJM/
 6     

The part about the nodes shows you how many possible instantiations for each object in your template has been found. These are not results yet, because only combinations of instantiations that satisfy all constraints are results.

The constraints come from the relations between the objects that you specified. In this case, there is only an implicit relation: embedding [[. Later on we'll examine all spatial relations.

The part about the edges shows you the constraints, and in what order they will be computed when stitching results together. In this case the order is exactly the order by which the relations appear in the template, but that will not always be the case. Text-Fabric spends some time and ingenuity to find out an optimal stitch plan. Fetching results is like selecting a node, stitching it to another node with an edge, and so on, until a full stitch of nodes intersects with all the node sets from which they must be chosen (the yarns).

Fetching results may take time.

For some queries, it can take a large amount of time to walk through all results. Even worse, it may happen that it takes a large amount of time before getting the first result. During stitching, many stitchings will be tried and fail before they can be completed.

This has to do with search strategies on the one hand, and the very likely possibility to encounter pathological search patterns, which have billions of results, mostly unintended. For example, a simple query that asks for 5 words in the Hebrew Bible without further constraints, will have 425,000 to the power of 5 results. That is 10-e28 (a one with 28 zeros), roughly the number of molecules in a few hundred liters of air. That may not sound much, but it is 10,000 times the amount of bytes that can be currently stored on the whole Internet.

Text-Fabric search is not yet done with finding optimal search strategies, and I hope to refine its arsenal of methods in the future, depending on what you report.

Counting results

It is always a good idea to get a feel for the amount of results, before you dive into them head-on.

In [9]:
S.count(progress=1, limit=5)
  0.00s Counting results per 1 up to 5 ...
   |     0.00s 1
   |     0.00s 2
   |     0.00s 3
   |     0.00s 4
   |     0.01s 5
  0.01s Done: 6 results

We asked for 5 results in total, with a progress message for every one. That was a bit conservative.

In [10]:
S.count(progress=100, limit=500)
  0.00s Counting results per 100 up to 500 ...
   |     0.01s 100
   |     0.02s 200
   |     0.03s 300
   |     0.04s 400
   |     0.05s 500
  0.05s Done: 501 results

Still pretty quick, now we want to count all results.

In [11]:
S.count(progress=200, limit=None)
  0.00s Counting results per 200 ...
   |     0.02s 200
   |     0.04s 400
   |     0.06s 600
   |     0.07s 800
  0.07s Done: 818 results

Fetching results

It is time to see something of those results.

In [12]:
S.fetch(limit=10)
Out[12]:
((426626, 427478, 1435353, 882995, 381820),
 (426626, 427478, 1435364, 883090, 382059),
 (426627, 427485, 1435532, 884992, 385801),
 (426627, 427486, 1435548, 885229, 386188),
 (426627, 427492, 1435804, 887032, 390487),
 (426627, 427493, 1435830, 887367, 391119),
 (426627, 427493, 1435831, 887394, 391159),
 (426628, 427497, 1435979, 888253, 392968),
 (426628, 427498, 1436032, 888574, 393786),
 (426628, 427498, 1436037, 888618, 393895))

Not very informative. Just a quick observation: look at the last column. These are the result nodes for the word part in the query, indicated as R7 by showPlan() before. And indeed, they are all below 425,000, the number of words in the Hebrew Bible.

Nevertheless, we want to glean a bit more information off them.

In [13]:
for r in S.fetch(limit=10):
    print(S.glean(r))
  Ezra 8:17 phrase[ืžึฐืฉึธืืจึฐืชึดึ–ื™ื ืœึฐื‘ึตึฅื™ืช ืึฑืœึนื”ึตึฝื™ื ื•ึผืƒ ] ืึฑืœึนื”ึตึฝื™ื ื•ึผืƒ 
  Ezra 8:28 phrase[ื ึฐื“ึธื‘ึธึ”ื” ืœึทื™ื”ื•ึธึ–ื” ืึฑืœึนื”ึตึฅื™ ืึฒื‘ึนืชึตื™ื›ึถึฝืืƒ ] ืึฑืœึนื”ึตึฅื™ 
  Nehemiah 5:15 phrase[ืžึดืคึฐึผื ึตึ–ื™ ื™ึดืจึฐืึทึฅืช ืึฑืœึนื”ึดึฝื™ืืƒ ] ืึฑืœึนื”ึดึฝื™ืืƒ 
  Nehemiah 6:12 phrase[ืึฑืœึนื”ึดึ–ื™ื ] ืึฑืœึนื”ึดึ–ื™ื 
  Nehemiah 12:46 phrase[ืœึตึฝืืœึนื”ึดึฝื™ืืƒ ] ืืœึนื”ึดึฝื™ืืƒ 
  Nehemiah 13:25 phrase[ื‘ึตึผึฝืืœึนื”ึดึ—ื™ื ] ืืœึนื”ึดึ—ื™ื 
  Nehemiah 13:26 phrase[ืึฑืœึนื”ึดึ”ื™ื ] ืึฑืœึนื”ึดึ”ื™ื 
  1_Chronicles 4:10 phrase[ืึฑืœึนื”ึดึ–ื™ื ] ืึฑืœึนื”ึดึ–ื™ื 
  1_Chronicles 5:20 phrase[ืœึตืืœึนื”ึดึคื™ื ] ืืœึนื”ึดึคื™ื 
  1_Chronicles 5:25 phrase[ืึฑืœึนื”ึดึ–ื™ื ] ืึฑืœึนื”ึดึ–ื™ื 
Caution

It is not possible to do len(S.fetch()). Because fetch() is a generator, not a list. It will deliver a result every time it is being asked and for as long as there are results, but it does not know in advance how many there will be.

Fetching a result can be costly, because due to the constraints, a lot of possibilities may have to be tried and rejected before a the next result is found.

That is why you often see results coming in at varying speeds when counting them.

We can also use A.table() to make a list of results. This function is part of the Bhsa API, not of the generic Text-Fabric machinery, as opposed to S.glean().

So, you can use S.glean() for every Text-Fabric corpus, but the output is still not very nice. A.table() gives much nicer output.

In [14]:
A.table(S.fetch(limit=5))
npbookchapterversephraseword
1Ezra 8:17EzraEzra 8ืžึฐืฉึธืืจึฐืชึดึ–ื™ื ืœึฐื‘ึตึฅื™ืช ืึฑืœึนื”ึตึฝื™ื ื•ึผืƒ ืึฑืœึนื”ึตึฝื™ื ื•ึผืƒ
2Ezra 8:28EzraEzra 8ื ึฐื“ึธื‘ึธึ”ื” ืœึทื™ื”ื•ึธึ–ื” ืึฑืœึนื”ึตึฅื™ ืึฒื‘ึนืชึตื™ื›ึถึฝืืƒ ืึฑืœึนื”ึตึฅื™
3Nehemiah 5:15NehemiahNehemiah 5ืžึดืคึฐึผื ึตึ–ื™ ื™ึดืจึฐืึทึฅืช ืึฑืœึนื”ึดึฝื™ืืƒ ืึฑืœึนื”ึดึฝื™ืืƒ
4Nehemiah 6:12NehemiahNehemiah 6ืึฑืœึนื”ึดึ–ื™ื ืึฑืœึนื”ึดึ–ื™ื
5Nehemiah 12:46NehemiahNehemiah 12ืœึตึฝืืœึนื”ึดึฝื™ืืƒ ืืœึนื”ึดึฝื™ืืƒ

Queries with abundant results

Above we mentioned that there are queries with astronomically many results. Here we present one:

In [15]:
query = """
word
# word
"""

We are asking for any pair of different words. That will give roughly 425,000 * 425,000 results, which is 180 billion results. This is a lot to produce, it will take time on even the best of computers, and once you've got the results, what would you do with them. Let's see what happens if we count these results.

In [16]:
S.study(query)
  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
  0.05s Constraining search space with 1 relations ...
  0.05s 	0 edges thinned
  0.05s Setting up retrieval plan with strategy small_choice_multi ...
  0.06s Ready to deliver results from 853180 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
In [17]:
S.count(progress=500000)
  0.00s Counting results per 500000 ...
   |     0.16s 500000
   |     0.30s 1000000
   |     0.45s 1500000
   |     0.59s 2000000
   |     0.74s 2500000
   |     0.88s 3000000
   |     1.03s 3500000
   |     1.17s 4000000
   |     1.31s 4500000
   |     1.45s 5000000
   |     1.59s 5500000
  1.68s cut off at 5787324 results. There are more ...

Text-fabric has cut off the process at a certain limit. This limit is a number of times the maximum node in your corpus:

In [18]:
5787324 / F.otype.maxNode
Out[18]:
4.0

If you really need more results than this limit, you can specify a higher limit:

In [19]:
S.count(progress=500000, limit=8 * F.otype.maxNode)
  0.00s Counting results per 500000 up to 11574648 ...
   |     0.16s 500000
   |     0.30s 1000000
   |     0.45s 1500000
   |     0.59s 2000000
   |     0.74s 2500000
   |     0.89s 3000000
   |     1.03s 3500000
   |     1.17s 4000000
   |     1.32s 4500000
   |     1.46s 5000000
   |     1.61s 5500000
   |     1.75s 6000000
   |     1.90s 6500000
   |     2.04s 7000000
   |     2.18s 7500000
   |     2.33s 8000000
   |     2.47s 8500000
   |     2.62s 9000000
   |     2.76s 9500000
   |     2.91s 10000000
   |     3.05s 10500000
   |     3.20s 11000000
   |     3.34s 11500000
  3.36s Done: 11574649 results

Now you do not get an error message, because you got what you've asked for.

Or, in the advanced interface, let's fetch the standard maximum of results:

In [20]:
results = A.search(query)
  3.03s cut off at 5787324 results. There are more ...
  4.85s 5787324 results

Or, with a modified limit:

In [22]:
results = A.search(query, limit=5 * F.otype.maxNode)
  6.32s 7234155 results

Again, you do not get an error message, because you got what you've asked for.

Slow queries

The search template above has some pretty tight constraints on one of its objects, so the amount of data to deal with is pretty limited.

If the constraints are weak, search may become slow.

For example, here is a query that looks for pairs of phrases in the same clause in such a way that one is engulfed by the other.

In [21]:
query = """
% test
% verse book=Genesis chapter=2 verse=25
verse
  clause

    p1:phrase
      w1:word
      w3:word
      w1 < w3

    p2:phrase
      w2:word
      w1 < w2
      w3 > w2

    p1 < p2
"""

A couple of remarks you may have encountered before.

  • some objects have got a name
  • there are additional relations specified between named objects
  • < means: comes before, and >: comes after in the canonical order for nodes, which for words means: comes textually before/after, but for other nodes the meaning is explained here
  • later on we describe those relations in more detail

Note on order Look at the words w1 and w3 below phrase p1. Although in the template w1 comes before w3, this is not translated in a search constraint of the same nature.

Order between objects in a template is never significant, only embedding is.

Because order is not significant, you have to specify order yourself, using relations.

It turns out that this is better than the other way around. In MQL order is significant, and it is very difficult to search for w1 and w2 in any order. Especially if your are looking for more than 2 complex objects with lots of feature conditions, your search template would explode if you had to spell out all possible permutations. See the example of Reinoud Oosting below.

Note on gaps Look at the phrases p1 and p2. We do not specify an order here, only that they are different. In order to prevent duplicated searches with p1 and p2 interchanged, we even stipulate that p1 < p2. There are many spatial relationships possible between different objects. In many cases, neither the one comes before the other, nor vice versa. They can overlap, one can occur in a gap of the other, they can be completely disjoint and interleaved, etc.

In [22]:
# ignore this
# S.tweakPerformance(yarnRatio=2)
In [23]:
S.study(query)
  0.00s Checking search template ...
  0.00s Setting up search space for 7 objects ...
  0.12s Constraining search space with 10 relations ...
  0.77s 	6 edges thinned
  0.77s Setting up retrieval plan with strategy small_choice_multi ...
  0.81s Ready to deliver results from 1894471 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

Text-Fabric knows that narrowing down the search space in this case would take ages, without resulting in a significantly shrunken space. So it skips doing so for most constraints.

Let us see the plan, with details.

In [24]:
S.showPlan(details=True)
Search with 7 objects and 9 relations
Results are instantiations of the following objects:
node  0-verse                                          23207   choices
node  1-clause                                         88081   choices
node  2-phrase                                        252998   choices
node  3-word                                          425729   choices
node  4-word                                          425729   choices
node  5-phrase                                        252998   choices
node  6-word                                          425729   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-verse          23207   choices
edge        0-verse            [[     1-clause             3.7 choices (thinned)
edge        1-clause           [[     2-phrase             2.8 choices (thinned)
edge        2-phrase           [[     3-word               1.6 choices (thinned)
edge        2-phrase           [[     4-word               1.8 choices (thinned)
edge        3-word             <      4-word               0   choices
edge        1-clause           [[     5-phrase             3.0 choices (thinned)
edge        5-phrase           >      2-phrase             0   choices
edge        5-phrase           [[     6-word               1.3 choices (thinned)
edge      3,4-word            <,>     6-word               0   choices
  7.45s The results are connected to the original search template as follows:
 0     
 1     % test
 2     % verse book=Genesis chapter=2 verse=25
 3 R0  verse
 4 R1    clause
 5     
 6 R2      p1:phrase
 7 R3        w1:word
 8 R4        w3:word
 9           w1 < w3
10     
11 R5      p2:phrase
12 R6        w2:word
13           w1 < w2
14           w3 > w2
15     
16         p1 < p2
17     

As you see, we have a hefty search space here. Let us play with the count() function.

In [25]:
S.count(progress=10, limit=100)
  0.00s Counting results per 10 up to 100 ...
   |     0.04s 10
   |     0.04s 20
   |     0.04s 30
   |     0.05s 40
   |     0.05s 50
   |     0.05s 60
   |     0.05s 70
   |     0.05s 80
   |     0.05s 90
   |     0.05s 100
  0.05s Done: 101 results

We can be bolder than this!

In [26]:
S.count(progress=100, limit=1000)
  0.00s Counting results per 100 up to 1000 ...
   |     0.05s 100
   |     0.06s 200
   |     0.06s 300
   |     0.09s 400
   |     0.11s 500
   |     0.11s 600
   |     0.12s 700
   |     0.16s 800
   |     0.17s 900
   |     0.21s 1000
  0.22s Done: 1001 results

OK, not too bad, but note that it takes a big fraction of a second to get just 100 results.

Now let us go for all of them by the thousand.

In [28]:
S.count(progress=1000, limit=None)
  0.00s Counting results per 1000 ...
   |     0.21s 1000
   |     0.36s 2000
   |     0.50s 3000
   |     0.64s 4000
   |     0.77s 5000
   |     1.02s 6000
   |     1.50s 7000
  1.88s Done: 7593 results

See? This is substantial work.

In [29]:
A.table(S.fetch(limit=5))
npverseclausephrasewordwordphraseword
1Genesis 2:25ื•ึทื™ึดึผึฝื”ึฐื™ึคื•ึผ ืฉึฐืื ึตื™ื”ึถืึ™ ืขึฒืจื•ึผืžึดึผึ”ื™ื ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ื”ึธึฝืขึฒืจื•ึผืžึดึผึ”ื™ื ืขึฒืจื•ึผืžึดึผึ”ื™ื
2Genesis 2:25ื•ึทื™ึดึผึฝื”ึฐื™ึคื•ึผ ืฉึฐืื ึตื™ื”ึถืึ™ ืขึฒืจื•ึผืžึดึผึ”ื™ื ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ืึธื“ึธึ–ื ืขึฒืจื•ึผืžึดึผึ”ื™ื ืขึฒืจื•ึผืžึดึผึ”ื™ื
3Genesis 2:25ื•ึทื™ึดึผึฝื”ึฐื™ึคื•ึผ ืฉึฐืื ึตื™ื”ึถืึ™ ืขึฒืจื•ึผืžึดึผึ”ื™ื ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ื•ึฐืขึฒืจื•ึผืžึดึผึ”ื™ื ืขึฒืจื•ึผืžึดึผึ”ื™ื
4Genesis 2:25ื•ึทื™ึดึผึฝื”ึฐื™ึคื•ึผ ืฉึฐืื ึตื™ื”ึถืึ™ ืขึฒืจื•ึผืžึดึผึ”ื™ื ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ื”ึธึฝืึธื“ึธึ–ื ื•ึฐืึดืฉึฐืืชึนึผึ‘ื• ืฉึฐืื ึตื™ื”ึถืึ™ ืึดืฉึฐืืชึนึผึ‘ื• ืขึฒืจื•ึผืžึดึผึ”ื™ื ืขึฒืจื•ึผืžึดึผึ”ื™ื
5Genesis 4:4ื•ึฐื”ึถึจื‘ึถืœ ื”ึตื‘ึดึฅื™ื ื’ึทืึพื”ึ›ื•ึผื ืžึดื‘ึฐึผื›ึนืจึนึฅื•ืช ืฆึนืื ึนึ–ื• ื•ึผืžึตึฝื—ึถืœึฐื‘ึตื”ึถึ‘ืŸ ื”ึถึจื‘ึถืœ ื’ึทืึพื”ึ›ื•ึผื ื”ึถึจื‘ึถืœ ื’ึทืึพื”ึตื‘ึดึฅื™ื ื”ึตื‘ึดึฅื™ื

Hand-coding

As a check, here is some code that looks for basically the same phenomenon: a phrase within the gap of another phrase. It does not use search, and it gets a bit more focused results, in half the time compared to the search with the template.

Hint If you are comfortable with programming, and what you look for is fairly generic, you may be better off without search, provided you can translate your insight in the data into an effective procedure within Text-Fabric. But wait till we are completely done with this example!

In [30]:
TF.indent(reset=True)
TF.info("Getting gapped phrases")
results = []
for v in F.otype.s("verse"):
    for c in L.d(v, otype="clause"):
        ps = L.d(c, otype="phrase")
        first = {}
        last = {}
        slots = {}
        # make index of phrase boundaries
        for p in ps:
            words = L.d(p, otype="word")
            first[p] = words[0]
            last[p] = words[-1]
            slots[p] = set(words)
        for p1 in ps:
            for p2 in ps:
                if p2 < p1:
                    continue
                if len(slots[p1] & slots[p2]) != 0:
                    continue
                if first[p1] < first[p2] and last[p2] < last[p1]:
                    results.append(
                        (v, c, p1, p2, first[p1], first[p2], last[p2], last[p1])
                    )
TF.info("{} results".format(len(results)))
  0.00s Getting gapped phrases
  0.84s 368 results

Pretty printing

We can use the pretty printing of A.table() and A.show() here as well, even though we have not used search!

Not that you can show the node numbers. In this case it helps to see where the gaps are.

In [31]:
A.table(results, withNodes=True, end=5)
A.show(results, start=1, end=1)
npverseclausephrasephrasewordwordwordword
1Genesis 2:251414444427773ื•ึทื™ึดึผึฝื”ึฐื™ึคื•ึผ 6522171159ืฉึฐืื ึตื™ื”ึถืึ™ 6522181160ืขึฒืจื•ึผืžึดึผึ”ื™ื 652217ื”ึธึฝืึธื“ึธึ–ื ื•ึฐ1164ืึดืฉึฐืืชึนึผึ‘ื• 6522171159ืฉึฐืื ึตื™ื”ึถืึ™ 652217ื”ึธึฝืึธื“ึธึ–ื ื•ึฐ1164ืึดืฉึฐืืชึนึผึ‘ื• 6522181160ืขึฒืจื•ึผืžึดึผึ”ื™ื 1159ืฉึฐืื ึตื™ื”ึถืึ™ 1160ืขึฒืจื•ึผืžึดึผึ”ื™ื 1160ืขึฒืจื•ึผืžึดึผึ”ื™ื 1164ืึดืฉึฐืืชึนึผึ‘ื•
2Genesis 4:41414472427895ื•ึฐ6525741720ื”ึถึจื‘ึถืœ 6525751721ื”ึตื‘ึดึฅื™ื 652574ื’ึทืึพ1723ื”ึ›ื•ึผื ืžึดื‘ึฐึผื›ึนืจึนึฅื•ืช ืฆึนืื ึนึ–ื• ื•ึผืžึตึฝื—ึถืœึฐื‘ึตื”ึถึ‘ืŸ 6525741720ื”ึถึจื‘ึถืœ 652574ื’ึทืึพ1723ื”ึ›ื•ึผื 6525751721ื”ึตื‘ึดึฅื™ื 1720ื”ึถึจื‘ึถืœ 1721ื”ึตื‘ึดึฅื™ื 1721ื”ึตื‘ึดึฅื™ื 1723ื”ึ›ื•ึผื
3Genesis 10:2114146444283926541724819ื’ึทึผืึพื”ึ‘ื•ึผื 6541734821ืึฒื‘ึดื™ึ™ ื›ึธึผืœึพื‘ึฐึผื ึตื™ึพ4824ืขึตึ”ื‘ึถืจ 654172ืึฒื—ึดึ–ื™ ื™ึถึฅืคึถืช ื”ึท4828ื’ึธึผื“ึนึฝื•ืœืƒ 6541724819ื’ึทึผืึพื”ึ‘ื•ึผื 654172ืึฒื—ึดึ–ื™ ื™ึถึฅืคึถืช ื”ึท4828ื’ึธึผื“ึนึฝื•ืœืƒ 6541734821ืึฒื‘ึดื™ึ™ ื›ึธึผืœึพื‘ึฐึผื ึตื™ึพ4824ืขึตึ”ื‘ึถืจ 4819ื’ึทึผืึพ4821ืึฒื‘ึดื™ึ™ 4824ืขึตึ”ื‘ึถืจ 4828ื’ึธึผื“ึนึฝื•ืœืƒ
4Genesis 12:171414704428575ื•ึทื™ึฐื ึทื’ึทึผึจืข ื™ึฐื”ื•ึธึงื”ื€ 6547485803ืึถืชึพืคึทึผืจึฐืขึนึ›ื” 6547495805ื ึฐื’ึธืขึดึฅื™ื 5806ื’ึฐึผื“ึนืœึดึ–ื™ื 654748ื•ึฐืึถืชึพ5809ื‘ึตึผื™ืชึนึ‘ื• ืขึทืœึพื“ึฐึผื‘ึทึฅืจ ืฉึธื‚ืจึทึ–ื™ ืึตึฅืฉึถืืช ืึทื‘ึฐืจึธึฝืืƒ 6547485803ืึถืชึพืคึทึผืจึฐืขึนึ›ื” 654748ื•ึฐืึถืชึพ5809ื‘ึตึผื™ืชึนึ‘ื• 6547495805ื ึฐื’ึธืขึดึฅื™ื 5806ื’ึฐึผื“ึนืœึดึ–ื™ื 5803ืึถืชึพ5805ื ึฐื’ึธืขึดึฅื™ื 5806ื’ึฐึผื“ึนืœึดึ–ื™ื 5809ื‘ึตึผื™ืชึนึ‘ื•
5Genesis 13:11414708428591ื•ึทื™ึทึผืขึทืœึฉ 6547955868ืึทื‘ึฐืจึธึจื 6547965869ืžึด5870ืžึดึผืฆึฐืจึทึœื™ึดื 654795ื”ึ ื•ึผื ื•ึฐืึดืฉึฐืืชึนึผึงื• ื•ึฐ5875ื›ึธืœึพ428591ื”ึทื ึถึผึฝื’ึฐื‘ึธึผื”ืƒ 6547955868ืึทื‘ึฐืจึธึจื 654795ื”ึ ื•ึผื ื•ึฐืึดืฉึฐืืชึนึผึงื• ื•ึฐ5875ื›ึธืœึพ6547965869ืžึด5870ืžึดึผืฆึฐืจึทึœื™ึดื 5868ืึทื‘ึฐืจึธึจื 5869ืžึด5870ืžึดึผืฆึฐืจึทึœื™ึดื 5875ื›ึธืœึพ

result 1

NB Gaps are a tricky phenomenon. In gaps we will deal with them cruelly.

Performance tuning

Here is an example by Yanniek van der Schans (2018-09-21).

In [32]:
query = """
c:clause
  PreGap:phrase_atom
  LastPhrase:phrase_atom
  :=

Gap:clause_atom
  :: word

PreGap < Gap
Gap < LastPhrase
c || Gap
"""

Here are the current settings of the performance parameters:

In [33]:
S.tweakPerformance()
Performance parameters, current values:
	tryLimitFrom         =      40
	tryLimitTo           =      40
	yarnRatio            =    1.25
In [34]:
S.study(query)
S.showPlan(details=True)
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.07s Constraining search space with 8 relations ...
  0.25s 	2 edges thinned
  0.25s Setting up retrieval plan with strategy small_choice_multi ...
  0.26s Ready to deliver results from 454184 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 5 objects and 8 relations
Results are instantiations of the following objects:
node  0-clause                                         88131   choices
node  1-phrase_atom                                   267532   choices
node  2-phrase_atom                                    88131   choices
node  3-clause_atom                                     5195   choices
node  4-word                                            5195   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  3-clause_atom     5195   choices
edge        3-clause_atom      [[     4-word               1.0 choices
edge        3-clause_atom      ::     4-word               0   choices
edge        3-clause_atom      <      2-phrase_atom    44065.5 choices
edge        2-phrase_atom      :=     0-clause             1.0 choices (thinned)
edge        2-phrase_atom      ]]     0-clause             0   choices
edge        0-clause           ||     3-clause_atom        0   choices
edge        0-clause           [[     1-phrase_atom        2.9 choices
edge        1-phrase_atom      <      3-clause_atom        0   choices
  0.27s The results are connected to the original search template as follows:
 0     
 1 R0  c:clause
 2 R1    PreGap:phrase_atom
 3 R2    LastPhrase:phrase_atom
 4       :=
 5     
 6 R3  Gap:clause_atom
 7 R4    :: word
 8     
 9     PreGap < Gap
10     Gap < LastPhrase
11     c || Gap
12     
In [35]:
S.count(progress=1, limit=3)
  0.00s Counting results per 1 up to 3 ...
   |     0.00s 1
   |     0.00s 2
   |     1.78s 3
  3.65s Done: 4 results

Can we do better?

The performance parameter yarnRatio can be used to increase the amount of preprocessing, and we can increase to number of random samples that we make by tryLimitFrom and tryLimitTo.

We start with increasing the amount of up-front edge-spinning.

In [36]:
S.tweakPerformance(yarnRatio=0.2, tryLimitFrom=10000, tryLimitTo=10000)
Performance parameters, current values:
	tryLimitFrom         =   10000
	tryLimitTo           =   10000
	yarnRatio            =     0.2
In [37]:
S.study(query)
S.showPlan(details=True)
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.07s Constraining search space with 8 relations ...
  0.38s 	2 edges thinned
  0.38s Setting up retrieval plan with strategy small_choice_multi ...
  0.48s Ready to deliver results from 454184 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 5 objects and 8 relations
Results are instantiations of the following objects:
node  0-clause                                         88131   choices
node  1-phrase_atom                                   267532   choices
node  2-phrase_atom                                    88131   choices
node  3-clause_atom                                     5195   choices
node  4-word                                            5195   choices
Performance parameters:
	yarnRatio            =     0.2
	tryLimitFrom         =   10000
	tryLimitTo           =   10000
Instantiations are computed along the following relations:
node                                  3-clause_atom     5195   choices
edge        3-clause_atom      [[     4-word               1.0 choices
edge        3-clause_atom      ::     4-word               0   choices
edge        3-clause_atom      <      2-phrase_atom    44065.5 choices
edge        2-phrase_atom      :=     0-clause             1.0 choices (thinned)
edge        2-phrase_atom      ]]     0-clause             0   choices
edge        0-clause           ||     3-clause_atom        0   choices
edge        0-clause           [[     1-phrase_atom        3.0 choices
edge        1-phrase_atom      <      3-clause_atom        0   choices
  0.49s The results are connected to the original search template as follows:
 0     
 1 R0  c:clause
 2 R1    PreGap:phrase_atom
 3 R2    LastPhrase:phrase_atom
 4       :=
 5     
 6 R3  Gap:clause_atom
 7 R4    :: word
 8     
 9     PreGap < Gap
10     Gap < LastPhrase
11     c || Gap
12     

It seems to be the same plan.

In [38]:
S.count(progress=1, limit=3)
  0.00s Counting results per 1 up to 3 ...
   |     0.01s 1
   |     0.01s 2
   |     1.78s 3
  3.67s Done: 4 results

No improvement.

What if we decrease the amount of edge spinning?

In [39]:
S.tweakPerformance(yarnRatio=5, tryLimitFrom=10000, tryLimitTo=10000)
Performance parameters, current values:
	tryLimitFrom         =   10000
	tryLimitTo           =   10000
	yarnRatio            =       5
In [40]:
S.study(query)
S.showPlan(details=True)
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.07s Constraining search space with 8 relations ...
  0.27s 	2 edges thinned
  0.27s Setting up retrieval plan with strategy small_choice_multi ...
  0.37s Ready to deliver results from 454184 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 5 objects and 8 relations
Results are instantiations of the following objects:
node  0-clause                                         88131   choices
node  1-phrase_atom                                   267532   choices
node  2-phrase_atom                                    88131   choices
node  3-clause_atom                                     5195   choices
node  4-word                                            5195   choices
Performance parameters:
	yarnRatio            =       5
	tryLimitFrom         =   10000
	tryLimitTo           =   10000
Instantiations are computed along the following relations:
node                                  3-clause_atom     5195   choices
edge        3-clause_atom      [[     4-word               1.0 choices
edge        3-clause_atom      ::     4-word               0   choices
edge        3-clause_atom      <      2-phrase_atom    44065.5 choices
edge        2-phrase_atom      :=     0-clause             1.0 choices (thinned)
edge        2-phrase_atom      ]]     0-clause             0   choices
edge        0-clause           ||     3-clause_atom        0   choices
edge        0-clause           [[     1-phrase_atom        3.1 choices
edge        1-phrase_atom      <      3-clause_atom        0   choices
  0.39s The results are connected to the original search template as follows:
 0     
 1 R0  c:clause
 2 R1    PreGap:phrase_atom
 3 R2    LastPhrase:phrase_atom
 4       :=
 5     
 6 R3  Gap:clause_atom
 7 R4    :: word
 8     
 9     PreGap < Gap
10     Gap < LastPhrase
11     c || Gap
12     
In [41]:
S.count(progress=1, limit=3)
  0.00s Counting results per 1 up to 3 ...
   |     0.00s 1
   |     0.00s 2
   |     1.78s 3
  3.67s Done: 4 results

Again, no improvement.

We'll look for queries where the parameters matter more in the future.

Here is how to reset the performance parameters:

In [42]:
S.tweakPerformance(yarnRatio=None, tryLimitFrom=None, tryLimitTo=None)
Performance parameters, current values:
	tryLimitFrom         =      40
	tryLimitTo           =      40
	yarnRatio            =    1.25

Next

You have seen cases where the implementation is to blame.

Now I want to point to gaps in your understanding: gaps


basic advanced sets relations quantifiers rough gaps

All steps

  • start your first step in mastering the bible computationally
  • display become an expert in creating pretty displays of your text structures
  • search turbo charge your hand-coding with search templates

advanced sets relations quantifiers fromMQL rough

You have seen cases where the implementation is to blame.

Now I want to point to gaps in your understanding:

gaps


  • exportExcel make tailor-made spreadsheets out of your results
  • share draw in other people's data and let them use yours
  • export export your dataset as an Emdros database
  • annotate annotate plain text by means of other tools and import the annotations as TF features
  • map map somebody else's annotations to a new version of the corpus
  • volumes work with selected books only
  • trees work with the BHSA data as syntax trees

CC-BY Dirk Roorda