The 'center' of the Torah (BHSA)¶

book Leviticus

book=Leviticus

Leviticus 8:8

verse

book=Leviticus

sentence 16

clause Way0 NA

phrase CP Conj

phrase VP Pred

phrase PP Cmpl

phrase PP Objc

sentence 17

clause Way0 NA

phrase CP Conj

phrase VP Pred

phrase PP Cmpl

phrase PP Objc

This verse in the King James Version:

And he put the mitre upon his head; also upon the mitre, even upon his forefront, did he put the golden plate, the holy crown; as the Lord commanded Moses.

3.4 - Center sentence ¶

The following method is based upon the center sentence. In this method the sentence definition used is the one according to the ETCBC database, which at places differs from other databases.

In [13]:

# number of sentences in Torah
SentenceQuery = '''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
  sentence 
'''

SentenceResults = BHS.search(SentenceQuery)

  0.10s 15088 results

Determining the interval of sentence node-numbers.

In [14]:

F.otype.sInterval('sentence')

Out[14]:

(1172308, 1236024)

In [15]:

# start + delta: 1172308 + int(15088/2) = 1172308 + 7544 = 1179852
T.sectionFromNode(1179852)

Out[15]:

('Exodus', 36, 11)

In [16]:

T.text(1179852)

Out[16]:

'וַיַּ֜עַשׂ לֻֽלְאֹ֣ת תְּכֵ֗לֶת עַ֣ל שְׂפַ֤ת הַיְרִיעָה֙ הָֽאֶחָ֔ת מִקָּצָ֖ה בַּמַּחְבָּ֑רֶת '

In [17]:

# 15088  results / 2 = 7544 
BHS.show(SentenceResults,start=7544,end=7544,multiFeatures=False)

result 7544

Exodus

book Exodus

book=Exodus

Exodus 36:10

verse

book=Exodus

sentence 19

clause Way0 NA

phrase CP Conj

phrase VP Pred

phrase PP Objc

clause Ellp Adju

phrase NP Objc

אַחַ֖ת

phrase PP Cmpl

אֶל־

אֶחָ֑ת

sentence 20

clause WxQ0 NA

phrase CP Conj

phrase NP Objc

phrase VP Pred

clause Ellp Adju

phrase NP Objc

אַחַ֖ת

phrase PP Cmpl

אֶל־

אֶחָֽת׃

This sentence in the King James Version:

and the other five curtains he coupled one unto another.

Note that in the KJV this is a subsentence.

3.5 - Center clause ¶

The following method is based upon the center clause. In this method the clause definition used is the one according to the ETCBC database, which may slightly differ in other implementations.

In [18]:

# number of clauses in Torah
ClauseQuery = '''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
  clause 
'''

ClauseResults = BHS.search(ClauseQuery)

  0.11s 21181 results

Determining the interval of clause node-numbers.

In [19]:

F.otype.sInterval('clause')

Out[19]:

(427559, 515689)

In [20]:

# start + delta: 427559 + int(21181/2) = 427559 + 10590 = 438149
T.sectionFromNode(438149)

Out[20]:

('Leviticus', 4, 35)

In [21]:

T.text(438149)

Out[21]:

'וְאֶת־כָּל־חֶלְבָּ֣ה יָסִ֗יר '

In [22]:

# 21181 results / 2 = 10590,5 -> midpoint = 10590
BHS.show(ClauseResults,start=10590,end=10590, multiFeatures=False)

result 10590

book Leviticus

book=Leviticus

Leviticus 4:34

verse

book=Leviticus

sentence 75

clause WQtX NA

phrase CP Conj

phrase VP Pred

phrase NP Subj

phrase PP Cmpl

phrase PP Cmpl

sentence 76

clause WQt0 NA

phrase CP Conj

phrase VP Pred

phrase PP Cmpl

sentence 77

clause WxY0 NA

phrase CP Conj

phrase PP Objc

phrase VP Pred

phrase PP Cmpl

In the King James Version:

and shall pour out all the blood thereof at the bottom of the altar

Note that while in the ETCBC BHSA sentences often contain multiple clauses, this clause constitutes a full sentence.

3.6 - Center phrase ¶

The following method is based upon the center phrase. In this method the clause definition used is the one according to the ETCBC database, following a more-or-less general understanding of what does constitute a phrase.

In [23]:

# number of phrases in Torah
PhraseQuery = '''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
  phrase 
'''

PhraseResults = BHS.search(PhraseQuery)

  0.30s 64195 results

Determining the interval of phrase node-numbers.

In [24]:

F.otype.sInterval('phrase')

Out[24]:

(651573, 904775)

In [25]:

# start + delta: 651573 + int(64195/2) = 651573 + 32097 = 683670
T.sectionFromNode(683670)

Out[25]:

('Leviticus', 4, 32)

In [26]:

T.text(683670)

Out[26]:

'נְקֵבָ֥ה תְמִימָ֖ה '

In [27]:

# 64195 results /2 = 32097,5 -> midpoint = 32098
BHS.show(PhraseResults,start=32098,end=32098,multiFeatures=False)

result 32098

book Leviticus

book=Leviticus

Leviticus 4:32

verse

book=Leviticus

sentence 71

clause WxY0 NA

phrase CP Conj

phrase CP Conj

phrase NP Objc

phrase VP Pred

phrase NP Objc

sentence 72

clause xYq0 NA

phrase NP PrAd

נְקֵבָ֥ה

תְמִימָ֖ה

phrase VP PreO

יְבִיאֶֽנָּה׃

In the King James Version:

a female without blemish

3.7 - Center word - based upon center word node ¶

This method assumes the mathematical center of the list of word nodes provides us the center of the Torah.

In [28]:

# number of words in Torah (WARNING: as per ETCBC definition!) 
WordQuery = '''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
  word 
'''

WordResults = BHS.search(WordQuery)

  0.48s 112927 results

The following code validates that the word nodes are numbered starting from '1'.

In [29]:

F.otype.sInterval('word')

Out[29]:

(1, 426590)

Find the midle word node

In [30]:

# start + delta: 1 + int(112927/2) = 1 + 56463 = 56464
T.sectionFromNode(56464)

Out[30]:

('Leviticus', 8, 21)

In [31]:

T.text(56464)

Out[31]:

'בַּ'

In [32]:

# 112927 results /2 = 56463,5 -> midpoint = 56464
BHS.show(WordResults,start=56464,end=56464,multiFeatures=False)

result 56464

book Leviticus

book=Leviticus

Leviticus 8:21

verse

book=Leviticus

sentence 48

clause WxQ0 NA

phrase CP Conj

phrase PP Objc

phrase VP Pred

phrase PP Cmpl

sentence 49

clause WayX NA

phrase CP Conj

phrase VP Pred

phrase PrNP Subj

phrase PP Objc

phrase AdvP Cmpl

sentence 50

clause NmCl NA

phrase NP PreC

phrase PPrP Subj

phrase PP Cmpl

sentence 51

clause NmCl NA

phrase NP PreC

phrase PPrP Subj

phrase PP Cmpl

clause xQtX Adju

phrase CP Conj

phrase VP Pred

phrase PrNP Subj

phrase PP Objc

If this would be 'translated' into a meaningfull 'center' clause, it could be:

'wash in the water'.

3.8 - Center word based on spaces and maqaf ¶

Here the number of words in the Torah is determined by items separeted by spaces OR maqaf (diacritical mark indicating a strong connection between words).

First check what can be placed after an individual word

In [33]:

# note: this is for the full TeNaCH!
F.trailer.freqList()

Out[33]:

((' ', 236930),
 ('', 121801),
 ('&', 42275),
 ('00 ', 20146),
 ('05 ', 2266),
 ('00_S ', 1892),
 ('00_P ', 1165),
 ('_S ', 76),
 (' 05 ', 17),
 ('_P ', 13),
 ('00_N ', 7),
 ('00_N_P ', 1),
 ('00_N_S ', 1))

In this list, the ' ' value (i.e. a space) is used when the word is joined to the next word, while '&' indicates a maqqef (־), a diacritical mark indicating a strong connection between words. We consider both as word separators. Examining the frequency list above there are two methods to determine the word boundaries. The first is utilizing the fact that all feature values indicating a wordboundary are of lenght 1 or higher, allowing the string (.+) to exclude all cases where the lenght is less than 1 character. The other option is to explicitly look for spaces and maqqefs, by using [\s&] as regex expression. As expected, both product the same outcome. The following query determines the number of words in the torah based on this methond of counting.

In [34]:

# define query template
# The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex.

WordQuery2 = r'''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
  word trailer~[\s&]
'''

WordResults2 = BHS.search(WordQuery2)

  0.68s 79886 results

Find the midpoint: 79886/2 = 39948

In [36]:

T.text(39949)

Out[36]:

'תַעֲשׂ֖וּן '

In [37]:

BHS.show(WordResults2,start=39948,end=39948,multiFeatures=False)

result 39948

book Leviticus

book=Leviticus

Leviticus 8:15

verse

book=Leviticus

sentence 33

clause Way0 NA

phrase CP Conj

trailer=

phrase VP Pred

יִּשְׁחָ֗ט

trailer=

sentence 34

clause WayX NA

phrase CP Conj

trailer=

phrase VP Pred

יִּקַּ֨ח

trailer=

phrase PrNP Subj

מֹשֶׁ֤ה

trailer=

phrase PP Objc

trailer=&

הַ

trailer=

דָּם֙

trailer=

sentence 35

clause Way0 NA

phrase CP Conj

וַ֠

trailer=

phrase VP Pred

יִּתֵּן

trailer=

phrase PP Cmpl

trailer=&

trailer=

trailer=

trailer=

phrase AdvP Modi

סָבִיב֙

trailer=

phrase PP Adju

בְּ

trailer=

אֶצְבָּעֹ֔ו

trailer=

sentence 36

clause Way0 NA

phrase CP Conj

trailer=

phrase VP Pred

יְחַטֵּ֖א

trailer=

phrase PP Objc

trailer=&

הַ

trailer=

מִּזְבֵּ֑חַ

trailer=

sentence 37

clause WxQ0 NA

phrase CP Conj

וְ

trailer=

phrase PP Objc

trailer=&

הַ

trailer=

דָּ֗ם

trailer=

phrase VP Pred

יָצַק֙

trailer=

phrase PP Cmpl

trailer=&

trailer=

trailer=

trailer=

sentence 38

clause Way0 NA

phrase CP Conj

וַֽ

trailer=

phrase VP PreO

יְקַדְּשֵׁ֖הוּ

trailer=

clause InfC Adju

phrase VP Pred

לְ

trailer=

כַפֵּ֥ר

trailer=

phrase PP Cmpl

עָלָֽיו׃

trailer=00

Following this method, the center would be:

and be holy

3.9 - Center word based upon using feature 'wordboundary'¶

In this section we will use some of the additonal features made available by the BHSaddons dataset.

In [61]:

# load the app and data with additial features (removed the hoist here)
BHSAadd = use ("etcbc/BHSA", mod="tonyjurg/BHSaddons/tf/:hot")

Locating corpus resources ...

app: ~/text-fabric-data/github/etcbc/BHSA/app

data: ~/text-fabric-data/github/etcbc/BHSA/tf/2021

rate limit is 5000 requests per hour, with 4943 left for this hour
	connecting to online GitHub repo tonyjurg/BHSaddons ... connected

data: ~/text-fabric-data/github/tonyjurg/BHSaddons/tf/2021

data: ~/text-fabric-data/github/etcbc/phono/tf/2021

The requested data is not available offline
	~/text-fabric-data/github/etcbc/parallels/tf/2021 not found

Status: latest release online v2.1 versus None locally

downloading app, main data and requested additions ...

File is not a zip file
	could not save corpus data to ~/text-fabric-data/github

rate limit is 5000 requests per hour, with 4940 left for this hour
	connecting to online GitHub repo etcbc/parallels ... connected
	downloading from https:/github.com/ETCBC/parallels/releases/download/v2.1/tf-2021.zip ... 
	saving data

data: ~/text-fabric-data/github/etcbc/parallels/tf/2021

   |     0.11s T crossref             from ~/text-fabric-data/github/etcbc/parallels/tf/2021

TF: TF API 12.6.2, etcbc/BHSA/app v3, Search Reference
Data: etcbc - BHSA 2021, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	39	10938.21	100
chapter	929	459.19	100
lex	9230	46.22	100
verse	23213	18.38	100
half_verse	45179	9.44	100
sentence	63717	6.70	100
sentence_atom	64514	6.61	100
clause	88131	4.84	100
clause_atom	90704	4.70	100
phrase	253203	1.68	100
phrase_atom	267532	1.59	100
subphrase	113850	1.42	38
word	426590	1.00	100

Sets: no custom sets
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

book@ll

str

✅ book name in amharic (ኣማርኛ)

chapter

int

✅ chapter number (1; 2; 3; ...)

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

freq_lex

int

✅ frequency of lexemes

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

gloss

str

🆗 english translation of lexeme (beginning create god(s))

gn

str

✅ grammatical gender (m; f; NA; unknown.)

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

ls

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

nu

str

✅ grammatical number (sg; du; pl; NA; unknown.)

number

int

✅ sequence number of an object within its context

otype

str

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

ps

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

qere

str

✅ word pointed-transliterated masoretic reading correction

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

rank_lex

int

✅ ranking of lexemes based on freqnuecy

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

sp

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

st

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

tab

int

✅ clause atom: its level in the linguistic embedding

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

verse

int

✅ verse number

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

vs

str

✅ verbal stem (qal; piel; hif; apel; pael)

vt

str

✅ verbal tense (perf; impv; wayq; infc)

mother

none

✅ linguistic dependency between textual objects

oslots

none

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

phono_trailer

str

🆗 interword material in phonological transcription

tonyjurg/BHSaddons/tf

aliyotnum

str

The sequence number of the aliyot within the parasha

maftir

str

Set to 1 if this verse is part of a maftir

parashahebr

str

The name of the parasha in Hebrew

parashanum

int

The sequence number of the parasha

parashatrans

str

Transliteration of the Hebrew parasha name

parashaverse

str

The sequence number of the verse within the parasha

wordboundary

str

indicates wordboudaries (spaces OR maqaf)

Settings:

specified

apiVersion: 3
appName: etcbc/BHSA
appPath: C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/app
commit: gd905e3fb6e80d0fa537600337614adc2af157309
css: ''
dataDisplay:
- exampleSectionHtml:
  <code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
- excludedFeatures:
  - g_uvf_utf8
  - g_vbs
  - kq_hybrid
  - languageISO
  - g_nme
  - lex0
  - is_root
  - g_vbs_utf8
  - g_uvf
  - dist
  - root
  - suffix_person
  - g_vbe
  - dist_unit
  - suffix_number
  - distributional_parent
  - kq_hybrid_utf8
  - crossrefSET
  - instruction
  - g_prs
  - lexeme_count
  - rank_occ
  - g_pfm_utf8
  - freq_occ
  - crossrefLCS
  - functional_parent
  - g_pfm
  - g_nme_utf8
  - g_vbe_utf8
  - kind
  - g_prs_utf8
  - suffix_gender
  - mother_object_type
- noneValues:
  - none
  - unknown
  - no value
  - NA
docs:
- docBase: {docRoot}/{repo}
- docExt: ''
- docPage: ''
- docRoot: https://{org}.github.io
- featurePage: 0_home
interfaceDefaults: {}
isCompatible: True
local: local
localDir: C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/_temp
provenanceSpec:
- corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
- doi: 10.5281/zenodo.1007624
- moduleSpecs:
  - :
    backend: no value
    corpus: Phonetic Transcriptions
    docUrl:
    https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
    doi: 10.5281/zenodo.1007636
    org: etcbc
    relative: /tf
    repo: phono
  - :
    backend: no value
    corpus: Parallel Passages
    docUrl:
    https://nbviewer.jupyter.org/github/etcbc/parallels/blob/master/programs/parallels.ipynb
    doi: 10.5281/zenodo.1007642
    org: etcbc
    relative: /tf
    repo: parallels
- org: etcbc
- relative: /tf
- repo: BHSA
- version: 2021
- webBase: https://shebanq.ancient-data.org/hebrew
- webHint: Show this on SHEBANQ
- webLang: la
- webLexId: True
- webUrl:
  {webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: v1.8
typeDisplay:
- clause:
  - label: {typ} {rela}
  - style: ''
- clause_atom:
  - hidden: True
  - label: {code}
  - level: 1
  - style: ''
- half_verse:
  - hidden: True
  - label: {label}
  - style: ''
  - verselike: True
- lex:
  - featuresBare: gloss
  - label: {voc_lex_utf8}
  - lexOcc: word
  - style: orig
  - template: {voc_lex_utf8}
- phrase:
  - label: {typ} {function}
  - style: ''
- phrase_atom:
  - hidden: True
  - label: {typ} {rela}
  - level: 1
  - style: ''
- sentence:
  - label: {number}
  - style: ''
- sentence_atom:
  - hidden: True
  - label: {number}
  - level: 1
  - style: ''
- subphrase:
  - hidden: True
  - label: {number}
  - style: ''
- word:
  - features: pdp vs vt
  - featuresBare: lex:gloss
writing: hbo

In [62]:

# find all 'end-of-word' word nodes within any parasha
wordboundaryQuery = '''
verse parashanum
   word wordboundary=1
'''
wordboundaryResult = BHSAadd.search(wordboundaryQuery)

  0.41s 79886 results

In [63]:

# find all word nodes within any parasha
wordboundaryQuery = '''
verse parashanum
   word 
'''
wordboundaryResult = BHSAadd.search(wordboundaryQuery)

  0.44s 112927 results

As can be seen from these queries, the result is (as expected) the same as for the previous section (3.8).

3.10 - Center word based upon spaces ¶

In the following method words are defined as items separeted by spaces.

In [38]:

# following regexp  selects for values of feature trailer that are 1 or more characters in length {alternative regex: (.+) }

wordQuery3 = r'''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium
  word trailer~\ $
'''

wordResults3 = BHS.search(wordQuery3)

  0.60s 68434 results

In [39]:

# Just to check: query for maqafs

maqafQuery = '''
book book=Genesis|Exodus|Leviticus|Numeri|Deuteronomium

  word trailer=&
'''

maqafResults = BHS.search(maqafQuery)

  0.34s 11452 results

Check if the numbers do add up: 68434 (spaces) + 11452 (maqafs) =? 79886 (total) YES!

Find the midpoint in wordResults3: 68434/2 = 34217 and print its tuple:

In [40]:

wordResults3[34216]

Out[40]:

(426593, 56509)

Print associated text (we need second element in tuple):

In [41]:

T.text(wordResults3[34216][1])

Out[41]:

'רֹ֥אשׁ '

Displaying the syntax tree of the relevant verse:

In [43]:

BHS.show(wordResults3,start=34217,end=34217,multiFeatures=False)

result 34217

book Leviticus

book=Leviticus

Leviticus 8:22

verse

book=Leviticus

sentence 52

clause Way0 NA

phrase CP Conj

trailer=

phrase VP Pred

יַּקְרֵב֙

trailer=

phrase PP Objc

trailer=&

trailer=

trailer=

trailer=

trailer=

trailer=

trailer=

trailer=

sentence 53

clause WayX NA

phrase CP Conj

וַֽ

trailer=

phrase VP Pred

יִּסְמְכ֞וּ

trailer=

phrase PrNP Subj

אַהֲרֹ֧ן

trailer=

וּ

trailer=

בָנָ֛יו

trailer=

phrase PP Objc