The Old Testament contains the how the Tetragrammaton יהוה written with different vowels, for example with the vowals of of אֲדֹנַי (Adonai, ETCBC transliteration: >:ADON@J).
%load_ext autoreload
%autoreload 2
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment.
from tf.fabric import Fabric
from tf.app import use
# load the BHS app and data
BHS = use ("etcbc/BHSA",hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots / node | % coverage |
---|---|---|---|
book | 39 | 10938.21 | 100 |
chapter | 929 | 459.19 | 100 |
lex | 9230 | 46.22 | 100 |
verse | 23213 | 18.38 | 100 |
half_verse | 45179 | 9.44 | 100 |
sentence | 63717 | 6.70 | 100 |
sentence_atom | 64514 | 6.61 | 100 |
clause | 88131 | 4.84 | 100 |
clause_atom | 90704 | 4.70 | 100 |
phrase | 253203 | 1.68 | 100 |
phrase_atom | 267532 | 1.59 | 100 |
subphrase | 113850 | 1.42 | 38 |
word | 426590 | 1.00 | 100 |
3
etcbc/BHSA
C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/app
gd905e3fb6e80d0fa537600337614adc2af157309
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
local
C:/Users/tonyj/text-fabric-data/github/etcbc/BHSA/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
etcbc
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/etcbc/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
etcbc
/tf
parallels
etcbc
/tf
BHSA
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
v1.8
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
Note: Thefeature documentation can be found at ETCBC GitHub
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
BHS.dh(BHS.getCss())
First get all occurances of the Tetragrammaton יהוה (so without vowel pointing and other diacritical marks). See also notes on feature g_word.
JHWHQuery = '''
book
chapter
verse
word g_cons=JHWH
'''
JHWHResults = BHS.search(JHWHQuery)
0.51s 6828 results
Now post process the results to create a nice table.
# Libraries for table formatting and regular expressions
import re
import pandas as pd
from IPython.display import display
# Initialize dictionary for storing results
resultDict = {}
# Process each item in the JHWHResults
for item in JHWHResults:
node = item[3]
# Get the pointed and unpointed representation of a word occurrence
pointedWord = F.g_word.v(node)
hebrewWord = F.g_word_utf8.v(node)
# Remove cantillations in the BSHA (presented by digits)
vocalizedWord = re.sub(r'\d', '', pointedWord)
if vocalizedWord in resultDict:
# If exists, increment the frequency count
resultDict[vocalizedWord][0] += 1
else:
# Initialize count and store the first occurrence
firstOccurrence = T.sectionFromNode(node)
resultDict[vocalizedWord] = [1, firstOccurrence, hebrewWord]
# Convert the dictionary into a DataFrame and sort by frequency
tableData = pd.DataFrame(
[[key, value[0], value[1], value[2]] for key, value in resultDict.items()],
columns=["Pointed Word", "Frequency", "First Occurrence", "Hebrew Word"]
)
tableData = tableData.sort_values(by="Frequency", ascending=False)
# Display the table
display(tableData)
Pointed Word | Frequency | First Occurrence | Hebrew Word | |
---|---|---|---|---|
0 | J:HW@H | 5682 | (Genesis, 2, 4) | יְהוָ֥ה |
2 | JHW@H | 788 | (Genesis, 4, 3) | יהוָֽה |
5 | J:HWIH | 270 | (Deuteronomy, 3, 24) | יְהוִ֗ה |
1 | J:HOW@H | 45 | (Genesis, 3, 14) | יְהֹוָ֨ה |
7 | J:HOWIH | 32 | (1_Kings, 2, 26) | יְהֹוִה֙ |
4 | JHOW@H | 6 | (Genesis, 18, 17) | יהֹוָ֖ה |
3 | J:EHWIH | 2 | (Genesis, 15, 2) | יֱהוִה֙ |
6 | J:EHOWIH | 1 | (Judges, 16, 28) | יֱהֹוִ֡ה |
8 | JHWIH | 1 | (Psalms, 68, 21) | יהוִ֥ה |
9 | J:AHW@H | 1 | (Psalms, 144, 15) | יֲהוָ֥ה |
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.layouts import column
# Enable Bokeh output in the notebook
output_notebook()
# Ensure tableData has the exact column names you need
tableData.columns = ["Pointed Word", "Frequency", "First Occurrence", "Hebrew Word"]
# Create a ColumnDataSource for the Bokeh plot
source = ColumnDataSource(tableData)
# Create a Bokeh figure for the bar chart
p = figure(
x_range=tableData['Hebrew Word'].tolist(), # convert x_range to list explicitly
height=800,
width=1000,
title="Frequency of Tetragrammaton vocalisation in biblical text",
toolbar_location="right"
)
# Create bar chart
p.vbar(x='Hebrew Word', top='Frequency', width=0.5, source=source)
# Add labels and customizations
p.xaxis.axis_label = "Hebrew Word"
p.yaxis.axis_label = "Frequency"
p.xaxis.major_label_orientation = "horizontal"
p.xaxis.major_label_text_font_size = "26pt" # Increase font size of x-axis labels
# Add hover tool
hover = HoverTool()
hover.tooltips = [
("Pointed Word", "@{Pointed Word}"),
("Frequency", "@Frequency"),
("First Occurrence", "@{First Occurrence}"),
("Hebrew Word", "@{Hebrew Word}")
]
p.add_tools(hover)
# Show the interactive plot
show(p)
Add another condition to the query. This is to select for the wowels for adOnAi, translatiteratd as O and @, which should be around the Wav. The regexp inludes '.*' to allow for in-between cantilation marks.
adonaiQuery = '''
word g_cons=JHWH g_word~O.*W.*@
'''
adonaiResults = BHS.search(adonaiQuery)
0.29s 51 results
BHS.table(adonaiResults, condensed=False, extraFeatures={'voc_lex'})
adonaiQuery2 = '''
word lex=JHWH/ g_word~O.*W.*@
'''
adonaiResults2 = BHS.search(adonaiQuery2)
0.32s 51 results
Print the features associated with word nodes that containing data
featureList=Fall()
for item in adonaiResults2:
Node=item[0]
for feature in featureList:
featureValue=Fs(feature).v(Node)
if type(featureValue)!=type(None): print (feature,'=',featureValue)
break
freq_lex = 6828 g_cons = JHWH g_cons_utf8 = יהוה g_lex = J:HOW@H g_lex_utf8 = יְהֹוָה g_word = J:HOW@63H g_word_utf8 = יְהֹוָ֨ה gloss = YHWH gn = m language = Hebrew lex = JHWH/ lex_utf8 = יהוה ls = none nametype = pers nme = nu = sg number = 1427 otype = word pdp = nmpr pfm = n/a phono = [yᵊhôˌāh] phono_trailer = prs = n/a prs_gn = NA prs_nu = NA prs_ps = NA ps = NA rank_lex = 6 sp = nmpr st = a trailer = trailer_utf8 = uvf = absent vbe = n/a vbs = n/a voc_lex = J:HW@H voc_lex_utf8 = יְהוָה vs = NA vt = NA
import re
import pandas as pd
from IPython.display import display
# Initialize dictionary for storing results
resultDict = {}
# Process each item in the JHWHResults
for item in JHWHResults:
node = item[3]
# Get the pointed and unpointed representation of a word occurrence
pointedWord = F.g_word.v(node)
hebrewWord = F.g_word_utf8.v(node)
# Remove cantillations in the BHSA (presented by digits)
vocalizedWord = re.sub(r'\d', '', pointedWord)
if vocalizedWord in resultDict:
# If it exists, add the count to the existing value
resultDict[vocalizedWord][0] += 1 # Increase frequency count
else:
# If it doesn't exist, initialize the count and store firstOccurrence
firstOccurrence = T.sectionFromNode(node)
resultDict[vocalizedWord] = [1, firstOccurrence, hebrewWord]
# Convert the dictionary into a DataFrame and sort by frequency
tableData = pd.DataFrame(
[[key, value[0], value[1], value[2]] for key, value in resultDict.items()],
columns=["Pointing", "Frequency", "First Occurrence", "Hebrew Word"]
)
tableData = tableData.sort_values(by="Frequency", ascending=False)
# Display the table
display(tableData)
Pointing | Frequency | First Occurrence | Hebrew Word | |
---|---|---|---|---|
0 | J:HW@H | 5682 | (Genesis, 2, 4) | יְהוָ֥ה |
2 | JHW@H | 788 | (Genesis, 4, 3) | יהוָֽה |
5 | J:HWIH | 270 | (Deuteronomy, 3, 24) | יְהוִ֗ה |
1 | J:HOW@H | 45 | (Genesis, 3, 14) | יְהֹוָ֨ה |
7 | J:HOWIH | 32 | (1_Kings, 2, 26) | יְהֹוִה֙ |
4 | JHOW@H | 6 | (Genesis, 18, 17) | יהֹוָ֖ה |
3 | J:EHWIH | 2 | (Genesis, 15, 2) | יֱהוִה֙ |
6 | J:EHOWIH | 1 | (Judges, 16, 28) | יֱהֹוִ֡ה |
8 | JHWIH | 1 | (Psalms, 68, 21) | יהוִ֥ה |
9 | J:AHW@H | 1 | (Psalms, 144, 15) | יֲהוָ֥ה |
qereQuery = '''
word qere_utf8 g_cons=JHWH
'''
qereResults = BHS.search(qereQuery)
0.28s 0 results
for item in qereResults:
node = item[0]
pointedWord = F.g_word.v(node)
qereWord =F.qere.v(node)
uncantQereWord=re.sub(r'\d', '', qereWord)
print (pointedWord,qereWord,uncantQereWord)
break
The scripts in this notebook require (beside text-fabric
) the following Python libraries to be installed in the environment:
bokeh
IPython
pandas
re
You can install any missing library from within Jupyter Notebook using eitherpip
or pip3
.