We want to make a list of all occurrences of אֱלֹהִים with the function of the phrase they occur in.
We define a function, that given a lexeme, produces a tab separated file of the occurrences of that lexeme throughout the Hebrew Bible.
We apply that function to the lexeme אֱלֹהִים (in ETCBC encoding: >LHJM/) to generate two concrete output files:
import sys, os
import collections
import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()
0.00s This is LAF-Fabric 4.7.2 API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html
version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon', 'adjectives', {
"xmlids": {"node": False, "edge": False},
"features": ('''
otype
function lex
gloss
''',
'''
'''),
"prepare": prepare,
"primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))
0.00s LOADING API: please wait ... 0.00s DETAIL: COMPILING m: etcbc4b: UP TO DATE 0.00s USING main: etcbc4b DATA COMPILED AT: 2015-11-02T15-08-56 0.01s DETAIL: COMPILING a: lexicon: UP TO DATE 0.01s USING annox: lexicon DATA COMPILED AT: 2016-07-08T14-32-54 0.01s DETAIL: load main: G.node_anchor_min 0.14s DETAIL: load main: G.node_anchor_max 0.24s DETAIL: load main: G.node_sort 0.34s DETAIL: load main: G.node_sort_inv 0.83s DETAIL: load main: G.edges_from 0.95s DETAIL: load main: G.edges_to 1.14s DETAIL: load main: F.etcbc4_db_otype [node] 2.20s DETAIL: load main: F.etcbc4_ft_function [node] 2.33s DETAIL: load main: F.etcbc4_ft_lex [node] 2.54s DETAIL: load annox lexicon: F.etcbc4_lex_gloss [node] 2.80s LOGFILE=/Users/dirk/laf/laf-fabric-output/etcbc4b/adjectives/__log__adjectives.txt 2.83s INFO: LOADING PREPARED data: please wait ... 2.83s prep prep: G.node_sort 2.92s prep prep: G.node_sort_inv 3.43s prep prep: L.node_up 7.27s prep prep: L.node_down 13s prep prep: V.verses 13s prep prep: V.books_la 13s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html 17s INFO: LOADED PREPARED data 17s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK adjectives AT 2016-09-01T11-02-28
We make an index between a lexeme and all its occurrences. The index takes the shape of a dictionary with the lexemes as keys and the set of its occurrences as values. The lexeme is represented in the ETCBC transcription (we take the value of the lex
features, and the occurrences are represented by just their nodes (which are plain integers).
occurrences = collections.defaultdict(lambda: set())
# a defaultdict is needed for the case where we see a lexeme for the first time.
# In that case occurrences[lexeme] does not yet exist.
# The defaultdict then inserts the key lexeme with value empty set into the dict.
inf('Making occurrence index ...')
for w in F.otype.s('word'):
occurrences[F.lex.v(w)].add(w)
inf('{} lexemes'.format(len(occurrences)))
6m 30s Making occurrence index ... 6m 31s 8777 lexemes
Given the node of an occurrence, we gather all required information without much ado, and assemble it into a tuple. Because we want two output formats, we define a function that takes a format parameter, ec
(=ETCBC consonantal) or ha
(=fully pointed Hebrew). See the ETCBC-reference (follow the link in the cell above that has loaded the data, and look for the T
function.
def bits(fmt, w):
p = L.u('phrase', w)
pw = list(L.d('word', p))
return (
T.passage(w),
p,
T.words(pw, fmt=fmt).replace('\n', ' '),
' '.join(F.gloss.v(x) for x in pw),
F.function.v(p),
F.lex.v(w),
T.words([w], fmt=fmt).replace('\n', ' '),
w,
)
Let us now assemble all data into the final output. We produce also a row of column headers. And we produce some statistics.
fields = '''
passage
phrase_node
phrase_text
phrase_gloss
phrase_function
lexeme
occ_text
occ_node
'''.strip().split()
nfields = len(fields)
row_template = ('{}\t' * (nfields - 1))+'{}\n'
of_path_template = 'occurrences_{}.{}.csv'
The function that writes the file, given lexeme and format, and a function to produce statistics, given a lexeme.
def lex_file_name(lexeme):
# in order to use the lexeme in a file name, we replace < > / [ = by harmless characters
return lexeme.\
replace('/', 's').\
replace('[', 'v').\
replace('=', 'x').\
replace('<', 'o').\
replace('>', 'a')
def lex_info(lexeme, fmt):
file_lex = lex_file_name(lexeme)
file_name = of_path_template.format(file_lex, fmt)
of = open(file_name, 'w')
of.write('{}\n'.format('\t'.join(fields)))
if lexeme not in occurrences:
msg('There is no lexeme "{}"'.format(lexeme))
occs = []
else:
occs = sorted(occurrences[lexeme], key=NK)
# sorted turns a set into a list. The order is given by the key parameter.
# This is the function NK (see the ETCBC-reference. It orders nodes
# according to where their associated text occurs in the Bible
for w in occs:
of.write(row_template.format(*bits(fmt, w)))
# bits yields a tuple of values. The * unpacks this tuple in separate arguments.
of.close()
inf('Written {} lines to {}'.format(len(occs) + 1, file_name))
def show_stats(lexeme):
# we produce an overview of the distribution of the occurrences over the books
# book names in Swahili
book_dist = collections.Counter()
if lexeme not in occurrences:
msg('There is no lexeme "{}"'.format(lexeme))
occs = []
else:
occs = sorted(occurrences[lexeme], key=NK)
for w in occs:
book_node = L.u('book', w)
book_name_sw = T.book_name(book_node, lang='sw')
book_name = T.book_name(book_node)
book_dist['{:<30} = {}'.format(book_name_sw, book_name)] += 1
# we sort the results by frequency
total = 0
for (b, n) in sorted(book_dist.items(), key=lambda x: (-x[1], x[0])):
print('{:<10} has {:>5} occurrences in {}'.format(lexeme, n, b))
total += n
print('{:<10} has {:>5} occurrences in {}'.format(lexeme, total, 'the whole Bible'))
Here we produce results for lexeme >LHJM/
and formats ec
and ha
.
lexeme = '>LHJM/'
show_stats(lexeme)
lex_info(lexeme, 'ec')
lex_info(lexeme, 'ha')
>LHJM/ has 374 occurrences in Kumbukumbu_la_Torati = Deuteronomy >LHJM/ has 365 occurrences in Zaburi = Psalms >LHJM/ has 219 occurrences in Mwanzo = Genesis >LHJM/ has 203 occurrences in 2_Mambo_ya_Nyakati = 2_Chronicles >LHJM/ has 145 occurrences in Yeremia = Jeremiah >LHJM/ has 139 occurrences in Kutoka = Exodus >LHJM/ has 118 occurrences in 1_Mambo_ya_Nyakati = 1_Chronicles >LHJM/ has 107 occurrences in 1_Wafalme = 1_Kings >LHJM/ has 100 occurrences in 1_Samweli = 1_Samuel >LHJM/ has 98 occurrences in 2_Wafalme = 2_Kings >LHJM/ has 94 occurrences in Isaya = Isaiah >LHJM/ has 76 occurrences in Yoshua = Joshua >LHJM/ has 73 occurrences in Waamuzi = Judges >LHJM/ has 70 occurrences in Nehemia = Nehemiah >LHJM/ has 55 occurrences in Ezra = Ezra >LHJM/ has 54 occurrences in 2_Samweli = 2_Samuel >LHJM/ has 53 occurrences in Mambo_ya_Walawi = Leviticus >LHJM/ has 40 occurrences in Mhubiri = Ecclesiastes >LHJM/ has 36 occurrences in Ezekieli = Ezekiel >LHJM/ has 27 occurrences in Hesabu = Numbers >LHJM/ has 26 occurrences in Hosea = Hosea >LHJM/ has 22 occurrences in Danieli = Daniel >LHJM/ has 17 occurrences in Ayubu = Job >LHJM/ has 16 occurrences in Yona = Jonah >LHJM/ has 14 occurrences in Amosi = Amos >LHJM/ has 11 occurrences in Mika = Micah >LHJM/ has 11 occurrences in Yoeli = Joel >LHJM/ has 11 occurrences in Zekaria = Zechariah >LHJM/ has 7 occurrences in Malaki = Malachi >LHJM/ has 5 occurrences in Mithali = Proverbs >LHJM/ has 5 occurrences in Sefania = Zephaniah >LHJM/ has 4 occurrences in Ruthi = Ruth >LHJM/ has 3 occurrences in Hagai = Haggai >LHJM/ has 2 occurrences in Habakuki = Habakkuk >LHJM/ has 1 occurrences in Nahumu = Nahum >LHJM/ has 2601 occurrences in the whole Bible 1h 03m 57s Written 2602 lines to occurrences_aLHJMs.ec.csv 1h 03m 58s Written 2602 lines to occurrences_aLHJMs.ha.csv
print(open(of_path_template.format(lex_file_name(lexeme), 'ec')).read()[0:1000])
passage phrase_node phrase_text phrase_gloss phrase_function lexeme occ_text occ_node Genesis 1:1 605135 >LHJM god(s) Subj >LHJM/ >LHJM 3 Genesis 1:2 605145 RWX >LHJM wind god(s) Subj >LHJM/ >LHJM 25 Genesis 1:3 605150 >LHJM god(s) Subj >LHJM/ >LHJM 33 Genesis 1:4 605158 >LHJM god(s) Subj >LHJM/ >LHJM 41 Genesis 1:4 605164 >LHJM god(s) Subj >LHJM/ >LHJM 49 Genesis 1:5 605168 >LHJM god(s) Subj >LHJM/ >LHJM 59 Genesis 1:6 605184 >LHJM god(s) Subj >LHJM/ >LHJM 80 Genesis 1:7 605194 >LHJM god(s) Subj >LHJM/ >LHJM 96 Genesis 1:8 605208 >LHJM god(s) Subj >LHJM/ >LHJM 126 Genesis 1:9 605220 >LHJM god(s) Subj >LHJM/ >LHJM 141 Genesis 1:10 605232 >LHJM god(s) Subj >LHJM/ >LHJM 161 Genesis 1:10 605241 >LHJM god(s) Subj >LHJM/ >LHJM 175 Genesis 1:11 605246 >LHJM god(s) Subj >LHJM/ >LHJM 180 Genesis 1:12 605277 >LHJM god(s) Subj >LHJM/ >LHJM 224 Genesis 1:14 605289 >LHJM god(s) Subj >LHJM/ >LHJM 237 Genesis 1:16 605309 >LHJM god(s) Subj >LHJM/ >LHJM 283 Genesis 1:17