This notebook exports a corrected version of the participant reference dataset created by Eep Talstra. That dataset was inspected and a list of mismatches was identified for manual correction (see notebook). The dataset went through two rounds of corrections in Excel. This corrected version will now be exported as three text-fabric features:
import sys, os
import csv, collections
import pandas as pd
from tf.app import use
B = use('bhsa', hoist=globals(), locations='actor/tf')
Using etcbc/bhsa/tf - c r1.4 in C:\Users\Ejer/text-fabric-data Using etcbc/phono/tf - c r1.1 in C:\Users\Ejer/text-fabric-data Using etcbc/parallels/tf - c r1.1 in C:\Users\Ejer/text-fabric-data
Documentation: BHSA Character table Feature docs bhsa API Text-Fabric API 7.0.3 Search Reference
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det dist dist_unit domain freq_lex freq_occ function g_word g_word_utf8 gloss gn instruction is_root kind label language lex lex_utf8 ls nametype nme nu number otype pargr pdp pfm prs prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rank_occ rela root sp st tab trailer trailer_utf8 txt typ uvf vbe vbs verse voc_lex voc_lex_utf8 vs vt distributional_parent functional_parent mother oslots
Parallel Passages: crossref
Phonetic Transcriptions: phono phono_trailer
file = 'Datasets/Lev17toLev26_mapped_updated_corrected.csv'
new_dict = {}
n = 0
with open(file) as f:
next(f)
reader = csv.reader(f, delimiter = ';')
for r in reader:
surface_text = r[1]
book = r[2]
chapter = r[3]
verse = r[4]
clause_atom = r[5]
pred = r[6]
ref = r[7]
ptc_set = r[8]
ptc_actor = r[9]
slots = r[10]
func = r[11]
compound = r[12]
correction_1 = r[14]
correction_2 = r[15]
n+=1
new_dict[n] = [surface_text, book, chapter, verse, clause_atom, pred, ref,
ptc_set, ptc_actor, slots, func, compound, correction_1, correction_2]
data = pd.DataFrame.from_dict(new_dict).T
data.columns = ['surface_text','book','chapter','verse','clause_atom','predicate','reference','participant',
'actor','slots','func','compound','1_correction','2_correction']
data
surface_text | book | chapter | verse | clause_atom | predicate | reference | participant | actor | slots | func | compound | 1_correction | 2_correction | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | JDBR | Leviticus | 17 | 1 | 528163 | DBR | DBR | 3sm=JHWH | JHWH | 63009 | VbPred | 0 | ||
2 | JHWH | Leviticus | 17 | 1 | 528163 | DBR | JHWH | 3sm=JHWH | JHWH | 63010 | Subj | 1 | ||
3 | >L MCH | Leviticus | 17 | 1 | 528163 | DBR | >L MCH | 0sm=MCH | MCH | 63011 63012 | Compl1 | 2 | ||
4 | L->MR | Leviticus | 17 | 1 | 528164 | >MR | L >MR | 3sm=JHWH | JHWH | 63013 63014 | VbPred | 3 | ||
5 | DBR | Leviticus | 17 | 2 | 528165 | DBR | DBR | 2sm= | MCH | 63015 | VbPred | 4 | ||
6 | >L >HRN W->L BNJW W->L KL BNJ JFR>L | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN W >L BN+S W >L KL BN JFR>L | 3pm=>HRN BN+>HRN FR>L | >HRN BN >HRN | 63016 63017 63018 63019 63020 63021 63022 6302... | Compl1 | 5 6 7 8 9 10 11 | >HRN BN >HRN BN JFR>L | |
7 | >L >HRN W->L BNJW | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN W >L BN+S | ... | ... | 63016 63017 63018 63019 63020 | -paral | 6 7 8 9 | ||
8 | >L >HRN | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN | 3sm=>HRN | >HRN | 63016 63017 | -paral | 7 | ||
9 | >L BNJW | Leviticus | 17 | 2 | 528165 | DBR | >L BN+312 | ... | ... | 63019 63020 | -paral | 8 9 | ||
10 | sfx:W | Leviticus | 17 | 2 | 528165 | DBR | sfx | 3sm=>HRN | >HRN | 63020 | -gentf | 9 | ||
11 | BNJ JFR>L | Leviticus | 17 | 2 | 528165 | DBR | BN JFR>L | 0pm=BN JFR>L | BN JFR>L | 63024 63025 | -gentf | 10 11 | ||
12 | JFR>L | Leviticus | 17 | 2 | 528165 | DBR | JFR>L | JFR>L | JFR>L | 63025 | -gentf | 11 | ||
13 | >MRT | Leviticus | 17 | 2 | 528166 | >MR | >MR | 2sm= | MCH | 63027 | VbPred | 12 | ||
14 | sfx:HM | Leviticus | 17 | 2 | 528166 | >MR | sfx | 3pm=>HRN BN+>HRN FR>L | >HRN BN >HRN | 63028 | Compl1 | 13 | >HRN BN >HRN BN JFR>L | |
15 | ZH | Leviticus | 17 | 2 | 528167 | .... | ZH | 0sm=DBR | DBR | 63029 | Subj | 14 | ||
16 | H-DBR | Leviticus | 17 | 2 | 528167 | .... | H DBR | 0sm=DBR | DBR | 63030 63031 | PrCompl | 15 | ||
17 | YWH | Leviticus | 17 | 2 | 528168 | YWH | YWH | 3sm=JHWH | JHWH | 63033 | VbPred | 16 | ||
18 | JHWH | Leviticus | 17 | 2 | 528168 | YWH | JHWH | 3sm=JHWH | JHWH | 63034 | Subj | 17 | ||
19 | L->MR | Leviticus | 17 | 2 | 528169 | >MR | L >MR | 3sm=JHWH | JHWH | 63035 63036 | VbPred | 18 | ||
20 | >JC >JC | Leviticus | 17 | 3 | 528170 | .... | >JC >JC | 3sm=>JC >JC | >JC >JC | 63037 63038 | 572 | 19 20 21 | ||
21 | >JC | Leviticus | 17 | 3 | 528170 | .... | >JC | ... | ... | 63037 | -paral | 20 | ||
22 | >JC | Leviticus | 17 | 3 | 528170 | .... | >JC | 0sm=>JC | >JC | 63038 | -paral | 21 | ||
23 | M-BJT JFR>L | Leviticus | 17 | 3 | 528170 | .... | MN BJT JFR>L | 0sm=BJT JFR>L | BJT JFR>L | 63039 63040 63041 | -specf | 22 23 | ||
24 | JFR>L | Leviticus | 17 | 3 | 528170 | .... | JFR>L | JFR>L | JFR>L | 63041 | -gentf | 23 | ||
25 | JCXV | Leviticus | 17 | 3 | 528171 | CXV | CXV | 3sm=>JC >JC | >JC >JC | 63043 | VbPred | 24 | ||
26 | CWR >W KFB >W <Z | Leviticus | 17 | 3 | 528171 | CXV | CWR >W KFB >W <Z | 3sm=CWR KFB <Z | CWR KFB <Z | 63044 63045 63046 63047 63048 | Obj1 | 25 26 27 28 29 | ||
27 | CWR >W KFB | Leviticus | 17 | 3 | 528171 | CXV | CWR >W KFB | ... | ... | 63044 63045 63046 | -paral | 26 27 28 | ||
28 | CWR | Leviticus | 17 | 3 | 528171 | CXV | CWR | ... | ... | 63044 | -paral | 27 | ||
29 | KFB | Leviticus | 17 | 3 | 528171 | CXV | KFB | ... | ... | 63046 | -paral | 28 | ||
30 | <Z | Leviticus | 17 | 3 | 528171 | CXV | <Z | ... | ... | 63048 | -paral | 29 | ||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4063 | BRJT R>CNJM | Leviticus | 26 | 45 | 529377 | ZKR | BRJT R>CWN | ... | ... | 68924 68925 | Obj1 | 4062 4063 | ||
4064 | R>CNJM | Leviticus | 26 | 45 | 529377 | ZKR | R>CWN | 0pm=R>CWN | R>CWN | 68925 | -gentf | 4063 | ||
4065 | HWY>TJ | Leviticus | 26 | 45 | 529378 | JY> | JY> | 1sc=>NJ | >NJ | 68927 | VbPred | 4064 | JHWH | |
4066 | sfx:M | Leviticus | 26 | 45 | 529378 | JY> | sfx | 3pm=GWJ | GWJ | 68928 | Obj1 | 4065 | C>R | |
4067 | M->RY MYRJM | Leviticus | 26 | 45 | 529378 | JY> | MN >RY MYRJM | >RY MYRJM | >RY MYRJM | 68929 68930 68931 | Compl1 | 4066 4067 | ||
4068 | MYRJM | Leviticus | 26 | 45 | 529378 | JY> | MYRJM | 3pm=MYRJM | MYRJM | 68931 | -gentf | 4067 | ||
4069 | L-<JNJ H-GWJM | Leviticus | 26 | 45 | 529378 | JY> | L <JN H GWJ | ... | ... | 68932 68933 68934 68935 | Adjunc | 4068 4069 | ||
4070 | H-GWJM | Leviticus | 26 | 45 | 529378 | JY> | H GWJ | 3pm=GWJ | GWJ | 68934 68935 | -gentf | 4069 | ||
4071 | L-HJT | Leviticus | 26 | 45 | 529379 | HJH | L HJH | 1sc=>NJ | >NJ | 68936 68937 | VbPred | 4070 | JHWH | |
4072 | sfx:HM | Leviticus | 26 | 45 | 529379 | HJH | sfx | 3pm=GWJ | GWJ | 68938 | Compl1 | 4071 | C>R | |
4073 | L->LHJM | Leviticus | 26 | 45 | 529379 | HJH | L >LHJM | 0pm=>LHJM | >LHJM | 68939 68940 | PrCompl | 4072 | JHWH | |
4074 | >NJ | Leviticus | 26 | 45 | 529380 | .... | >NJ | 1sc=>NJ | >NJ | 68941 | Subj | 4073 | JHWH | |
4075 | JHWH | Leviticus | 26 | 45 | 529380 | .... | JHWH | 0sm=JHWH | JHWH | 68942 | PrCompl | 4074 | ||
4076 | >LH | Leviticus | 26 | 46 | 529381 | .... | >LH | 0pm=XQ MCPV TWRH | XQ MCPV TWRH | 68943 | Subj | 4075 | ||
4077 | H-XQJM W-H-MCPVJM W-H-TWRT | Leviticus | 26 | 46 | 529381 | .... | H XQ W H MCPV W H TWRH | 0pm=XQ MCPV TWRH | XQ MCPV TWRH | 68944 68945 68946 68947 68948 68949 68950 68951 | PrCompl | 4076 4077 4078 4079 4080 | ||
4078 | H-XQJM W-H-MCPVJM | Leviticus | 26 | 46 | 529381 | .... | H XQ W H MCPV | ... | ... | 68944 68945 68946 68947 68948 | -paral | 4077 4078 4079 | ||
4079 | H-XQJM | Leviticus | 26 | 46 | 529381 | .... | H XQ | ... | ... | 68944 68945 | -paral | 4078 | ||
4080 | H-MCPVJM | Leviticus | 26 | 46 | 529381 | .... | H MCPV | ... | ... | 68947 68948 | -paral | 4079 | ||
4081 | H-TWRT | Leviticus | 26 | 46 | 529381 | .... | H TWRH | ... | ... | 68950 68951 | -paral | 4080 | ||
4082 | NTN | Leviticus | 26 | 46 | 529382 | NTN | NTN | 3sm=JHWH | JHWH | 68953 | VbPred | 4081 | ||
4083 | JHWH | Leviticus | 26 | 46 | 529382 | NTN | JHWH | 3sm=JHWH | JHWH | 68954 | Subj | 4082 | ||
4084 | BJNW W-BJN BNJ JFR>L | Leviticus | 26 | 46 | 529382 | NTN | BJN W BJN BN JFR>L | ... | ... | 68955 68956 68957 68958 68959 | Compl1 | 4083 4084 4085 4086 4087 | ||
4085 | BJNW | Leviticus | 26 | 46 | 529382 | NTN | BJN | ... | ... | 68955 | -paral | 4084 4085 | ||
4086 | sfx:W | Leviticus | 26 | 46 | 529382 | NTN | sfx | 3sm=JHWH | JHWH | 68955 | -gentf | 4085 | ||
4087 | BJN BNJ JFR>L | Leviticus | 26 | 46 | 529382 | NTN | BJN BN JFR>L | ... | ... | 68957 68958 68959 | -paral | 4086 4087 | ||
4088 | JFR>L | Leviticus | 26 | 46 | 529382 | NTN | JFR>L | ... | ... | 68959 | -gentf | 4087 | ||
4089 | B-HR SJNJ | Leviticus | 26 | 46 | 529382 | NTN | B HR SJNJ | ... | ... | 68960 68961 68962 | Locat | 4088 4089 | ||
4090 | SJNJ | Leviticus | 26 | 46 | 529382 | NTN | SJNJ | ... | ... | 68962 | -gentf | 4089 | ||
4091 | B-JD MCH | Leviticus | 26 | 46 | 529382 | NTN | B JD MCH | ... | ... | 68963 68964 68965 | Adjunc | 4090 4091 | ||
4092 | MCH | Leviticus | 26 | 46 | 529382 | NTN | MCH | 0sm=MCH | MCH | 68965 | -gentf | 4091 |
4092 rows × 14 columns
How many corrections have been made?
rows_total = len(data)
print(f"First round: {rows_total-len(data[data['1_correction'] == ''])}")
print(f"Second round: {rows_total-len(data[data['2_correction'] == ''])}")
First round: 235 Second round: 687
The manual corrections are incorporated in following order: First, it is checked whether a correction from 2. correction round exists, and thereafter if a correction from 1. correction round exists. If one of these exists (2. round has priority), it will overwrite the actor in the 'actor' column.
for row in data.iterrows():
row = row[0]
actor = data['actor'][row]
correction_1 = data['1_correction'][row]
correction_2 = data['2_correction'][row]
if correction_2 != '':
actor = correction_2
elif correction_1 != '':
actor = correction_1
data['actor'][row] = actor #Update dataframe
The columns '1_correction' and '2_correction' can now be dropped:
data = data.drop(columns=['1_correction', '2_correction']) #drop columns
data['otype'] = '...'
data
surface_text | book | chapter | verse | clause_atom | predicate | reference | participant | actor | slots | func | compound | otype | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | JDBR | Leviticus | 17 | 1 | 528163 | DBR | DBR | 3sm=JHWH | JHWH | 63009 | VbPred | 0 | ... |
2 | JHWH | Leviticus | 17 | 1 | 528163 | DBR | JHWH | 3sm=JHWH | JHWH | 63010 | Subj | 1 | ... |
3 | >L MCH | Leviticus | 17 | 1 | 528163 | DBR | >L MCH | 0sm=MCH | MCH | 63011 63012 | Compl1 | 2 | ... |
4 | L->MR | Leviticus | 17 | 1 | 528164 | >MR | L >MR | 3sm=JHWH | JHWH | 63013 63014 | VbPred | 3 | ... |
5 | DBR | Leviticus | 17 | 2 | 528165 | DBR | DBR | 2sm= | MCH | 63015 | VbPred | 4 | ... |
6 | >L >HRN W->L BNJW W->L KL BNJ JFR>L | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN W >L BN+S W >L KL BN JFR>L | 3pm=>HRN BN+>HRN FR>L | >HRN BN >HRN BN JFR>L | 63016 63017 63018 63019 63020 63021 63022 6302... | Compl1 | 5 6 7 8 9 10 11 | ... |
7 | >L >HRN W->L BNJW | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN W >L BN+S | ... | ... | 63016 63017 63018 63019 63020 | -paral | 6 7 8 9 | ... |
8 | >L >HRN | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN | 3sm=>HRN | >HRN | 63016 63017 | -paral | 7 | ... |
9 | >L BNJW | Leviticus | 17 | 2 | 528165 | DBR | >L BN+312 | ... | ... | 63019 63020 | -paral | 8 9 | ... |
10 | sfx:W | Leviticus | 17 | 2 | 528165 | DBR | sfx | 3sm=>HRN | >HRN | 63020 | -gentf | 9 | ... |
11 | BNJ JFR>L | Leviticus | 17 | 2 | 528165 | DBR | BN JFR>L | 0pm=BN JFR>L | BN JFR>L | 63024 63025 | -gentf | 10 11 | ... |
12 | JFR>L | Leviticus | 17 | 2 | 528165 | DBR | JFR>L | JFR>L | JFR>L | 63025 | -gentf | 11 | ... |
13 | >MRT | Leviticus | 17 | 2 | 528166 | >MR | >MR | 2sm= | MCH | 63027 | VbPred | 12 | ... |
14 | sfx:HM | Leviticus | 17 | 2 | 528166 | >MR | sfx | 3pm=>HRN BN+>HRN FR>L | >HRN BN >HRN BN JFR>L | 63028 | Compl1 | 13 | ... |
15 | ZH | Leviticus | 17 | 2 | 528167 | .... | ZH | 0sm=DBR | DBR | 63029 | Subj | 14 | ... |
16 | H-DBR | Leviticus | 17 | 2 | 528167 | .... | H DBR | 0sm=DBR | DBR | 63030 63031 | PrCompl | 15 | ... |
17 | YWH | Leviticus | 17 | 2 | 528168 | YWH | YWH | 3sm=JHWH | JHWH | 63033 | VbPred | 16 | ... |
18 | JHWH | Leviticus | 17 | 2 | 528168 | YWH | JHWH | 3sm=JHWH | JHWH | 63034 | Subj | 17 | ... |
19 | L->MR | Leviticus | 17 | 2 | 528169 | >MR | L >MR | 3sm=JHWH | JHWH | 63035 63036 | VbPred | 18 | ... |
20 | >JC >JC | Leviticus | 17 | 3 | 528170 | .... | >JC >JC | 3sm=>JC >JC | >JC >JC | 63037 63038 | 572 | 19 20 21 | ... |
21 | >JC | Leviticus | 17 | 3 | 528170 | .... | >JC | ... | ... | 63037 | -paral | 20 | ... |
22 | >JC | Leviticus | 17 | 3 | 528170 | .... | >JC | 0sm=>JC | >JC | 63038 | -paral | 21 | ... |
23 | M-BJT JFR>L | Leviticus | 17 | 3 | 528170 | .... | MN BJT JFR>L | 0sm=BJT JFR>L | BJT JFR>L | 63039 63040 63041 | -specf | 22 23 | ... |
24 | JFR>L | Leviticus | 17 | 3 | 528170 | .... | JFR>L | JFR>L | JFR>L | 63041 | -gentf | 23 | ... |
25 | JCXV | Leviticus | 17 | 3 | 528171 | CXV | CXV | 3sm=>JC >JC | >JC >JC | 63043 | VbPred | 24 | ... |
26 | CWR >W KFB >W <Z | Leviticus | 17 | 3 | 528171 | CXV | CWR >W KFB >W <Z | 3sm=CWR KFB <Z | CWR KFB <Z | 63044 63045 63046 63047 63048 | Obj1 | 25 26 27 28 29 | ... |
27 | CWR >W KFB | Leviticus | 17 | 3 | 528171 | CXV | CWR >W KFB | ... | ... | 63044 63045 63046 | -paral | 26 27 28 | ... |
28 | CWR | Leviticus | 17 | 3 | 528171 | CXV | CWR | ... | ... | 63044 | -paral | 27 | ... |
29 | KFB | Leviticus | 17 | 3 | 528171 | CXV | KFB | ... | ... | 63046 | -paral | 28 | ... |
30 | <Z | Leviticus | 17 | 3 | 528171 | CXV | <Z | ... | ... | 63048 | -paral | 29 | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4063 | BRJT R>CNJM | Leviticus | 26 | 45 | 529377 | ZKR | BRJT R>CWN | ... | ... | 68924 68925 | Obj1 | 4062 4063 | ... |
4064 | R>CNJM | Leviticus | 26 | 45 | 529377 | ZKR | R>CWN | 0pm=R>CWN | R>CWN | 68925 | -gentf | 4063 | ... |
4065 | HWY>TJ | Leviticus | 26 | 45 | 529378 | JY> | JY> | 1sc=>NJ | JHWH | 68927 | VbPred | 4064 | ... |
4066 | sfx:M | Leviticus | 26 | 45 | 529378 | JY> | sfx | 3pm=GWJ | C>R | 68928 | Obj1 | 4065 | ... |
4067 | M->RY MYRJM | Leviticus | 26 | 45 | 529378 | JY> | MN >RY MYRJM | >RY MYRJM | >RY MYRJM | 68929 68930 68931 | Compl1 | 4066 4067 | ... |
4068 | MYRJM | Leviticus | 26 | 45 | 529378 | JY> | MYRJM | 3pm=MYRJM | MYRJM | 68931 | -gentf | 4067 | ... |
4069 | L-<JNJ H-GWJM | Leviticus | 26 | 45 | 529378 | JY> | L <JN H GWJ | ... | ... | 68932 68933 68934 68935 | Adjunc | 4068 4069 | ... |
4070 | H-GWJM | Leviticus | 26 | 45 | 529378 | JY> | H GWJ | 3pm=GWJ | GWJ | 68934 68935 | -gentf | 4069 | ... |
4071 | L-HJT | Leviticus | 26 | 45 | 529379 | HJH | L HJH | 1sc=>NJ | JHWH | 68936 68937 | VbPred | 4070 | ... |
4072 | sfx:HM | Leviticus | 26 | 45 | 529379 | HJH | sfx | 3pm=GWJ | C>R | 68938 | Compl1 | 4071 | ... |
4073 | L->LHJM | Leviticus | 26 | 45 | 529379 | HJH | L >LHJM | 0pm=>LHJM | JHWH | 68939 68940 | PrCompl | 4072 | ... |
4074 | >NJ | Leviticus | 26 | 45 | 529380 | .... | >NJ | 1sc=>NJ | JHWH | 68941 | Subj | 4073 | ... |
4075 | JHWH | Leviticus | 26 | 45 | 529380 | .... | JHWH | 0sm=JHWH | JHWH | 68942 | PrCompl | 4074 | ... |
4076 | >LH | Leviticus | 26 | 46 | 529381 | .... | >LH | 0pm=XQ MCPV TWRH | XQ MCPV TWRH | 68943 | Subj | 4075 | ... |
4077 | H-XQJM W-H-MCPVJM W-H-TWRT | Leviticus | 26 | 46 | 529381 | .... | H XQ W H MCPV W H TWRH | 0pm=XQ MCPV TWRH | XQ MCPV TWRH | 68944 68945 68946 68947 68948 68949 68950 68951 | PrCompl | 4076 4077 4078 4079 4080 | ... |
4078 | H-XQJM W-H-MCPVJM | Leviticus | 26 | 46 | 529381 | .... | H XQ W H MCPV | ... | ... | 68944 68945 68946 68947 68948 | -paral | 4077 4078 4079 | ... |
4079 | H-XQJM | Leviticus | 26 | 46 | 529381 | .... | H XQ | ... | ... | 68944 68945 | -paral | 4078 | ... |
4080 | H-MCPVJM | Leviticus | 26 | 46 | 529381 | .... | H MCPV | ... | ... | 68947 68948 | -paral | 4079 | ... |
4081 | H-TWRT | Leviticus | 26 | 46 | 529381 | .... | H TWRH | ... | ... | 68950 68951 | -paral | 4080 | ... |
4082 | NTN | Leviticus | 26 | 46 | 529382 | NTN | NTN | 3sm=JHWH | JHWH | 68953 | VbPred | 4081 | ... |
4083 | JHWH | Leviticus | 26 | 46 | 529382 | NTN | JHWH | 3sm=JHWH | JHWH | 68954 | Subj | 4082 | ... |
4084 | BJNW W-BJN BNJ JFR>L | Leviticus | 26 | 46 | 529382 | NTN | BJN W BJN BN JFR>L | ... | ... | 68955 68956 68957 68958 68959 | Compl1 | 4083 4084 4085 4086 4087 | ... |
4085 | BJNW | Leviticus | 26 | 46 | 529382 | NTN | BJN | ... | ... | 68955 | -paral | 4084 4085 | ... |
4086 | sfx:W | Leviticus | 26 | 46 | 529382 | NTN | sfx | 3sm=JHWH | JHWH | 68955 | -gentf | 4085 | ... |
4087 | BJN BNJ JFR>L | Leviticus | 26 | 46 | 529382 | NTN | BJN BN JFR>L | ... | ... | 68957 68958 68959 | -paral | 4086 4087 | ... |
4088 | JFR>L | Leviticus | 26 | 46 | 529382 | NTN | JFR>L | ... | ... | 68959 | -gentf | 4087 | ... |
4089 | B-HR SJNJ | Leviticus | 26 | 46 | 529382 | NTN | B HR SJNJ | ... | ... | 68960 68961 68962 | Locat | 4088 4089 | ... |
4090 | SJNJ | Leviticus | 26 | 46 | 529382 | NTN | SJNJ | ... | ... | 68962 | -gentf | 4089 | ... |
4091 | B-JD MCH | Leviticus | 26 | 46 | 529382 | NTN | B JD MCH | ... | ... | 68963 68964 68965 | Adjunc | 4090 4091 | ... |
4092 | MCH | Leviticus | 26 | 46 | 529382 | NTN | MCH | 0sm=MCH | MCH | 68965 | -gentf | 4091 | ... |
4092 rows × 13 columns
When exporting the actor references as TF-features, we need to know what object (word, phrase etc.) the actor reference applies to. That information does not exist directly in the dataset, as the actors are only related to an interval of slots. The following function finds the nearest object type in terms of slots occupied. The algorithm goes from the smallest type (suffix) to the larger types (subphrases and phrases). At each step, it is checked whether the first word and the last word of any given object match the first and last words of the actor reference:
def nearestObject(row):
'''
Input: row number
Output: the TF-object type that comes closest to the interval of words that the actor reference occupies.
'''
slots = data['slots'][row].split()
first_word = int(slots[0]) #First word of actor reference
last_word = int(slots[-1]) #Last word
subphrase = L.u(first_word, 'subphrase') #The subphrase to which the first word belongs
phrase_atom = L.u(first_word, 'phrase_atom')[0] #The phrase atom to which the first belongs
nearest_object = ''
if data['reference'][row] == 'sfx': #If actor reference is a suffix, it only occupies one word slot.
nearest_object = first_word
elif subphrase: #If the first word belongs to a subphrase - that's not always the case.
for ph in subphrase:
subphrase_words = L.d(ph, 'word')
#Checking if first word and last word of the actor match those of the subphrase.
if subphrase_words[0] == first_word and subphrase_words[-1] == last_word:
nearest_object = ph
if nearest_object == '': #If nearest object has not yet been found we check the phrase atom level
phrase_words = L.d(phrase_atom, 'word')
if phrase_words[-1] == last_word: #If the last word of the phrase matches the last of the actor, there is a match
nearest_object = phrase_atom
return nearest_object
#nearestObject(5)
A new column is added to the dataframe in which we add the nodes of the object type matching the slots occupied by the actor reference. If no object type is found, the row is added to an error_list, so possible mismatches can be identified:
error_list = []
for row in data.iterrows():
if data['actor'][row[0]] != '...': #Empty actors are left out because they are not relevant.
nearest_object = nearestObject(row[0]) #The function nearestObject() is run for every row.
if nearest_object == '':
error_list.append(row[0]) #If there is no object matching the actor, the row is added to an error_list
else:
data['otype'][row[0]] = nearest_object #The object type node is added to a column
print(f'Number of errors: {len(error_list)}')
Number of errors: 0
data
surface_text | book | chapter | verse | clause_atom | predicate | reference | participant | actor | slots | func | compound | otype | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | JDBR | Leviticus | 17 | 1 | 528163 | DBR | DBR | 3sm=JHWH | JHWH | 63009 | VbPred | 0 | 943175 |
2 | JHWH | Leviticus | 17 | 1 | 528163 | DBR | JHWH | 3sm=JHWH | JHWH | 63010 | Subj | 1 | 943176 |
3 | >L MCH | Leviticus | 17 | 1 | 528163 | DBR | >L MCH | 0sm=MCH | MCH | 63011 63012 | Compl1 | 2 | 943177 |
4 | L->MR | Leviticus | 17 | 1 | 528164 | >MR | L >MR | 3sm=JHWH | JHWH | 63013 63014 | VbPred | 3 | 943178 |
5 | DBR | Leviticus | 17 | 2 | 528165 | DBR | DBR | 2sm= | MCH | 63015 | VbPred | 4 | 943179 |
6 | >L >HRN W->L BNJW W->L KL BNJ JFR>L | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN W >L BN+S W >L KL BN JFR>L | 3pm=>HRN BN+>HRN FR>L | >HRN BN >HRN BN JFR>L | 63016 63017 63018 63019 63020 63021 63022 6302... | Compl1 | 5 6 7 8 9 10 11 | 943180 |
7 | >L >HRN W->L BNJW | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN W >L BN+S | ... | ... | 63016 63017 63018 63019 63020 | -paral | 6 7 8 9 | ... |
8 | >L >HRN | Leviticus | 17 | 2 | 528165 | DBR | >L >HRN | 3sm=>HRN | >HRN | 63016 63017 | -paral | 7 | 1317252 |
9 | >L BNJW | Leviticus | 17 | 2 | 528165 | DBR | >L BN+312 | ... | ... | 63019 63020 | -paral | 8 9 | ... |
10 | sfx:W | Leviticus | 17 | 2 | 528165 | DBR | sfx | 3sm=>HRN | >HRN | 63020 | -gentf | 9 | 63020 |
11 | BNJ JFR>L | Leviticus | 17 | 2 | 528165 | DBR | BN JFR>L | 0pm=BN JFR>L | BN JFR>L | 63024 63025 | -gentf | 10 11 | 1317258 |
12 | JFR>L | Leviticus | 17 | 2 | 528165 | DBR | JFR>L | JFR>L | JFR>L | 63025 | -gentf | 11 | 1317257 |
13 | >MRT | Leviticus | 17 | 2 | 528166 | >MR | >MR | 2sm= | MCH | 63027 | VbPred | 12 | 943182 |
14 | sfx:HM | Leviticus | 17 | 2 | 528166 | >MR | sfx | 3pm=>HRN BN+>HRN FR>L | >HRN BN >HRN BN JFR>L | 63028 | Compl1 | 13 | 63028 |
15 | ZH | Leviticus | 17 | 2 | 528167 | .... | ZH | 0sm=DBR | DBR | 63029 | Subj | 14 | 943184 |
16 | H-DBR | Leviticus | 17 | 2 | 528167 | .... | H DBR | 0sm=DBR | DBR | 63030 63031 | PrCompl | 15 | 943185 |
17 | YWH | Leviticus | 17 | 2 | 528168 | YWH | YWH | 3sm=JHWH | JHWH | 63033 | VbPred | 16 | 943187 |
18 | JHWH | Leviticus | 17 | 2 | 528168 | YWH | JHWH | 3sm=JHWH | JHWH | 63034 | Subj | 17 | 943188 |
19 | L->MR | Leviticus | 17 | 2 | 528169 | >MR | L >MR | 3sm=JHWH | JHWH | 63035 63036 | VbPred | 18 | 943189 |
20 | >JC >JC | Leviticus | 17 | 3 | 528170 | .... | >JC >JC | 3sm=>JC >JC | >JC >JC | 63037 63038 | 572 | 19 20 21 | 943190 |
21 | >JC | Leviticus | 17 | 3 | 528170 | .... | >JC | ... | ... | 63037 | -paral | 20 | ... |
22 | >JC | Leviticus | 17 | 3 | 528170 | .... | >JC | 0sm=>JC | >JC | 63038 | -paral | 21 | 1317261 |
23 | M-BJT JFR>L | Leviticus | 17 | 3 | 528170 | .... | MN BJT JFR>L | 0sm=BJT JFR>L | BJT JFR>L | 63039 63040 63041 | -specf | 22 23 | 943191 |
24 | JFR>L | Leviticus | 17 | 3 | 528170 | .... | JFR>L | JFR>L | JFR>L | 63041 | -gentf | 23 | 1317263 |
25 | JCXV | Leviticus | 17 | 3 | 528171 | CXV | CXV | 3sm=>JC >JC | >JC >JC | 63043 | VbPred | 24 | 943193 |
26 | CWR >W KFB >W <Z | Leviticus | 17 | 3 | 528171 | CXV | CWR >W KFB >W <Z | 3sm=CWR KFB <Z | CWR KFB <Z | 63044 63045 63046 63047 63048 | Obj1 | 25 26 27 28 29 | 943194 |
27 | CWR >W KFB | Leviticus | 17 | 3 | 528171 | CXV | CWR >W KFB | ... | ... | 63044 63045 63046 | -paral | 26 27 28 | ... |
28 | CWR | Leviticus | 17 | 3 | 528171 | CXV | CWR | ... | ... | 63044 | -paral | 27 | ... |
29 | KFB | Leviticus | 17 | 3 | 528171 | CXV | KFB | ... | ... | 63046 | -paral | 28 | ... |
30 | <Z | Leviticus | 17 | 3 | 528171 | CXV | <Z | ... | ... | 63048 | -paral | 29 | ... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4063 | BRJT R>CNJM | Leviticus | 26 | 45 | 529377 | ZKR | BRJT R>CWN | ... | ... | 68924 68925 | Obj1 | 4062 4063 | ... |
4064 | R>CNJM | Leviticus | 26 | 45 | 529377 | ZKR | R>CWN | 0pm=R>CWN | R>CWN | 68925 | -gentf | 4063 | 1318757 |
4065 | HWY>TJ | Leviticus | 26 | 45 | 529378 | JY> | JY> | 1sc=>NJ | JHWH | 68927 | VbPred | 4064 | 946968 |
4066 | sfx:M | Leviticus | 26 | 45 | 529378 | JY> | sfx | 3pm=GWJ | C>R | 68928 | Obj1 | 4065 | 68928 |
4067 | M->RY MYRJM | Leviticus | 26 | 45 | 529378 | JY> | MN >RY MYRJM | >RY MYRJM | >RY MYRJM | 68929 68930 68931 | Compl1 | 4066 4067 | 946970 |
4068 | MYRJM | Leviticus | 26 | 45 | 529378 | JY> | MYRJM | 3pm=MYRJM | MYRJM | 68931 | -gentf | 4067 | 1318759 |
4069 | L-<JNJ H-GWJM | Leviticus | 26 | 45 | 529378 | JY> | L <JN H GWJ | ... | ... | 68932 68933 68934 68935 | Adjunc | 4068 4069 | ... |
4070 | H-GWJM | Leviticus | 26 | 45 | 529378 | JY> | H GWJ | 3pm=GWJ | GWJ | 68934 68935 | -gentf | 4069 | 1318761 |
4071 | L-HJT | Leviticus | 26 | 45 | 529379 | HJH | L HJH | 1sc=>NJ | JHWH | 68936 68937 | VbPred | 4070 | 946972 |
4072 | sfx:HM | Leviticus | 26 | 45 | 529379 | HJH | sfx | 3pm=GWJ | C>R | 68938 | Compl1 | 4071 | 68938 |
4073 | L->LHJM | Leviticus | 26 | 45 | 529379 | HJH | L >LHJM | 0pm=>LHJM | JHWH | 68939 68940 | PrCompl | 4072 | 946974 |
4074 | >NJ | Leviticus | 26 | 45 | 529380 | .... | >NJ | 1sc=>NJ | JHWH | 68941 | Subj | 4073 | 946975 |
4075 | JHWH | Leviticus | 26 | 45 | 529380 | .... | JHWH | 0sm=JHWH | JHWH | 68942 | PrCompl | 4074 | 946976 |
4076 | >LH | Leviticus | 26 | 46 | 529381 | .... | >LH | 0pm=XQ MCPV TWRH | XQ MCPV TWRH | 68943 | Subj | 4075 | 946977 |
4077 | H-XQJM W-H-MCPVJM W-H-TWRT | Leviticus | 26 | 46 | 529381 | .... | H XQ W H MCPV W H TWRH | 0pm=XQ MCPV TWRH | XQ MCPV TWRH | 68944 68945 68946 68947 68948 68949 68950 68951 | PrCompl | 4076 4077 4078 4079 4080 | 946978 |
4078 | H-XQJM W-H-MCPVJM | Leviticus | 26 | 46 | 529381 | .... | H XQ W H MCPV | ... | ... | 68944 68945 68946 68947 68948 | -paral | 4077 4078 4079 | ... |
4079 | H-XQJM | Leviticus | 26 | 46 | 529381 | .... | H XQ | ... | ... | 68944 68945 | -paral | 4078 | ... |
4080 | H-MCPVJM | Leviticus | 26 | 46 | 529381 | .... | H MCPV | ... | ... | 68947 68948 | -paral | 4079 | ... |
4081 | H-TWRT | Leviticus | 26 | 46 | 529381 | .... | H TWRH | ... | ... | 68950 68951 | -paral | 4080 | ... |
4082 | NTN | Leviticus | 26 | 46 | 529382 | NTN | NTN | 3sm=JHWH | JHWH | 68953 | VbPred | 4081 | 946980 |
4083 | JHWH | Leviticus | 26 | 46 | 529382 | NTN | JHWH | 3sm=JHWH | JHWH | 68954 | Subj | 4082 | 946981 |
4084 | BJNW W-BJN BNJ JFR>L | Leviticus | 26 | 46 | 529382 | NTN | BJN W BJN BN JFR>L | ... | ... | 68955 68956 68957 68958 68959 | Compl1 | 4083 4084 4085 4086 4087 | ... |
4085 | BJNW | Leviticus | 26 | 46 | 529382 | NTN | BJN | ... | ... | 68955 | -paral | 4084 4085 | ... |
4086 | sfx:W | Leviticus | 26 | 46 | 529382 | NTN | sfx | 3sm=JHWH | JHWH | 68955 | -gentf | 4085 | 68955 |
4087 | BJN BNJ JFR>L | Leviticus | 26 | 46 | 529382 | NTN | BJN BN JFR>L | ... | ... | 68957 68958 68959 | -paral | 4086 4087 | ... |
4088 | JFR>L | Leviticus | 26 | 46 | 529382 | NTN | JFR>L | ... | ... | 68959 | -gentf | 4087 | ... |
4089 | B-HR SJNJ | Leviticus | 26 | 46 | 529382 | NTN | B HR SJNJ | ... | ... | 68960 68961 68962 | Locat | 4088 4089 | ... |
4090 | SJNJ | Leviticus | 26 | 46 | 529382 | NTN | SJNJ | ... | ... | 68962 | -gentf | 4089 | ... |
4091 | B-JD MCH | Leviticus | 26 | 46 | 529382 | NTN | B JD MCH | ... | ... | 68963 68964 68965 | Adjunc | 4090 4091 | ... |
4092 | MCH | Leviticus | 26 | 46 | 529382 | NTN | MCH | 0sm=MCH | MCH | 68965 | -gentf | 4091 | 1318773 |
4092 rows × 13 columns
Next, we need to find the edges between co-referring actor references. The principle is to find a list of nodes from the 'otype' column referring to the same actor.
First, we define a function that takes an actor reference and a chapter as input and produces a list of all nodes referring to the same actor based on the dataframe:
def mapActors(actor, chapter):
'''
Input: Actor reference (string) and chapter (int)
Output: List of nodes with the same actor reference in that particular chapter
'''
subset_data = data[(data.chapter == str(chapter)) & (data.actor == actor)]
otype = subset_data.otype.values.tolist()
return otype
#mapActors('>X BN JFR>L',25)
The function above is used to create a dictionary of edges for all actor references in the dataset:
edges_dict = {}
for row in data.iterrows():
row = row[0]
actor = data['actor'][row]
if actor != '...': #Excluding rows with empty actors
edges = mapActors(actor, data['chapter'][row]) #A list of edges is created with the function mapActors() for each row
edges.remove(data['otype'][row]) #The present row number is removed from the edges list to avoid redundancy.
#If a set can be made from the edges list, the set is added to the dictionary with the row otype node as key:
if set(edges):
edges_dict[data['otype'][row]] = set(edges)
#edges_dict
Now, we can export the actor references and the edge dictionary as TF-features. First, we assign TF version names and paths to ensure the right storage of the features:
if 'SCRIPT' not in locals():
SCRIPT = False
FORCE = True
CORE_NAME = 'bhsa'
NAME = 'actor'
VERSION= 'c'
CORE_MODULE = 'core'
repoBase = os.path.expanduser('~/text-fabric-data/etcbc')
coreTf = '{}/{}/tf/{}'.format(repoBase, CORE_NAME, VERSION) #Path of the core TF datasets
thisTf = '~Feature_sets/{}/tf/{}'.format(NAME, VERSION) #Path of actor datasets
To store the lexical and suffix references as TF-features the information needs to be stored in dictionaries. This is done by looping through the dataset and distinguishing between suffix references and non-suffix references and storing the TF node in the relevant dictionary:
suffix_dict = {}
lexical_dict = {}
for row in data.iterrows():
row = row[0]
if data['actor'][row] != '...': #Excluding empty actors
if data['reference'][row] == 'sfx': #If suffix, the word node (always only one word node) is stored in the dict.
node = int(data['slots'][row])
suffix_dict[node] = data['actor'][row]
else: #If not suffix, the nearest object is found using nearestObject() and the resulting node is stored.
node = nearestObject(row)
lexical_dict[node] = data['actor'][row]
nodeFeatures = dict(actor=lexical_dict)
metaData = dict(
actor=dict(
valueType='str',
description="Participant references for words, subphrases and phrases. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491",
coreData='BHSA',
coreVersion=VERSION
)
)
TF.save(nodeFeatures=nodeFeatures, metaData=metaData, module='c')
| 0.10s T actor to actor/tf/c
nodeFeatures = dict(prs_actor=suffix_dict)
metaData = dict(
prs_actor=dict(
valueType='str',
description="Participant references for pronominal suffixes. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491",
coreData='BHSA',
coreVersion=VERSION
)
)
TF.save(nodeFeatures=nodeFeatures, metaData=metaData, module='c')
| 0.03s T prs_actor to actor/tf/c
edgeFeatures = dict(coref=edges_dict)
metaData = dict(
coref=dict(
valueType='str',
description="Edges to co-referring actors on chapter-level. The references are adapted from Eep Talstra's work on participant tracking. http://doi.org/10.5281/zenodo.1479491",
coreData='BHSA',
coreVersion=VERSION
)
)
TF.save(edgeFeatures=edgeFeatures, metaData=metaData, module='c')
| 0.13s T coref to actor/tf/c