In this NB, we produce two text-fabric features on the BHSA data using the get_heads
method developed in getting_heads.ipynb. See that notebook for a detailed description of the motivation, method, and shortcomings for this data.
N.B. this data is experimental and a work in progress!
Three features are produced herein:
Generated the features for version 2021, removed version 2016, adapted some code to modern Text-Fabric.
Added a new feature, noun_heads
, to pluck noun heads from both noun phrases or prepositional phrases.
New export for the updated C version of BHSA data.
A new function has been added to double check phrase heads. Prepositional phrases whose objects are also prepositions have resulted in some false heads being assigned. This is because prepositional objects receive no subphrase relations in BHSA and appeared to the algorithm as independent. An additional check is required to make sure that a given preposition does not serve as the head of its phrase. The new function, check_preposition
, looks one word behind a candidate head noun (within the phrase boundaries) and validates only those cases that are not immediately preceded by another preposition.
In discussion with Stephen Ku, I've decided to apply the quantifier
algorithm to prepositional objects so that we retrieve the head of the prepositional object noun phrase rather than a quantifier. For good measure, I will also apply the attributed
function (see getting_heads.ipynb for a description of both functions).
import collections
import random
from tf.fabric import Fabric
from heads import get_heads, find_quantified, find_attributed
# export heads.tf & prep_obj.tf for all TF versions
for version in ["2021", "c", "2017"]:
print("processing version ", version, "\n")
# load Text-Fabric and data
TF = Fabric(locations="~/github/etcbc/bhsa/tf", modules=version)
api = TF.load(
"""
book chapter verse
typ pdp rela mother
function lex sp ls
"""
)
F, E, T, L = api.F, api.E, api.T, api.L # TF data methods
# get heads
heads_features = collections.defaultdict(dict)
print("\nprocessing heads...")
for phrase in list(F.otype.s("phrase")) + list(F.otype.s("phrase_atom")):
heads = get_heads(phrase, api)
if heads:
heads_features["heads"][phrase] = set(heads)
# make noun heads part 1
if F.typ.v(phrase) != "PP" and heads:
heads_features["noun_heads"][phrase] = set(heads)
# do prep objects and noun heads part 2
if F.typ.v(phrase) == "PP" and heads:
for head in heads:
obj = head + 1 if F.pdp.v(head + 1) != "art" else head + 2
phrase_bounds = L.d(phrase, "word")
if obj in phrase_bounds:
obj = find_quantified(obj, api) or find_attributed(obj, api) or obj
heads_features["prep_obj"][head] = set([obj])
heads_features["noun_heads"][phrase] = set(
[obj]
) # make noun heads part 2
# export TF data
print("\nexporting TF...")
meta = {
"": {"created_by": "Cody Kingham", "coreData": "BHSA", "coreVersion": version},
"heads": {
"source": "see the notebook at https://github.com/etcbc/lingo/heads",
"valueType": "int",
"edgeValues": False,
},
"prep_obj": {
"source": "see the notebook at https://github.com/etcbc/lingo/heads",
"valueType": "int",
"edgeValues": False,
},
"noun_heads": {
"source": "see the notebook at https://github.com/etcbc/lingo/heads",
"valueType": "int",
"edgeValues": False,
},
}
save_tf = Fabric(
locations="~/github/etcbc/lingo/heads/tf", modules=version, silent=True
)
save_api = save_tf.load("", silent=True)
save_tf.save(nodeFeatures={}, edgeFeatures=heads_features, metaData=meta)
print(f"\ndone with {version}")
processing version 2021 This is Text-Fabric 9.1.8 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 116 features found and 0 ignored | 0.00s Dataset without structure sections in otext:no structure functions in the T-API | 0.00s ... __levels__ ... | 0.00s ... __order__ ... | 0.03s ... __rank__ ... | 0.03s ... __levUp__ ... | 2.88s ... __levDown__ ... | 2.52s ... __boundary__ ... | 3.04s ... __sections__ ... 11s All features loaded/computed - for details use TF.isLoaded() processing heads...
0.00s Feature "otype" not available in ~/github/etcbc/lingo/heads/tf/2021
exporting TF... done with 2021 processing version c This is Text-Fabric 9.1.8 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 114 features found and 0 ignored | 0.00s Dataset without structure sections in otext:no structure functions in the T-API | 0.00s ... __levels__ ... | 0.00s ... __order__ ... | 0.03s ... __rank__ ... | 0.03s ... __levUp__ ... | 3.54s ... __levDown__ ... | 3.72s ... __boundary__ ... | 4.01s ... __sections__ ... 18s All features loaded/computed - for details use TF.isLoaded() processing heads...
0.00s Feature "otype" not available in ~/github/etcbc/lingo/heads/tf/c
exporting TF... done with c processing version 2017 This is Text-Fabric 9.1.8 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 115 features found and 0 ignored | 0.00s Dataset without structure sections in otext:no structure functions in the T-API | 0.00s ... __levels__ ... | 0.00s ... __order__ ... | 0.03s ... __rank__ ... | 0.03s ... __levUp__ ... | 6.52s ... __levDown__ ... | 4.71s ... __boundary__ ... | 4.97s ... __sections__ ... 23s All features loaded/computed - for details use TF.isLoaded() processing heads...
0.00s Feature "otype" not available in ~/github/etcbc/lingo/heads/tf/2017
exporting TF... done with 2017
from tf.app import use
A = use("ETCBC/bhsa", mod="etcbc/lingo/heads/tf:clone", version="2021", hoist=globals())
This is Text-Fabric 9.1.8 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 125 features found and 0 ignored | 5.80s T heads from ~/github/etcbc/lingo/heads/tf/2021 | 1.24s T noun_heads from ~/github/etcbc/lingo/heads/tf/2021 | 0.16s T prep_obj from ~/github/etcbc/lingo/heads/tf/2021
A.show(
A.search(
"""
phrase typ=PP
-noun_heads> word
"""
)[:10]
)
0.92s 45182 results
result 1
result 2
result 3
result 4
result 5
result 6
result 7
result 8
result 9
result 10
test_prep = []
for ph in F.typ.s("PP"):
heads = E.heads.f(ph)
objs = [E.prep_obj.f(prep)[0] for prep in heads if E.prep_obj.f(prep)]
test_prep.append(tuple(objs))
random.shuffle(test_prep)
A.show(test_prep[:50]) # uncomment me
result 1
result 2
result 5
result 6
result 7
result 8
result 9
result 10
result 12
result 13
result 14
result 15
result 16
result 18
result 19
result 20
result 21
result 23
result 24
result 25
result 27
result 28
result 29
result 30
result 31
result 32
result 33
result 34
result 40
result 41
result 42
result 43
result 44
result 46
result 48
result 49
result 50
See what the prepositional object looks like for Genesis 1:21:
gen_121_case = L.d(T.nodeFromSection(("Genesis", 1, 21)), "phrase")[13]
print("example phrase", gen_121_case, "phrase number 14 in verse")
print(T.text(L.d(gen_121_case, "word")))
print("\nGen 1:21 phrase 14's heads, a preposition:")
heads = E.heads.f(gen_121_case)
print(T.text(heads))
print("\nGen 1:21 phrase 14's prepositional object:")
print(T.text(E.prep_obj.f(heads[0])))
example phrase 651799 phrase number 14 in verse אֵ֨ת כָּל־עֹ֤וף כָּנָף֙ Gen 1:21 phrase 14's heads, a preposition: אֵ֨ת Gen 1:21 phrase 14's prepositional object: עֹ֤וף
heads = [E.heads.f(ph) for ph in F.otype.s("phrase") if F.typ.v(ph) == "NP"]
random.shuffle(heads)
A.show(heads[:50]) # uncomment me
result 1
result 2
result 3
result 4
result 5
result 6
result 7
result 8
result 9
result 10
result 11
result 12
result 13
result 14
result 15
result 16
result 17
result 18
result 19
result 20
result 21
result 22
result 23
result 24
result 25
result 26
result 27
result 28
result 29
result 30
result 31
result 32
result 33
result 34
result 35
result 36
result 37
result 38
result 39
result 40
result 41
result 42
result 43
result 44
result 45
result 46
result 47
result 48
result 49
result 50