We load the Text-Fabric program and the BHSA data.
%load_ext autoreload
%autoreload 2
from tf.app import use
from util import getTfVerses, getShebanqData, compareResults, MQL_RESULTS
VERSION = "2017"
# A = use('ETCBC/bhsa', hoist=globals(), version=VERSION)
A = use("ETCBC/bhsa:clone", checkout="clone", hoist=globals(), version=VERSION)
Oliver Glanz: DHQ article: discourse pattern deviation
[[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Subj
[word AS samesubject]
]
..
[phrase FOCUS function = Cmpl
[word AS samecomplement]
]
]
[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}
[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Subj
[word lex = samesubject.lex]
]
..
[phrase FOCUS function = Cmpl
[word lex = samecomplement.lex]
]
]]
OR
[[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Cmpl
[word AS samecomplement2]
]
..
[phrase FOCUS function = Subj
[word AS samesubject2]
]
]
[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}
[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Cmpl
[word lex = samecomplement2.lex]
]
..
[phrase FOCUS function = Subj
[word lex = samesubject2.lex]
]
]]
OR
[[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Subj
[word AS samesubject3]
]
..
[phrase FOCUS function = Cmpl
[word AS samecomplement3]
]
]
[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}
[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Cmpl
[word lex = samecomplement3.lex]
]
..
[phrase FOCUS function = Subj
[word lex = samesubject3.lex]
]
]]
OR
[[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Cmpl
[word AS samecomplement4]
]
..
[phrase FOCUS function = Subj
[word AS samesubject4]
]
]
[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}
[clause domain = "N"
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
..
[phrase FOCUS function = Subj
[word lex = samesubject4.lex]
]
..
[phrase FOCUS function = Cmpl
[word lex = samecomplement4.lex]
]
]]
(verses, words) = getShebanqData(A, MQL_RESULTS, 9)
344 results in 101 verses with 356 words
This is a complex query. Let's make it simpler first.
We see some recurring objects:
speakPhrase
[phrase function = Pred
[word
[word lex = "DBR["]
OR
[word lex = ">MR["]
OR
[word lex = "QR>["]
]
]
subjPhrase
[phrase FOCUS function = Subj
[word AS samesubject]
]
cmplPhrase
[phrase FOCUS function = Cmpl
[word AS samecomplement]
]
clauses
[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}
The structure of the whole query is then, in pseudo TF-query terms:
clause domain=N
speakPhrase
< s:subjPhrase
< c:cmplPhrase
clauses
clause domain=N
speakPhrase
< subjPhrase (.lex. s)
< cmplPhrase (.lex. c)
OR
clause domain=N
speakPhrase
< c:cmplPhrase
< s:subjPhrase
clauses
clause domain=N
speakPhrase
< cmplPhrase (.lex. c)
< subjPhrase (.lex. s)
OR
clause domain=N
speakPhrase
< s:subjPhrase
< c:cmplPhrase
clauses
clause domain=N
speakPhrase
< cmplPhrase (.lex. c)
< subjPhrase (.lex. s)
OR
clause domain=N
speakPhrase
< c:cmplPhrase
< s:subjPhrase
clauses
clause domain=N
speakPhrase
< subjPhrase (.lex. s)
< cmplPhrase (.lex. c)
The OR
is only used to enumerate the four different orders between subjPhrase
and cmplPhrase
.
So we can simplify greatly!
clause domain=N
sp1:speakPhrase
< s1:subjPhrase
c1:cmplPhrase
< clauses
< clause domain=N
sp2:speakPhrase
< s2:subjPhrase (.lex. s1)
c2:cmplPhrase (.lex. c1)
sp1 < c1
sp2 < c2
s1 .lex. s2
c1 .lex. c2
There is a problem with translating the clauses
bit:
[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}
The operator * {n-m}
means: repeat the previous block n
to m
times.
In TF-Query there is no such operator.
We will mimic this query by means of a mixture of TF-Query and hand-coding.
We examine the results of searching for
clause domain=N
sp: speakPhrase
< subjPhrase
c:cmplPhrase
sp < c
By means of hand coding we walk through the results of this query:
clauses
subjPhrase
and cmplPhrase
We take care to deliver the results in the same way as a TF-query would do.
N.B. What does agree mean in 4? According to the MQL query:
[phrase FOCUS function = Subj
[word AS samesubject]
]
..
[phrase FOCUS function = Cmpl
[word AS samecomplement]
]
That means that two phrases agree if they both have a word which are an occurrence of the same lexeme. In short: they share a lexeme.
That is a fairly relaxed condition: if both phrases have the article, or the same preposition, the condition is met. BUt that is what the query says, and we stick to that.
query = """
clause domain=N
sp:phrase function=Pred
word lex=DBR[|>MR[|QR>[
< phrase function=Subj
c:phrase function=Cmpl
sp < c
"""
speakResults = A.search(query)
1.31s 1132 results
This is our starting point.
We are going to weave these results together.
The following function does that.
It has to find up to 50 intervening clauses with domain=Q
between
two clauses with a speech verb.
Let's parametrize this number 50, so that we can play with it later on.
speakResultsIndex = {sr[0]: sr for sr in speakResults}
def weave(qLimit):
results = []
for speakResult in speakResults:
(clause, speakPhrase, speakWord, subjPhrase, cmplPhrase) = speakResult
nextClause = L.n(clause, otype="clause")
if not nextClause:
continue
nextClause = nextClause[0]
domain = F.domain.v(nextClause)
qSeen = domain == "Q"
if not qSeen and domain != "N":
continue
if not qSeen:
nextClause = L.n(nextClause, otype="clause")
if not nextClause:
continue
nextClause = nextClause[0]
domain = F.domain.v(nextClause)
qSeen = domain == "Q"
if not qSeen:
continue
qs = 1
while qs <= qLimit:
nextClause = L.n(nextClause, otype="clause")
if not nextClause:
break
nextClause = nextClause[0]
domain = F.domain.v(nextClause)
if domain != "Q":
break
qs += 1
if not nextClause:
continue
if domain != "N":
continue
if nextClause not in speakResultsIndex:
continue
nextSpeakResult = speakResultsIndex[nextClause]
(
nextClause,
nextSpeakPhrase,
nextSpeakWord,
nextSubjPhrase,
nextCmplPhrase,
) = nextSpeakResult
# here we implement the "agree" bit. Note that & means: set intersection.
if (
{F.lex.v(w) for w in L.d(subjPhrase, otype="word")}
& {F.lex.v(w) for w in L.d(nextSubjPhrase, otype="word")}
) and (
{F.lex.v(w) for w in L.d(cmplPhrase, otype="word")}
& {F.lex.v(w) for w in L.d(nextCmplPhrase, otype="word")}
):
# note that we add the number of Q-clauses at the end of each result tuple
results.append(
(
clause,
speakPhrase,
subjPhrase,
cmplPhrase,
nextSubjPhrase,
nextCmplPhrase,
qs,
)
)
print(f"qLimit={qLimit}: {len(results)} results")
return results
results = weave(50)
qLimit=50: 62 results
(tfVerses, tfWords) = getTfVerses(A, results, (2, 3, 4, 5))
101 verses 356 words
compareResults(A, verses, words, tfVerses, tfWords)
VERSES EQUAL WORDS EQUAL
What if we allowed only strings of 49 Q-clauses? Would that matter?
It would be nice if we could see the number of Q-clauses in each result. Well, we have sneaked that in already! It is added at the end of each result tuple.
Here are the minimum and the maximum that we encountered:
print(f"minimum number of Q-clauses: {min(r[-1] for r in results):>2}")
print(f"maximum number of Q-clauses: {max(r[-1] for r in results):>2}")
minimum number of Q-clauses: 1 maximum number of Q-clauses: 50
So we expect that it does matter if we go from 50 to 49. Before we test that, let us show all Q-lengths:
for r in results:
startPhrase = min(r[2], r[3])
startVerse = T.sectionFromNode(L.u(startPhrase, otype="verse")[0])
startString = "{} {}:{}".format(*startVerse)
endPhrase = min(r[4], r[5])
endVerse = T.sectionFromNode(L.u(endPhrase, otype="verse")[0])
endString = "{} {}:{}".format(*endVerse)
qs = r[-1]
print(f"{startString:<20} == {qs:>2} Q-clauses ==> {endString}")
Genesis 3:2 == 7 Q-clauses ==> Genesis 3:4 Genesis 16:9 == 2 Q-clauses ==> Genesis 16:10 Genesis 16:10 == 2 Q-clauses ==> Genesis 16:11 Genesis 17:9 == 14 Q-clauses ==> Genesis 17:15 Genesis 20:9 == 4 Q-clauses ==> Genesis 20:10 Genesis 41:38 == 2 Q-clauses ==> Genesis 41:39 Genesis 41:39 == 5 Q-clauses ==> Genesis 41:41 Exodus 7:14 == 25 Q-clauses ==> Exodus 7:19 Exodus 7:26 == 12 Q-clauses ==> Exodus 8:1 Exodus 30:11 == 17 Q-clauses ==> Exodus 30:17 Exodus 30:17 == 15 Q-clauses ==> Exodus 30:22 Exodus 30:22 == 24 Q-clauses ==> Exodus 30:34 Exodus 30:34 == 15 Q-clauses ==> Exodus 31:1 Leviticus 5:14 == 23 Q-clauses ==> Leviticus 5:20 Leviticus 5:20 == 25 Q-clauses ==> Leviticus 6:1 Leviticus 6:1 == 39 Q-clauses ==> Leviticus 6:12 Leviticus 6:12 == 16 Q-clauses ==> Leviticus 6:17 Leviticus 7:22 == 13 Q-clauses ==> Leviticus 7:28 Leviticus 12:1 == 27 Q-clauses ==> Leviticus 13:1 Leviticus 21:1 == 50 Q-clauses ==> Leviticus 21:16 Leviticus 22:17 == 26 Q-clauses ==> Leviticus 22:26 Leviticus 22:26 == 21 Q-clauses ==> Leviticus 23:1 Leviticus 23:1 == 19 Q-clauses ==> Leviticus 23:9 Leviticus 23:9 == 41 Q-clauses ==> Leviticus 23:23 Leviticus 23:23 == 5 Q-clauses ==> Leviticus 23:26 Leviticus 23:26 == 18 Q-clauses ==> Leviticus 23:33 Numbers 3:5 == 13 Q-clauses ==> Numbers 3:11 Numbers 3:11 == 7 Q-clauses ==> Numbers 3:14 Numbers 4:1 == 46 Q-clauses ==> Numbers 4:17 Numbers 4:17 == 12 Q-clauses ==> Numbers 4:21 Numbers 5:5 == 22 Q-clauses ==> Numbers 5:11 Numbers 8:23 == 10 Q-clauses ==> Numbers 9:1 Numbers 15:1 == 34 Q-clauses ==> Numbers 15:17 Numbers 18:1 == 22 Q-clauses ==> Numbers 18:8 Numbers 18:8 == 36 Q-clauses ==> Numbers 18:20 Numbers 18:20 == 16 Q-clauses ==> Numbers 18:25 Numbers 18:25 == 20 Q-clauses ==> Numbers 19:1 Numbers 27:6 == 19 Q-clauses ==> Numbers 27:12 Numbers 33:50 == 28 Q-clauses ==> Numbers 34:1 Numbers 34:16 == 20 Q-clauses ==> Numbers 35:1 Numbers 35:1 == 24 Q-clauses ==> Numbers 35:9 Deuteronomy 9:12 == 7 Q-clauses ==> Deuteronomy 9:13 Joshua 3:5 == 2 Q-clauses ==> Joshua 3:6 Joshua 9:19 == 7 Q-clauses ==> Joshua 9:21 Judges 8:23 == 3 Q-clauses ==> Judges 8:24 Judges 9:9 == 4 Q-clauses ==> Judges 9:10 Judges 9:10 == 2 Q-clauses ==> Judges 9:11 Judges 9:11 == 3 Q-clauses ==> Judges 9:12 Judges 9:12 == 2 Q-clauses ==> Judges 9:13 Judges 9:14 == 2 Q-clauses ==> Judges 9:15 1_Samuel 16:10 == 1 Q-clauses ==> 1_Samuel 16:11 1_Samuel 16:15 == 9 Q-clauses ==> 1_Samuel 16:17 1_Samuel 20:37 == 1 Q-clauses ==> 1_Samuel 20:38 2_Samuel 14:8 == 2 Q-clauses ==> 2_Samuel 14:9 2_Samuel 15:25 == 9 Q-clauses ==> 2_Samuel 15:27 2_Samuel 18:20 == 4 Q-clauses ==> 2_Samuel 18:21 2_Samuel 24:22 == 6 Q-clauses ==> 2_Samuel 24:23 1_Kings 13:7 == 3 Q-clauses ==> 1_Kings 13:8 1_Kings 22:4 == 3 Q-clauses ==> 1_Kings 22:5 Jeremiah 14:14 == 46 Q-clauses ==> Jeremiah 15:1 Ruth 2:20 == 2 Q-clauses ==> Ruth 2:20 1_Chronicles 28:20 == 13 Q-clauses ==> 1_Chronicles 29:1
Let's double-check: if we allow only strings of Q-clauses up to length 49, the result in Leviticus 21:1 should be gone.
results = weave(49)
qLimit=49: 61 results
(tfVerses, tfWords) = getTfVerses(A, results, (2, 3, 4, 5))
99 verses 350 words
compareResults(A, verses, words, tfVerses, tfWords)
DIFFERENCE: ('Leviticus', 21, 1) ('Leviticus', 22, 17) DIFFERENCE: 64969 = J:HW@H03 65656 = J:HW@73H
-1
And so it is!
Conclusion
Instead of running a query and obtaining a list of results, we did a bit of programming and we can get much more than just the results.
That is the power of programming. But programming is difficult, and mistakes will be made.
TF-Query helps you to find a sweet spot between crafting queries and writing code.