Notebook

MQL versus TF-Query¶

See TF versus MQL for an introduction.

Loading¶

We load the Text-Fabric program and the BHSA data.

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

from tf.app import use

from util import getTfVerses, getShebanqData, compareResults, MQL_RESULTS

In [3]:

VERSION = "2017"
# A = use('ETCBC/bhsa', hoist=globals(), version=VERSION)
A = use("ETCBC/bhsa:clone", checkout="clone", hoist=globals(), version=VERSION)

TF-app: ~/github/annotation/app-bhsa/code

data: ~/github/etcbc/bhsa/tf/2017

data: ~/github/etcbc/phono/tf/2017

data: ~/github/etcbc/parallels/tf/2017

Text-Fabric: Text-Fabric API 8.4.13, app-bhsa v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:

Parallel Passages

crossref

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
omap@ll
oslots

Phonetic Transcriptions

phono
phono_trailer

Text-Fabric API: names N F E L T S C TF directly usable

Example 9¶

Oliver Glanz: DHQ article: discourse pattern deviation

[[clause domain = "N"
  [phrase function = Pred
    [word
      [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
  ]
    ..
    [phrase FOCUS function = Subj
      [word AS samesubject]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word AS samecomplement]
    ]
  ]
  [clause domain = "N"]* {0-1}
  [clause domain = "Q"]* {1-50}
  [clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Subj
      [word lex = samesubject.lex]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word lex = samecomplement.lex]
    ]
  ]]
OR
  [[clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word AS samecomplement2]
    ]
    ..
    [phrase FOCUS function = Subj
      [word AS samesubject2]
    ]
  ]
  [clause domain = "N"]* {0-1}
  [clause domain = "Q"]* {1-50}
  [clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word lex = samecomplement2.lex]
    ]
    ..
    [phrase FOCUS function = Subj
      [word lex = samesubject2.lex]
    ]
  ]]
  OR
  [[clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Subj
      [word AS samesubject3]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word AS samecomplement3]
    ]
  ]
  [clause domain = "N"]* {0-1}
  [clause domain = "Q"]* {1-50}
  [clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word lex = samecomplement3.lex]
    ]
    ..
    [phrase FOCUS function = Subj
      [word lex = samesubject3.lex]
    ]
  ]]
  OR
  [[clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word AS samecomplement4]
    ]
    ..
    [phrase FOCUS function = Subj
      [word AS samesubject4]
    ]
  ]
  [clause domain = "N"]* {0-1}
  [clause domain = "Q"]* {1-50}
  [clause domain = "N"
    [phrase function = Pred
      [word
        [word lex = "DBR["]
        OR
        [word lex = ">MR["]
        OR
        [word lex = "QR>["]
      ]
    ]
    ..
    [phrase FOCUS function = Subj
      [word lex = samesubject4.lex]
    ]
    ..
    [phrase FOCUS function = Cmpl
      [word lex = samecomplement4.lex]
    ]
  ]]

In [4]:

(verses, words) = getShebanqData(A, MQL_RESULTS, 9)

344 results in 101 verses with 356 words

This is a complex query. Let's make it simpler first.

We see some recurring objects:

speakPhrase

[phrase function = Pred
  [word
    [word lex = "DBR["]
      OR
      [word lex = ">MR["]
      OR
      [word lex = "QR>["]
    ]
]

subjPhrase

[phrase FOCUS function = Subj
  [word AS samesubject]
]

cmplPhrase

[phrase FOCUS function = Cmpl
  [word AS samecomplement]
]

clauses

[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}

The structure of the whole query is then, in pseudo TF-query terms:

clause domain=N
  speakPhrase
  < s:subjPhrase
  < c:cmplPhrase
clauses
clause domain=N
  speakPhrase
  < subjPhrase (.lex. s)
  < cmplPhrase (.lex. c)

clause domain=N
  speakPhrase
  < c:cmplPhrase
  < s:subjPhrase
clauses
clause domain=N
  speakPhrase
  < cmplPhrase (.lex. c)
  < subjPhrase (.lex. s)

clause domain=N
  speakPhrase
  < s:subjPhrase
  < c:cmplPhrase
clauses
clause domain=N
  speakPhrase
  < cmplPhrase (.lex. c)
  < subjPhrase (.lex. s)

clause domain=N
  speakPhrase
  < c:cmplPhrase
  < s:subjPhrase
clauses
clause domain=N
  speakPhrase
  < subjPhrase (.lex. s)
  < cmplPhrase (.lex. c)

The OR is only used to enumerate the four different orders between subjPhrase and cmplPhrase. So we can simplify greatly!

clause domain=N
  sp1:speakPhrase
  < s1:subjPhrase
  c1:cmplPhrase
< clauses
< clause domain=N
  sp2:speakPhrase
  < s2:subjPhrase (.lex. s1)
  c2:cmplPhrase (.lex. c1)

sp1 < c1
sp2 < c2
s1 .lex. s2
c1 .lex. c2

There is a problem with translating the clauses bit:

[clause domain = "N"]* {0-1}
[clause domain = "Q"]* {1-50}

The operator * {n-m} means: repeat the previous block n to m times. In TF-Query there is no such operator.

We will mimic this query by means of a mixture of TF-Query and hand-coding.

We examine the results of searching for

clause domain=N
  sp: speakPhrase
  < subjPhrase
  c:cmplPhrase

sp < c

By means of hand coding we walk through the results of this query:

suppose we are at a query result
walk through the following clauses as long as they match clauses
see if the next clause is a result of the query
check whether this clause and the one we started at in 1. agree lexically in their subjPhrase and cmplPhrase
for each such pair of results we add the combination to the result set

We take care to deliver the results in the same way as a TF-query would do.

N.B. What does agree mean in 4? According to the MQL query:

[phrase FOCUS function = Subj
      [word AS samesubject]
]
..
[phrase FOCUS function = Cmpl
  [word AS samecomplement]
]

That means that two phrases agree if they both have a word which are an occurrence of the same lexeme. In short: they share a lexeme.

That is a fairly relaxed condition: if both phrases have the article, or the same preposition, the condition is met. BUt that is what the query says, and we stick to that.

In [5]:

query = """
clause domain=N
  sp:phrase function=Pred
    word lex=DBR[|>MR[|QR>[
  < phrase function=Subj
  c:phrase function=Cmpl
  
sp < c
"""

In [6]:

speakResults = A.search(query)

  1.31s 1132 results

This is our starting point.

We are going to weave these results together.

The following function does that. It has to find up to 50 intervening clauses with domain=Q between two clauses with a speech verb.

Let's parametrize this number 50, so that we can play with it later on.

In [12]:

speakResultsIndex = {sr[0]: sr for sr in speakResults}


def weave(qLimit):
    results = []

    for speakResult in speakResults:
        (clause, speakPhrase, speakWord, subjPhrase, cmplPhrase) = speakResult
        nextClause = L.n(clause, otype="clause")
        if not nextClause:
            continue
        nextClause = nextClause[0]
        domain = F.domain.v(nextClause)
        qSeen = domain == "Q"
        if not qSeen and domain != "N":
            continue
        if not qSeen:
            nextClause = L.n(nextClause, otype="clause")
            if not nextClause:
                continue
            nextClause = nextClause[0]
            domain = F.domain.v(nextClause)
            qSeen = domain == "Q"
        if not qSeen:
            continue
        qs = 1
        while qs <= qLimit:
            nextClause = L.n(nextClause, otype="clause")
            if not nextClause:
                break
            nextClause = nextClause[0]
            domain = F.domain.v(nextClause)
            if domain != "Q":
                break
            qs += 1
        if not nextClause:
            continue
        if domain != "N":
            continue
        if nextClause not in speakResultsIndex:
            continue

        nextSpeakResult = speakResultsIndex[nextClause]
        (
            nextClause,
            nextSpeakPhrase,
            nextSpeakWord,
            nextSubjPhrase,
            nextCmplPhrase,
        ) = nextSpeakResult

        # here we implement the "agree" bit. Note that & means: set intersection.
        if (
            {F.lex.v(w) for w in L.d(subjPhrase, otype="word")}
            & {F.lex.v(w) for w in L.d(nextSubjPhrase, otype="word")}
        ) and (
            {F.lex.v(w) for w in L.d(cmplPhrase, otype="word")}
            & {F.lex.v(w) for w in L.d(nextCmplPhrase, otype="word")}
        ):
            # note that we add the number of Q-clauses at the end of each result tuple
            results.append(
                (
                    clause,
                    speakPhrase,
                    subjPhrase,
                    cmplPhrase,
                    nextSubjPhrase,
                    nextCmplPhrase,
                    qs,
                )
            )

    print(f"qLimit={qLimit}: {len(results)} results")
    return results

In [9]:

results = weave(50)

qLimit=50: 62 results

In [10]:

(tfVerses, tfWords) = getTfVerses(A, results, (2, 3, 4, 5))

101 verses
356 words

In [11]:

compareResults(A, verses, words, tfVerses, tfWords)

VERSES EQUAL
WORDS EQUAL

What if we allowed only strings of 49 Q-clauses? Would that matter?

It would be nice if we could see the number of Q-clauses in each result. Well, we have sneaked that in already! It is added at the end of each result tuple.

Here are the minimum and the maximum that we encountered:

In [13]:

print(f"minimum number of Q-clauses: {min(r[-1] for r in results):>2}")
print(f"maximum number of Q-clauses: {max(r[-1] for r in results):>2}")

minimum number of Q-clauses:  1
maximum number of Q-clauses: 50

So we expect that it does matter if we go from 50 to 49. Before we test that, let us show all Q-lengths:

In [14]:

for r in results:
    startPhrase = min(r[2], r[3])
    startVerse = T.sectionFromNode(L.u(startPhrase, otype="verse")[0])
    startString = "{} {}:{}".format(*startVerse)

    endPhrase = min(r[4], r[5])
    endVerse = T.sectionFromNode(L.u(endPhrase, otype="verse")[0])
    endString = "{} {}:{}".format(*endVerse)

    qs = r[-1]

    print(f"{startString:<20} == {qs:>2} Q-clauses ==>     {endString}")

Genesis 3:2          ==  7 Q-clauses ==>     Genesis 3:4
Genesis 16:9         ==  2 Q-clauses ==>     Genesis 16:10
Genesis 16:10        ==  2 Q-clauses ==>     Genesis 16:11
Genesis 17:9         == 14 Q-clauses ==>     Genesis 17:15
Genesis 20:9         ==  4 Q-clauses ==>     Genesis 20:10
Genesis 41:38        ==  2 Q-clauses ==>     Genesis 41:39
Genesis 41:39        ==  5 Q-clauses ==>     Genesis 41:41
Exodus 7:14          == 25 Q-clauses ==>     Exodus 7:19
Exodus 7:26          == 12 Q-clauses ==>     Exodus 8:1
Exodus 30:11         == 17 Q-clauses ==>     Exodus 30:17
Exodus 30:17         == 15 Q-clauses ==>     Exodus 30:22
Exodus 30:22         == 24 Q-clauses ==>     Exodus 30:34
Exodus 30:34         == 15 Q-clauses ==>     Exodus 31:1
Leviticus 5:14       == 23 Q-clauses ==>     Leviticus 5:20
Leviticus 5:20       == 25 Q-clauses ==>     Leviticus 6:1
Leviticus 6:1        == 39 Q-clauses ==>     Leviticus 6:12
Leviticus 6:12       == 16 Q-clauses ==>     Leviticus 6:17
Leviticus 7:22       == 13 Q-clauses ==>     Leviticus 7:28
Leviticus 12:1       == 27 Q-clauses ==>     Leviticus 13:1
Leviticus 21:1       == 50 Q-clauses ==>     Leviticus 21:16
Leviticus 22:17      == 26 Q-clauses ==>     Leviticus 22:26
Leviticus 22:26      == 21 Q-clauses ==>     Leviticus 23:1
Leviticus 23:1       == 19 Q-clauses ==>     Leviticus 23:9
Leviticus 23:9       == 41 Q-clauses ==>     Leviticus 23:23
Leviticus 23:23      ==  5 Q-clauses ==>     Leviticus 23:26
Leviticus 23:26      == 18 Q-clauses ==>     Leviticus 23:33
Numbers 3:5          == 13 Q-clauses ==>     Numbers 3:11
Numbers 3:11         ==  7 Q-clauses ==>     Numbers 3:14
Numbers 4:1          == 46 Q-clauses ==>     Numbers 4:17
Numbers 4:17         == 12 Q-clauses ==>     Numbers 4:21
Numbers 5:5          == 22 Q-clauses ==>     Numbers 5:11
Numbers 8:23         == 10 Q-clauses ==>     Numbers 9:1
Numbers 15:1         == 34 Q-clauses ==>     Numbers 15:17
Numbers 18:1         == 22 Q-clauses ==>     Numbers 18:8
Numbers 18:8         == 36 Q-clauses ==>     Numbers 18:20
Numbers 18:20        == 16 Q-clauses ==>     Numbers 18:25
Numbers 18:25        == 20 Q-clauses ==>     Numbers 19:1
Numbers 27:6         == 19 Q-clauses ==>     Numbers 27:12
Numbers 33:50        == 28 Q-clauses ==>     Numbers 34:1
Numbers 34:16        == 20 Q-clauses ==>     Numbers 35:1
Numbers 35:1         == 24 Q-clauses ==>     Numbers 35:9
Deuteronomy 9:12     ==  7 Q-clauses ==>     Deuteronomy 9:13
Joshua 3:5           ==  2 Q-clauses ==>     Joshua 3:6
Joshua 9:19          ==  7 Q-clauses ==>     Joshua 9:21
Judges 8:23          ==  3 Q-clauses ==>     Judges 8:24
Judges 9:9           ==  4 Q-clauses ==>     Judges 9:10
Judges 9:10          ==  2 Q-clauses ==>     Judges 9:11
Judges 9:11          ==  3 Q-clauses ==>     Judges 9:12
Judges 9:12          ==  2 Q-clauses ==>     Judges 9:13
Judges 9:14          ==  2 Q-clauses ==>     Judges 9:15
1_Samuel 16:10       ==  1 Q-clauses ==>     1_Samuel 16:11
1_Samuel 16:15       ==  9 Q-clauses ==>     1_Samuel 16:17
1_Samuel 20:37       ==  1 Q-clauses ==>     1_Samuel 20:38
2_Samuel 14:8        ==  2 Q-clauses ==>     2_Samuel 14:9
2_Samuel 15:25       ==  9 Q-clauses ==>     2_Samuel 15:27
2_Samuel 18:20       ==  4 Q-clauses ==>     2_Samuel 18:21
2_Samuel 24:22       ==  6 Q-clauses ==>     2_Samuel 24:23
1_Kings 13:7         ==  3 Q-clauses ==>     1_Kings 13:8
1_Kings 22:4         ==  3 Q-clauses ==>     1_Kings 22:5
Jeremiah 14:14       == 46 Q-clauses ==>     Jeremiah 15:1
Ruth 2:20            ==  2 Q-clauses ==>     Ruth 2:20
1_Chronicles 28:20   == 13 Q-clauses ==>     1_Chronicles 29:1

Let's double-check: if we allow only strings of Q-clauses up to length 49, the result in Leviticus 21:1 should be gone.

In [16]:

results = weave(49)

qLimit=49: 61 results

In [17]:

(tfVerses, tfWords) = getTfVerses(A, results, (2, 3, 4, 5))

 99 verses
350 words

In [18]:

compareResults(A, verses, words, tfVerses, tfWords)

DIFFERENCE:
('Leviticus', 21, 1)
('Leviticus', 22, 17)
DIFFERENCE:
64969 = J:HW@H03
65656 = J:HW@73H

Out[18]:

-1

And so it is!

Conclusion

Instead of running a query and obtaining a list of results, we did a bit of programming and we can get much more than just the results.

That is the power of programming. But programming is difficult, and mistakes will be made.

TF-Query helps you to find a sweet spot between crafting queries and writing code.