MQL versus TF-Query¶

See TF versus MQL for an introduction.

Loading¶

We load the Text-Fabric program and the BHSA data.

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

from tf.app import use

from util import getTfVerses, getShebanqData, compareResults, MQL_RESULTS

In [3]:

VERSION = "2017"
# A = use('ETCBC/bhsa', hoist=globals(), version=VERSION)
A = use("ETCBC/bhsa:clone", checkout="clone", hoist=globals(), version=VERSION)

TF-app: ~/github/annotation/app-bhsa/code

data: ~/github/etcbc/bhsa/tf/2017

data: ~/github/etcbc/phono/tf/2017

data: ~/github/etcbc/parallels/tf/2017

Text-Fabric: Text-Fabric API 8.4.13, app-bhsa v3, Search Reference
Data: BHSA, Character table, Feature docs
Features:

Parallel Passages

crossref

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
omap@ll
oslots

Phonetic Transcriptions

phono
phono_trailer

Text-Fabric API: names N F E L T S C TF directly usable

Example 2¶

Bas Meeuse: Example 2: FJM + prep. L

[clause
  [word FOCUS lex = 'FJM[']
  ..
  [word FOCUS lex = "L"]
  [word lex <> '<JN/' AND lex <> 'PNH/']
]

In [4]:

(verses, words) = getShebanqData(A, MQL_RESULTS, 2)

156 results in 136 verses with 294 words

In [5]:

query = """
clause
  word lex=FJM[
  < word lex=L
  <: word lex#<JN/|PNH/
"""

In [6]:

results = A.search(query)

  1.53s 155 results

N.B.: one result less than in SHEBANQ.

In [7]:

(tfVerses, tfWords) = getTfVerses(A, results, (1, 2))

135 verses
292 words

In [8]:

compareResults(A, verses, words, tfVerses, tfWords)

DIFFERENCE:
('Joshua', 8, 2)
('Joshua', 8, 12)
DIFFERENCE:
116852 = FIJM
117091 = J.@63FEM

Out[8]:

-1

The TF results are skipping Joshua 8:2!

We expand this verse in SHEBANQ:

josh

That there is a gap in the clause, right after the word L between words 116853 and 116858. In MQL, the adjacency of things is relative to the container it is in. If the container has a gap, the words around the gap are considered adjacent.

In this example it means that this part of the query:

  [word FOCUS lex = "L"]
  [word lex <> '<JN/' AND lex <> 'PNH/']

is matched by words 116853 and 116858. And the MQL query considers those two words as adjacent within the clause.

In TF-Query, adjacency between words is absolute: it is not relative to a container object.

So this part of the query

  < word lex=L
  <: word lex#<JN/|PNH/

is not matched by words 116853 and 116854, because 116854 is not part of the embedding clause.

The Text-Fabric notion of adjacency is more crude. The reason is that in Text-Fabric, the query does not have to be a tree, where each object has a unique immediate embedder object. There could be several embedder objects in the query, and each of the them may have different gaps, and if we had the concept of relative adjacency, our query language would need a way to express relative to which object the adjacency must be taken.

It has not, and to me it is an open question whether we should complicate search templates in that way.

Anyway, as it stands, there is no obvious workaround to get the exactly the same behaviour as the MQL query.

That said, we can try something that comes close:

We state that the L is is not immediately followed by a word that is <JN/ or PNH/.

In [9]:

query = """
clause
  word lex=FJM[
  < word lex=L
  /without/
  <: w3:word lex=<JN/|PNH/
  /-/
"""

In [10]:

results2 = A.search(query)

  0.78s 161 results

It turns out that we get more results than in SHEBANQ. We first count the verses and words involved in the results.

In [11]:

(tfVerses2, tfWords2) = getTfVerses(A, results2, (1, 2))

139 verses
302 words

In [12]:

compareResults(A, verses, words, tfVerses2, tfWords2)

DIFFERENCE:
('Genesis', 43, 32)
('Genesis', 27, 37)
DIFFERENCE:
24760 = J.@FI71JMW.
14288 = FAM:T.I71JW

Out[12]:

-1

Now the situation is reversed: Genesis 27:37 is in the TF results, but skipped by SHEBANQ.

In [13]:

A.show(results2, condenseType="clause", start=3, end=3)

result 3

Genesis 27:37

clause

phrase

הֵ֣ן

lex=HN

phrase

גְּבִ֞יר

lex=GBJR/

phrase

שַׂמְתִּ֥יו

lex=FJM[

phrase

לָךְ֙

lex=L

The Genesis 27:37 result has something in common with the Joshua 8:2 result in SHEBANQ that we saw above:

the L has a pronominal suffix;
the L is at the end of its clause atom: in Joshua 8:2 it is preceding a gap; here it is also at the end of a clause.

So it seems to be an intended result of the MQL query in both cases.

Let's make a mental shift: what is the intention of the MQL query? Here is a bit of query-exegesis, in that the query itself is the object of the exegesis.

The MQL query mentions three [word] objects, but it puts only the first two of them in FOCUS.

it is not interested in the actual value of the third one;
the third [word] is constrained by a very loose restriction: it can be anything, except two specific values.

These two things point to the intended meaning of the query, namely:

find a clause with the word FJM[, and somewhere after that the word L,

which is not followed by either the word <JN/ or the word PNH/.

This differs subtly from what the query actually says:

find a clause with the word FJM[, and somewhere after that the word L,

which is followed by another word that is not <JN/ and not PNH/.

The difference is one of quantification.

More schematically, the MQL states literally:

there is a word after a that is not b and not c

But the intention is:

for each word after a it is not b and not c

MQL also has a concept of quantifier, a bit more limited than in TF: NOTEXIST. Let's try it:

[clause
  [word FOCUS lex = 'FJM[']
  ..
  [word FOCUS lex = "L"]
  NOTEXIST [word lex = '<JN/' OR lex = 'PNH/']
]

See Dirk Roorda: Example 2: not exist

In [14]:

(verses2, words2) = getShebanqData(A, MQL_RESULTS, "2a")

160 results in 138 verses with 300 words

Now we have one result more in Text-Fabric than in SHEBANQ.

In [15]:

compareResults(A, verses2, words2, tfVerses2, tfWords2)

DIFFERENCE:
('2_Samuel', 23, 5)
('2_Samuel', 14, 7)
DIFFERENCE:
174875 = F@74M
168181 = *FWM

Out[15]:

-1

2 Samuel 14:7 is skipped by SHEBANQ.

In [17]:

A.show(results2, condenseType="clause", start=58, end=58)

result 58

2_Samuel 14:7

clause

phrase

לְ

lex=L

בִלְתִּ֧י

lex=BLT/

שִׂים־

lex=FJM[

phrase

לְ

lex=L

אִישִׁ֛י

lex=>JC/

phrase

שֵׁ֥ם

lex=CM/

וּ

lex=W

שְׁאֵרִ֖ית

lex=C>RJT/

phrase

lex=<L

lex=PNH/

lex=H

lex=>DMH/

When we look it up in SHEBANQ we find this:

sam

The thing here is that the forbidden word PNH/ turns up later in the clause, at 168188.

It turns out that the NOTEXIST operator in MQL quantifies over all words that follow from that position.

If NOTEXIST [word properties] meant that there is no word at that position with those properties, all was well for our purposes. But it means that there is no word from that position with those properties.

So it turns out: nice idea, but it does not work out in MQL.

Now the tide has turned: we have trouble in MQL to find a query that exactly matches our intention, while in TF we can.

Still, there might be problems.

If there is a clause, with L, then a gap, and then either <JN/ or PNH/, the SHEBANQ query would skip it, but the Text-Fabric query would include it.

Let's check in Text-Fabric whether this occurs.

In [19]:

query = """
clause
  clause_atom
    word lex=L
    :=
  < clause_atom
    =: word lex=<JN/|PNH/
"""

In [20]:

results = A.search(query)

  0.88s 0 results

Nope.

But is this query itself right? Let's look for a known case, namely Joshua 8:2 above.

In [21]:

query = """
clause
  clause_atom
    word lex=L
    :=
  < clause_atom
    =: word lex=MN
"""

In [22]:

results = A.search(query)

  0.88s 2 results

In [24]:

A.show(results, condenseType="clause")

result 1

Joshua 8:2

clause

phrase

שִׂים־

lex=FJM[

phrase

לְךָ֥

lex=L

clause

phrase

מֵ

lex=MN

אַחֲרֶֽיהָ׃

lex=>XR/

result 2

Hosea 10:15

clause

phrase

כָּ֗כָה

lex=KKH

phrase

עָשָׂ֤ה

lex=<FH[

phrase

לָכֶם֙

lex=L

clause

phrase

lex=MN

lex=PNH/

lex=R<H/

lex=R<H/

Yes, this kind of query finds exactly what we are looking for.

Conclusion

In Text-Fabric we have found a query with slightly different results. But these results match the intention of the query just a bit better than the original query.

We tried to improve the MQL query by using NOTEXIST, but that did not work out.

However, the TF query might include (contrived) cases that the MQL query would rightfully skip. We can verify whether those cases actually exist by running a separate TF query, and it turns out they do not exist.

Lesson

Whenever an exegesis hinges on the results of a query, check and double check. You probably will have to run multiple queries in SHEBANQ and combine the results. This will quickly get very cumbersome. If that happens, it starts to pay off to use Text-Fabric, where you have more complete power over the computations and their results.