Gaps and spans¶

conj and

phrase Pred VP

הֲקִמֹתִ֨י

verb arise hif perf

phrase Objc PP

prep <object marker>

בְּרִיתִ֜י

subs covenant

phrase Cmpl PP

prep interval

conj and

prep interval

conj and

prep interval

subs seed

phrase Cmpl PP

אַחֲרֶ֛יךָ

prep after

phrase Cmpl PP

prep to

דֹרֹתָ֖ם

subs generation

phrase Cmpl PP

לִ

prep to

בְרִ֣ית

subs covenant

עֹולָ֑ם

subs eternity

clause Adju InfC

phrase Pred VP

לִ

prep to

הְיֹ֤ות

verb be qal infc

phrase Cmpl PP

לְךָ֙

prep to

phrase PreC PP

לֵֽ

prep to

אלֹהִ֔ים

subs god(s)

phrase Cmpl PP

וּֽ

conj and

phrase Cmpl PP

prep to

זַרְעֲךָ֖

subs seed

phrase Cmpl PP

אַחֲרֶֽיךָ׃

prep after

result 2

Genesis 28:4

sentence 13

clause WYq0

phrase Conj CP

conj and

phrase Pred VP

יִֽתֶּן־

verb give qal impf

phrase Cmpl PP

לְךָ֙

prep to

phrase Objc PP

prep <object marker>

בִּרְכַּ֣ת

subs blessing

אַבְרָהָ֔ם

nmpr Abraham

phrase Cmpl PP

prep to

conj and

prep to

subs seed

phrase Cmpl PP

אִתָּ֑ךְ

prep together with

clause Adju InfC

phrase PreS VP

prep to

רִשְׁתְּךָ֙

verb trample down qal infc

phrase Objc PP

prep <object marker>

אֶ֣רֶץ

subs earth

מְגֻרֶ֔יךָ

subs neighbourhood

clause Attr xQtX

phrase Rela CP

אֲשֶׁר־

conj <relative>

phrase Pred VP

נָתַ֥ן

verb give qal perf

phrase Subj NP

אֱלֹהִ֖ים

subs god(s)

phrase Cmpl PP

prep to

אַבְרָהָֽם׃

nmpr Abraham

result 3

Genesis 31:16

sentence 49

clause CPen

phrase Conj CP

כִּ֣י

conj that

phrase Frnt NP

כָל־

subs whole

art the

עֹ֗שֶׁר

subs riches

clause Attr xQtX

phrase Rela CP

אֲשֶׁ֨ר

conj <relative>

phrase Pred VP

הִצִּ֤יל

verb deliver hif perf

phrase Subj NP

אֱלֹהִים֙

subs god(s)

phrase Cmpl PP

מֵֽ

prep from

אָבִ֔ינוּ

subs father

clause Resu NmCl

phrase PreC PP

לָ֥נוּ

prep to

phrase Subj PPrP

ה֖וּא

prps he

phrase PreC PP

conj and

phrase PreC PP

prep to

בָנֵ֑ינוּ

subs son

sentence 50

clause MSyn

phrase Conj CP

conj and

phrase Time AdvP

עַתָּ֗ה

advb now

sentence 51

clause xIm0

phrase Objc NP

כֹּל֩

subs whole

clause Attr xQtX

phrase Rela CP

אֲשֶׁ֨ר

conj <relative>

phrase Pred VP

אָמַ֧ר

verb say qal perf

phrase Subj NP

אֱלֹהִ֛ים

subs god(s)

phrase Cmpl PP

אֵלֶ֖יךָ

prep to

clause xIm0

phrase Pred VP

עֲשֵֽׂה׃

verb make qal impv

All gapped phrases¶

These were particular gaps. Now we want to get all gapped phrases.

We can just lift the special requirement that the preGapWord has to satisfy a special lexical condition.

In [9]:

query = '''
p:phrase
  wPreGap:word
  wLast:word
  :=

wGap:word
wPreGap <: wGap
wGap < wLast

p || wGap
'''

In [10]:

results = B.search(query)

  3.55s 715 results

Not too bad! We could wait for it. Here are some results.

In [11]:

B.table(results, end=10)

n	phrase	word	word	word
1	בֵּ֤ין הַמַּ֨יִם֙ וּבֵ֣ין הַמַּ֔יִם	מַּ֨יִם֙	מַּ֔יִם	אֲשֶׁר֙
2	דֶּ֔שֶׁא עֵ֚שֶׂב עֵ֣ץ פְּרִ֞י	עֵ֚שֶׂב	פְּרִ֞י	מַזְרִ֣יעַ
3	דֶּ֠שֶׁא עֵ֣שֶׂב וְעֵ֧ץ	עֵ֣שֶׂב	עֵ֧ץ	מַזְרִ֤יעַ
4	אֶת־כָּל־עֵ֣שֶׂב׀ וְאֶת־כָּל־הָעֵ֛ץ	עֵ֣שֶׂב׀	עֵ֛ץ	זֹרֵ֣עַ
5	שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו	שְׁנֵיהֶם֙	אִשְׁתֹּ֑ו	עֲרוּמִּ֔ים
6	הֶ֨בֶל גַם־ה֛וּא	הֶ֨בֶל	ה֛וּא	הֵבִ֥יא
7	מִן־הַבְּהֵמָה֙ הַטְּהֹורָ֔ה וּמִן־הַ֨בְּהֵמָ֔ה וּמִ֨ן־הָעֹ֔וף וְכֹ֥ל	בְּהֵמָ֔ה	כֹ֥ל	אֲשֶׁ֥ר
8	הֵ֜מָּה וְכָל־הַֽחַיָּ֣ה לְמִינָ֗הּ וְכָל־הַבְּהֵמָה֙ לְמִינָ֔הּ וְכָל־הָרֶ֛מֶשׂ לְמִינֵ֑הוּ וְכָל־הָעֹ֣וף לְמִינֵ֔הוּ כֹּ֖ל צִפֹּ֥ור כָּל־כָּנָֽף׃	רֶ֛מֶשׂ	כָּנָֽף׃	הָ
9	כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃	בָּשָׂ֣ר׀	אָדָֽם׃	הָ
10	כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃	שֶּׁ֖רֶץ	אָדָֽם׃	הַ

If a phrase has multiple gaps, we encounter it multiple times in our results.

We show the two condensed results in Genesis 7:21.

In [12]:

B.show(results, condensed=True, start=9, end=10, colorMap={1: 'lightyellow', 2: 'yellow', 4: 'magenta'})

verse 9

Genesis 7:21

sentence 32

clause WayX

phrase Conj CP

conj and

phrase Pred VP

יִּגְוַ֞ע

verb expire qal wayq

phrase Subj NP

כָּל־

subs whole

בָּשָׂ֣ר׀

subs flesh

clause Attr Ptcp

phrase Rela CP

conj the

phrase PreC VP

רֹמֵ֣שׂ

verb creep qal ptca

phrase Cmpl PP

עַל־

prep upon

art the

אָ֗רֶץ

subs earth

clause WayX

phrase Subj NP

בָּ

prep in

art the

עֹ֤וף

subs birds

conj and

בַ

prep in

art the

בְּהֵמָה֙

subs cattle

conj and

בַ֣

prep in

art the

subs wild animal

conj and

prep in

subs whole

art the

subs swarming creatures

clause Attr Ptcp

phrase Rela CP

conj the

phrase PreC VP

שֹּׁרֵ֣ץ

verb swarm qal ptca

phrase Cmpl PP

עַל־

prep upon

art the

אָ֑רֶץ

subs earth

clause WayX

phrase Subj NP

conj and

phrase Subj NP

כֹ֖ל

subs whole

art the

אָדָֽם׃

subs human, mankind

verse 10

Genesis 9:2

sentence 6

clause WXYq

phrase Conj CP

conj and

phrase Subj NP

מֹורַאֲכֶ֤ם

subs fear

conj and

חִתְּכֶם֙

subs terror

phrase Pred VP

יִֽהְיֶ֔ה

verb be qal impf

phrase PreC PP

prep upon

subs whole

subs wild animal

art the

subs earth

conj and

prep upon

subs whole

subs birds

art the

subs heavens

clause Coor NmCl

phrase PreC PP

prep in

כֹל֩

subs whole

clause Attr xYq0

phrase Rela CP

אֲשֶׁ֨ר

conj <relative>

phrase Pred VP

תִּרְמֹ֧שׂ

verb creep qal impf

phrase Cmpl NP

הָֽ

art the

אֲדָמָ֛ה

subs soil

clause Coor NmCl

phrase PreC PP

וּֽ

conj and

phrase PreC PP

prep in

subs whole

subs fish

art the

subs sea

sentence 7

clause xQt0

phrase Cmpl PP

prep in

יֶדְכֶ֥ם

subs hand

phrase Pred VP

נִתָּֽנוּ׃

verb give nif perf

If we want just the phrases, and only once, we can run the query in shallow mode, see advanced:

In [13]:

gapQueryResults = B.search(query, shallow=True)

  3.49s 671 results

A different query¶

We can make an equivalent query to get the gaps.

In [14]:

query = '''
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap
'''

Experience has shown that this is a slow query, so we handle it with care.

In [15]:

S.study(query)
S.count(progress=1, limit=8)

   |     0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.38s Constraining search space with 7 relations ...
  0.40s Setting up retrieval plan ...
  0.45s Ready to deliver results from 1532939 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.00s Counting results per 1 up to 8 ...
   |       43s 1
   |       43s 2
   |       43s 3
   |       43s 4
   |       43s 5
   |       43s 6
   |    1m 16s 7
   |    1m 16s 8
 1m 16s Done: 8 results

This is a good example of a query that is slow to deliver even its first result. And that is bad, because it is such a straight-forward query.

Why is this one so slow, while the previous one went so smoothly?

The crucial thing is the wGap word. In the latter template, wGap is not embedded in anything. It is constrained by wFirst < wGap and wGap < wLast. However, the way the search strategy works is by examining all possibilities for wFirst < wGap and only then checking whether wGap < wLast. The algorithm cannot check both conditions at the same time.

With embedding relations, things are better. Text-Fabric is heavily optimized to deal with embedding relationships.

In the former template, we see that the wGap is required to be adjacent to wPreGap, and this one is embedded in the phrase. Hence there are few cases to consider for wPreGap, and per instance there is only one wGap.

Lesson

Try to prevent the use of free floating nodes in your template that become constrained by other spatial relationships than embedding.

To the rescue¶

The former template had it right. Can we rescue the latter template?

We can assume that the phrase and the gap both contain a word that belongs to the same verse. Note that phrase and gap may belong to different clauses and sentences. We assume that a phrase cannot belong to more than two verses, so either the first or the last word of the phrase is in the same verse as a word in the gap.

In [16]:

query = '''
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap

v:verse

v [[ wFirst
v [[ wGap
'''

In [17]:

S.study(query)
S.count(progress=100, limit=3000)

   |     0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.36s Constraining search space with 9 relations ...
  0.38s Setting up retrieval plan ...
  0.45s Ready to deliver results from 1556152 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.00s Counting results per 100 up to 3000 ...
   |     0.50s 100
   |     1.08s 200
   |     1.48s 300
   |     1.67s 400
   |     1.81s 500
   |     2.17s 600
   |     2.25s 700
   |     2.67s 800
   |     2.81s 900
   |     3.03s 1000
   |     3.29s 1100
   |     3.48s 1200
   |     3.78s 1300
   |     4.70s 1400
   |     5.44s 1500
   |     5.89s 1600
   |     6.43s 1700
   |     6.68s 1800
   |     7.15s 1900
   |     7.68s 2000
   |     8.13s 2100
   |     8.44s 2200
   |     9.20s 2300
   |       11s 2400
   |       12s 2500
   |       12s 2600
   |       13s 2700
    13s Done: 2707 results

We are going to run this query in shallow mode.

In [18]:

results = B.search(query, shallow=True)

    14s 671 results

Shallow mode tends to be quicker, but that does not always materialize. The number of results agrees with the first query. Yet we have been lucky, because we required the word in the gap to be in the same verse as the first word in the phrase. What if we require if it is the last word in the phrase?

In [19]:

query = '''
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap

v:verse

v [[ wLast
v [[ wGap
'''

In [20]:

results = B.search(query, shallow=True)

    42s 660 results

Then we would not have found all results.

So, this road, although doable, is much less comfortable, performance-wise and logic-wise.

Check the gaps¶

In this misty landscape of gaps we need some corroboration that we found the right results.

is every node in gapQueryResults a phrase?
does every phrase in the gapQueryResults have a gap?
is every gapped phrase contained in gapQueryResults?

We check all this by hand coding.

Here is a function that checks whether a phrase has a gap. If the distance between its end points is greater than the number of words it contains, it must have a gap.

In [21]:

def hasGap(p):
    words = L.d(p, otype='word')
    return words[-1] - words[0] + 1 > len(words)

Now we can perform the checks.

In [22]:

otypesGood = True
haveGaps = True

for p in gapQueryResults:
    otype = F.otype.v(p)
    if otype != 'phrase':
        print(f'Non phrase detected: {p}) is a {otype}')
        otypesGood = False
        break

    if not hasGap(p):
        print(f'Phrase without a gap: {p}')
        B.pretty(p)
        haveGaps = False
        break

print(f'{len(gapQueryResults)} nodes in query result')
if otypesGood:
    print('1. all nodes are phrases')
if haveGaps:
    print('2. all nodes have gaps')

inResults = True
for p in F.otype.s('phrase'):
    if hasGap(p):
        if p not in gapQueryResults:
            print(f'Gapped phrase outside query results: {p}')
            B.pretty(p)
            inResults = False
            break
            
if inResults:
    print('3. all gapped phrases are contained in the results')

671 nodes in query result
1. all nodes are phrases
2. all nodes have gaps
3. all gapped phrases are contained in the results

Note that by hand coding we can get the gapped phrases much more quickly and securely!

Custom sets for (non-)gapped phrases¶

We have obtained a set with all gapped phrases, and we have paid a price:

either an expensive query,
or an inconvenient bit of hand coding.

It would be nice if we could kick-start our queries using this set as a given. And that is exactly what we are going to do now.

We make to custom sets and give them a name, one for gapped phrases and one for non-gapped phrases.

In [23]:

customSets = dict(
    gapphrase=gapQueryResults,
    conphrase=set(F.otype.s('phrase')) - gapQueryResults,
)

Suppose we want all verbs that occur in a gapped phrase.

In [24]:

query = '''
gapphrase
  word sp=verb
'''

Note that we have used the foreign name gapphrase in our search template, instead of phrase.

But we can still run search(), provided we tell it what we mean by gapphrase. We do that by passing the sets parameter to search(), which should be a dictionary of sets. Search will look up gapphrase in this dictionary, and will use its value, which should be a node set. That way, it understands that the expression gapphrase stands for the nodes in the given node set.

Here we go:

In [25]:

results = B.search(query, sets=customSets)

  1.03s 93 results

In [26]:

B.show(results, start=1, end=3)

verse 1

Genesis 30:20

sentence 64

clause WayX

phrase Conj CP

conj and

phrase Pred VP

תֹּ֣אמֶר

verb say qal wayq

phrase Subj PrNP

לֵאָ֗ה

nmpr Leah

sentence 65

clause ZQtX

phrase PreO VP

זְבָדַ֨נִי

verb bestow qal perf

phrase Subj NP

אֱלֹהִ֥ים׀

subs god(s)

phrase PreO VP

אֹתִי֮

prep <object marker>

phrase Objc NP

זֵ֣בֶד

subs endowment

טֹוב֒

adjv good

sentence 66

clause xYqX

phrase Modi NP

art the

פַּ֨עַם֙

subs foot

phrase PreO VP

יִזְבְּלֵ֣נִי

verb tolerate qal impf

phrase Subj NP

אִישִׁ֔י

subs man

sentence 67

clause xQt0

phrase Conj CP

כִּֽי־

conj that

phrase Pred VP

יָלַ֥דְתִּי

verb bear qal perf

phrase Cmpl PP

לֹ֖ו

prep to

phrase Objc NP

שִׁשָּׁ֣ה

subs six

בָנִ֑ים

subs son

sentence 68

clause Way0

phrase Conj CP

conj and

phrase Pred VP

תִּקְרָ֥א

verb call qal wayq

phrase Objc PP

prep <object marker>

שְׁמֹ֖ו

subs name

phrase Objc PrNP

זְבֻלֽוּן׃

nmpr Zebulun

verse 2

Genesis 30:35

sentence 118

clause Way0

phrase Conj CP

conj and

phrase Pred VP

יָּ֣סַר

verb turn aside hif wayq

phrase Time PP

בַּ

prep in

art the

יֹּום֩

subs day

art the

ה֨וּא

prde he

phrase Objc PP

prep <object marker>

art the

subs he-goat

art the

adjv twisted

conj and

art the

adjv patch qal ptcp

phrase Objc PP

conj and

phrase Objc PP

prep <object marker>

subs whole

art the

subs goat

art the

adjv speckled

conj and

art the

adjv patch qal ptcp

phrase Objc PP

כֹּ֤ל

subs whole

clause Attr NmCl

phrase Rela CP

אֲשֶׁר־

conj <relative>

phrase Subj NP

לָבָן֙

subs white

phrase PreC PP

בֹּ֔ו

prep in

clause Way0

        phrase   Objc PP
    
וְ
 conj and

        phrase   Objc PP
    
כָל־
 subs whole
ח֖וּם 
 adjv ruttish

        phrase   Objc PP
    
בַּ
 prep in
 art the
כְּשָׂבִ֑ים 
 subs young ram

sentence 119

clause Way0

phrase Conj CP

conj and

phrase Pred VP

יִּתֵּ֖ן

verb give qal wayq

phrase Cmpl PP

prep in

יַד־

subs hand

בָּנָֽיו׃

subs son

verse 3

Genesis 40:5

sentence 8

clause WayX

phrase Conj CP

conj and

phrase Pred VP

יַּֽחַלְמוּ֩

verb dream qal wayq

phrase Objc NP

חֲלֹ֨ום

subs dream

phrase Subj NP

שְׁנֵיהֶ֜ם

subs two

clause Ellp

phrase Subj NP

אִ֤ישׁ

subs man

phrase Objc NP

חֲלֹמֹו֙

subs dream

phrase Time PP

prep in

לַ֣יְלָה

subs night

אֶחָ֔ד

subs one

clause Ellp

phrase Subj NP

אִ֖ישׁ

subs man

phrase Adju PP

כְּ

prep as

פִתְרֹ֣ון

subs interpretation

חֲלֹמֹ֑ו

subs dream

clause WayX

phrase Subj NP

art the

מַּשְׁקֶ֣ה

subs give drink hif ptca

conj and

art the

אֹפֶ֗ה

subs baker

clause Attr NmCl

phrase Rela CP

אֲשֶׁר֙

conj <relative>

phrase PreC PP

prep to

מֶ֣לֶךְ

subs king

מִצְרַ֔יִם

nmpr Egypt

clause Attr Ptcp

phrase Rela CP

אֲשֶׁ֥ר

conj <relative>

phrase PreC VP

אֲסוּרִ֖ים

verb bind qal ptcp

phrase Cmpl PP

prep in

subs house

art the

subs prison

That looks good.

We can also apply feature conditions to gapphrase:

In [27]:

query = '''
gapphrase function=Subj
'''
results = B.search(query, sets=customSets)
B.show(results, start=1, end=3)

  0.00s 176 results

verse 1

Genesis 2:25

sentence 59

clause WayX

phrase Conj CP

conj and

phrase Pred VP

יִּֽהְי֤וּ

verb be qal wayq

phrase Subj NP

שְׁנֵיהֶם֙

subs two

phrase PreC AdjP

עֲרוּמִּ֔ים

adjv naked

phrase Subj NP

art the

subs human, mankind

conj and

subs woman

sentence 60

clause WxY0

phrase Conj CP

conj and

phrase Nega NegP

לֹ֖א

nega not

phrase Pred VP

יִתְבֹּשָֽׁשׁוּ׃

verb be ashamed hit impf

verse 2

Genesis 4:4

sentence 11

clause WXQt

phrase Conj CP

conj and

phrase Subj PrNP

הֶ֨בֶל

nmpr Abel

phrase Pred VP

הֵבִ֥יא

verb come hif perf

phrase Subj PrNP

גַם־

advb even

ה֛וּא

prps he

phrase Cmpl PP

prep from

subs first-born

subs cattle

conj and

prep from

subs fat

sentence 12

clause WayX

phrase Conj CP

conj and

phrase Pred VP

יִּ֣שַׁע

verb look qal wayq

phrase Subj PrNP

יְהוָ֔ה

nmpr YHWH

phrase Cmpl PP

prep to

nmpr Abel

conj and

prep to

subs present

verse 3

Genesis 7:14

sentence 17

clause Ellp

        phrase   Subj PPrP
    
הֵ֜מָּה 
 prps they
וְ
 conj and
כָל־
 subs whole
הַֽ
 art the
חַיָּ֣ה 
 subs wild animal

        phrase   Subj PPrP
    
לְ
 prep to
מִינָ֗הּ 
 subs kind

        phrase   Subj PPrP
    
וְ
 conj and

        phrase   Subj PPrP
    
כָל־
 subs whole
הַ
 art the
בְּהֵמָה֙ 
 subs cattle

        phrase   Subj PPrP
    
לְ
 prep to
מִינָ֔הּ 
 subs kind

        phrase   Subj PPrP
    
וְ
 conj and

        phrase   Subj PPrP
    
כָל־
 subs whole
הָ
 art the
רֶ֛מֶשׂ 
 subs creeping animals

clause Attr Ptcp

phrase Rela CP

conj the

phrase PreC VP

רֹמֵ֥שׂ

verb creep qal ptca

phrase Cmpl PP

עַל־

prep upon

art the

אָ֖רֶץ

subs earth

clause Ellp

        phrase   Subj PPrP
    
לְ
 prep to
מִינֵ֑הוּ 
 subs kind

        phrase   Subj PPrP
    
וְ
 conj and

        phrase   Subj PPrP
    
כָל־
 subs whole
הָ
 art the
עֹ֣וף 
 subs birds

        phrase   Subj PPrP
    
לְ
 prep to
מִינֵ֔הוּ 
 subs kind

        phrase   Subj PPrP
    
כֹּ֖ל 
 subs whole
צִפֹּ֥ור 
 subs bird
כָּל־
 subs whole
כָּנָֽף׃ 
 subs wing

Two-phrase clauses¶

We can find the gaps, but do our minds always reckon with gaps? Gaps cause unexpected semantics. Here is a little puzzle.

Suppose we want to count the clauses consisting of exactly two phrases.

Here follows a little journey. We use a query to find the clauses, check the result with hand-coding, scratch our heads, refine the query, the hand-coding and our question until we are satisfied.

Attempt 1¶

By query¶

The following template should do it: a clause, starting with a phrase, followed by an adjacent phrase, which terminates the clause.

In [28]:

query = '''
clause
    =: phrase
    <: phrase
    :=
'''

In [29]:

results = B.search(query)
B.table(results, end=10)

  1.11s 23483 results

n	clause	phrase	phrase
1	יְהִ֣י אֹ֑ור	יְהִ֣י	אֹ֑ור
2	כִּי־טֹ֑וב	כִּי־	טֹ֑וב
3	אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ	אֲשֶׁר֙	מִתַּ֣חַת לָרָקִ֔יעַ
4	אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ	אֲשֶׁ֖ר	מֵעַ֣ל לָרָקִ֑יעַ
5	כִּי־טֹֽוב׃	כִּי־	טֹֽוב׃
6	מַזְרִ֣יעַ זֶ֔רַע	מַזְרִ֣יעַ	זֶ֔רַע
7	כִּי־טֹֽוב׃	כִּי־	טֹֽוב׃
8	לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה	לְהַבְדִּ֕יל	בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה
9	לְהָאִ֖יר עַל־הָאָ֑רֶץ	לְהָאִ֖יר	עַל־הָאָ֑רֶץ
10	לְהָאִ֖יר עַל־הָאָֽרֶץ׃	לְהָאִ֖יר	עַל־הָאָֽרֶץ׃

If we want to have the clauses only, we run it in shallow mode:

In [30]:

clausesByQuery = sorted(B.search(query, shallow=True))

  1.06s 23483 results

By hand¶

Let us check this with a piece of hand-written code. We want clauses that consist of exactly two phrases.

In [31]:

indent(reset=True)
info('counting ...')

clausesByHand = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    if len(phrases) == 2:
        clausesByHand.append(clause)
clausesByHand = sorted(clausesByHand)
info(f'Done: found {len(clausesByHand)}')

  0.00s counting ...
  0.82s Done: found 23862

The difference¶

Strange, we end up with too many cases. What is happening? Let us compare the results. We look at the first result where both methods diverge.

We put the difference finding in a little function.

In [32]:

def showDiff(queryResults, handResults):
    diff = [x for x in zip(queryResults, handResults) if x[0] != x[1]]
    if not diff:
        print(f'''
{len(queryResults):>6} queryResults
         are identical with
{len(handResults):>6} handResults
''')
        return
    (rQuery, rHand) = diff[0]
    if rQuery < rHand:
        print(f'clause {rQuery} is a query result but not found by hand')
        toShow = rQuery
    else:
        print(f'clause {rHand} is not a query result but has been found by hand')
        toShow = rHand
    colors = ['aqua', 'aquamarine', 'khaki', 'lavender', 'yellow']
    highlights = {}
    for (i, phrase) in enumerate(L.d(toShow, otype='phrase')):
        highlights[phrase] = colors[i % len(colors)]
        # for atom in L.d(phrase, otype='phrase_atom'):
        #     highlights[atom] = colors[i % len(colors)]
    B.pretty(toShow, withNodes=True, suppress={'lex', 'sp', 'vt', 'vs'}, highlights=highlights)

In [33]:

showDiff(clausesByQuery, clausesByHand)

clause 427931 is not a query result but has been found by hand

427931

clause 427931 XYqt

phrase 652631 Subj NP

1904

כָל־

subs whole

clause 427931 XYqt

phrase 652633 PreO VP

1906

יַֽהַרְגֵֽנִי׃

verb kill

Lo and behold:

the hand-written code is right in a sense: this is a clause that consists exactly of two phrases.
the query is also right in a sense: the two phrases are not adjacent: there is a gap in the clause between them!

Attempt 2¶

By hand¶

We modify the hand-written code such that only clauses qualify if the two phrases are adjacent.

In [34]:

indent(reset=True)
info('counting ...')

clausesByHand2 = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    if len(phrases) == 2:
        if L.d(phrases[0], otype='word')[-1] + 1 == L.d(phrases[1], otype='word')[0]:
            clausesByHand2.append(clause)
clausesByHand2 = sorted(clausesByHand2)
info(f'Done: found {len(clausesByHand2)}')

  0.00s counting ...
  1.00s Done: found 23399

The difference¶

Now we have too few cases. What is going on?

In [35]:

showDiff(clausesByQuery, clausesByHand2)

clause 428692 is a query result but not found by hand

428692

clause 428692 WxQ0

phrase 655060 Conj CP

6514

conj and

phrase 655061 Objc PP

6515

גַם֩

advb even

6516

prep <object marker>

6517

לֹ֨וט

nmpr Lot

phrase 655061 Objc PP

6518

אָחִ֤יו

subs brother

phrase 655061 Objc PP

6519

conj and

phrase 655061 Objc PP

6520

רְכֻשֹׁו֙

subs property

phrase 655062 Pred VP

6521

הֵשִׁ֔יב

verb return

phrase 655061 Objc PP

6522

conj and

phrase 655061 Objc PP

6523

גַ֥ם

advb even

6524

prep <object marker>

6525

art the

6526

נָּשִׁ֖ים

subs woman

6527

conj and

6528

prep <object marker>

6529

art the

6530

עָֽם׃

subs people

Observe:

This clause has three phrases, but the third one lies inside the second one.

the hand-written clause is right in a sense: this clause has three phrases.
the query is right in a sense: it contains two adjacent phrases that together span the whole clause.

Attempt 3¶

By query¶

Can we adjust the pattern to exclude cases like this? Yes, with custom sets, see advanced.

Instead of looking through all phrases, we can just consider non gapped phrases only.

Earlier in this notebook we have constructed the set of non-gapped phrases and put it under the name conphrase in the custom sets.

In [36]:

query = '''
clause
    =: conphrase
    <: conphrase
    :=
'''

clausesByQuery2 = sorted(B.search(query, sets=customSets, shallow=True))

  1.32s 23327 results

The difference¶

There is still a difference.

In [37]:

showDiff(clausesByQuery2, clausesByHand2)

clause 428374 is not a query result but has been found by hand

428374

clause 428374 Ellp

phrase 654063 Conj CP

4718

וְֽ

conj and

phrase 654064 Objc PP

4719

prep <object marker>

4720

פַּתְרֻסִ֞ים

nmpr Pathrusites

4721

conj and

4722

prep <object marker>

4723

כַּסְלֻחִ֗ים

nmpr Casluhites

clause 428374 Ellp

phrase 654064 Objc PP

4729

conj and

phrase 654064 Objc PP

4730