You might want to consider the start of this tutorial.

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

from tf.fabric import Fabric
from tf.extra.bhsa import Bhsa

In [3]:

VERSION = '2017'
DATABASE = '~/github/etcbc'
BHSA = f'bhsa/tf/{VERSION}'
PARA = f'parallels/tf/{VERSION}'
TF = Fabric(locations=[DATABASE], modules=[BHSA, PARA], silent=True )

In [4]:

api = TF.load('', silent=True)
api.makeAvailableIn(globals())
B = Bhsa(api, 'search', version=VERSION)

Documentation: BHSA Feature docs BHSA API Text-Fabric API 5.5.12 Search Reference

This notebook online: NBViewer GitHub

Quantifiers¶

Disclaimer¶

This part of search templates is still experimental.

bugs may be discovered
the syntax of quantifiers may change

Quantifiers add considerable power to search templates.

Quantifiers consist of full-fledged search templates themselves, and give rise to auxiliary searches being performed.

The use of quantifiers may prevent the need to resort to hand-coding in many cases. That said, they can also be exceedingly tricky, so that it is advisable to check the results by hand-coding anyway, until you are perfectly comfortable with them.

Examples¶

Lexemes¶

It is easy to find the lexemes that occur in a specific book only. Because the lex node of such a lexeme is contained in the node of that specific book.

Lets get the lexemes specific to Ezra and then those specific to Nehemiah.

In [5]:

query = '''
book book@en=Ezra
    lex
'''
ezLexemes = B.search(query)
ezSet = {r[1] for r in ezLexemes}

query = '''
book book@en=Nehemiah
    lex
'''
nhLexemes = B.search(query)
nhSet = {r[1] for r in nhLexemes}

print(f'Total {len(ezSet | nhSet)} lexemes')

  0.02s 199 results
  0.01s 110 results
Total 309 lexemes

What if we want to have the lexemes that occur only in Ezra and Nehemia?

If such a lexeme occurs in both books, it will not be contained by either book. So we have missed them by the two queries above.

We have to find a different way. Something like: search for lexemes of which all words occur either in Ezra or in Nehemia.

With the template constructions you have seen so far, this is impossible to say.

This is where quantifiers come in.

/without/¶

First we are going to query for these lexemes by means of a no: quantifier.

In [6]:

query = '''
lex
/without/
book book@en#Ezra|Nehemiah
  w:word
  w ]] ..
/-/
'''
query1results = B.search(query, shallow=True)

 0 parent:lex
 1 books book@en#Ezra|Nehemiah
 2   w:word
 3   w ]] parent
Unknown object type in line 1: "books"
Valid object types are: book,chapter,lex,verse,half_verse,sentence,sentence_atom,clause,clause_atom,phrase,phrase_atom,subphrase,word

  0.01s 9233 results

/where/¶

Now the /without/ quantifier is a bit of a roundabout way to say what you really mean. We can also employ the more positive /where/ quantifier.

In [7]:

query = '''
lex
/where/
  w:word
/have/
b:book book@en=Ezra|Nehemiah
w ]] b
/-/
'''
query2results = B.search(query, shallow=True)

  0.86s 382 results

Check by hand coding:

In [8]:

indent(reset=True)
universe = F.otype.s('lex')
wordsEzNh = set(
    L.d(T.bookNode('Ezra', lang='en'), otype='word') + 
    L.d(T.bookNode('Nehemiah', lang='en'), otype='word')
)
handResults = set()
for lex in universe:
    occs = set(L.d(lex, otype='word'))
    if occs <= wordsEzNh:
        handResults.add(lex)
info(len(handResults))

  0.19s 382

Looks good, but we are thorough:

In [9]:

print(query1results == handResults)
print(query2results == handResults)

True
True

Verb phrases¶

Let's look for clauses with where all Pred phrases contain only verbs and look for Subj phrases in those clauses.

In [10]:

query = '''
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
'''
queryResults = B.search(query)

B.show(queryResults, end=5)

  2.65s 31399 results

verse 1

Genesis 1:1

sentence 1

clause xQtX

phrase Time PP

בְּ

prep in

רֵאשִׁ֖ית

subs beginning

phrase Pred VP

בָּרָ֣א

verb create qal perf

phrase Subj NP

אֱלֹהִ֑ים

subs god(s)

phrase Objc PP

prep <object marker>

art the

subs heavens

conj and

prep <object marker>

art the

subs earth

verse 2

Genesis 1:2

sentence 2

clause WXQt

phrase Conj CP

וְ

conj and

phrase Subj NP

הָ

art the

אָ֗רֶץ

subs earth

phrase Pred VP

הָיְתָ֥ה

verb be qal perf

phrase PreC NP

תֹ֨הוּ֙

subs emptiness

וָ

conj and

בֹ֔הוּ

subs emptiness

sentence 3

clause NmCl

phrase Conj CP

וְ

conj and

phrase Subj NP

חֹ֖שֶׁךְ

subs darkness

phrase PreC PP

עַל־

prep upon

פְּנֵ֣י

subs face

תְהֹ֑ום

subs primeval ocean

sentence 4

clause Ptcp

phrase Conj CP

וְ

conj and

phrase Subj NP

ר֣וּחַ

subs wind

אֱלֹהִ֔ים

subs god(s)

phrase PreC VP

מְרַחֶ֖פֶת

verb shake piel ptca

phrase Cmpl PP

prep upon

subs face

art the

subs water

verse 3

Genesis 1:3

sentence 5

clause WayX

phrase Conj CP

וַ

conj and

phrase Pred VP

יֹּ֥אמֶר

verb say qal wayq

phrase Subj NP

אֱלֹהִ֖ים

subs god(s)

sentence 6

clause ZYqX

phrase Pred VP

יְהִ֣י

verb be qal impf

phrase Subj NP

אֹ֑ור

subs light

sentence 7

clause WayX

phrase Conj CP

וַֽ

conj and

phrase Pred VP

יְהִי־

verb be qal wayq

phrase Subj NP

אֹֽור׃

subs light

verse 4

Genesis 1:4

sentence 8

clause WayX

phrase Conj CP

וַ

conj and

phrase Pred VP

יַּ֧רְא

verb see qal wayq

phrase Subj NP

אֱלֹהִ֛ים

subs god(s)

phrase Objc PP

אֶת־

prep <object marker>

הָ

art the

אֹ֖ור

subs light

clause Objc xQt0

phrase Conj CP

כִּי־

conj that

phrase Pred VP

טֹ֑וב

verb be good qal perf

sentence 9

clause WayX

phrase Conj CP

וַ

conj and

phrase Pred VP

יַּבְדֵּ֣ל

verb separate hif wayq

phrase Subj NP

אֱלֹהִ֔ים

subs god(s)

phrase Cmpl PP

prep interval

art the

subs light

conj and

prep interval

art the

subs darkness

verse 5

Genesis 1:5

sentence 10

clause WayX

phrase Conj CP

וַ

conj and

phrase Pred VP

יִּקְרָ֨א

verb call qal wayq

phrase Subj NP

אֱלֹהִ֤ים׀

subs god(s)

phrase Cmpl PP

לָ

prep to

art the

אֹור֙

subs light

phrase Objc NP

יֹ֔ום

subs day

sentence 11

clause WxQ0

phrase Conj CP

וְ

conj and

phrase Cmpl PP

לַ

prep to

art the

חֹ֖שֶׁךְ

subs darkness

phrase Pred VP

קָ֣רָא

verb call qal perf

phrase Objc NP

לָ֑יְלָה

subs night

sentence 12

clause WayX

phrase Conj CP

וַֽ

conj and

phrase Pred VP

יְהִי־

verb be qal wayq

phrase Subj NP

עֶ֥רֶב

subs evening

sentence 13

clause WayX

phrase Conj CP

וַֽ

conj and

phrase Pred VP

יְהִי־

verb be qal wayq

phrase Subj NP

בֹ֖קֶר

subs morning

sentence 14

clause NmCl

phrase PreC NP

יֹ֥ום

subs day

אֶחָֽד׃ פ

subs one

Note that the pieces of template that belong to a quantifier, do not correspond to nodes in the result tuples!

Check by hand:

In [11]:

indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    preds = [p for p in phrases if F.function.v(p) == 'Pred']
    good = True
    for pred in preds:
        if any(F.sp.v(w) != 'verb' for w in L.d(pred, otype='word')):
            good = False
    if good:
        subjs = [p for p in phrases if F.function.v(p) == 'Subj']
        for subj in subjs:
            handResults.append((clause, subj))
info(len(handResults))

  1.34s 31399

In [12]:

queryResults == handResults

Out[12]:

True

Inspection¶

We can see which templates are being composed in the course of interpreting the quantifier. We use the good old S.study():

In [13]:

query = '''
clause
/where/
  phrase function=Pred
/have/
  /without/
    word sp#verb
  /-/
/-/
  phrase function=Subj
'''
S.study(query)

   |     0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
   |     0.00s "Quantifier on "parent:clause"
   |      |   /where/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     0.70s 57070 matching nodes
   |      |   /have/
   |      |   parent:clause
   |      |     phrase function=Pred
   |      |     /without/
   |      |       word sp#verb
   |      |     /-/
   |      |   /-/
   |      |     0.00s "Quantifier on "parent:phrase function=Pred"
   |      |      |   /without/
   |      |      |   parent:phrase function=Pred
   |      |      |     word sp#verb
   |      |      |   /-/
   |      |      |     1.73s 4893 nodes to exclude
   |      |     1.76s reduction from 57070 to 52177 nodes
   |      |     2.02s 52177 matching nodes
   |      |     2.05s 4893 match antecedent but not consequent
   |     2.03s reduction from 88101 to 83208 nodes
  2.43s Constraining search space with 1 relations ...
  2.43s Setting up retrieval plan ...
  2.43s Ready to deliver results from 115154 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

Observe the stepwise unraveling of the quantifiers, and the auxiliary templates that are distilled from your original template.

If you ever get syntax errors, run S.study() to find clues.

Subject at start or at end¶

We want the clauses that consist of at least two adjacent phrases, has a Subj phrase, which is either at the beginning or at the end.

In [14]:

query = '''
c:clause
/with/
  =: phrase function=Subj
/or/
  := phrase function=Subj
/-/
  phrase
  <: phrase
'''

queryResults = sorted(B.search(query, shallow=True))

  1.00s 15332 results

Check by hand:

In [15]:

indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
    clauseWords = L.d(clause, otype='word')
    phrases = set(L.d(clause, otype='phrase'))
    if any(L.n(p, otype='phrase') and (L.n(p, otype='phrase')[0] in phrases) for p in phrases):
        # handResults.append(clause)
        # continue
        subjPhrases = [p for p in phrases if F.function.v(p) == 'Subj']
        if (
            any(L.d(p, otype='word')[0] == clauseWords[0] for p in subjPhrases)
            or
            any(L.d(p, otype='word')[-1] == clauseWords[-1] for p in subjPhrases)
        ):
            handResults.append(clause)
info(len(handResults))

  2.93s 15332

A nice case where the search template performs better than this particular piece of hand-coding.

In [16]:

queryResults == handResults

Out[16]:

True

Let's also study this query:

In [17]:

S.study(query)

   |     0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
   |     0.00s "Quantifier on "c:clause"
   |      |   /with/
   |      |   c:clause
   |      |     =: phrase function=Subj
   |      |     0.58s adding 5297 to 0 yields 5297 nodes
   |      |   /or/
   |      |   c:clause
   |      |     := phrase function=Subj
   |      |     0.77s adding 11118 to 5297 yields 15924 nodes
   |      |   /-/
   |     0.77s reduction from 88101 to 15924 nodes
  0.97s Constraining search space with 3 relations ...
  0.99s Setting up retrieval plan ...
  1.02s Ready to deliver results from 522298 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

Verb-containing phrases¶

Suppose we want to collect all phrases with the condition that if they contain a verb, their function is Pred.

This is a bit theoretical, but it shows two powerful constructs to increase readability of quantifiers.

Unreadable¶

First we express it without special constructs.

In [18]:

query = '''
p:phrase
/where/
  w:word pdp=verb
/have/
q:phrase function=Pred
q = p
/-/
'''
results = B.search(query, shallow=True)

  1.71s 241232 results

We check the query by means of hand-coding:

is every result a phrase: either without verbs, or with function Pred?
is every phrase without verbs or with function Pred contained in the results?

In [19]:

allPhrases = set(F.otype.s('phrase'))

ok1 = all(
    F.function.v(p) == 'Pred'
    or 
    all(F.pdp.v(w) != 'verb' for w in L.d(p, otype='word'))
    for p in results
)
ok2 = all(
    p in results
    for p in allPhrases
    if (
        F.function.v(p) == 'Pred'
        or
        all(F.pdp.v(w) != 'verb' for w in L.d(p, otype='word'))
    )
)

print(f'Check 1: {ok1}')
print(f'Check 2: {ok2}')

Check 1: True
Check 2: True

Ok, we are sure that the query does what we think it does.

Readable¶

Now let's make it more readable.

In [20]:

query = '''
phrase
/where/
  w:word pdp=verb
/have/
.. function=Pred
/-/
'''

In [21]:

results2 = B.search(query, shallow=True)

print(f'Same results as before? {results == results2}')

  1.61s 241232 results
Same results as before? True

Try to see how search is providing the name parent to the phrase atom and how it resolves the name ..:

In [22]:

S.study(query)

   |     0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 1 objects ...
   |     0.00s "Quantifier on "parent:phrase"
   |      |   /where/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |     1.09s 69026 matching nodes
   |      |   /have/
   |      |   parent:phrase
   |      |     w:word pdp=verb
   |      |   parent function=Pred
   |      |   /-/
   |      |     1.49s 57070 matching nodes
   |      |     1.57s 11955 match antecedent but not consequent
   |     1.58s reduction from 253187 to 241232 nodes
  1.60s Constraining search space with 0 relations ...
  1.60s Setting up retrieval plan ...
  1.60s Ready to deliver results from 241232 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

Next¶

You master the theory.

In practice, their are pitfalls: rough edges

basic advanced relations quantifiers rough gaps