You might want to consider the start of this tutorial.
%load_ext autoreload
%autoreload 2
from tf.fabric import Fabric
from tf.extra.bhsa import Bhsa
VERSION = '2017'
DATABASE = '~/github/etcbc'
BHSA = f'bhsa/tf/{VERSION}'
PARA = f'parallels/tf/{VERSION}'
TF = Fabric(locations=[DATABASE], modules=[BHSA, PARA], silent=True )
api = TF.load('', silent=True)
api.makeAvailableIn(globals())
B = Bhsa(api, 'search', version=VERSION)
Documentation: BHSA Feature docs BHSA API Text-Fabric API 5.5.12 Search Reference
This part of search templates is still experimental.
Quantifiers add considerable power to search templates.
Quantifiers consist of full-fledged search templates themselves, and give rise to auxiliary searches being performed.
The use of quantifiers may prevent the need to resort to hand-coding in many cases. That said, they can also be exceedingly tricky, so that it is advisable to check the results by hand-coding anyway, until you are perfectly comfortable with them.
It is easy to find the lexemes that occur in a specific book only.
Because the lex
node of such a lexeme is contained in the node of that specific book.
Lets get the lexemes specific to Ezra and then those specific to Nehemiah.
query = '''
book book@en=Ezra
lex
'''
ezLexemes = B.search(query)
ezSet = {r[1] for r in ezLexemes}
query = '''
book book@en=Nehemiah
lex
'''
nhLexemes = B.search(query)
nhSet = {r[1] for r in nhLexemes}
print(f'Total {len(ezSet | nhSet)} lexemes')
0.02s 199 results 0.01s 110 results Total 309 lexemes
What if we want to have the lexemes that occur only in Ezra and Nehemia?
If such a lexeme occurs in both books, it will not be contained by either book. So we have missed them by the two queries above.
We have to find a different way. Something like: search for lexemes of which all words occur either in Ezra or in Nehemia.
With the template constructions you have seen so far, this is impossible to say.
This is where quantifiers come in.
First we are going to query for these lexemes by means of a no:
quantifier.
query = '''
lex
/without/
book book@en#Ezra|Nehemiah
w:word
w ]] ..
/-/
'''
query1results = B.search(query, shallow=True)
0 parent:lex 1 books book@en#Ezra|Nehemiah 2 w:word 3 w ]] parent Unknown object type in line 1: "books" Valid object types are: book,chapter,lex,verse,half_verse,sentence,sentence_atom,clause,clause_atom,phrase,phrase_atom,subphrase,word
0.01s 9233 results
Now the /without/
quantifier is a bit of a roundabout way to say what you really mean.
We can also employ the more positive /where/
quantifier.
query = '''
lex
/where/
w:word
/have/
b:book book@en=Ezra|Nehemiah
w ]] b
/-/
'''
query2results = B.search(query, shallow=True)
0.86s 382 results
Check by hand coding:
indent(reset=True)
universe = F.otype.s('lex')
wordsEzNh = set(
L.d(T.bookNode('Ezra', lang='en'), otype='word') +
L.d(T.bookNode('Nehemiah', lang='en'), otype='word')
)
handResults = set()
for lex in universe:
occs = set(L.d(lex, otype='word'))
if occs <= wordsEzNh:
handResults.add(lex)
info(len(handResults))
0.19s 382
Looks good, but we are thorough:
print(query1results == handResults)
print(query2results == handResults)
True True
Let's look for clauses with where all Pred
phrases contain only verbs and look for Subj
phrases in those clauses.
query = '''
clause
/where/
phrase function=Pred
/have/
/without/
word sp#verb
/-/
/-/
phrase function=Subj
'''
queryResults = B.search(query)
B.show(queryResults, end=5)
Note that the pieces of template that belong to a quantifier, do not correspond to nodes in the result tuples!
Check by hand:
indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
phrases = L.d(clause, otype='phrase')
preds = [p for p in phrases if F.function.v(p) == 'Pred']
good = True
for pred in preds:
if any(F.sp.v(w) != 'verb' for w in L.d(pred, otype='word')):
good = False
if good:
subjs = [p for p in phrases if F.function.v(p) == 'Subj']
for subj in subjs:
handResults.append((clause, subj))
info(len(handResults))
1.34s 31399
queryResults == handResults
True
We can see which templates are being composed in the course of interpreting the quantifier.
We use the good old S.study()
:
query = '''
clause
/where/
phrase function=Pred
/have/
/without/
word sp#verb
/-/
/-/
phrase function=Subj
'''
S.study(query)
| 0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed 0.00s Checking search template ... 0.00s Setting up search space for 2 objects ... | 0.00s "Quantifier on "parent:clause" | | /where/ | | parent:clause | | phrase function=Pred | | 0.70s 57070 matching nodes | | /have/ | | parent:clause | | phrase function=Pred | | /without/ | | word sp#verb | | /-/ | | /-/ | | 0.00s "Quantifier on "parent:phrase function=Pred" | | | /without/ | | | parent:phrase function=Pred | | | word sp#verb | | | /-/ | | | 1.73s 4893 nodes to exclude | | 1.76s reduction from 57070 to 52177 nodes | | 2.02s 52177 matching nodes | | 2.05s 4893 match antecedent but not consequent | 2.03s reduction from 88101 to 83208 nodes 2.43s Constraining search space with 1 relations ... 2.43s Setting up retrieval plan ... 2.43s Ready to deliver results from 115154 nodes Iterate over S.fetch() to get the results See S.showPlan() to interpret the results
Observe the stepwise unraveling of the quantifiers, and the auxiliary templates that are distilled from your original template.
If you ever get syntax errors, run S.study()
to find clues.
We want the clauses that consist of at least two adjacent phrases, has a Subj phrase, which is either at the beginning or at the end.
query = '''
c:clause
/with/
=: phrase function=Subj
/or/
:= phrase function=Subj
/-/
phrase
<: phrase
'''
queryResults = sorted(B.search(query, shallow=True))
1.00s 15332 results
Check by hand:
indent(reset=True)
handResults = []
for clause in F.otype.s('clause'):
clauseWords = L.d(clause, otype='word')
phrases = set(L.d(clause, otype='phrase'))
if any(L.n(p, otype='phrase') and (L.n(p, otype='phrase')[0] in phrases) for p in phrases):
# handResults.append(clause)
# continue
subjPhrases = [p for p in phrases if F.function.v(p) == 'Subj']
if (
any(L.d(p, otype='word')[0] == clauseWords[0] for p in subjPhrases)
or
any(L.d(p, otype='word')[-1] == clauseWords[-1] for p in subjPhrases)
):
handResults.append(clause)
info(len(handResults))
2.93s 15332
A nice case where the search template performs better than this particular piece of hand-coding.
queryResults == handResults
True
Let's also study this query:
S.study(query)
| 0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed 0.00s Checking search template ... 0.00s Setting up search space for 3 objects ... | 0.00s "Quantifier on "c:clause" | | /with/ | | c:clause | | =: phrase function=Subj | | 0.58s adding 5297 to 0 yields 5297 nodes | | /or/ | | c:clause | | := phrase function=Subj | | 0.77s adding 11118 to 5297 yields 15924 nodes | | /-/ | 0.77s reduction from 88101 to 15924 nodes 0.97s Constraining search space with 3 relations ... 0.99s Setting up retrieval plan ... 1.02s Ready to deliver results from 522298 nodes Iterate over S.fetch() to get the results See S.showPlan() to interpret the results
Suppose we want to collect all phrases with the condition that if they
contain a verb, their function
is Pred
.
This is a bit theoretical, but it shows two powerful constructs to increase readability of quantifiers.
First we express it without special constructs.
query = '''
p:phrase
/where/
w:word pdp=verb
/have/
q:phrase function=Pred
q = p
/-/
'''
results = B.search(query, shallow=True)
1.71s 241232 results
We check the query by means of hand-coding:
allPhrases = set(F.otype.s('phrase'))
ok1 = all(
F.function.v(p) == 'Pred'
or
all(F.pdp.v(w) != 'verb' for w in L.d(p, otype='word'))
for p in results
)
ok2 = all(
p in results
for p in allPhrases
if (
F.function.v(p) == 'Pred'
or
all(F.pdp.v(w) != 'verb' for w in L.d(p, otype='word'))
)
)
print(f'Check 1: {ok1}')
print(f'Check 2: {ok2}')
Check 1: True Check 2: True
Ok, we are sure that the query does what we think it does.
Now let's make it more readable.
query = '''
phrase
/where/
w:word pdp=verb
/have/
.. function=Pred
/-/
'''
results2 = B.search(query, shallow=True)
print(f'Same results as before? {results == results2}')
1.61s 241232 results Same results as before? True
Try to see how search is providing the name parent
to the phrase atom and how it resolves the name ..
:
S.study(query)
| 0.00s Feature overview: 109 for nodes; 8 for edges; 1 configs; 7 computed 0.00s Checking search template ... 0.00s Setting up search space for 1 objects ... | 0.00s "Quantifier on "parent:phrase" | | /where/ | | parent:phrase | | w:word pdp=verb | | 1.09s 69026 matching nodes | | /have/ | | parent:phrase | | w:word pdp=verb | | parent function=Pred | | /-/ | | 1.49s 57070 matching nodes | | 1.57s 11955 match antecedent but not consequent | 1.58s reduction from 253187 to 241232 nodes 1.60s Constraining search space with 0 relations ... 1.60s Setting up retrieval plan ... 1.60s Ready to deliver results from 241232 nodes Iterate over S.fetch() to get the results See S.showPlan() to interpret the results