Notebook

Search¶

Search in Text-Fabric is a template based way of looking for structural patterns in your dataset.

It is inspired by the idea of topographic query, as worked out in MQL which has been implemented in Emdros. See also pitfalls of MQL

Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for complicated syntactical patterns with the power of programmatically processing the results.

This notebook will show you how to get up and running.

See the notebook searchFromMQL for examples how MQL queries can be expressed in Text-Fabric search.

Before we continue¶

Search is a big feature in Text-Fabric. It is also a very recent addition.

Caution:¶

There might be bugs.

Search is also costly. Quite a bit of the implementation work has been dedicated to optimize performance. But it is worth the price: search templates are powerful for a wide range of purposes. I do not pretend, however, to have found optimal strategies for all possible search templates.

That being said, I think search might turn out helpful in many cases, and I welcome your feedback.

Dirk Roorda, 2016-12-23, updates 2017-10-10

Search command¶

Search is as simple as saying (just an example)

for r in S.search(template): print(S.glean(r))

See all ins and outs in the search template reference.

All search related things use the S api.

In [1]:

from tf.fabric import Fabric

In [2]:

DATABASE = '~/github/etcbc'
BHSA = 'bhsa/tf/2017'
TF = Fabric(locations=[DATABASE], modules=[BHSA], silent=False )

This is Text-Fabric 3.0.9
Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

114 features found and 0 ignored

Let us just not load any specific features.

In [3]:

api = TF.load('', silent=True)
api.makeAvailableIn(globals())

Basic search command¶

We start with the most simple form of issuing a query. Let's look for the word Elohim in undetermined phrases, only in Genesis 1-2.

All work involved in searching takes place under the hood.

In [4]:

query = '''
book book=Genesis
  chapter chapter=1|2
    verse
      phrase det=und
        word lex=>LHJM/
'''
for r in S.search(query): print(S.glean(r))

  Genesis 1:1 phrase[אֱלֹהִ֑ים ] אֱלֹהִ֑ים 
  Genesis 1:2 phrase[ר֣וּחַ אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:3 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:4 phrase[אֱלֹהִ֛ים ] אֱלֹהִ֛ים 
  Genesis 1:4 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:5 phrase[אֱלֹהִ֤ים׀ ] אֱלֹהִ֤ים׀ 
  Genesis 1:6 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:7 phrase[אֱלֹהִים֮ ] אֱלֹהִים֮ 
  Genesis 1:8 phrase[אֱלֹהִ֛ים ] אֱלֹהִ֛ים 
  Genesis 1:9 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים 
  Genesis 1:10 phrase[אֱלֹהִ֤ים׀ ] אֱלֹהִ֤ים׀ 
  Genesis 1:10 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:11 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים 
  Genesis 1:12 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:14 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים 
  Genesis 1:16 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:17 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:18 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:20 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:21 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:21 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:22 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:24 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים 
  Genesis 1:25 phrase[אֱלֹהִים֩ ] אֱלֹהִים֩ 
  Genesis 1:25 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:26 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:27 phrase[אֱלֹהִ֤ים׀ ] אֱלֹהִ֤ים׀ 
  Genesis 1:27 phrase[בְּצֶ֥לֶם אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:28 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים 
  Genesis 1:28 phrase[אֱלֹהִים֒ ] אֱלֹהִים֒ 
  Genesis 1:29 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים 
  Genesis 1:31 phrase[אֱלֹהִים֙ ] אֱלֹהִים֙ 
  Genesis 2:2 phrase[אֱלֹהִים֙ ] אֱלֹהִים֙ 
  Genesis 2:3 phrase[אֱלֹהִים֙ ] אֱלֹהִים֙ 
  Genesis 2:3 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים

Under the hood¶

It might be helpful to peek under the hood, especially when exploring new searches. We feed the query to the search API, which will study it. The syntax will be checked, features loaded, the search space will be set up, narrowed down, and the fetching of results will be prepared, but not yet executed.

In order to make the query a bit more interesting, we lift the constraint that the results must be in Genesis 1-2.

In [5]:

query = '''
book
  chapter
    verse
      phrase det=und
        word lex=>LHJM/
'''

In [6]:

S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.99s Constraining search space with 4 relations ...
  1.00s Setting up retrieval plan ...
  1.02s Ready to deliver results from 2735 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

Before we rush to the results, lets have a look at the plan.

In [7]:

S.showPlan()

    11s The results are connected to the original search template as follows:
 0     
 1 R0  book
 2 R1    chapter
 3 R2      verse
 4 R3        phrase det=und
 5 R4          word lex=>LHJM/
 6

Here you see already what your results will look like. Each result r is a tuple of nodes:

(R0, R1, R2, R3, R4)

that instantiate the objects in your template.

Excursion¶

In case you are curious, you can get details about the search space as well:

In [8]:

S.showPlan(details=True)

Search with 5 objects and 4 relations
Results are instantiations of the following objects:
node  0-book                              (    29   choices)
node  1-chapter                           (   329   choices)
node  2-verse                             (   754   choices)
node  3-phrase                            (   805   choices)
node  4-word                              (   818   choices)
Instantiations are computed along the following relations:
node                      0-book          (    29   choices)
edge  0-book          [[  1-chapter       (    10.6 choices)
edge  1-chapter       [[  2-verse         (     1.8 choices)
edge  2-verse         [[  3-phrase        (     1.1 choices)
edge  3-phrase        [[  4-word          (     1.0 choices)
    19s The results are connected to the original search template as follows:
 0     
 1 R0  book
 2 R1    chapter
 3 R2      verse
 4 R3        phrase det=und
 5 R4          word lex=>LHJM/
 6

The part about the nodes shows you how many possible instantiations for each object in your template has been found. These are not results yet, because only combinations of instantiations that satisfy all constraints are results.

The constraints come from the relations between the objects that you specified. In this case, there is only an implicit relation: embedding [[. Later on we'll examine all basic relations.

The part about the edges shows you the constraints, and in what order they will be computed when stitching results together. In this case the order is exactly the order by which the relations appear in the template, but that will not always be the case. Text-Fabric spends some time and ingenuity to find out an optimal stitch plan.

Nevertheless, fetching results may take time.

For some queries, it can take a large amount of time to walk through all results. Even worse, it may happen that it takes a large amount of time before getting the first result.

This has to do with search strategies on the one hand, and the very likely possibility to encounter pathological search patterns, which have billions of results, mostly unintended. For example, a simple query that asks for 5 words in the Hebrew Bible without further constraints, will have 425,000 to the power of 5 results. That is 10-e28 (a one with 28 zeros), roughly the number of molecules in a few hundred liters of air. That may not sound much, but it is 10,000 times the amount of bytes that can be currently stored on the whole Internet.

Text-Fabric search is not yet done with finding optimal search strategies, and I hope to refine its arsenal of methods in the future, depending on what you report.

Back to business¶

It is always a good idea to get a feel for the amount of results, before you dive into them head-on.

In [9]:

S.count(progress=1, limit=5)

  0.00s Counting results per 1 up to 5 ...
   |     0.01s 1
   |     0.01s 2
   |     0.01s 3
   |     0.01s 4
   |     0.02s 5
  0.02s Done: 5 results

We asked for 5 results in total, with a progress message for every one. That was a bit conservative.

In [10]:

S.count(progress=100, limit=500)

  0.00s Counting results per 100 up to 500 ...
   |     0.01s 100
   |     0.03s 200
   |     0.07s 300
   |     0.09s 400
   |     0.13s 500
  0.14s Done: 500 results

Still pretty quick, now we want to count all results.

In [11]:

S.count(progress=200, limit=-1)

  0.00s Counting results per 200 up to  the end of the results ...
   |     0.03s 200
   |     0.09s 400
   |     0.13s 600
   |     0.15s 800
  0.15s Done: 818 results

It is time to see something of those results.

In [12]:

S.fetch(limit=10)

Out[12]:

((426585, 426624, 1414190, 651505, 4),
 (426585, 426624, 1414191, 651515, 26),
 (426585, 426624, 1414192, 651520, 34),
 (426585, 426624, 1414193, 651528, 42),
 (426585, 426624, 1414193, 651534, 50),
 (426585, 426624, 1414194, 651538, 60),
 (426585, 426624, 1414195, 651554, 81),
 (426585, 426624, 1414196, 651564, 97),
 (426585, 426624, 1414197, 651578, 127),
 (426585, 426624, 1414198, 651590, 142))

Not very informative. Just a quick observation: look at the last column. These are the result nodes for the word part in the query, indicated as R7 by showPlan() before. And indeed, they are all below 425,000, the number of words in the Hebrew Bible.

Nevertheless, we want to glean a bit more information off them.

In [13]:

for r in S.fetch(limit=10):
    print(S.glean(r))

  Genesis 1:1 phrase[אֱלֹהִ֑ים ] אֱלֹהִ֑ים 
  Genesis 1:2 phrase[ר֣וּחַ אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:3 phrase[אֱלֹהִ֖ים ] אֱלֹהִ֖ים 
  Genesis 1:4 phrase[אֱלֹהִ֛ים ] אֱלֹהִ֛ים 
  Genesis 1:4 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:5 phrase[אֱלֹהִ֤ים׀ ] אֱלֹהִ֤ים׀ 
  Genesis 1:6 phrase[אֱלֹהִ֔ים ] אֱלֹהִ֔ים 
  Genesis 1:7 phrase[אֱלֹהִים֮ ] אֱלֹהִים֮ 
  Genesis 1:8 phrase[אֱלֹהִ֛ים ] אֱלֹהִ֛ים 
  Genesis 1:9 phrase[אֱלֹהִ֗ים ] אֱלֹהִ֗ים

Caution¶

It is not possible to do len(S.fetch()).

Because fetch() is a generator, not a list. It will deliver a result every time it is being asked and for as long as there are results, but it does not know in advance how many there will be.

Fetching a result can be costly, because due to the constraints, a lot of possibilities

may have to be tried and rejected before a the next result is found.

That is why you often see results coming in at varying speeds when counting them.

This search template has some pretty tight constraints on one of its objects, so the amount of data to deal with is pretty limited.

Let us turn to a template where this is not so.

In [14]:

query = '''
# test
# verse book=Genesis chapter=2 verse=25
verse
  clause
                                 
    p1:phrase
        w1:word
        w3:word
        w1 < w3

    p2:phrase
        w2:word
        w1 < w2 
        w3 > w2
    
    p1 < p2   
'''

A couple of remarks.

some objects have got a name
there are additional relations specified between named objects
< means: comes before, and >: comes after in the canonical order for nodes, which for words means: comes textually before/after, but for other nodes the meaning is explained here
later on we describe those relations in more detail

Note on order¶

Look at the words w1 and w3 below phrase p1.

Although in the template w1 comes before w3, this is not translated in a search constraint of the same nature.

Order between objects in a template is never significant, only embedding is.

Because order is not significant, you have to specify order yourself, using relations.

It turns out that this is better than the other way around. In MQL order is significant, and it is very difficult to search for w1 and w2 in any order. Especially if your are looking for more than 2 complex objects with lots of feature conditions, your search template would explode if you had to spell out all possible permutations. See the example of Reinoud Oosting below.

Note on gaps¶

Look at the phrases p1 and p2.

We do not specify an order here, only that they are different. In order to prevent duplicated searches with p1 and p2 interchanged, we even stipulate that p1 < p2. There are many spatial relationships possible between different objects. In many cases, neither the one comes before the other, nor vice versa. They can overlap, one can occur in a gap of the other, they can be completely disjoint and interleaved, etc.

In [15]:

S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 7 objects ...
  0.48s Constraining search space with 10 relations ...
  0.51s Setting up retrieval plan ...
  0.56s Ready to deliver results from 1897440 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

That was quick! Well, Text-Fabric knows that narrowing down the search space in this case would take ages, without resulting in a significantly shrunken space. So it skips doing so for most constraints.

Let us see the plan, with details.

In [16]:

S.showPlan(details=True)

Search with 7 objects and 10 relations
Results are instantiations of the following objects:
node  0-verse                             ( 23213   choices)
node  1-clause                            ( 88101   choices)
node  2-phrase                            (253187   choices)
node  3-word                              (426584   choices)
node  4-word                              (426584   choices)
node  5-phrase                            (253187   choices)
node  6-word                              (426584   choices)
Instantiations are computed along the following relations:
node                      0-verse         ( 23213   choices)
edge  0-verse         [[  1-clause        (     3.6 choices)
edge  1-clause        [[  5-phrase        (     2.5 choices)
edge  5-phrase        [[  6-word          (     2.3 choices)
edge  1-clause        [[  2-phrase        (     3.4 choices)
edge  2-phrase        <   5-phrase        (126593.5 choices)
edge  2-phrase        [[  3-word          (     1.6 choices)
edge  3-word          <   6-word          (213292.0 choices)
edge  2-phrase        [[  4-word          (     1.6 choices)
edge  4-word          >   6-word          (213292.0 choices)
edge  3-word          <   4-word          (213292.0 choices)
  4.13s The results are connected to the original search template as follows:
 0     
 1     # test
 2     # verse book=Genesis chapter=2 verse=25
 3 R0  verse
 4 R1    clause
 5                                      
 6 R2      p1:phrase
 7 R3          w1:word
 8 R4          w3:word
 9             w1 < w3
10     
11 R5      p2:phrase
12 R6          w2:word
13             w1 < w2 
14             w3 > w2
15         
16         p1 < p2   
17

As you see, we have a hefty search space here. Let us play with the count() function.

In [17]:

S.count(progress=10, limit=100)

  0.00s Counting results per 10 up to 100 ...
   |     0.14s 10
   |     0.14s 20
   |     0.14s 30
   |     0.17s 40
   |     0.17s 50
   |     0.18s 60
   |     0.20s 70
   |     0.20s 80
   |     0.20s 90
   |     0.21s 100
  0.21s Done: 100 results

We can be bolder than this!

In [18]:

S.count(progress=100, limit=1000)

  0.00s Counting results per 100 up to 1000 ...
   |     0.18s 100
   |     0.23s 200
   |     0.23s 300
   |     0.46s 400
   |     0.53s 500
   |     0.54s 600
   |     0.64s 700
   |     0.81s 800
   |     0.84s 900
   |     1.07s 1000
  1.07s Done: 1000 results

OK, not too bad, but note that it takes a big fraction of a second to get just 100 results.

Now let us go for all of them by the thousand.

In [19]:

S.count(progress=1000, limit=-1)

  0.00s Counting results per 1000 up to  the end of the results ...
   |     1.01s 1000
   |     1.60s 2000
   |     2.26s 3000
   |     2.87s 4000
   |     3.58s 5000
   |     4.82s 6000
   |     7.56s 7000
  8.98s Done: 7618 results

See? This is substantial work.

In [20]:

for r in S.fetch(limit=10):
    print(S.glean(r))

Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] שְׁנֵיהֶם֙  הָֽ phrase[עֲרוּמִּ֔ים ] עֲרוּמִּ֔ים 
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] שְׁנֵיהֶם֙  אָדָ֖ם  phrase[עֲרוּמִּ֔ים ] עֲרוּמִּ֔ים 
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] שְׁנֵיהֶם֙  וְ phrase[עֲרוּמִּ֔ים ] עֲרוּמִּ֔ים 
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] שְׁנֵיהֶם֙  אִשְׁתֹּ֑ו  phrase[עֲרוּמִּ֔ים ] עֲרוּמִּ֔ים 
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ] הֶ֨בֶל  גַם־ phrase[הֵבִ֥יא ] הֵבִ֥יא 
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ] הֶ֨בֶל  ה֛וּא  phrase[הֵבִ֥יא ] הֵבִ֥יא 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] גַּם־ אֲחִ֖י  phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] עֵ֔בֶר 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] גַּם־ יֶ֥פֶת  phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] עֵ֔בֶר 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] גַּם־ הַ phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] עֵ֔בֶר 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] גַּם־ גָּדֹֽול׃  phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] עֵ֔בֶר

As a check, here is some code that looks for basically the same phenomenon: a phrase within the gap of another phrase. It does not use search, and it gets a bit more focused results, in half the time compared to the search with the template.

Hint¶

If you are comfortable with programming, and what you look for is fairly generic,

you may be better off without search, provided you can translate your insight in the data into an effective procedure within Text-Fabric. But wait till we are completely done with this example!

In [21]:

indent(reset=True)
info('Getting gapped phrases')
results = []
for v in F.otype.s('verse'):
    for c in L.d(v, otype='clause'):
        ps = L.d(c, otype='phrase')
        first = {}
        last = {}
        slots = {}
        # make index of phrase boundaries
        for p in ps:
            words = L.d(p, otype='word')
            first[p] = words[0]
            last[p] = words[-1]
            slots[p] = set(words)
        for p1 in ps:
            for p2 in ps:
                if p2 < p1: continue
                if len(slots[p1] & slots[p2]) != 0: continue
                if first[p1] < first[p2] and last[p2] < last[p1]:
                    results.append((v, c, p1, p2, first[p1], first[p2], last[p2], last[p1]))
info('{} results'.format(len(results)))
for r in results[0:10]:
    print(r)

  0.00s Getting gapped phrases
  3.21s 368 results
(1414245, 427767, 652147, 652148, 1159, 1160, 1160, 1164)
(1414273, 427889, 652504, 652505, 1720, 1721, 1721, 1723)
(1414445, 428386, 654102, 654103, 4819, 4821, 4824, 4828)
(1414505, 428569, 654678, 654679, 5803, 5805, 5806, 5809)
(1414509, 428585, 654725, 654726, 5868, 5869, 5870, 5875)
(1414542, 428692, 655061, 655062, 6515, 6521, 6521, 6530)
(1414594, 428886, 655642, 655643, 7431, 7432, 7433, 7437)
(1414651, 429128, 656353, 656354, 8502, 8507, 8507, 8520)
(1414651, 429128, 656353, 656355, 8502, 8508, 8510, 8520)
(1414740, 429497, 657505, 657506, 10284, 10287, 10287, 10291)

But we can use the pretty printing of glean() here as well, even though we have not used search!

In [22]:

for r in results[0:10]: print(S.glean(r))

Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] phrase[עֲרוּמִּ֔ים ] שְׁנֵיהֶם֙  עֲרוּמִּ֔ים  עֲרוּמִּ֔ים  אִשְׁתֹּ֑ו 
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ] phrase[הֵבִ֥יא ] הֶ֨בֶל  הֵבִ֥יא  הֵבִ֥יא  ה֛וּא 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] גַּם־ אֲבִי֙  עֵ֔בֶר  גָּדֹֽול׃ 
Genesis 12:17 clause[וַיְנַגַּ֨ע יְהוָ֧ה׀ אֶת־פַּרְעֹ֛ה ...] phrase[אֶת־פַּרְעֹ֛ה וְאֶת־בֵּיתֹ֑ו ] phrase[נְגָעִ֥ים גְּדֹלִ֖ים ] אֶת־ נְגָעִ֥ים  גְּדֹלִ֖ים  בֵּיתֹ֑ו 
Genesis 13:1 clause[וַיַּעַל֩ אַבְרָ֨ם מִמִּצְרַ֜יִם ...] phrase[אַבְרָ֨ם ה֠וּא וְאִשְׁתֹּ֧ו וְ...] phrase[מִמִּצְרַ֜יִם ] אַבְרָ֨ם  מִ מִּצְרַ֜יִם  כָל־
Genesis 14:16 clause[וְגַם֩ אֶת־לֹ֨וט אָחִ֤יו ...] phrase[גַם֩ אֶת־לֹ֨וט אָחִ֤יו וּ...] phrase[הֵשִׁ֔יב ] גַם֩  הֵשִׁ֔יב  הֵשִׁ֔יב  עָֽם׃ 
Genesis 17:7 clause[לִהְיֹ֤ות לְךָ֙ לֵֽאלֹהִ֔ים ...] phrase[לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ ] phrase[לֵֽאלֹהִ֔ים ] לְךָ֙  לֵֽ אלֹהִ֔ים  אַחֲרֶֽיךָ׃ 
Genesis 19:4 clause[וְאַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י ...] phrase[אַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י סְדֹם֙ ...] phrase[נָסַ֣בּוּ ] אַנְשֵׁ֨י  נָסַ֣בּוּ  נָסַ֣בּוּ  קָּצֶֽה׃ 
Genesis 19:4 clause[וְאַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י ...] phrase[אַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י סְדֹם֙ ...] phrase[עַל־הַבַּ֔יִת ] אַנְשֵׁ֨י  עַל־ בַּ֔יִת  קָּצֶֽה׃ 
Genesis 22:3 clause[וַיִּקַּ֞ח אֶת־שְׁנֵ֤י נְעָרָיו֙ ...] phrase[אֶת־שְׁנֵ֤י נְעָרָיו֙ וְאֵ֖ת ...] phrase[אִתֹּ֔ו ] אֶת־ אִתֹּ֔ו  אִתֹּ֔ו  בְּנֹ֑ו

Further on we have another example with gaps, and we get the results by means of a search template in a slightly other way.

Refine the search template¶

A second look at the results of our search template reveals that there are multiple results per pair of phrases, because there are in general multiple words in both phrases that satisfy the condition. We can make the search template stricter, by requiring alignment of the words with the starts and ends of the phrases they are in.

For this, we employ a convenient device in search templates that we have not explained yet.

Before each atom we may put a relational operator.

The meaning is that this relation holds between the preceding atom and the current one. If there is a lonely operator all by itself on a line, it means that this relation holds between the preceding sibling and the parent.

These operators are very handy to indicate that there is an order between siblings, and also that a child should start or end where the parent starts or ends.

In [23]:

query = '''
verse
  clause
                                 
    p1:phrase
        =: w1:word
        <  w3:word
        :=

    p2:phrase
        =: w2:word
    
    p1 < p2
    w1 < p2
    w2 < w3
    
'''

The line

=: w1:word

constrains word w1 to start exactly at the start of its parent, phrase p1.

The line

<  w3:word

constrains the preceding sibling w1 to come before w3 in the canonical node ordering. Because w1 and w3 are words, this means that w1 comes textually before w3.

The line

:=

constrains the preceding sibling, word w3 to end exactly at the end of its parent, phrase p1.

The line

=: w2:word

constrains word w2 to start exactly at the start of its parent, phrase p2.

Given two phrases p1 and p2, the positions of all three words w1, w2, w3 are fixed, so for every pair p1, p2 that satisfies the conditions, there is exactly one result.

In [24]:

S.study(query)
S.showPlan(details=True)
S.count(progress=100, limit=-1)
for r in S.fetch(limit=10): print(S.glean(r))

  0.00s Checking search template ...
  0.05s Setting up search space for 7 objects ...
  0.53s Constraining search space with 13 relations ...
  0.57s Setting up retrieval plan ...
  0.66s Ready to deliver results from 1897440 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 7 objects and 13 relations
Results are instantiations of the following objects:
node  0-verse                             ( 23213   choices)
node  1-clause                            ( 88101   choices)
node  2-phrase                            (253187   choices)
node  3-word                              (426584   choices)
node  4-word                              (426584   choices)
node  5-phrase                            (253187   choices)
node  6-word                              (426584   choices)
Instantiations are computed along the following relations:
node                      0-verse         ( 23213   choices)
edge  0-verse         [[  1-clause        (     4.1 choices)
edge  1-clause        [[  2-phrase        (     3.0 choices)
edge  2-phrase        =:  3-word          (     1.0 choices)
edge  3-word          ]]  2-phrase        (     1.0 choices)
edge  2-phrase        :=  4-word          (     1.0 choices)
edge  4-word          ]]  2-phrase        (     1.0 choices)
edge  3-word          <   4-word          (213292.0 choices)
edge  1-clause        [[  5-phrase        (     3.3 choices)
edge  5-phrase        >   2-phrase        (126593.5 choices)
edge  3-word          <   5-phrase        (126593.5 choices)
edge  5-phrase        =:  6-word          (     1.0 choices)
edge  6-word          ]]  5-phrase        (     1.0 choices)
edge  6-word          <   4-word          (213292.0 choices)
  0.69s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1    clause
 3                                      
 4 R2      p1:phrase
 5 R3          =: w1:word
 6 R4          <  w3:word
 7             :=
 8     
 9 R5      p2:phrase
10 R6          =: w2:word
11         
12         p1 < p2
13         w1 < p2
14         w2 < w3
15         
16     
  0.00s Counting results per 100 up to  the end of the results ...
   |     0.45s 100
   |     1.01s 200
   |     1.95s 300
  3.64s Done: 368 results
Genesis 2:25 clause[וַיִּֽהְי֤וּ שְׁנֵיהֶם֙ עֲרוּמִּ֔ים הָֽ...] phrase[שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו ] שְׁנֵיהֶם֙  אִשְׁתֹּ֑ו  phrase[עֲרוּמִּ֔ים ] עֲרוּמִּ֔ים 
Genesis 4:4 clause[וְהֶ֨בֶל הֵבִ֥יא גַם־ה֛וּא ...] phrase[הֶ֨בֶל גַם־ה֛וּא ] הֶ֨בֶל  ה֛וּא  phrase[הֵבִ֥יא ] הֵבִ֥יא 
Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] גַּם־ גָּדֹֽול׃  phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ] אֲבִי֙ 
Genesis 12:17 clause[וַיְנַגַּ֨ע יְהוָ֧ה׀ אֶת־פַּרְעֹ֛ה ...] phrase[אֶת־פַּרְעֹ֛ה וְאֶת־בֵּיתֹ֑ו ] אֶת־ בֵּיתֹ֑ו  phrase[נְגָעִ֥ים גְּדֹלִ֖ים ] נְגָעִ֥ים 
Genesis 13:1 clause[וַיַּעַל֩ אַבְרָ֨ם מִמִּצְרַ֜יִם ...] phrase[אַבְרָ֨ם ה֠וּא וְאִשְׁתֹּ֧ו וְ...] אַבְרָ֨ם  כָל־ phrase[מִמִּצְרַ֜יִם ] מִ
Genesis 14:16 clause[וְגַם֩ אֶת־לֹ֨וט אָחִ֤יו ...] phrase[גַם֩ אֶת־לֹ֨וט אָחִ֤יו וּ...] גַם֩  עָֽם׃  phrase[הֵשִׁ֔יב ] הֵשִׁ֔יב 
Genesis 17:7 clause[לִהְיֹ֤ות לְךָ֙ לֵֽאלֹהִ֔ים ...] phrase[לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ ] לְךָ֙  אַחֲרֶֽיךָ׃  phrase[לֵֽאלֹהִ֔ים ] לֵֽ
Genesis 19:4 clause[וְאַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י ...] phrase[אַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י סְדֹם֙ ...] אַנְשֵׁ֨י  קָּצֶֽה׃  phrase[נָסַ֣בּוּ ] נָסַ֣בּוּ 
Genesis 19:4 clause[וְאַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י ...] phrase[אַנְשֵׁ֨י הָעִ֜יר אַנְשֵׁ֤י סְדֹם֙ ...] אַנְשֵׁ֨י  קָּצֶֽה׃  phrase[עַל־הַבַּ֔יִת ] עַל־
Genesis 22:3 clause[וַיִּקַּ֞ח אֶת־שְׁנֵ֤י נְעָרָיו֙ ...] phrase[אֶת־שְׁנֵ֤י נְעָרָיו֙ וְאֵ֖ת ...] אֶת־ בְּנֹ֑ו  phrase[אִתֹּ֔ו ] אִתֹּ֔ו

And here we have exactly the same results as our hand-written piece of code.

Note

Now, with the "duplicate" results prevented, the search with the template has only a slight performance overhead compared to the manual piece of code!

But beware of complications. Search templates are powerful, but sometimes they lead to a different result set from what you might think. Here is an example.

A tricky example¶

Suppose we want to count the clauses consisting of exactly two phrases. The following template should do it: a clause, starting with a phrase, followed by an adjacent phrase, which terminates the clause.

In [25]:

query = '''
clause
    =: phrase
    <: phrase
    :=
'''

In [26]:

S.study(query)
S.showPlan(details=True)
qresults = sorted(r[0] for r in S.fetch())
info(f'Done: found {len(qresults)}')

  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.15s Constraining search space with 5 relations ...
  0.17s Setting up retrieval plan ...
  0.22s Ready to deliver results from 594475 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 3 objects and 5 relations
Results are instantiations of the following objects:
node  0-clause                            ( 88101   choices)
node  1-phrase                            (253187   choices)
node  2-phrase                            (253187   choices)
Instantiations are computed along the following relations:
node                      0-clause        ( 88101   choices)
edge  0-clause        :=  2-phrase        (     1.0 choices)
edge  2-phrase        ]]  0-clause        (     1.0 choices)
edge  2-phrase        :>  1-phrase        (     1.0 choices)
edge  1-phrase        =:  0-clause        (     0.2 choices)
edge  1-phrase        ]]  0-clause        (     1.0 choices)
  0.25s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1      =: phrase
 3 R2      <: phrase
 4         :=
 5     
  1.20s Done: found 23483

Let us check this with a piece of hand-written code.

In [27]:

indent(reset=True)
info('counting ...')

cresults = []
for c in F.otype.s('clause'):
    wc = L.d(c, otype='word')
    ps = L.d(c, otype='phrase')
    if len(ps) == 2:
        (fp, lp) = ps
        wf = L.d(fp, otype='word')
        wl = L.d(lp, otype='word')
        if wf[0] == wc[0] and wf[-1] + 1 == wl[0] and wl[-1] == wc[-1]:
            cresults.append(c)
cresults = sorted(cresults)
info(f'Done: found {len(cresults)}')

  0.00s counting ...
  1.55s Done: found 23399

Strange, we end up with less cases. What is happening? Let us compare the results. We look at the first result where both methods diverge.

In [28]:

diff = [x for x in zip(qresults, cresults) if x[0] != x[1]]
print(f'{len(diff)} differences')
print(diff[0])

23119 differences
(428692, 428697)

Let's look at the phrases of the first difference:

In [29]:

for p in L.d(diff[0][0], otype='phrase'):
    print(f'Phrase {p} has words {L.d(p, otype="word")}')

Phrase 655060 has words [6514]
Phrase 655061 has words [6515, 6516, 6517, 6518, 6519, 6520, 6522, 6523, 6524, 6525, 6526, 6527, 6528, 6529, 6530]
Phrase 655062 has words [6521]

This clause has three phrases, but the third one lies inside the second one, so that the clause indeed satisfies the pattern of two adjacent phrases.

Can we adjust the pattern to exclude cases like this? At the moment, our search template mechanism is not powerful enough for that.

We can count how often it happens, however. We require a third phrase to be present, not equal to one of the first two ones.

In [30]:

query = '''
clause
    =: p1:phrase
    <: p2:phrase
    :=
    p3:phrase
    p1 # p3
    p2 # p3
'''

In [31]:

S.study(query)
S.showPlan()
rresults = sorted(r[0] for r in S.fetch())
info(f'Done: found {len(rresults)}')

  0.00s Checking search template ...
  0.02s Setting up search space for 4 objects ...
  0.24s Constraining search space with 8 relations ...
  0.27s Setting up retrieval plan ...
  0.32s Ready to deliver results from 847662 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
  0.32s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1      =: p1:phrase
 3 R2      <: p2:phrase
 4         :=
 5 R3      p3:phrase
 6         p1 # p3
 7         p2 # p3
 8     
  1.32s Done: found 118

But we have to filter this, because per p1, p2 there might be multiple p3 that satisfy the constraints. So lets gather the set of p1, p2 pairs.

In [32]:

len(set(rresults))

Out[32]:

And this is exactly the difference between the number of results of the search template and the hand-written piece of code.

Testing basic relations¶

Basic relations are about the identity spatial ordering of objects. Are they the same, do they occupy the same slots, do they overlap, is one embedded in the other, does one come before the other?

We also have edge features, that specify relationships between nodes.

Although the basic relationships are easy to define, and even easy to implement, they may be very costly to use. When searching, most of them have to be computed very many times.

Some of them have been precomputed and stored in an index, e.g. the embedding relationships. They can be used without penalty.

Other relations are not suitable for pre-computing: most inequality relations are of that kind. It would require an enormous amount of storage to pre-compute for each node the set of nodes that occupy different slots. This type of relation will not be used in narrowing down the search space, which means that it may take more time to get the results.

We are going to test all of our basic relationships here.

Let us first see what relations we have:

In [33]:

print(S.relationLegend)

                      = left equal to right (as node)
                      # left unequal to right (as node)
                      < left before right (in canonical node ordering)
                      > left after right (in canonical node ordering)
                     == left occupies same slots as right
                     && left has overlapping slots with right
                     ## left and right do not have the same slot set
                     || left and right do not have common slots
                     [[ left embeds right
                     ]] left embedded in right
                     << left completely before right
                     >> left completely after right
                     =: left and right start at the same slot
                     := left and right end at the same slot
                     :: left and right start and end at the same slot
                     <: left immediately before right
                     :> left immediately after right
                    =k: left and right start at k-nearly the same slot
                    :k= left and right end at k-nearly the same slot
                    :k: left and right start and end at k-near slots
                    <k: left k-nearly before right
                    :k> left k-nearly after right
-distributional_parent> edge feature "distributional_parent"
<distributional_parent- edge feature "distributional_parent" (opposite direction)
    -functional_parent> edge feature "functional_parent"
    <functional_parent- edge feature "functional_parent" (opposite direction)
               -mother> edge feature "mother"
               <mother- edge feature "mother" (opposite direction)
       -omap@2016-2017> edge feature "omap@2016-2017"
       <omap@2016-2017- edge feature "omap@2016-2017" (opposite direction)
The grid feature "oslots" cannot be used in searches.
Surely, one of the above relations on nodes and/or slots will suit you better!

= (equal as node)¶

The = means that both parts are the same node. Left and right are not two things with similar properties, no, they are one and the same thing.

Useful if the thing you search for it part of two wildly different patterns.

In [34]:

query = '''
v1:verse
  sentence
    clause rela=Objc
      phrase
        word sp=verb gn=f nu=pl
v2:verse
  sentence
    c1:clause
    c2:clause
    c3:clause
    c1 < c2
    c2 < c3
v1 = v2
'''
for r in S.search(query, limit=10): print(S.glean(r))

Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[אֲשֶׁ֣ר יִֽשָּׁאֲרוּ֮ סְבִיבֹותֵיכֶם֒ ] clause[כִּ֣י׀ אֲנִ֣י יְהוָ֗ה בָּנִ֨יתִי֙ ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[אֲשֶׁ֣ר יִֽשָּׁאֲרוּ֮ סְבִיבֹותֵיכֶם֒ ] clause[הַנֶּ֣הֱרָסֹ֔ות ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[כִּ֣י׀ אֲנִ֣י יְהוָ֗ה בָּנִ֨יתִי֙ ] clause[הַנֶּ֣הֱרָסֹ֔ות ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[אֲשֶׁ֣ר יִֽשָּׁאֲרוּ֮ סְבִיבֹותֵיכֶם֒ ] clause[נָטַ֖עְתִּי ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[כִּ֣י׀ אֲנִ֣י יְהוָ֗ה בָּנִ֨יתִי֙ ] clause[נָטַ֖עְתִּי ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[הַנֶּ֣הֱרָסֹ֔ות ] clause[נָטַ֖עְתִּי ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[אֲשֶׁ֣ר יִֽשָּׁאֲרוּ֮ סְבִיבֹותֵיכֶם֒ ] clause[הַנְּשַׁמָּ֑ה ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[כִּ֣י׀ אֲנִ֣י יְהוָ֗ה בָּנִ֨יתִי֙ ] clause[הַנְּשַׁמָּ֑ה ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[הַנֶּ֣הֱרָסֹ֔ות ] clause[הַנְּשַׁמָּ֑ה ]
Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[הַנֶּ֣הֱרָסֹ֔ות ] phrase[נֶּ֣הֱרָסֹ֔ות ] נֶּ֣הֱרָסֹ֔ות  Ezekiel 36:36 sentence[וְיָדְע֣וּ הַגֹּויִ֗ם אֲשֶׁ֣ר ...] clause[וְיָדְע֣וּ הַגֹּויִ֗ם ] clause[נָטַ֖עְתִּי ] clause[הַנְּשַׁמָּ֑ה ]

# (unequal as node)¶

n # m if n and m are not the same node.

If you write a template, and you know that one node should come before another one, consider using < or >, which will constrain the results better.

We have seen this in action in the search for gapped phrases.

< and > (canonical)¶

n < m if n comes before m in the canonical ordering of nodes.

We have seen them in action before.

== (same slots)¶

Two objects are extensionally equal if they occupy exactly the same slots.

Quite an expensive relation, as you will see: nearly 30 seconds for 3608 results.

In [35]:

query = '''
v:verse
    s:sentence
v == s
'''
for r in S.search(query, limit=10): print(S.glean(r))
S.count(progress=1000, limit=10000)

Isaiah 52:14 sentence[כַּאֲשֶׁ֨ר שָׁמְמ֤וּ עָלֶ֨יךָ֙ רַבִּ֔ים ...]
Proverbs 23:34 sentence[וְ֭הָיִיתָ כְּשֹׁכֵ֣ב בְּ...]
Proverbs 24:4 sentence[וּ֭בְדַעַת חֲדָרִ֣ים יִמָּלְא֑וּ ...]
Leviticus 12:1 sentence[וַיְדַבֵּ֥ר יְהוָ֖ה אֶל־מֹשֶׁ֥ה ...]
Leviticus 12:3 sentence[וּבַיֹּ֖ום הַ...]
Proverbs 24:8 sentence[מְחַשֵּׁ֥ב לְהָרֵ֑עַ לֹ֝֗ו בַּֽעַל־...]
Leviticus 12:6 sentence[וּבִמְלֹ֣את׀ יְמֵ֣י טָהֳרָ֗הּ ...]
Leviticus 13:1 sentence[וַיְדַבֵּ֣ר יְהוָ֔ה אֶל־מֹשֶׁ֥ה ...]
Isaiah 54:12 sentence[וְשַׂמְתִּ֤י כַּֽדְכֹד֙ שִׁמְשֹׁתַ֔יִךְ וּ...]
Leviticus 13:9 sentence[נֶ֣גַע צָרַ֔עַת כִּ֥י תִהְיֶ֖ה בְּ...]
  0.00s Counting results per 1000 up to 10000 ...
   |     6.46s 1000
   |       13s 2000
   |       19s 3000
    23s Done: 3601 results

&& (overlap)¶

Two objects overlap if and only if they share at least one slot. This is quite costly to use in some cases.

In [36]:

query = '''
verse
    phrase
      s1:subphrase
      s2:subphrase
      s1 # s2
      s1 && s2
'''
for r in S.search(query, limit=10): print(S.glean(r))

Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ] subphrase[לְאֹתֹת֙ ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ] subphrase[לְמֹ֣ועֲדִ֔ים ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְאֹתֹת֙ ] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְמֹ֣ועֲדִ֔ים ] subphrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְיָמִ֖ים וְשָׁנִֽים׃ ] subphrase[יָמִ֖ים ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[לְיָמִ֖ים וְשָׁנִֽים׃ ] subphrase[שָׁנִֽים׃ ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[יָמִ֖ים ] subphrase[לְיָמִ֖ים וְשָׁנִֽים׃ ]
Genesis 1:14 phrase[לְאֹתֹת֙ וּלְמֹ֣ועֲדִ֔ים ...] subphrase[שָׁנִֽים׃ ] subphrase[לְיָמִ֖ים וְשָׁנִֽים׃ ]
Genesis 1:16 phrase[אֶת־שְׁנֵ֥י הַמְּאֹרֹ֖ת הַ...] subphrase[הַמְּאֹרֹ֖ת הַגְּדֹלִ֑ים ] subphrase[הַגְּדֹלִ֑ים ]
Genesis 1:16 phrase[אֶת־שְׁנֵ֥י הַמְּאֹרֹ֖ת הַ...] subphrase[הַמְּאֹרֹ֖ת ] subphrase[הַמְּאֹרֹ֖ת הַגְּדֹלִ֑ים ]

## (not the same slots)¶

True when the two objects in question do not occupy exactly the same set of slots. This is a very loose relationship.

It is implemented, but not yet tested, and at the moment I have not a clear use case for it.

|| (disjoint slots)¶

True when the two objects in question do not share any slots. This is a rather loose relationship.

This cab be used for locating gaps: a textual object that lies inside a gap of another object.

[[ and ]] (embedding)¶

n [[ m if object n embeds m.

n ]] m if object n lies embedded in n.

These relations are used implicitly in templates when there is indentation:

s:sentence
  p:phrase
    w1:word gn=f
    w2:word gn=m

The template above implicitly states the following embeddings:

s ]] p
p ]] w1
p ]] w2

We have seen these relations in action.

<< and >> (before and after with slots)¶

These relations test whether one object comes before or after an other, in the sense that the slots occupied by the one object lie completely before or after the slots occupied by the other object.

In [37]:

query = '''
verse
  sentence
    c1:clause
    p:phrase
    c2:clause
    c1 << p
    c2 >> p
'''
for r in S.search(query, limit=10): print(S.glean(r))

Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] phrase[עֹ֤שֶׂה ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] phrase[פְּרִי֙ ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] phrase[לְמִינֹ֔ו ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[מַזְרִ֣יעַ זֶ֔רַע ] phrase[עֹ֤שֶׂה ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[מַזְרִ֣יעַ זֶ֔רַע ] phrase[פְּרִי֙ ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:11 sentence[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause[מַזְרִ֣יעַ זֶ֔רַע ] phrase[לְמִינֹ֔ו ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:12 sentence[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] clause[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] phrase[עֹ֥שֶׂה ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:12 sentence[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] clause[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] phrase[פְּרִ֛י ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:12 sentence[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] clause[מַזְרִ֤יעַ זֶ֨רַע֙ לְמִינֵ֔הוּ ] phrase[עֹ֥שֶׂה ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]
Genesis 1:12 sentence[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] clause[מַזְרִ֤יעַ זֶ֨רַע֙ לְמִינֵ֔הוּ ] phrase[פְּרִ֛י ] clause[אֲשֶׁ֥ר זַרְעֹו־בֹ֖ו ]

=: (start at same slots)¶

This relation holds when the left and right hand sides are nodes that have the same first slot. It serves to enforce the the children of a parent are textually the first things inside that parent. We have seen it in action before.

:= (end at same slots)¶

This relation holds when the left and right hand sides are nodes that have the same last slot It serves to enforce the the children of a parent are textually the last things inside that parent. We have seen it in action before.

:: (same start and end slots)¶

This relation holds when =: and := both hold between the left and right hand sides. It serves to look for parents with single children, or at least, where the parent is textually spanned by a single child.

In [38]:

query = '''
verse
    clause
        :: phrase
'''
for r in S.search(query, limit=10): print(S.glean(r))
S.count(progress=1000, limit=-1)

Genesis 1:5 clause[יֹ֥ום אֶחָֽד׃ פ ] phrase[יֹ֥ום אֶחָֽד׃ פ ]
Genesis 1:8 clause[יֹ֥ום שֵׁנִֽי׃ פ ] phrase[יֹ֥ום שֵׁנִֽי׃ פ ]
Genesis 1:13 clause[יֹ֥ום שְׁלִישִֽׁי׃ פ ] phrase[יֹ֥ום שְׁלִישִֽׁי׃ פ ]
Genesis 1:19 clause[יֹ֥ום רְבִיעִֽי׃ פ ] phrase[יֹ֥ום רְבִיעִֽי׃ פ ]
Genesis 1:22 clause[לֵאמֹ֑ר ] phrase[לֵאמֹ֑ר ]
Genesis 1:22 clause[פְּר֣וּ ] phrase[פְּר֣וּ ]
Genesis 1:23 clause[יֹ֥ום חֲמִישִֽׁי׃ פ ] phrase[יֹ֥ום חֲמִישִֽׁי׃ פ ]
Genesis 1:28 clause[פְּר֥וּ ] phrase[פְּר֥וּ ]
Genesis 1:31 clause[יֹ֥ום הַשִּׁשִּֽׁי׃ פ ] phrase[יֹ֥ום הַשִּׁשִּֽׁי׃ פ ]
Genesis 2:3 clause[לַעֲשֹֽׂות׃ פ ] phrase[לַעֲשֹֽׂות׃ פ ]
  0.00s Counting results per 1000 up to  the end of the results ...
   |     0.10s 1000
   |     0.19s 2000
   |     0.27s 3000
   |     0.34s 4000
   |     0.38s 5000
   |     0.44s 6000
   |     0.48s 7000
   |     0.54s 8000
   |     0.60s 9000
  0.65s Done: 9451 results

Like before, there might be extra phrases in such clauses, lying embedded in the clause-spanning phrase.

In [39]:

query = '''
verse
    clause
        :: p1:phrase
        p2:phrase
        p1 # p2
'''
for r in S.search(query, limit=10): print(S.glean(r))
S.count(progress=1000, limit=-1)

Genesis 10:21 clause[גַּם־ה֑וּא אֲבִי֙ כָּל־בְּנֵי־...] phrase[גַּם־ה֑וּא אֲחִ֖י יֶ֥פֶת הַ...] phrase[אֲבִי֙ כָּל־בְּנֵי־עֵ֔בֶר ]
Genesis 24:24 clause[בַּת־בְּתוּאֵ֖ל אָנֹ֑כִי בֶּן־מִלְכָּ֕ה ] phrase[בַּת־בְּתוּאֵ֖ל בֶּן־מִלְכָּ֕ה ] phrase[אָנֹ֑כִי ]
Genesis 31:16 clause[לָ֥נוּ ה֖וּא וּלְבָנֵ֑ינוּ ] phrase[לָ֥נוּ וּלְבָנֵ֑ינוּ ] phrase[ה֖וּא ]
Genesis 31:53 clause[אֱלֹהֵ֨י אַבְרָהָ֜ם וֵֽאלֹהֵ֤י נָחֹור֙ ...] phrase[אֱלֹהֵ֨י אַבְרָהָ֜ם וֵֽאלֹהֵ֤י נָחֹור֙ ...] phrase[יִשְׁפְּט֣וּ ]
Genesis 31:53 clause[אֱלֹהֵ֨י אַבְרָהָ֜ם וֵֽאלֹהֵ֤י נָחֹור֙ ...] phrase[אֱלֹהֵ֨י אַבְרָהָ֜ם וֵֽאלֹהֵ֤י נָחֹור֙ ...] phrase[בֵינֵ֔ינוּ ]
Exodus 28:1 clause[לְכַהֲנֹו־לִ֑י אַהֲרֹ֕ן נָדָ֧ב ...] phrase[לְכַהֲנֹו־אַהֲרֹ֕ן נָדָ֧ב וַ...] phrase[לִ֑י ]
Exodus 28:14 clause[מִגְבָּלֹ֛ת תַּעֲשֶׂ֥ה אֹתָ֖ם מַעֲשֵׂ֣ה עֲבֹ֑ת ] phrase[מִגְבָּלֹ֛ת מַעֲשֵׂ֣ה עֲבֹ֑ת ] phrase[תַּעֲשֶׂ֥ה ]
Exodus 28:14 clause[מִגְבָּלֹ֛ת תַּעֲשֶׂ֥ה אֹתָ֖ם מַעֲשֵׂ֣ה עֲבֹ֑ת ] phrase[מִגְבָּלֹ֛ת מַעֲשֵׂ֣ה עֲבֹ֑ת ] phrase[אֹתָ֖ם ]
Exodus 29:18 clause[עֹלָ֥ה ה֖וּא לַֽיהוָ֑ה רֵ֣יחַ ...] phrase[עֹלָ֥ה רֵ֣יחַ נִיחֹ֔וחַ ] phrase[ה֖וּא ]
Exodus 29:18 clause[עֹלָ֥ה ה֖וּא לַֽיהוָ֑ה רֵ֣יחַ ...] phrase[עֹלָ֥ה רֵ֣יחַ נִיחֹ֔וחַ ] phrase[לַֽיהוָ֑ה ]
  0.00s Counting results per 1000 up to  the end of the results ...
  0.57s Done: 80 results

<: (adjacent before)¶

This relation holds when the left hand sides ends in a slot that lies before the first slot of the right hand side. It serves to enforce an ordering between siblings of a parent.

:> (adjacent after)¶

This relation holds when the left hand sides starts in a slot that lies after the last slot of the right hand side.

As an example: are there clauses with multiple clause atoms without a gap between the two?

In [40]:

query = '''
verse
    clause
        clause_atom
        <: clause_atom
'''
for r in S.search(query, limit=10): print(S.glean(r))
S.count(progress=1000, limit=-1)

  0.00s Counting results per 1000 up to  the end of the results ...
  0.78s Done: 0 results

Conclusion: there is always textual material between clause_atoms of the same clause. If we lift the adjacency to sequentially before (<<) we do get results:

In [41]:

query = '''
verse
    clause
        clause_atom
        << clause_atom
'''
for r in S.search(query, limit=10): print(S.glean(r))
S.count(progress=1000, limit=-1)

Genesis 1:7 clause[וַיַּבְדֵּ֗ל בֵּ֤ין הַמַּ֨יִם֙ ...] clause_atom[וַיַּבְדֵּ֗ל בֵּ֤ין הַמַּ֨יִם֙ ] clause_atom[וּבֵ֣ין הַמַּ֔יִם ]
Genesis 1:11 clause[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ...] clause_atom[תַּֽדְשֵׁ֤א הָאָ֨רֶץ֙ דֶּ֔שֶׁא עֵ֚שֶׂב ] clause_atom[עֵ֣ץ פְּרִ֞י ]
Genesis 1:11 clause[עֹ֤שֶׂה פְּרִי֙ לְמִינֹ֔ו עַל־...] clause_atom[עֹ֤שֶׂה פְּרִי֙ לְמִינֹ֔ו ] clause_atom[עַל־הָאָ֑רֶץ ]
Genesis 1:12 clause[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] clause_atom[וַתֹּוצֵ֨א הָאָ֜רֶץ דֶּ֠שֶׁא ...] clause_atom[וְעֵ֧ץ ]
Genesis 1:12 clause[עֹ֥שֶׂה פְּרִ֛י לְמִינֵ֑הוּ ] clause_atom[עֹ֥שֶׂה פְּרִ֛י ] clause_atom[לְמִינֵ֑הוּ ]
Genesis 1:21 clause[וַיִּבְרָ֣א אֱלֹהִ֔ים אֶת־הַ...] clause_atom[וַיִּבְרָ֣א אֱלֹהִ֔ים אֶת־הַ...] clause_atom[לְמִֽינֵהֶ֗ם ]
Genesis 1:29 clause[הִנֵּה֩ נָתַ֨תִּי לָכֶ֜ם אֶת־כָּל־...] clause_atom[הִנֵּה֩ נָתַ֨תִּי לָכֶ֜ם אֶת־כָּל־...] clause_atom[וְאֶת־כָּל־הָעֵ֛ץ ]
Genesis 1:30 clause[וּֽלְכָל־חַיַּ֣ת הָ֠...] clause_atom[וּֽלְכָל־חַיַּ֣ת הָ֠...] clause_atom[אֶת־כָּל־יֶ֥רֶק עֵ֖שֶׂב לְ...]
Genesis 2:17 clause[כִּ֗י בְּיֹ֛ום מֹ֥ות תָּמֽוּת׃ ] clause_atom[כִּ֗י בְּיֹ֛ום ] clause_atom[מֹ֥ות תָּמֽוּת׃ ]
Genesis 2:22 clause[וַיִּבֶן֩ יְהוָ֨ה אֱלֹהִ֧ים׀ אֶֽת־...] clause_atom[וַיִּבֶן֩ יְהוָ֨ה אֱלֹהִ֧ים׀ אֶֽת־...] clause_atom[לְאִשָּׁ֑ה ]
  0.00s Counting results per 1000 up to  the end of the results ...
   |     0.28s 1000
   |     0.59s 2000
  0.73s Done: 2589 results

Nearness for := =: :: :> <:¶

The relations with : in their name always have a requirement somewhere that a slot of the left hand node equals a slot of the right hand node, or that the two are adjacent.

All these relationships can be relaxed by a nearness number. If you put a number k inside the relationship symbols, those restrictions will be relaxed to the one slot and the other slot should have a mutual distance of at most k.

Here is an example.

First we look for clauses, with a phrase in it that starts at the same slot as the clause.

In [42]:

S.study('''
chapter book=Genesis chapter=1
    clause
      =: phrase
''', silent=True)
S.count(progress=100, limit=-1)

  0.00s Counting results per 100 up to  the end of the results ...
   |     0.00s 100
  0.00s Done: 126 results

Now we add a bit of freedom, but not much: 0. Indeed, this is no extra freedom, and it should give the same number of results.

In [43]:

S.study('''
chapter book=Genesis chapter=1
    clause
      =0: phrase
''', silent=True)
S.count(progress=100, limit=-1)

  0.00s Counting results per 100 up to  the end of the results ...
   |     0.00s 100
  0.00s Done: 126 results

Now we add real freedom: 1 and 2

In [44]:

S.study('''
chapter book=Genesis chapter=1
    clause
      =1: phrase
''', silent=True)
S.count(progress=100, limit=-1)

  0.00s Counting results per 100 up to  the end of the results ...
   |     0.00s 100
   |     0.00s 200
  0.01s Done: 236 results

In [45]:

S.study('''
chapter book=Genesis chapter=1
    clause
      =2: phrase
''', silent=True)
S.count(progress=100, limit=-1)

  0.00s Counting results per 100 up to  the end of the results ...
   |     0.00s 100
   |     0.01s 200
   |     0.02s 300
  0.02s Done: 315 results

Let us see some cases:

In [46]:

for r in S.fetch(limit=10): print(S.glean(r))

 clause[בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת ...] phrase[בָּרָ֣א ]
 clause[בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת ...] phrase[בְּרֵאשִׁ֖ית ]
 clause[וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ ...] phrase[וְ]
 clause[וְהָאָ֗רֶץ הָיְתָ֥ה תֹ֨הוּ֙ ...] phrase[הָאָ֗רֶץ ]
 clause[וְחֹ֖שֶׁךְ עַל־פְּנֵ֣י תְהֹ֑ום ] phrase[חֹ֖שֶׁךְ ]
 clause[וְחֹ֖שֶׁךְ עַל־פְּנֵ֣י תְהֹ֑ום ] phrase[עַל־פְּנֵ֣י תְהֹ֑ום ]
 clause[וְחֹ֖שֶׁךְ עַל־פְּנֵ֣י תְהֹ֑ום ] phrase[וְ]
 clause[וְר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל־...] phrase[וְ]
 clause[וְר֣וּחַ אֱלֹהִ֔ים מְרַחֶ֖פֶת עַל־...] phrase[ר֣וּחַ אֱלֹהִ֔ים ]
 clause[וַיֹּ֥אמֶר אֱלֹהִ֖ים ] phrase[אֱלֹהִ֖ים ]

The first and second result show the same clause, with its first and second phrase respectively.

Note that we look for phrases that lie embedded in their clause. So we do not get phrases of a preceding clause.

But if we want, we can get those as well.

In [47]:

S.study('''
chapter book=Genesis chapter=1
    c:clause
    p:phrase
    
    c =2: p
''', silent=True)
S.count(progress=100, limit=-1)

  0.00s Counting results per 100 up to  the end of the results ...
   |     0.00s 100
   |     0.00s 200
   |     0.00s 300
   |     0.01s 400
  0.01s Done: 485 results

We have more results now. Here is a closer look:

In [48]:

for r in S.search('''
verse book=Genesis chapter=1 verse=3
    c:clause
    p:phrase
    
    c =2: p
''', limit=100): print(S.glean(r))

S.count(progress=100, limit=-1)    

Genesis 1:3 clause[וַיֹּ֥אמֶר אֱלֹהִ֖ים ] phrase[אֱלֹהִ֖ים ]
Genesis 1:3 clause[וַיֹּ֥אמֶר אֱלֹהִ֖ים ] phrase[וַ]
Genesis 1:3 clause[וַיֹּ֥אמֶר אֱלֹהִ֖ים ] phrase[יֹּ֥אמֶר ]
Genesis 1:3 clause[יְהִ֣י אֹ֑ור ] phrase[אֱלֹהִ֖ים ]
Genesis 1:3 clause[יְהִ֣י אֹ֑ור ] phrase[יְהִ֣י ]
Genesis 1:3 clause[יְהִ֣י אֹ֑ור ] phrase[אֹ֑ור ]
Genesis 1:3 clause[יְהִ֣י אֹ֑ור ] phrase[וַֽ]
Genesis 1:3 clause[יְהִ֣י אֹ֑ור ] phrase[יֹּ֥אמֶר ]
Genesis 1:3 clause[וַֽיְהִי־אֹֽור׃ ] phrase[יְהִ֣י ]
Genesis 1:3 clause[וַֽיְהִי־אֹֽור׃ ] phrase[אֹ֑ור ]
Genesis 1:3 clause[וַֽיְהִי־אֹֽור׃ ] phrase[וַֽ]
Genesis 1:3 clause[וַֽיְהִי־אֹֽור׃ ] phrase[יְהִי־]
Genesis 1:3 clause[וַֽיְהִי־אֹֽור׃ ] phrase[אֹֽור׃ ]
  0.00s Counting results per 100 up to  the end of the results ...
  0.00s Done: 13 results

Here you see in result 4 a phrase of the previous clause in the result.

Gaps¶

A question raised by Cody Kingham: gaps!

Search has no direct primitives to deal with gaps. For example, the MQL query

SELECT ALL OBJECTS WHERE

[phrase FOCUS
    [word lex='L']
    [gap]
]

looks for a phrase with a gap in it (i.e. one or more consecutive words between the start and the end of the phrase that do not belong to the phrase). The query then asks additionally for those gap-containing phrases that have a certain word in front of the gap.

Yet we can mimick this query in Search.

Find the gap¶

In [49]:

query = '''
p:phrase
  =: wFirst:word
  wLast:word
  :=

wGap:word
wFirst < wGap
wGap < wLast
wGap || p
'''

In [50]:

S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.38s Constraining search space with 7 relations ...
  0.41s Setting up retrieval plan ...
  0.47s Ready to deliver results from 1532939 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [51]:

S.showPlan(details=True)

Search with 4 objects and 7 relations
Results are instantiations of the following objects:
node  0-phrase                            (253187   choices)
node  1-word                              (426584   choices)
node  2-word                              (426584   choices)
node  3-word                              (426584   choices)
Instantiations are computed along the following relations:
node                      0-phrase        (253187   choices)
edge  0-phrase        :=  2-word          (     1.0 choices)
edge  2-word          ]]  0-phrase        (     1.0 choices)
edge  0-phrase        =:  1-word          (     1.0 choices)
edge  1-word          ]]  0-phrase        (     1.0 choices)
edge  2-word          >   3-word          (213292.0 choices)
edge  1-word          <   3-word          (213292.0 choices)
edge  3-word          ||  0-phrase        (227868.3 choices)
  2.65s The results are connected to the original search template as follows:
 0     
 1 R0  p:phrase
 2 R1    =: wFirst:word
 3 R2    wLast:word
 4       :=
 5     
 6 R3  wGap:word
 7     wFirst < wGap
 8     wGap < wLast
 9     wGap || p
10

In [52]:

S.count(progress=2, limit=20)

  0.00s Counting results per 2 up to 20 ...
   |     8.26s 2
   |     8.26s 4
   |     8.26s 6
   |       15s 8
   |       17s 10
   |       17s 12
   |       46s 14
   |       46s 16
   |       46s 18
   |       46s 20
    46s Done: 20 results

It is not a fast query, to say the least. Let's add an additional constraint, and see whether it goes faster.

In [53]:

query = '''
verse
    p:phrase
      =: wFirst:word
      wBefore:word lex=L
      wLast:word
      :=

wGap:word
wFirst < wGap
wGap < wLast
p || wGap
wBefore <: wGap
'''

In [54]:

S.study(query)

  0.00s Checking search template ...
  0.00s Setting up search space for 6 objects ...
  0.97s Constraining search space with 10 relations ...
  1.01s Setting up retrieval plan ...
  1.08s Ready to deliver results from 1576599 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [55]:

S.showPlan(details=True)

Search with 6 objects and 10 relations
Results are instantiations of the following objects:
node  0-verse                             ( 23213   choices)
node  1-phrase                            (253187   choices)
node  2-word                              (426584   choices)
node  3-word                              ( 20447   choices)
node  4-word                              (426584   choices)
node  5-word                              (426584   choices)
Instantiations are computed along the following relations:
node                      3-word          ( 20447   choices)
edge  3-word          <:  5-word          (     1.0 choices)
edge  3-word          ]]  1-phrase        (     1.0 choices)
edge  5-word          ||  1-phrase        (227868.3 choices)
edge  1-phrase        ]]  0-verse         (     1.0 choices)
edge  1-phrase        :=  4-word          (     1.0 choices)
edge  4-word          ]]  1-phrase        (     1.0 choices)
edge  5-word          <   4-word          (213292.0 choices)
edge  1-phrase        =:  2-word          (     1.0 choices)
edge  2-word          ]]  1-phrase        (     1.0 choices)
edge  2-word          <   5-word          (213292.0 choices)
  3.92s The results are connected to the original search template as follows:
 0     
 1 R0  verse
 2 R1      p:phrase
 3 R2        =: wFirst:word
 4 R3        wBefore:word lex=L
 5 R4        wLast:word
 6           :=
 7     
 8 R5  wGap:word
 9     wFirst < wGap
10     wGap < wLast
11     p || wGap
12     wBefore <: wGap
13

In [56]:

S.count(progress=10)

  0.00s Counting results per 10 up to 1000 ...
   |     0.17s 10
  0.23s Done: 13 results

That is much quicker. Let's see the results.

In [57]:

for r in S.fetch(): print(S.glean(r))

Leviticus 25:6 phrase[לָכֶם֙ לְךָ֖ וּלְעַבְדְּךָ֣ ...] לָכֶם֙  לָכֶם֙  תֹושָׁ֣בְךָ֔  לְ
Genesis 17:7 phrase[לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ ] לְךָ֙  לְךָ֙  אַחֲרֶֽיךָ׃  לֵֽ
Deuteronomy 26:11 phrase[לְךָ֛ וּלְבֵיתֶ֑ךָ ] לְךָ֛  לְךָ֛  בֵיתֶ֑ךָ  יְהוָ֥ה 
Exodus 30:21 phrase[לָהֶ֧ם לֹ֥ו וּלְזַרְעֹ֖ו ] לָהֶ֧ם  לָהֶ֧ם  זַרְעֹ֖ו  חָק־
Genesis 28:4 phrase[לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ ...] לְךָ֙  לְךָ֙  אִתָּ֑ךְ  אֶת־
2_Kings 25:24 phrase[לָהֶ֤ם וּלְאַנְשֵׁיהֶ֔ם ] לָהֶ֤ם  לָהֶ֤ם  אַנְשֵׁיהֶ֔ם  גְּדַלְיָ֨הוּ֙ 
Daniel 9:8 phrase[לָ֚נוּ לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ ...] לָ֚נוּ  לָ֚נוּ  אֲבֹתֵ֑ינוּ  בֹּ֣שֶׁת 
Genesis 31:16 phrase[לָ֥נוּ וּלְבָנֵ֑ינוּ ] לָ֥נוּ  לָ֥נוּ  בָנֵ֑ינוּ  ה֖וּא 
Numbers 20:15 phrase[לָ֛נוּ וְלַאֲבֹתֵֽינוּ׃ ] לָ֛נוּ  לָ֛נוּ  אֲבֹתֵֽינוּ׃  מִצְרַ֖יִם 
Numbers 32:33 phrase[לָהֶ֣ם׀ לִבְנֵי־גָד֩ וְ...] לָהֶ֣ם׀  לָהֶ֣ם׀  יֹוסֵ֗ף  מֹשֶׁ֡ה 
1_Samuel 25:31 phrase[לְךָ֡ לַאדֹנִ֗י ] לְךָ֡  לְךָ֡  אדֹנִ֗י  לְ
Jeremiah 40:9 phrase[לָהֶ֜ם וּלְאַנְשֵׁיהֶ֣ם ] לָהֶ֜ם  לָהֶ֜ם  אַנְשֵׁיהֶ֣ם  גְּדַלְיָ֨הוּ 
Deuteronomy 1:36 phrase[לֹֽו־וּלְבָנָ֑יו ] לֹֽו־ לֹֽו־ בָנָ֑יו  אֶתֵּ֧ן

Go to SHEBANQ to inspect Genesis 17:7 and click the verse number to view the verse in data view. The last phrase looks like this

The number 7431 etc are slot (word) numbers, the numbers 2 and 3 are phrase numbers, relative to the surrounding clause, and the numbers 4354 are phrase atom numbers, relative to the surrounding book.

The red bars higlight the spots where phrases get interrupted by other material. Here we see that phrase 2 get interrupted after word 7432 by phrase 2.

Note that in SHEBANQ you are looking at versions 4 and 4b, while we ran this search against version 2017. But here the versions agree.

In [ ]: