Notebook

Advanced search options (Nestle1904LFT)¶

NOTE: This notebook requires significant cleaning and rework.

Table of content ¶

1 - Introduction
2 - Load Text-Fabric app and data
3 - Performing the queries

1 - Introduction ¶

2 - Load Text-Fabric app and data ¶

Back to TOC ¶

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [7]:

# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904LFT", version="0.6", hoist=globals())

Locating corpus resources ...

The requested app is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app not found

Status: latest release online v0.6 versus None locally

downloading app, main data and requested additions ...

app: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app

The requested data is not available offline
	~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 not found

Status: latest release online v0.6 versus None locally

downloading app, main data and requested additions ...

data: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6

   |     0.19s T otype                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     2.39s T oslots               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.58s T normalized           from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.49s T after                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.58s T wordtranslit         from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.61s T wordunacc            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.46s T verse                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.60s T word                 from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.61s T unicode              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.47s T chapter              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.55s T book                 from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |      |     0.06s C __levels__           from otype, oslots, otext
   |      |     1.90s C __order__            from otype, oslots, __levels__
   |      |     0.07s C __rank__             from otype, __order__
   |      |     3.27s C __levUp__            from otype, oslots, __rank__
   |      |     1.88s C __levDown__          from otype, __levUp__, __rank__
   |      |     0.21s C __characters__       from otext
   |      |     0.96s C __boundary__         from otype, oslots, __rank__
   |      |     0.04s C __sections__         from otype, oslots, otext, __levUp__, __levels__, book, chapter, verse
   |      |     0.22s C __structure__        from otype, oslots, otext, __rank__, __levUp__, book, chapter, verse
   |     0.44s T booknumber           from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.51s T bookshort            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.47s T case                 from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.32s T clausetype           from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.54s T containedclause      from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.41s T degree               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.57s T gloss                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.47s T gn                   from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.03s T headverse            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.32s T junction             from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.57s T lemma                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.55s T lex_dom              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.55s T ln                   from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.42s T markafter            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.43s T markbefore           from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.41s T markorder            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.44s T monad                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.43s T mood                 from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.52s T morph                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.52s T nodeID               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.50s T nu                   from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.51s T number               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.44s T person               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.43s T punctuation          from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.65s T ref                  from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.64s T reference            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.48s T roleclausedistance   from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.46s T sentence             from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.50s T sp                   from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.50s T sp_full              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.53s T strongs              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.44s T subj_ref             from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.43s T tense                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.45s T type                 from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.44s T voice                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.40s T wgclass              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.33s T wglevel              from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.35s T wgnum                from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.35s T wgrole               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.35s T wgrolelong           from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.39s T wgrule               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.33s T wgtype               from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.49s T wordlevel            from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.52s T wordrole             from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
   |     0.51s T wordrolelong         from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6

TF: TF API 12.1.5, tonyjurg/Nestle1904LFT/app v3, Search Reference
Data: tonyjurg - Nestle1904LFT 0.6, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	27	5102.93	100
chapter	260	529.92	100
verse	7943	17.35	100
sentence	8011	17.20	100
wg	105430	6.85	524
word	137779	1.00	100

Sets: no custom sets
Features:

Nestle 1904 (Low Fat Tree)

after

str

✅ Characters (eg. punctuations) following the word

book

str

✅ Book name (in English language)

booknumber

int

✅ NT book number (Matthew=1, Mark=2, ..., Revelation=27)

bookshort

str

✅ Book name (abbreviated)

case

str

✅ Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)

chapter

int

✅ Chapter number inside book

clausetype

str

✅ Clause type details (e.g. Verbless, Minor)

containedclause

str

🆗 Contained clause (WG number)

degree

str

✅ Degree (e.g. Comparitative, Superlative)

gloss

str

✅ English gloss

str

✅ Gramatical gender (Masculine, Feminine, Neuter)

headverse

str

✅ Start verse number of a sentence

junction

str

✅ Junction data related to a wordgroup

lemma

str

✅ Lexeme (lemma)

lex_dom

str

✅ Lexical domain according to Semantic Dictionary of Biblical Greek, SDBG (not present everywhere?)

str

✅ Lauw-Nida lexical classification (not present everywhere?)

markafter

str

🆗 Text critical marker after word

markbefore

str

🆗 Text critical marker before word

markorder

str

Order of punctuation and text critical marker

monad

int

✅ Monad (smallest token matching word order in the corpus)

mood

str

✅ Gramatical mood of the verb (passive, etc)

morph

str

✅ Morphological tag (Sandborg-Petersen morphology)

nodeID

str

✅ Node ID (as in the XML source data)

normalized

str

✅ Surface word with accents normalized and trailing punctuations removed

str

✅ Gramatical number (Singular, Plural)

number

str

✅ Gramatical number of the verb (e.g. singular, plural)

otype

str

person

str

✅ Gramatical person of the verb (first, second, third)

punctuation

str

✅ Punctuation after word

ref

str

✅ Value of the ref ID (taken from XML sourcedata)

reference

str

✅ Reference (to nodeID in XML source data, not yet post-processes)

roleclausedistance

str

⚠️ Distance to the wordgroup defining the syntactical role of this word

sentence

int

✅ Sentence number (counted per chapter)

str

✅ Part of Speech (abbreviated)

sp_full

str

✅ Part of Speech (long description)

strongs

str

✅ Strongs number

subj_ref

str

🆗 Subject reference (to nodeID in XML source data, not yet post-processes)

tense

str

✅ Gramatical tense of the verb (e.g. Present, Aorist)

type

str

✅ Gramatical type of noun or pronoun (e.g. Common, Personal)

unicode

str

✅ Word as it apears in the text in Unicode (incl. punctuations)

verse

int

✅ Verse number inside chapter

voice

str

✅ Gramatical voice of the verb (e.g. active,passive)

wgclass

str

✅ Class of the wordgroup (e.g. cl, np, vp)

wglevel

int

🆗 Number of the parent wordgroups for a wordgroup

wgnum

int

✅ Wordgroup number (counted per book)

wgrole

str

✅ Syntactical role of the wordgroup (abbreviated)

wgrolelong

str

✅ Syntactical role of the wordgroup (full)

wgrule

str

✅ Wordgroup rule information (e.g. Np-Appos, ClCl2, PrepNp)

wgtype

str

✅ Wordgroup type details (e.g. group, apposition)

word

str

✅ Word as it appears in the text (excl. punctuations)

wordlevel

str

🆗 Number of the parent wordgroups for a word

wordrole

str

✅ Syntactical role of the word (abbreviated)

wordrolelong

str

✅ Syntactical role of the word (full)

wordtranslit

str

🆗 Transliteration of the text (in latin letters, excl. punctuations)

wordunacc

str

✅ Word without accents (excl. punctuations)

oslots

none

Settings:

specified

apiVersion: 3
appName: tonyjurg/Nestle1904LFT
appPath:
C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/app
commit: no value
css: ''
dataDisplay:
- excludedFeatures:
  - orig_order
  - verse
  - book
  - chapter
- noneValues:
  - none
  - unknown
  - no value
  - NA
  - ''
- showVerseInTuple: 0
- textFormat: text-orig-full
docs:
- docBase: https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/
- docPage: about
- docRoot: https://github.com/tonyjurg/Nestle1904LFT
- featureBase:
  https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/features/<feature>.md
interfaceDefaults: {fmt: layout-orig-full}
isCompatible: True
local: no value
localDir:
C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/_temp
provenanceSpec:
- corpus: Nestle 1904 (Low Fat Tree)
- doi: notyet
- org: tonyjurg
- relative: /tf
- repo: Nestle1904LFT
- repro: Nestle1904LFT
- version: 0.6
- webBase: https://learner.bible/text/show_text/nestle1904/
- webHint: Show this on the Bible Online Learner website
- webLang: en
- webUrl:
  https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: no value
typeDisplay:
- book:
  - condense: True
  - hidden: True
  - label: {book}
  - style: ''
- chapter:
  - condense: True
  - hidden: True
  - label: {chapter}
  - style: ''
- sentence:
  - hidden: 0
  - label: #{sentence} (start: {book} {chapter}:{headverse})
  - style: ''
- verse:
  - condense: True
  - excludedFeatures: chapter verse
  - label: {book} {chapter}:{verse}
  - style: ''
- wg:
  - hidden: 0
  - label:
    #{wgnum}: {wgtype} {wgclass} {clausetype} {wgrole} {wgrule} {junction}
  - style: ''
- word:
  - base: True
  - features: lemma
  - featuresBare: gloss
  - surpress: chapter verse
writing: grc

TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

In [4]:

# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

In [5]:

# Set default view in a way to limit noise as much as possible.
N1904.displaySetup(condensed=True, multiFeatures=False,queryFeatures=False)

3 - Performing the queries ¶

3.1 - TBD ¶

3.2 - Inspecting your query ¶

Back to TOC ¶

Each query templace can be inspected by use of S.study(). This is particulary helpfull in case the query is complicated.

In [5]:

ComplicatedQuery = '''
wg
/where/
   wg phrasefunction=S
/have/
  /without/
    word sp#conj
  /-/
/-/
  wg phrasefunction=O
'''
S.study(ComplicatedQuery)

  0.00s Checking search template ...

 0 
 1 clause
 2 /where
 3   phrase phrasefunction=S
 4 /have
 5   /without
 6     word sp#conj
 7   /-
 8 /-
 9   phrase phrasefunction=O
10 
line 1: Unknown object type: "clause"
line 9: Unknown object type: "phrase"
Valid object types are: book, chapter, verse, sentence, wg, word
Missing feature "phrasefunction" in line(s) 9

In [5]:

S.showPlan()

    15s Cannot show plan if there is no previous "study()"

In [6]:

for result in S.fetch(limit=10):
    TF.info(S.glean(result))

    22s Cannot fetch if there is no previous "study()"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 for result in S.fetch(limit=10):
      2     TF.info(S.glean(result))

TypeError: 'NoneType' object is not iterable

3.3 - Comparing two lists with query results ¶

Using python standard functions, it apears to be easy to verify if the result of two queries are the same or different. However, it is good to have a closer look at the matter since there are a few pitfalls that could lead to false results.

In [7]:

# define query template 1
SomeQuery ='''
phrase phrasefunction=V
    word lemma=λέγω
'''
# now create two lists with identical query results and compare result lists
SomeResult1=N1904.search(SomeQuery)
SomeResult2=N1904.search(SomeQuery)
print(f'Same result? {SomeResult1 == SomeResult2}')

 0 
 1 phrase phrasefunction=V
 2     word lemma=λέγω
 3 
line 1: Unknown object type: "phrase"
Valid object types are: book, chapter, verse, sentence, wg, word
Or choose a custom set from: 
Missing feature "phrasefunction" in line(s) 1

  0.00s Cannot load feature "phrasefunction": not in dataset
  0.00s 0 results

 0 
 1 phrase phrasefunction=V
 2     word lemma=λέγω
 3 
line 1: Unknown object type: "phrase"
Valid object types are: book, chapter, verse, sentence, wg, word
Or choose a custom set from: 
Missing feature "phrasefunction" in line(s) 1

  0.00s Cannot load feature "phrasefunction": not in dataset
  0.00s 0 results
Same result? True

This is exactly what we would expect. But comparing lists can be tricky. Consider the following two queries.

In [25]:

# define query template 1
Query1 ='''
phrase
  a:word sp=prep
  b:word sp=adj
  c:word sp=noun 
'''

# define query template 2
Query2 ='''
phrase
  a:word sp=prep
  b:word sp=noun
  c:word sp=adj
'''

# create and compare result lists
ResultQuery1=N1904.search(Query1)
ResultQuery2=N1904.search(Query2)

print(f'Same result? {ResultQuery1 == ResultQuery2}')

  0.34s 3851 results
  0.35s 3851 results
Same result? False

This method of comparing the lists results identifies a difference between the two lists. However, upon closer examination, that may or may not be the case, depending on what is understood as difference. The 'problem' here is that both ResultQuery1 and ResultQuery2 are lists of ordered tuples. The swapping of the feature conditions 'sp=adj' and 'sp=noun' did not result in a different set of results; only the presentation of rhe result differed.

In [33]:

# create 2 result lists
ResultQuery3=N1904.search(Query1,sort=True)
ResultQuery4=N1904.search(Query1,sort=False)

# compare unsorted lists
print(f'Unsorted lists: Same result ? {ResultQuery3 == ResultQuery4}')

# sort both lists on the first tuple 
SortedResultQuery3 = sorted(ResultQuery3, key=lambda x: x[0])
SortedResultQuery4 = sorted(ResultQuery4, key=lambda x: x[0])

# compare sorted lists
print(f'Sorted lists: Same result ? {SortedResultQuery3 == SortedResultQuery4}')

  0.34s 3851 results
  0.33s 3851 results
Unsorted lists: Same result ? False
Sorted lists: Same result ? False

Unexpectedly the python list compare still viewed the two lists as different. But why? Python does report that the two lists are different because the comparison of lists (SortedResultQuery3 == SortedResultQuery4) checks for the equality of the list objects, not their contents.

The search() function in Text-Fabric returns a list of nodes or tuples representing search results. Even if the search criteria and the data are the same, the two lists, ResultQuery3 and ResultQuery4, are distinct list objects. Hence, when comparing them directly using the == operator, Python considers them as different objects, resulting in False.

To compare the content of the lists, it is advices to first onvert them to sets and compare those sets. See following example:

In [35]:

# Convert tuples to sets
set1 = set(tuple(item) for item in ResultQuery3)
set2 = set(tuple(item) for item in ResultQuery4)

# Compare the sets
if set1 == set2:
    print("Lists ResultQuery3 and ResultQuery4 are equal.")
else:
    print("Lists ResultQuery3 and ResultQuery4 are not equal.")

Lists ResultQuery3 and ResultQuery4 are equal.

Now, let's compare ResultQuery1 and ResultQuery2 again by first converting them to sets.

In [38]:

# Convert tuples to sets
set1 = set(tuple(item) for item in ResultQuery1)
set2 = set(tuple(item) for item in ResultQuery2)

# Compare the sets
if set1 == set2:
    print("Lists ResultQuery1 and ResultQuery2 are equal.")
else:
    print("Lists ResultQuery1 and ResultQuery2 are not equal.")

Lists ResultQuery1 and ResultQuery2 are not equal.

This is indeed the result we expected (see earlier mentioned reasons).

3.4 - Using search qualifiers ¶

Back to TOC ¶

A search template can also use the following:

In [ ]:

ErgetaiQuery = '''
word word=ἔρχεται
/with/
book=Mark chapter=6 verse=1 
/or/
book=Revelation chapter=1 verse=7 
/-/
'''

ErgetaiResult = N1904GBI.search(ErgetaiQuery) 
# returns list of ordered tuples

Maybe discuss: query = ''' node feature1=A|B node feature2=X|Y|Z '''

In [ ]:

Advanced search options (Nestle1904LFT)¶

Table of content ¶

1 - Introduction ¶

Back to TOC¶

2 - Load Text-Fabric app and data ¶

Back to TOC¶

3 - Performing the queries ¶

Back to TOC¶

3.1 - TBD¶

Back to TOC¶

3.2 - Inspecting your query¶

Back to TOC¶

3.3 - Comparing two lists with query results¶

Back to TOC¶

3.4 - Using search qualifiers¶

Back to TOC¶

Back to TOC ¶

Back to TOC ¶

Back to TOC ¶

3.1 - TBD ¶

Back to TOC ¶

3.2 - Inspecting your query ¶

Back to TOC ¶

3.3 - Comparing two lists with query results ¶

Back to TOC ¶

3.4 - Using search qualifiers ¶

Back to TOC ¶