You might want to consider the start of this tutorial.
Short introductions to other TF datasets:
or the
You can pass custom sets to the search function, as we have seen in advanced. Now we want to give a real-world example of that, and also show how you can prepare sets for use in the TF browser.
The following task comes from the department of education:
Find the chapters without more than 20 rare words, where a rare word has a frequency (as lexeme) of less than 70.
A question posed by Oliver Glanz.
%load_ext autoreload
%autoreload 2
import os
from tf.app import use
from tf.lib import writeSets, readSets
A = use("ETCBC/bhsa", hoist=globals())
This is Text-Fabric 9.2.3 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 122 features found and 0 ignored
FREQ = 70
AMOUNT = 20
A straightforward query is:
query = f"""
chapter
/without/
word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
< word freq_lex<{FREQ}
/-/
"""
Several problems with this query:
/without/
query that expresses what should be left out
denotes all possible combinations of 20 infrequent words, an astronomical number.So, better not search with this one.
# A.indent(reset=True)
# A.info('start query')
# results = S.search(query, limit=1)
# A.info('end query')
# len(results)
On the other hand, with a bit of hand coding it is very easy, and almost instantaneous:
results = []
allChapters = F.otype.s("chapter")
for chapter in allChapters:
if (
len([word for word in L.d(chapter, otype="word") if F.freq_lex.v(word) < FREQ])
< AMOUNT
):
results.append(chapter)
print(f"{len(results)} chapters out of {len(allChapters)}")
60 chapters out of 929
resultsByBook = dict()
for chapter in results:
(bk, ch) = T.sectionFromNode(chapter)
resultsByBook.setdefault(bk, []).append(ch)
for (bk, chps) in resultsByBook.items():
print("{} {}".format(bk, ", ".join(str(c) for c in chps)))
Exodus 11, 24 Leviticus 17 Deuteronomy 30 Joshua 23 Isaiah 12, 39 Jeremiah 45 Ezekiel 15 Hosea 3 Joel 3 Psalms 1, 3, 4, 13, 14, 15, 20, 23, 24, 26, 43, 47, 53, 54, 61, 67, 70, 82, 86, 87, 93, 97, 99, 100, 101, 110, 113, 114, 115, 117, 120, 121, 122, 123, 124, 125, 126, 127, 128, 130, 131, 133, 134, 136, 138, 150 Job 25 Esther 10 2_Chronicles 27
Once you have these chapters, you can put them in a set and use them in queries.
We show how to query results as far as they occur in an "ordinary" chapter.
First we search for a phenomenon in all chapters. The phenomenon is a clause with a subject consisting of a single noun in the plural and a verb in the plural.
sets = dict(ochapter=set(results))
query1 = """
verse
clause
phrase function=Pred
word pdp=verb nu=sg
phrase function=Subj
=: word pdp=subs nu=pl
:=
"""
results1 = A.search(query1)
1.58s 262 results
A.table(results1, start=1, end=5, skipCols="1")
n | p | clause | phrase | word | phrase | word |
---|---|---|---|---|---|---|
1 | Genesis 1:1 | בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃ | בָּרָ֣א | בָּרָ֣א | אֱלֹהִ֑ים | אֱלֹהִ֑ים |
2 | Genesis 1:3 | וַיֹּ֥אמֶר אֱלֹהִ֖ים | יֹּ֥אמֶר | יֹּ֥אמֶר | אֱלֹהִ֖ים | אֱלֹהִ֖ים |
3 | Genesis 1:4 | וַיַּ֧רְא אֱלֹהִ֛ים אֶת־הָאֹ֖ור | יַּ֧רְא | יַּ֧רְא | אֱלֹהִ֛ים | אֱלֹהִ֛ים |
4 | Genesis 1:4 | וַיַּבְדֵּ֣ל אֱלֹהִ֔ים בֵּ֥ין הָאֹ֖ור וּבֵ֥ין הַחֹֽשֶׁךְ׃ | יַּבְדֵּ֣ל | יַּבְדֵּ֣ל | אֱלֹהִ֔ים | אֱלֹהִ֔ים |
5 | Genesis 1:5 | וַיִּקְרָ֨א אֱלֹהִ֤ים׀ לָאֹור֙ יֹ֔ום | יִּקְרָ֨א | יִּקְרָ֨א | אֱלֹהִ֤ים׀ | אֱלֹהִ֤ים׀ |
Now we want to restrict results to ordinary chapters:
query2 = """
ochapter
verse
clause
phrase function=Pred
word pdp=verb nu=sg
phrase function=Subj
=: word pdp=subs nu=pl
:=
"""
Note that we use the name of a set here: ochapter
.
It is not a known node type in the BHSA, so we have to tell it what it means.
We do that by passing a dictionary of custom sets.
The keys are the names of the sets, which are the values.
Then we may use those keys in queries, everywhere where a node type is expected.
results2 = A.search(query2, sets=sets)
1.55s 6 results
A.table(results2)
n | p | chapter | verse | clause | phrase | word | phrase | word |
---|---|---|---|---|---|---|---|---|
1 | Psalms 47:6 | Psalms 47 | עָלָ֣ה אֱ֭לֹהִים בִּתְרוּעָ֑ה | עָלָ֣ה | עָלָ֣ה | אֱ֭לֹהִים | אֱ֭לֹהִים | |
2 | Psalms 47:9 | Psalms 47 | מָלַ֣ךְ אֱ֭לֹהִים עַל־גֹּויִ֑ם | מָלַ֣ךְ | מָלַ֣ךְ | אֱ֭לֹהִים | אֱ֭לֹהִים | |
3 | Psalms 47:9 | Psalms 47 | אֱ֝לֹהִ֗ים יָשַׁ֤ב׀ עַל־כִּסֵּ֬א קָדְשֹֽׁו׃ | יָשַׁ֤ב׀ | יָשַׁ֤ב׀ | אֱ֝לֹהִ֗ים | אֱ֝לֹהִ֗ים | |
4 | Psalms 53:3 | Psalms 53 | אֱֽלֹהִ֗ים מִשָּׁמַיִם֮ הִשְׁקִ֢יף עַֽל־בְּנֵ֫י אָדָ֥ם | הִשְׁקִ֢יף | הִשְׁקִ֢יף | אֱֽלֹהִ֗ים | אֱֽלֹהִ֗ים | |
5 | Psalms 53:6 | Psalms 53 | כִּֽי־אֱלֹהִ֗ים פִּ֭זַּר עַצְמֹ֣ות חֹנָ֑ךְ | פִּ֭זַּר | פִּ֭זַּר | אֱלֹהִ֗ים | אֱלֹהִ֗ים | |
6 | Psalms 70:5 | Psalms 70 | יִגְדַּ֣ל אֱלֹהִ֑ים | יִגְדַּ֣ל | יִגְדַּ֣ל | אֱלֹהִ֑ים | אֱלֹהִ֑ים |
We save the sets in a file. But before we do so, we also want to save all ordinary verses in a set, and all ordinary words.
queryV = f"""
verse
/without/
word freq_lex<{FREQ}
/-/
"""
resultsV = A.search(queryV, shallow=True)
sets["overse"] = resultsV
0.52s 2751 results
sets["oword"] = {w for w in F.otype.s("word") if F.freq_lex.v(w) >= FREQ}
SETS_FILE = os.path.expanduser("~/Downloads/ordinary.set")
writeSets(sets, SETS_FILE)
True
As a test, we read back the sets from disk and compare the number of elements with those in the original sets, which we still have in memory.
testSets = readSets(SETS_FILE)
for s in sorted(testSets):
elems = len(testSets[s])
oelems = len(sets[s])
print(f"{s} with {elems} nb {elems - oelems}")
ochapter with 60 nb 0 overse with 2751 nb 0 oword with 361411 nb 0
Now you can start your TF browser as follows:
text-fabric bhsa --sets=~/Downloads/ordinary.set
and then you can run the same queries over there!
Let's investigate the number of ordinary chapters with shifting definitions of ordinary
allChapters = F.otype.s("chapter")
longestChapter = max(len(L.d(chapter, otype="word")) for chapter in allChapters)
print(f"There are {len(allChapters)} chapters, the longest is {longestChapter} words")
There are 929 chapters, the longest is 1603 words
def getOrdinary(freq, amount):
results = []
for chapter in allChapters:
if (
len(
[
word
for word in L.d(chapter, otype="word")
if F.freq_lex.v(word) < freq
]
)
< amount
):
results.append(chapter)
return results
def overview(freq):
for amount in range(20, 1700, 50):
results = getOrdinary(freq, amount)
print(
f"for freq={freq:>3} and amount={amount:>4}: {len(results):>4} ordinary chapters"
)
if len(results) >= len(allChapters):
break
for freq in (40, 70, 100):
overview(freq)
for freq= 40 and amount= 20: 139 ordinary chapters for freq= 40 and amount= 70: 757 ordinary chapters for freq= 40 and amount= 120: 885 ordinary chapters for freq= 40 and amount= 170: 908 ordinary chapters for freq= 40 and amount= 220: 919 ordinary chapters for freq= 40 and amount= 270: 923 ordinary chapters for freq= 40 and amount= 320: 924 ordinary chapters for freq= 40 and amount= 370: 925 ordinary chapters for freq= 40 and amount= 420: 926 ordinary chapters for freq= 40 and amount= 470: 928 ordinary chapters for freq= 40 and amount= 520: 929 ordinary chapters for freq= 70 and amount= 20: 60 ordinary chapters for freq= 70 and amount= 70: 550 ordinary chapters for freq= 70 and amount= 120: 842 ordinary chapters for freq= 70 and amount= 170: 889 ordinary chapters for freq= 70 and amount= 220: 915 ordinary chapters for freq= 70 and amount= 270: 922 ordinary chapters for freq= 70 and amount= 320: 923 ordinary chapters for freq= 70 and amount= 370: 923 ordinary chapters for freq= 70 and amount= 420: 926 ordinary chapters for freq= 70 and amount= 470: 927 ordinary chapters for freq= 70 and amount= 520: 928 ordinary chapters for freq= 70 and amount= 570: 928 ordinary chapters for freq= 70 and amount= 620: 929 ordinary chapters for freq=100 and amount= 20: 38 ordinary chapters for freq=100 and amount= 70: 432 ordinary chapters for freq=100 and amount= 120: 782 ordinary chapters for freq=100 and amount= 170: 874 ordinary chapters for freq=100 and amount= 220: 905 ordinary chapters for freq=100 and amount= 270: 918 ordinary chapters for freq=100 and amount= 320: 921 ordinary chapters for freq=100 and amount= 370: 923 ordinary chapters for freq=100 and amount= 420: 923 ordinary chapters for freq=100 and amount= 470: 926 ordinary chapters for freq=100 and amount= 520: 927 ordinary chapters for freq=100 and amount= 570: 928 ordinary chapters for freq=100 and amount= 620: 928 ordinary chapters for freq=100 and amount= 670: 929 ordinary chapters
advanced sets
You have seen how to mingle sets with queries.
Time to enter the race for space:
relations quantifiers from MQL rough gaps
CC-BY Dirk Roorda