You might want to consider the start of this tutorial.
Short introductions to other TF datasets:
or the
Text-Fabric offers pretty and plain displays of textual objects.
A plain display of an object is a simple reference to that object if it is big, or the text of that object if it is small.
A pretty display of an object is a representation of the structure of that object. It contains text and features of sub objects. Provided the object is not too big.
%load_ext autoreload
%autoreload 2
The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.
from tf.app import use
A = use("ETCBC/bhsa", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 39 | 10938.21 | 100 |
chapter | 929 | 459.19 | 100 |
lex | 9230 | 46.22 | 100 |
verse | 23213 | 18.38 | 100 |
half_verse | 45179 | 9.44 | 100 |
sentence | 63717 | 6.70 | 100 |
sentence_atom | 64514 | 6.61 | 100 |
clause | 88131 | 4.84 | 100 |
clause_atom | 90704 | 4.70 | 100 |
phrase | 253203 | 1.68 | 100 |
phrase_atom | 267532 | 1.59 | 100 |
subphrase | 113850 | 1.42 | 38 |
word | 426590 | 1.00 | 100 |
3
ETCBC/bhsa
/Users/me/text-fabric-data/github/ETCBC/bhsa/app
gd905e3fb6e80d0fa537600337614adc2af157309
''
<code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
g_uvf_utf8
g_vbs
kq_hybrid
languageISO
g_nme
lex0
is_root
g_vbs_utf8
g_uvf
dist
root
suffix_person
g_vbe
dist_unit
suffix_number
distributional_parent
kq_hybrid_utf8
crossrefSET
instruction
g_prs
lexeme_count
rank_occ
g_pfm_utf8
freq_occ
crossrefLCS
functional_parent
g_pfm
g_nme_utf8
g_vbe_utf8
kind
g_prs_utf8
suffix_gender
mother_object_type
none
unknown
NA
{docRoot}/{repo}
''
''
https://{org}.github.io
0_home
{}
True
local
/Users/me/text-fabric-data/github/ETCBC/bhsa/_temp
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
10.5281/zenodo.1007624
Phonetic Transcriptions
https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
10.5281/zenodo.1007636
ETCBC
/tf
phono
Parallel Passages
https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
10.5281/zenodo.1007642
ETCBC
/tf
parallels
ETCBC
/tf
bhsa
2021
https://shebanq.ancient-data.org/hebrew
Show this on SHEBANQ
la
True
{webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
{webBase}/word?version={version}&id=<lid>
v1.8
{typ} {rela}
''
True
{code}
1
''
True
{label}
''
True
gloss
{voc_lex_utf8}
word
orig
{voc_lex_utf8}
{typ} {function}
''
True
{typ} {rela}
1
''
{number}
''
True
{number}
1
''
True
{number}
''
pdp vs vt
lex:gloss
hbo
v1 = A.nodeFromSectionStr("Genesis 1:1")
v1
1414389
A.pretty(v1)
With standard features displayed:
A.pretty(v1, standardFeatures=True)
Now a phrase. We display it with little and with much information.
phrase = 651605
A.pretty(phrase, withNodes=False, prettyTypes=False)
A.pretty(phrase, withNodes=True, standardFeatures=True, hideTypes=False)
If we want to see the subphrases but not the phrase atoms:
A.pretty(phrase, withNodes=True, standardFeatures=True, hiddenTypes="phrase_atom")
Use the following to find out which display options are available and what their current values are.
A.displayShow()
word
None
verse
False
None
None
False
clause_atom
half_verse
phrase_atom
sentence_atom
subphrase
True
{}
None
none
unknown
NA
True
True
True
None
set()
False
None
set()
()
False
True
False
Where is this phrase on SHEBANQ? You can click on the passage reference.
You can generate a link that points to where a node is on SHEBANQ as follows:
A.webLink(phrase)
If you want just the URL:
A.webLink(phrase, urlOnly=True)
'https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=1&verse=4&version=2021&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt'
A link to another passage:
z = A.nodeFromSectionStr("Ezra 3:4")
A.webLink(z)
We can represent a node in plain representation and highlight specific portions.
firstVerse = F.otype.s("verse")[0]
allPhrases = F.otype.s("phrase")
phrases = {allPhrases[1], allPhrases[3]}
words = (2, 4, 6, 9)
firstSentence = F.otype.s("sentence")[0]
A.plain(firstSentence)
First we highlight some words:
highlights = set(words)
A.plain(firstVerse, highlights=highlights)
Now some phrases:
highlights = set(phrases)
print(highlights)
A.plain(firstVerse, highlights=highlights)
{651576, 651574}
As you see, when we highlight bigger things than words, we put a highlighted border around the words in those things.
We can do both:
highlights = set(phrases) | set(words)
A.plain(firstVerse, highlights=highlights)
We can also highlight the verse itself.
highlights = {firstVerse}
A.plain(firstVerse, highlights=highlights)
We can use different colours for highlighting:
highlights = {i: "lightsalmon" for i in [1, 5, 9]}
highlights.update({i: "mediumaquamarine" for i in [3, 7]})
highlights.update({i: "blue" for i in phrases})
highlights.update({firstVerse: "#eeeeee"})
A.plain(firstVerse, highlights=highlights)
We define two verse nodes:
verse1 = A.nodeFromSectionStr("Genesis 1:7")
verse2 = A.nodeFromSectionStr("Genesis 1:17")
and display the first one:
A.pretty(verse1)
In the next verse we choose a bit more to display: we include standard features:
A.pretty(verse2, standardFeatures=True)
The labels of the nodes come from features in the data: hover over a label to see which feature is responsible. The same holds for the unnamed features below the words, in particular the gloss.
Note that the sentence in this verse continues after the verse ends, that is why it has no left border.
In the BHSA, sentences, clauses and phrases may be discontinuous.
The designers of the BHSA data (Eep Talstra and Constantijn Sikkel et al.) have added node types
sentence_atom
, clause_atom
and phrase_atom
.
They are the continuous chunks within the objects of their corresponding non-atom types. The atom types form a nice nest of building blocks.
Usually we hide the atom types from view. But we can make them visible:
A.pretty(verse2, hideTypes=False)
Back to the view without the atoms.
We can even leave out the node types:
A.pretty(verse2, prettyTypes=False)
We put in the features (again) and also add node numbers:
A.pretty(verse2, withNodes=True, standardFeatures=True)
Now we selectively remove a few features from the display:
A.pretty(verse2, standardFeatures=True, suppress={"gloss", "typ"})
Now we add features to the display: lex
and g_word
:
A.displaySetup(extraFeatures=["lex", "g_word"], standardFeatures=True)
We also made standardFeatures=True)
the temporary default.
A.pretty(verse2)
and we reset the pretty features to the default values:
A.displayReset("extraFeatures")
A.pretty(verse2)
We can also opt for less detail: suppose we do not want to dig deeper than the phrases:
A.pretty(verse2, baseTypes={"phrase"})
or if clauses are enough:
A.pretty(verse2, baseTypes={"clause"})
even sentences are possible:
A.pretty(verse2, baseTypes={"sentence"})
Before we go on, we reset the display completely.
A.displayReset()
We run a TF query and show some of its results with a lot of pomp and circumstance. The query is written by Stephen Ku, and he is the one who prompted me to write rich display function for query results.
It asks for a sentence in which there are three clauses, each entirely before the next one.
The first clause has a predicate phrase containing a verb.
The second clause has a predicate phrase, a verb is not required nor forbidden.
The third clause has an object phrase containing a (proper) noun or personal/demonstrative/interrogative pronoun.
ellipQuery = """
sentence
c1:clause
phrase function=Pred
word pdp=verb
c2:clause
phrase function=Pred
c3:clause typ=Ellp
phrase function=Objc
word pdp=subs|nmpr|prps|prde|prin
c1 << c2
c2 << c3
"""
Above is the query template. Now we run the query.
results = A.search(ellipQuery)
1.62s 1473 results
There are several ways to present the results. Here are results 10-12 in a table:
A.table(results, start=10, end=12)
n | p | sentence | clause | phrase | word | clause | phrase | clause | phrase | word |
---|---|---|---|---|---|---|---|---|---|---|
10 | Exodus 18:8 | וַיְסַפֵּ֤ר מֹשֶׁה֙ לְחֹ֣תְנֹ֔ו אֵת֩ כָּל־אֲשֶׁ֨ר עָשָׂ֤ה יְהוָה֙ לְפַרְעֹ֣ה וּלְמִצְרַ֔יִם עַ֖ל אֹודֹ֣ת יִשְׂרָאֵ֑ל אֵ֤ת כָּל־הַתְּלָאָה֙ אֲשֶׁ֣ר מְצָאָ֣תַם בַּדֶּ֔רֶךְ וַיַּצִּלֵ֖ם יְהוָֽה׃ | וַיְסַפֵּ֤ר מֹשֶׁה֙ לְחֹ֣תְנֹ֔ו אֵת֩ כָּל־ | יְסַפֵּ֤ר | יְסַפֵּ֤ר | אֲשֶׁ֨ר עָשָׂ֤ה יְהוָה֙ לְפַרְעֹ֣ה וּלְמִצְרַ֔יִם עַ֖ל אֹודֹ֣ת יִשְׂרָאֵ֑ל | עָשָׂ֤ה | אֵ֤ת כָּל־הַתְּלָאָה֙ | אֵ֤ת כָּל־הַתְּלָאָה֙ | כָּל־ |
11 | Exodus 18:8 | וַיְסַפֵּ֤ר מֹשֶׁה֙ לְחֹ֣תְנֹ֔ו אֵת֩ כָּל־אֲשֶׁ֨ר עָשָׂ֤ה יְהוָה֙ לְפַרְעֹ֣ה וּלְמִצְרַ֔יִם עַ֖ל אֹודֹ֣ת יִשְׂרָאֵ֑ל אֵ֤ת כָּל־הַתְּלָאָה֙ אֲשֶׁ֣ר מְצָאָ֣תַם בַּדֶּ֔רֶךְ וַיַּצִּלֵ֖ם יְהוָֽה׃ | וַיְסַפֵּ֤ר מֹשֶׁה֙ לְחֹ֣תְנֹ֔ו אֵת֩ כָּל־ | יְסַפֵּ֤ר | יְסַפֵּ֤ר | אֲשֶׁ֨ר עָשָׂ֤ה יְהוָה֙ לְפַרְעֹ֣ה וּלְמִצְרַ֔יִם עַ֖ל אֹודֹ֣ת יִשְׂרָאֵ֑ל | עָשָׂ֤ה | אֵ֤ת כָּל־הַתְּלָאָה֙ | אֵ֤ת כָּל־הַתְּלָאָה֙ | תְּלָאָה֙ |
12 | Exodus 23:15 | אֶת־חַ֣ג הַמַּצֹּות֮ תִּשְׁמֹר֒ וְחַ֤ג הַקָּצִיר֙ בִּכּוּרֵ֣י מַעֲשֶׂ֔יךָ אֲשֶׁ֥ר תִּזְרַ֖ע בַּשָּׂדֶ֑ה וְחַ֤ג הָֽאָסִף֙ בְּצֵ֣את הַשָּׁנָ֔ה בְּאָסְפְּךָ֥ אֶֽת־מַעֲשֶׂ֖יךָ מִן־הַשָּׂדֶֽה׃ | אֶת־חַ֣ג הַמַּצֹּות֮ תִּשְׁמֹר֒ | תִּשְׁמֹר֒ | תִּשְׁמֹר֒ | אֲשֶׁ֥ר תִּזְרַ֖ע בַּשָּׂדֶ֑ה | תִּזְרַ֖ע | וְחַ֤ג הָֽאָסִף֙ | חַ֤ג הָֽאָסִף֙ | חַ֤ג |
You can also show the results in pretty displays.
The A.show()
function asks you for some limits (it will not show more than 100 at a time), and then it displays them.
It lists the results as follows:
We show result 10 only.
A.show(results, start=10, end=10, withNodes=True)
result 10
Note that although the standard features are not all shown, the features mentioned in the query are shown. We can suppress that as well:
A.show(results, start=10, end=10, withNodes=True, queryFeatures=False)
result 10
We can also package the results tuples in other things than verses, e.g. sentences, and at the same time cut off the displays at phrases:
A.displaySetup(queryFeatures=False)
A.show(
results,
start=10,
end=12,
withNodes=True,
condenseType="sentence",
baseTypes={"phrase"},
)
result 10
result 11
result 12
Note, that now the phrases are heavily highlighted whereas the highlighted words just have a box around them.
Let's leave out some information:
A.show(
results,
start=10,
end=12,
withNodes=False,
prettyTypes=False,
condenseType="sentence",
baseTypes={"clause"},
withPassage=False,
)
result 10
result 11
result 12
CC-BY Dirk Roorda