Notebook

This notebook explains the basics of working with Text-Fabric and Python.

First, we import the Text-Fabric module. Make sure to have Text-Fabric installed first! If you have not installed Text-Fabric on your computer, you can uncomment (remove the #) first line to install it temporarily here.

In [2]:

#pip install text-fabric
from tf.app import use
A = use('bhsa:hot', hoist=globals())

rate limit is 5000 requests per hour, with 5000 left for this hour
	connecting to online GitHub repo annotation/app-bhsa ... connected

TF-app: C:\Users\Mark/text-fabric-data/annotation/app-bhsa/code

data: C:\Users\Mark/text-fabric-data/etcbc/bhsa/tf/c

data: C:\Users\Mark/text-fabric-data/etcbc/phono/tf/c

data: C:\Users\Mark/text-fabric-data/etcbc/parallels/tf/c

Text-Fabric: Text-Fabric API 8.4.0, app-bhsa, Search Reference
Data: BHSA, Character table, Feature docs
Features:

Parallel Passages

crossref

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book
book@ll
chapter
code
det
domain
freq_lex
function
g_cons
g_cons_utf8
g_lex
g_lex_utf8
g_word
g_word_utf8
gloss
gn
label
language
lex
lex_utf8
ls
nametype
nme
nu
number
otype
pargr
pdp
pfm
prs
prs_gn
prs_nu
prs_ps
ps
qere
qere_trailer
qere_trailer_utf8
qere_utf8
rank_lex
rela
sp
st
tab
trailer
trailer_utf8
txt
typ
uvf
vbe
vbs
verse
voc_lex
voc_lex_utf8
vs
vt
mother
oslots

Phonetic Transcriptions

phono
phono_trailer

Text-Fabric API: names N F E L T S C TF directly usable

The complete API (Application Programming Interface) of Text-Fabric can be found here: Cheatsheet. As this might be too complicated for you to understand right now, the most useful and important functions will be explained below. Once you get the hang of it, the others are pretty much self-explanatory.

The most used function is probably F.otype.s(...). It collects all nodes of a specific node type. To find out what node types there are, we can check the feature documentation. For example, if we want to collect all books or words, we can simply do the following:

In [4]:

F.otype.s('book')

Out[4]:

range(426585, 426624)

In [5]:

F.otype.s('word')

Out[5]:

range(1, 426585)

Apparently, every node type is represented by a number. The words are represented by the numbers 1 through 426584 (ranges are exclusive in Python) and the books by the numbers 426585 through 246623.

So, for example, node 100000 represents a word. We can use the node of this word to find out more about its features. To do this, we look in the feature documentation for all features that are applicable to 'words'. If we want to know this word's part of speech, we simply look for that feature in the documentation and find out that it is called sp and that it is applicable to 'objects of type word' (sp).

To find out the feature of node 100000, we add the following elements together:

F., indicating 'Feature'
The name of the feature, here sp
.v, because we know the node and we are looking for the value
The number of the node, (100000)

Together, this forms F.sp.v(100000). Similarly, if we want to know the word's lexeme, we construct F.lex.v(100000).

In [43]:

F.sp.v(100000)

Out[43]:

'prep'

In [44]:

F.lex.v(100000)

Out[44]:

'B'

To find out more about the position and the surroundings of node 100000, we have to use different functions than F. because F. only applies to the feature of the word itself.

Consulting the cheatsheet again, we find out that there also exist functions that start with the letter A, T, or L.

The L and T functions will probably be used the most, but there is at least one handy function that starts with the letter A: A.sectionStrFromNode(). It returns the heading of a section to which the node belongs. Let's apply it to node 100000 again:

In [50]:

A.sectionStrFromNode(100000)

Out[50]:

'Deuteronomy 11:19'

The function shows to which section a node belongs by returning a string (a data type that is a sequence of characters). There is also a similar function which starts with T, T.sectionFromNode(), but which returns a tuple (an ordered and immutable collection of objects). Let's check out the difference between both functions:

In [59]:

A_section = A.sectionStrFromNode(100000)
T_section = T.sectionFromNode(100000)

print("Function A:", A_section)
print("Function T:", T_section)
print("Function T:", T_section[0])
print("Function T:", T_section[0], T_section[1])

Function A: Deuteronomy 11:19
Function T: ('Deuteronomy', 11, 19)
Function T: Deuteronomy
Function T: Deuteronomy 11

While A.sectionStrFromNode() is useful if you only want to know a node's section, T.sectionFromNode allows you to easily adapt the data to your wishes.

Another useful T. function is T.text(node, fmt=...). It simply prints the text that is represented by the node. It requires a node as input with the option to specify the format, fmt. Below you can see some examples of different formats. When the format is not specified, the default format is text-orig-full, signifying the text in Hebrew including all diacritical marks.

In [61]:

text_trans_plain = T.text(100000, fmt='text-trans-plain')
text_trans_full = T.text(100000, fmt='text-trans-full')
text_orig_plain = T.text(100000, fmt='text-orig-plain')
text_orig_full = T.text(100000, fmt='text-orig-full')
print(text_trans_plain)
print(text_trans_full)
print(text_orig_plain)
print(text_orig_full)
print(T.text(100000))

B
B.:-
ב
בְּ
בְּ

Lastly, let's introduce two functions starting with L.. We have been focusing on node 100000, which is a word. What if we want to analyse its direct surroundings? For example, to which phrase does it belong, what is its function in the overarching clause?

To move up or down from one node type to another can be done with Locality functions. L.u(node, otype=node type) moves up from the node to the specified node type. For example, if we want to print the text of the clause to which our node 100000 belongs, we could do the following:

In [62]:

# move up from node 100000 to its clause
clause_node = L.u(100000, 'clause')

# get the text for this clause
clause_text = T.text(clause_node)

# print the clause text
print(clause_text)

בְּשִׁבְתְּךָ֤ בְּבֵיתֶ֨ךָ֙

Or, if we want to find the first clause in the BHSA, print the part of speech of each word within that clause, and determine the subject of the clause, we must do the following:

Determine the node of the first clause by getting the first element of F.otype.s('clause'), the collection of all clauses.
Get its section using A.sectionStrFromNode(node)
Use L.d(node, 'word) to move down from clause level to word level and collect the word nodes
Use F.sp.v(node) to get the part of speech for each word
Use L.d(node, 'phrase') to get the phrase nodes of the clause
Use F.function.v(node) to get the phrase function to check whether it is the subject of the clause (function is a feature on phrase level, see here).

Between each step, the program will print the results to provide insight in the intermediate results.

In [81]:

# collecting all clauses
all_clauses = F.otype.s('clause')
print("all_clauses:", all_clauses)

# getting the first clause
first_clause_node = all_clauses[0]
print("first_clause_node:", first_clause_node)

# getting the section of the first clause
section_first_clause = A.sectionStrFromNode(first_clause_node)
print("Section:", section_first_clause)

# collecting all words from the first clause
word_nodes_first_clause = L.d(first_clause_node, 'word')
print("word_nodes_first_clause:", word_nodes_first_clause)

# iterating through all word nodes in the first clause and 
# printing for each word: node, part of speech, unvocalised text
for word in word_nodes_first_clause:
    print(word, F.sp.v(word), T.text(word, fmt='text-orig-plain'))
    
# collecting all phrases from the first clause
phrase_nodes_first_clause = L.d(first_clause_node, 'phrase')
print("phrase_nodes_first_clause:", phrase_nodes_first_clause)

# iterating through all phrase node and checking whether 
# their function matches 'Subj'. If so, it prints:
# phrase node, function, unvocalised text
for phrase in phrase_nodes_first_clause:
    if F.function.v(phrase) == 'Subj':
        print(phrase, F.function.v(phrase), T.text(phrase, fmt='text-orig-plain'))

all_clauses: range(427553, 515674)
first_clause_node: 427553
Section: Genesis 1:1
word_nodes_first_clause: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
1 prep ב
2 subs ראשׁית 
3 verb ברא 
4 subs אלהים 
5 prep את 
6 art ה
7 subs שׁמים 
8 conj ו
9 prep את 
10 art ה
11 subs ארץ׃ 
phrase_nodes_first_clause: (651542, 651543, 651544, 651545)
651544 Subj אלהים