This notebook explains the basics of working with Text-Fabric and Python.
First, we import the Text-Fabric module. Make sure to have Text-Fabric installed first! If you have not installed Text-Fabric on your computer, you can uncomment (remove the #
) first line to install it temporarily here.
#pip install text-fabric
from tf.app import use
A = use('bhsa:hot', hoist=globals())
rate limit is 5000 requests per hour, with 5000 left for this hour connecting to online GitHub repo annotation/app-bhsa ... connected
The complete API (Application Programming Interface) of Text-Fabric can be found here: Cheatsheet. As this might be too complicated for you to understand right now, the most useful and important functions will be explained below. Once you get the hang of it, the others are pretty much self-explanatory.
The most used function is probably F.otype.s(...)
. It collects all nodes of a specific node type. To find out what node types there are, we can check the feature documentation. For example, if we want to collect all books or words, we can simply do the following:
F.otype.s('book')
range(426585, 426624)
F.otype.s('word')
range(1, 426585)
Apparently, every node type is represented by a number. The words are represented by the numbers 1 through 426584 (ranges are exclusive in Python) and the books by the numbers 426585 through 246623.
So, for example, node 100000 represents a word. We can use the node of this word to find out more about its features. To do this, we look in the feature documentation for all features that are applicable to 'words'. If we want to know this word's part of speech, we simply look for that feature in the documentation and find out that it is called sp
and that it is applicable to 'objects of type word' (sp).
To find out the feature of node 100000, we add the following elements together:
F.
, indicating 'Feature'sp
.v
, because we know the node and we are looking for the value(100000)
Together, this forms F.sp.v(100000)
. Similarly, if we want to know the word's lexeme, we construct F.lex.v(100000)
.
F.sp.v(100000)
'prep'
F.lex.v(100000)
'B'
To find out more about the position and the surroundings of node 100000, we have to use different functions than F.
because F.
only applies to the feature of the word itself.
Consulting the cheatsheet again, we find out that there also exist functions that start with the letter A, T, or L.
The L and T functions will probably be used the most, but there is at least one handy function that starts with the letter A: A.sectionStrFromNode()
. It returns the heading of a section to which the node belongs. Let's apply it to node 100000 again:
A.sectionStrFromNode(100000)
'Deuteronomy 11:19'
The function shows to which section a node belongs by returning a string (a data type that is a sequence of characters). There is also a similar function which starts with T, T.sectionFromNode()
, but which returns a tuple (an ordered and immutable collection of objects). Let's check out the difference between both functions:
A_section = A.sectionStrFromNode(100000)
T_section = T.sectionFromNode(100000)
print("Function A:", A_section)
print("Function T:", T_section)
print("Function T:", T_section[0])
print("Function T:", T_section[0], T_section[1])
Function A: Deuteronomy 11:19 Function T: ('Deuteronomy', 11, 19) Function T: Deuteronomy Function T: Deuteronomy 11
While A.sectionStrFromNode()
is useful if you only want to know a node's section, T.sectionFromNode
allows you to easily adapt the data to your wishes.
Another useful T.
function is T.text(node, fmt=...)
. It simply prints the text that is represented by the node. It requires a node as input with the option to specify the format, fmt
. Below you can see some examples of different formats. When the format is not specified, the default format is text-orig-full
, signifying the text in Hebrew including all diacritical marks.
text_trans_plain = T.text(100000, fmt='text-trans-plain')
text_trans_full = T.text(100000, fmt='text-trans-full')
text_orig_plain = T.text(100000, fmt='text-orig-plain')
text_orig_full = T.text(100000, fmt='text-orig-full')
print(text_trans_plain)
print(text_trans_full)
print(text_orig_plain)
print(text_orig_full)
print(T.text(100000))
B B.:- ב בְּ בְּ
Lastly, let's introduce two functions starting with L.
. We have been focusing on node 100000, which is a word. What if we want to analyse its direct surroundings? For example, to which phrase does it belong, what is its function in the overarching clause?
To move up or down from one node type to another can be done with Locality functions. L.u(node, otype=node type)
moves up from the node to the specified node type. For example, if we want to print the text of the clause to which our node 100000 belongs, we could do the following:
# move up from node 100000 to its clause
clause_node = L.u(100000, 'clause')
# get the text for this clause
clause_text = T.text(clause_node)
# print the clause text
print(clause_text)
בְּשִׁבְתְּךָ֤ בְּבֵיתֶ֨ךָ֙
Or, if we want to find the first clause in the BHSA, print the part of speech of each word within that clause, and determine the subject of the clause, we must do the following:
F.otype.s('clause')
, the collection of all clauses.A.sectionStrFromNode(node)
L.d(node, 'word)
to move down from clause level to word level and collect the word nodesF.sp.v(node)
to get the part of speech for each wordL.d(node, 'phrase')
to get the phrase nodes of the clauseF.function.v(node)
to get the phrase function to check whether it is the subject of the clause (function is a feature on phrase level, see here).Between each step, the program will print the results to provide insight in the intermediate results.
# collecting all clauses
all_clauses = F.otype.s('clause')
print("all_clauses:", all_clauses)
# getting the first clause
first_clause_node = all_clauses[0]
print("first_clause_node:", first_clause_node)
# getting the section of the first clause
section_first_clause = A.sectionStrFromNode(first_clause_node)
print("Section:", section_first_clause)
# collecting all words from the first clause
word_nodes_first_clause = L.d(first_clause_node, 'word')
print("word_nodes_first_clause:", word_nodes_first_clause)
# iterating through all word nodes in the first clause and
# printing for each word: node, part of speech, unvocalised text
for word in word_nodes_first_clause:
print(word, F.sp.v(word), T.text(word, fmt='text-orig-plain'))
# collecting all phrases from the first clause
phrase_nodes_first_clause = L.d(first_clause_node, 'phrase')
print("phrase_nodes_first_clause:", phrase_nodes_first_clause)
# iterating through all phrase node and checking whether
# their function matches 'Subj'. If so, it prints:
# phrase node, function, unvocalised text
for phrase in phrase_nodes_first_clause:
if F.function.v(phrase) == 'Subj':
print(phrase, F.function.v(phrase), T.text(phrase, fmt='text-orig-plain'))
all_clauses: range(427553, 515674) first_clause_node: 427553 Section: Genesis 1:1 word_nodes_first_clause: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) 1 prep ב 2 subs ראשׁית 3 verb ברא 4 subs אלהים 5 prep את 6 art ה 7 subs שׁמים 8 conj ו 9 prep את 10 art ה 11 subs ארץ׃ phrase_nodes_first_clause: (651542, 651543, 651544, 651545) 651544 Subj אלהים