This notebook gets you started with using Text-Fabric for coding in the Hebrew Bible.
If you are totally new to Text-Fabric, it might be helpful to read about the underlying data model first.
Short introductions to other TF datasets:
or the
In a notebook, you can perform searches and view them in a tabular display and zoom in on items with pretty displays.
But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just work with them in other programs, such as Excel.
You want to do that not only with query results, but with all kinds of lists of tuples of nodes.
There is a function for that, A.export()
, and here we show what it can do.
%load_ext autoreload
%autoreload 2
The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.
import os
from tf.app import use
#A = use('bhsa', hoist=globals())
A = use('bhsa:clone', checkout="clone", hoist=globals())
Using TF-app in /Users/dirk/github/annotation/app-bhsa/code: repo clone offline under ~/github (local github) Using data in /Users/dirk/github/etcbc/bhsa/tf/c: repo clone offline under ~/github (local github) Using data in /Users/dirk/github/etcbc/phono/tf/c: repo clone offline under ~/github (local github) Using data in /Users/dirk/github/etcbc/parallels/tf/c: repo clone offline under ~/github (local github) | 0.00s Dataset without structure sections in otext:no structure functions in the T-API
Parallel Passages: crossref
BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det domain freq_lex function g_cons g_cons_utf8 g_lex g_lex_utf8 g_word g_word_utf8 gloss gn label language lex lex_utf8 ls nametype nme nu number otype pargr pdp pfm prs prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rela sp st tab trailer trailer_utf8 txt typ uvf vbe vbs verse voc_lex voc_lex_utf8 vs vt mother oslots
Phonetic Transcriptions: phono phono_trailer
We write a function that can peek into file on your system, and show the first few lines. We'll use it to inspect the exported files that we are going to produce.
EXPORT_FILE = os.path.expanduser('~/Downloads/results.tsv')
UPTO = 10
def checkout():
with open(EXPORT_FILE, encoding='utf_16') as fh:
for (i, line) in enumerate(fh):
if i >= UPTO:
break
print(line)
Our exported .tsv
files open in Excel without hassle, even if they contain non-latin characters.
That is because TF writes such files in an
encoding that works well with Excel: utf_16_le
.
You can just open them in Excel, there is no need for conversion before or after opening these files.
Should you want to process these files by means of a (Python) program,
take care to read them with encoding utf_16
.
We first run a query in order to export the results.
query = '''
book book=Samuel_I
clause
word sp=nmpr
'''
results = A.search(query)
0.55s 1868 results
You can export the table of results to Excel.
The following command writes a tab-separated file results.tsv
to your downloads directory.
You can specify arguments toDir=directory
and toFile=file name
to write to a different file.
If the directory does not exist, it will be created.
We stick to the default, however.
A.export(results)
Check out the contents:
checkout()
R S1 S2 S3 NODE1 TYPE1 book1 NODE2 TYPE2 TEXT2 NODE3 TYPE3 TEXT3 sp3 1 1_Samuel 1 1 426592 book Samuel_I 453942 clause וַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם 141547 word אֶפְרָ֑יִם nmpr 2 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141550 word אֶ֠לְקָנָה nmpr 3 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141552 word יְרֹחָ֧ם nmpr 4 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141554 word אֱלִיה֛וּא nmpr 5 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141556 word תֹּ֥חוּ nmpr 6 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141558 word צ֖וּף nmpr 7 1_Samuel 1 2 426592 book Samuel_I 453945 clause שֵׁ֤ם אַחַת֙ חַנָּ֔ה 141566 word חַנָּ֔ה nmpr 8 1_Samuel 1 2 426592 book Samuel_I 453946 clause וְשֵׁ֥ם הַשֵּׁנִ֖ית פְּנִנָּ֑ה 141571 word פְּנִנָּ֑ה nmpr 9 1_Samuel 1 2 426592 book Samuel_I 453948 clause לִפְנִנָּה֙ יְלָדִ֔ים 141575 word פְנִנָּה֙ nmpr
You see the following columns:
sp
on node 3If we want to see the clause type (feature typ
) and the word gender (feature gn
) as well, we must mention them
in the query.
We can do so as follows:
query = '''
book book=Samuel_I
clause typ*
word sp=nmpr gn*
'''
results = A.search(query)
1.03s 1868 results
The same number of results as before.
The *
is a trivial condition, it is always true.
We do the export again and peek at the results.
A.export(results)
checkout()
R S1 S2 S3 NODE1 TYPE1 book1 NODE2 TYPE2 TEXT2 typ2 NODE3 TYPE3 TEXT3 gn3 sp3 1 1_Samuel 1 1 426592 book Samuel_I 453942 clause וַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם WayX 141547 word אֶפְרָ֑יִם unknown nmpr 2 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ NmCl 141550 word אֶ֠לְקָנָה m nmpr 3 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ NmCl 141552 word יְרֹחָ֧ם m nmpr 4 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ NmCl 141554 word אֱלִיה֛וּא m nmpr 5 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ NmCl 141556 word תֹּ֥חוּ m nmpr 6 1_Samuel 1 1 426592 book Samuel_I 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ NmCl 141558 word צ֖וּף m nmpr 7 1_Samuel 1 2 426592 book Samuel_I 453945 clause שֵׁ֤ם אַחַת֙ חַנָּ֔ה NmCl 141566 word חַנָּ֔ה f nmpr 8 1_Samuel 1 2 426592 book Samuel_I 453946 clause וְשֵׁ֥ם הַשֵּׁנִ֖ית פְּנִנָּ֑ה NmCl 141571 word פְּנִנָּ֑ה f nmpr 9 1_Samuel 1 2 426592 book Samuel_I 453948 clause לִפְנִנָּה֙ יְלָדִ֔ים NmCl 141575 word פְנִנָּה֙ f nmpr
As you see, you have an extra column typ2 and gn3.
This gives you a lot of control over the generation of spreadsheets.
You can also export lists of node tuples that are not obtained by a query:
tuples = (
tuple(results[0][1:3]),
tuple(results[1][1:3]),
)
tuples
((453942, 141547), (453943, 141550))
Two rows, each row has a clause node and a word node.
Let's do a bare export:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 book1 NODE2 TYPE2 TEXT2 typ2 1 1_Samuel 1 1 453942 clause וַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם 141547 word אֶפְרָ֑יִם 2 1_Samuel 1 1 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141550 word אֶ֠לְקָנָה
Wait a minute: why is the typ2
there?
It is because we have run a query before where we asked for typ
.
If we do not want to be influenced by previous things we've run, we need to reset the display:
A.displayReset('tupleFeatures')
Again:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 1 1_Samuel 1 1 453942 clause וַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם 141547 word אֶפְרָ֑יִם 2 1_Samuel 1 1 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ 141550 word אֶ֠לְקָנָה
We can get richer exports by means of
A.displaySetup()
, using the parameter tupleFeatures
:
A.displaySetup(tupleFeatures=(
(0, 'typ rela'),
(1, 'sp gn nu pdp'),
))
We assign extra features per member of the tuple.
In the above case:
0
) member (the clause node), gets feature typ
;1
) member (the word node), gets features sp
and gn
.A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 typ1 rela1 NODE2 TYPE2 TEXT2 sp2 gn2 nu2 pdp2 1 1_Samuel 1 1 453942 clause וַיְהִי֩ אִ֨ישׁ אֶחָ֜ד מִן־הָרָמָתַ֛יִם צֹופִ֖ים מֵהַ֣ר אֶפְרָ֑יִם WayX NA 141547 word אֶפְרָ֑יִם nmpr unknown sg nmpr 2 1_Samuel 1 1 453943 clause וּשְׁמֹ֡ו אֶ֠לְקָנָה בֶּן־יְרֹחָ֧ם בֶּן־אֱלִיה֛וּא בֶּן־תֹּ֥חוּ בֶן־צ֖וּף אֶפְרָתִֽי׃ NmCl NA 141550 word אֶ֠לְקָנָה nmpr m sg nmpr
Talking about display setup: other parameters also have effect, e.g. the text format.
Let's change it to the phonetic representation.
A.export(tuples, fmt='text-phono-full')
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 typ1 rela1 NODE2 TYPE2 TEXT2 sp2 gn2 nu2 pdp2 1 1_Samuel 1 1 453942 clause wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim WayX NA 141547 word ʔefrˈāyim nmpr unknown sg nmpr 2 1_Samuel 1 1 453943 clause ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . NmCl NA 141550 word ʔelqānˌā nmpr m sg nmpr
You can chain queries like this:
results = (
A.search('''
book book=Samuel_I
chapter chapter=1
verse verse=1
clause
word sp=nmpr
''')
+
A.search('''
book book=Samuel_I
chapter chapter=1
verse verse=1
clause
word sp=verb nu=pl
''')
)
0.56s 6 results 0.59s 1 result
In such cases, it is better to setup the features yourself:
A.displaySetup(
tupleFeatures=(
(3, 'typ rela'),
(4, 'sp gn vt vs'),
),
fmt='text-phono-full',
)
Now we can do a fine export:
A.export(results)
checkout()
R S1 S2 S3 NODE1 TYPE1 NODE2 TYPE2 NODE3 TYPE3 TEXT3 NODE4 TYPE4 TEXT4 typ4 rela4 NODE5 TYPE5 TEXT5 sp5 gn5 vt5 vs5 1 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453942 clause wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim WayX NA 141547 word ʔefrˈāyim nmpr unknown NA NA 2 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453943 clause ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . NmCl NA 141550 word ʔelqānˌā nmpr m NA NA 3 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453943 clause ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . NmCl NA 141552 word yᵊrōḥˈām nmpr m NA NA 4 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453943 clause ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . NmCl NA 141554 word ʔᵉlîhˈû nmpr m NA NA 5 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453943 clause ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . NmCl NA 141556 word tˌōḥû nmpr m NA NA 6 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453943 clause ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . NmCl NA 141558 word ṣˌûf nmpr m NA NA 7 1_Samuel 1 1 426592 book 426856 chapter 1421483 verse wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim ûšᵊmˈô ʔelqānˌā ben-yᵊrōḥˈām ben-ʔᵉlîhˈû ben-tˌōḥû ven-ṣˌûf ʔefrāṯˈî . 453942 clause wayᵊhˌî ʔˌîš ʔeḥˈāḏ min-hārāmāṯˈayim ṣôfˌîm mēhˈar ʔefrˈāyim WayX NA 141544 word ṣôfˌîm verb m ptca qal
Now you now how to escape from Text-Fabric.
We hope that this makes your stay in TF more comfortable. It's not a Hotel California.
CC-BY Dirk Roorda