This Jupyter Notebook showcases several examples of statistical analysis performed on a Text-Fabric corpus.
%load_ext autoreload
%autoreload 2
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use
# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904GBI", version="0.4", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 27 | 5102.93 | 100 |
chapter | 260 | 529.92 | 100 |
sentence | 5720 | 24.09 | 100 |
verse | 7943 | 17.35 | 100 |
clause | 16124 | 8.54 | 100 |
phrase | 72674 | 1.90 | 100 |
word | 137779 | 1.00 | 100 |
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())
Although this is somewhat trivial, this example does serve a purpose. We will print te version by means of calling the Text-Fabric parameter VERSION which is fixed for the whole programm. To access any of these parameters in our notebook, it first needs to be imported from tf.parameters
.
from tf.parameters import VERSION
print ('TextFabric version: {}'.format(VERSION))
TextFabric version: 11.4.10
Note that any other parameters can be dumped in similar manner.
N1904.showContext(...)
In this example the header of the loaded Text-Fabric dataset is dumped. This is done by means of an API call to A.header()
.
Please note that in the example below A
is replaced by N1904
. This is result of the method of incantation:
N1904 = use (... etc ... )
The use
function returns an oject whose attributes and methods constitute the advanced API. In the
N1904.header(allMeta=False)
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
book | 27 | 5102.93 | 100 |
chapter | 260 | 529.92 | 100 |
sentence | 5720 | 24.09 | 100 |
verse | 7943 | 17.35 | 100 |
clause | 16124 | 8.54 | 100 |
phrase | 72674 | 1.90 | 100 |
word | 137779 | 1.00 | 100 |
The following API call footprint
provides a nicely formatted overview of memory footprint for each of the features in the Text_fabric corpus.
TF.footprint()
feature | members | size in bytes |
---|---|---|
levUp | 240,527 | 27,849,840 |
phrase | 210,453 | 21,708,200 |
nodeID | 137,779 | 17,505,299 |
boundary | 2 | 15,456,576 |
monad | 137,779 | 12,951,424 |
clause | 153,930 | 12,487,268 |
oslots | 3 | 12,055,800 |
sentence | 143,553 | 10,903,980 |
word | 137,779 | 10,862,812 |
normalized | 137,779 | 10,773,392 |
gloss | 137,779 | 10,312,734 |
book | 162,187 | 9,785,747 |
bookshort | 162,187 | 9,785,636 |
booknum | 162,187 | 9,784,204 |
chapter | 162,160 | 9,783,448 |
verse | 161,900 | 9,776,168 |
levDown | 102,748 | 9,727,776 |
lemma | 137,779 | 9,581,098 |
ln | 137,779 | 9,532,549 |
subj_ref | 137,779 | 9,454,588 |
strongs | 137,779 | 9,382,667 |
lex_dom | 137,779 | 9,198,787 |
functionaltag | 137,779 | 9,160,464 |
formaltag | 137,779 | 9,159,903 |
sp | 137,779 | 9,101,359 |
type | 137,779 | 9,101,355 |
splong | 137,779 | 9,101,353 |
mood | 137,779 | 9,101,184 |
tense | 137,779 | 9,101,170 |
after | 137,779 | 9,101,136 |
case | 137,779 | 9,101,118 |
voice | 137,779 | 9,101,059 |
gn | 137,779 | 9,101,001 |
person | 137,779 | 9,100,994 |
degree | 137,779 | 9,100,951 |
nu | 137,779 | 9,100,943 |
number | 137,779 | 9,100,943 |
order | 240,527 | 8,659,012 |
phrasetype | 72,674 | 4,658,567 |
phrasefunction | 72,674 | 4,657,128 |
phrasefunctionlong | 72,674 | 4,657,002 |
structure | 6 | 4,023,786 |
clauserule | 16,124 | 1,075,551 |
rank | 240,527 | 1,022,312 |
otype | 4 | 822,535 |
sections | 2 | 573,560 |
clausetype | 3,846 | 255,410 |
characters | 1 | 30,405 |
levels | 7 | 1,519 |
TOTAL | 5,825,378 | 435,731,713 |
The API call A.isLoaded()
will show information about loaded features.
N1904.isLoaded()
__boundary__ computed __characters__ computed __levDown__ computed __levUp__ computed __levels__ computed __order__ computed __rank__ computed __sections__ computed __structure__ computed after node (str) book node (str) booknum node (int) bookshort node (str) case node (str) chapter node (int) clause node (int) clauserule node (str) clausetype node (str) degree node (str) formaltag node (str) functionaltag node (str) gloss node (str) gn node (str) lemma node (str) lex_dom node (str) ln node (str) monad node (int) mood node (str) nodeID node (str) normalized node (str) nu node (str) number node (str) oslots edge otext config otype node (str) person node (str) phrase node (int) phrasefunction node (str) phrasefunctionlong node (str) phrasetype node (str) reference NOT LOADED sentence node (int) sp node (str) splong node (str) strongs node (str) subj_ref node (str) tense node (str) type node (str) verse node (int) voice node (str) word node (str)
This example will show various statistics on node types. The call to C.levels.data
results in list of ordered tuples which will be nicely displayed using the tabulate function.
# Library to format table
from tabulate import tabulate
headers = ["Node", "Avarage # of slots","first","last"]
ResultList= C.levels.data
print(tabulate(ResultList, headers=headers, tablefmt='fancy_grid'))
╒══════════╤══════════════════════╤═════════╤════════╕ │ Node │ Avarage # of slots │ first │ last │ ╞══════════╪══════════════════════╪═════════╪════════╡ │ book │ 5102.93 │ 137780 │ 137806 │ ├──────────┼──────────────────────┼─────────┼────────┤ │ chapter │ 529.919 │ 137807 │ 138066 │ ├──────────┼──────────────────────┼─────────┼────────┤ │ sentence │ 24.0872 │ 226865 │ 232584 │ ├──────────┼──────────────────────┼─────────┼────────┤ │ verse │ 17.346 │ 232585 │ 240527 │ ├──────────┼──────────────────────┼─────────┼────────┤ │ clause │ 8.54496 │ 138067 │ 154190 │ ├──────────┼──────────────────────┼─────────┼────────┤ │ phrase │ 1.89585 │ 154191 │ 226864 │ ├──────────┼──────────────────────┼─────────┼────────┤ │ word │ 1 │ 1 │ 137779 │ ╘══════════╧══════════════════════╧═════════╧════════╛
for NodeType in F.otype.all:
print (NodeType, F.otype.sInterval(NodeType))
book (137780, 137806) chapter (137807, 138066) sentence (226865, 232584) verse (232585, 240527) clause (138067, 154190) phrase (154191, 226864) word (1, 137779)
Note that the ranges shown as output of this command are (except, possibly with repect to order) the same as found in file otype.tf
:
@node
@TextFabric version=11.4.10
...
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2023-06-19T16:21:20Z
1-137779 word
137780-137806 book
137807-138066 chapter
138067-154190 clause
154191-226864 phrase
226865-232584 sentence
232585-240527 verse
The scripts in this notebook require (beside text-fabric
) the following Python libraries to be installed in the environment:
tabulate
You can install any missing library from within Jupyter Notebook using eitherpip
or pip3
.