Some system statistics (Nestle1904GBI)¶

Table of content ¶

1 - Introduction
2 - Load Text-Fabric app and data
3 - Performing the queries
4 - Required libraries

1 - Introduction ¶

Back to TOC ¶

This Jupyter Notebook showcases several examples of statistical analysis performed on a Text-Fabric corpus.

2 - Load Text-Fabric app and data ¶

Back to TOC ¶

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [3]:

# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904GBI", version="0.4", hoist=globals())

Locating corpus resources ...

app: ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/app

data: ~/text-fabric-data/github/tonyjurg/Nestle1904GBI/tf/0.4

Text-Fabric: Text-Fabric API 11.4.10, tonyjurg/Nestle1904GBI/app v3, Search Reference
Data: tonyjurg - Nestle1904GBI 0.4, Character table, Feature docs

Node types

Name	# of nodes	# slots/node	% coverage
book	27	5102.93	100
chapter	260	529.92	100
sentence	5720	24.09	100
verse	7943	17.35	100
clause	16124	8.54	100
phrase	72674	1.90	100
word	137779	1.00	100

Sets: no custom sets
Features:

Nestle 1904 (GBI nodes)

after

str

book

str

booknum

int

bookshort

str

case

str

chapter

int

clause

int

clauserule

str

clausetype

str

degree

str

formaltag

str

functionaltag

str

gloss

str

gn

str

lemma

str

lex_dom

str

ln

str

monad

int

mood

str

nodeID

str

normalized

str

nu

str

number

str

otype

str

person

str

phrase

int

str

str

str

int

sp

str

splong

str

strongs

str

subj_ref

str

tense

str

type

str

verse

int

voice

str

word

str

oslots

none

Text-Fabric API: names N F E L T S C TF directly usable

In [4]:

# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())

3 - Performing the queries ¶

3.1 - Print the Text-Fabric version ¶

Although this is somewhat trivial, this example does serve a purpose. We will print te version by means of calling the Text-Fabric parameter VERSION which is fixed for the whole programm. To access any of these parameters in our notebook, it first needs to be imported from tf.parameters.

In [5]:

from tf.parameters import VERSION
print ('TextFabric version: {}'.format(VERSION))

TextFabric version: 11.4.10

Note that any other parameters can be dumped in similar manner.

In [6]:

N1904.showContext(...)

tonyjurg/Nestle1904GBI app context

3.2 - Dump selection of header ¶

Back to TOC ¶

In this example the header of the loaded Text-Fabric dataset is dumped. This is done by means of an API call to A.header().

Please note that in the example below A is replaced by N1904. This is result of the method of incantation:

N1904 = use (... etc ... )

The use function returns an oject whose attributes and methods constitute the advanced API. In the

In [7]:

N1904.header(allMeta=False)

Text-Fabric: Text-Fabric API 11.4.10, tonyjurg/Nestle1904GBI/app v3, Search Reference
Data: tonyjurg - Nestle1904GBI 0.4, Character table, Feature docs

Node types

Name	# of nodes	# slots/node	% coverage
book	27	5102.93	100
chapter	260	529.92	100
sentence	5720	24.09	100
verse	7943	17.35	100
clause	16124	8.54	100
phrase	72674	1.90	100
word	137779	1.00	100

Sets: no custom sets
Features:

Nestle 1904 (GBI nodes)

after

str

book

str

booknum

int

bookshort

str

case

str

chapter

int

clause

int

clauserule

str

clausetype

str

degree

str

formaltag

str

functionaltag

str

gloss

str

gn

str

lemma

str

lex_dom

str

ln

str

monad

int

mood

str

nodeID

str

normalized

str

nu

str

number

str

otype

str

person

str

phrase

int

str

str

str

int

sp

str

splong

str

strongs

str

subj_ref

str

tense

str

type

str

verse

int

voice

str

word

str

oslots

none

3.3 - Memory footprint ¶

Back to TOC ¶

The following API call footprint provides a nicely formatted overview of memory footprint for each of the features in the Text_fabric corpus.

In [8]:

TF.footprint()

49 features¶

feature	members	size in bytes
levUp	240,527	27,849,840
phrase	210,453	21,708,200
nodeID	137,779	17,505,299
boundary	2	15,456,576
monad	137,779	12,951,424
clause	153,930	12,487,268
oslots	3	12,055,800
sentence	143,553	10,903,980
word	137,779	10,862,812
normalized	137,779	10,773,392
gloss	137,779	10,312,734
book	162,187	9,785,747
bookshort	162,187	9,785,636
booknum	162,187	9,784,204
chapter	162,160	9,783,448
verse	161,900	9,776,168
levDown	102,748	9,727,776
lemma	137,779	9,581,098
ln	137,779	9,532,549
subj_ref	137,779	9,454,588
strongs	137,779	9,382,667
lex_dom	137,779	9,198,787
functionaltag	137,779	9,160,464
formaltag	137,779	9,159,903
sp	137,779	9,101,359
type	137,779	9,101,355
splong	137,779	9,101,353
mood	137,779	9,101,184
tense	137,779	9,101,170
after	137,779	9,101,136
case	137,779	9,101,118
voice	137,779	9,101,059
gn	137,779	9,101,001
person	137,779	9,100,994
degree	137,779	9,100,951
nu	137,779	9,100,943
number	137,779	9,100,943
order	240,527	8,659,012
phrasetype	72,674	4,658,567
phrasefunction	72,674	4,657,128
phrasefunctionlong	72,674	4,657,002
structure	6	4,023,786
clauserule	16,124	1,075,551
rank	240,527	1,022,312
otype	4	822,535
sections	2	573,560
clausetype	3,846	255,410
characters	1	30,405
levels	7	1,519
TOTAL	5,825,378	435,731,713

3.4 - List loaded features ¶

Back to TOC ¶

The API call A.isLoaded() will show information about loaded features.

In [9]:

N1904.isLoaded()

__boundary__         computed  
__characters__       computed  
__levDown__          computed  
__levUp__            computed  
__levels__           computed  
__order__            computed  
__rank__             computed  
__sections__         computed  
__structure__        computed  
after                node (str)
book                 node (str)
booknum              node (int)
bookshort            node (str)
case                 node (str)
chapter              node (int)
clause               node (int)
clauserule           node (str)
clausetype           node (str)
degree               node (str)
formaltag            node (str)
functionaltag        node (str)
gloss                node (str)
gn                   node (str)
lemma                node (str)
lex_dom              node (str)
ln                   node (str)
monad                node (int)
mood                 node (str)
nodeID               node (str)
normalized           node (str)
nu                   node (str)
number               node (str)
oslots               edge      
otext                config    
otype                node (str)
person               node (str)
phrase               node (int)
phrasefunction       node (str)
phrasefunctionlong   node (str)
phrasetype           node (str)
reference            NOT LOADED
sentence             node (int)
sp                   node (str)
splong               node (str)
strongs              node (str)
subj_ref             node (str)
tense                node (str)
type                 node (str)
verse                node (int)
voice                node (str)
word                 node (str)

3.5 - Statistics on node types ¶

Back to TOC ¶

This example will show various statistics on node types. The call to C.levels.data results in list of ordered tuples which will be nicely displayed using the tabulate function.

In [10]:

# Library to format table
from tabulate import tabulate
headers = ["Node", "Avarage # of slots","first","last"]
ResultList= C.levels.data
print(tabulate(ResultList, headers=headers, tablefmt='fancy_grid'))

╒══════════╤══════════════════════╤═════════╤════════╕
│ Node     │   Avarage # of slots │   first │   last │
╞══════════╪══════════════════════╪═════════╪════════╡
│ book     │           5102.93    │  137780 │ 137806 │
├──────────┼──────────────────────┼─────────┼────────┤
│ chapter  │            529.919   │  137807 │ 138066 │
├──────────┼──────────────────────┼─────────┼────────┤
│ sentence │             24.0872  │  226865 │ 232584 │
├──────────┼──────────────────────┼─────────┼────────┤
│ verse    │             17.346   │  232585 │ 240527 │
├──────────┼──────────────────────┼─────────┼────────┤
│ clause   │              8.54496 │  138067 │ 154190 │
├──────────┼──────────────────────┼─────────┼────────┤
│ phrase   │              1.89585 │  154191 │ 226864 │
├──────────┼──────────────────────┼─────────┼────────┤
│ word     │              1       │       1 │ 137779 │
╘══════════╧══════════════════════╧═════════╧════════╛

3.6 - Node number ranges ¶

Back to TOC ¶

In [11]:

for NodeType in F.otype.all:
    print (NodeType, F.otype.sInterval(NodeType))

book (137780, 137806)
chapter (137807, 138066)
sentence (226865, 232584)
verse (232585, 240527)
clause (138067, 154190)
phrase (154191, 226864)
word (1, 137779)

Note that the ranges shown as output of this command are (except, possibly with repect to order) the same as found in file otype.tf:

@node
@TextFabric version=11.4.10
...
@valueType=str
@writtenBy=Text-Fabric
@dateWritten=2023-06-19T16:21:20Z

1-137779	word
137780-137806	book
137807-138066	chapter
138067-154190	clause
154191-226864	phrase
226865-232584	sentence
232585-240527	verse

4 - Required libraries ¶

Back to TOC ¶

The scripts in this notebook require (beside text-fabric) the following Python libraries to be installed in the environment:

tabulate

You can install any missing library from within Jupyter Notebook using eitherpip or pip3.

In [ ]:

Some system statistics (Nestle1904GBI)¶

Table of content ¶

1 - Introduction ¶

2 - Load Text-Fabric app and data ¶

3 - Performing the queries ¶

3.1 - Print the Text-Fabric version ¶

3.2 - Dump selection of header¶

3.3 - Memory footprint ¶

49 features¶

3.4 - List loaded features ¶

3.5 - Statistics on node types¶

3.6 - Node number ranges ¶

4 - Required libraries ¶

3.2 - Dump selection of header ¶

3.5 - Statistics on node types ¶