Work in progress!
This Jupyter Notebook showcases several examples of statistical analysis performed on a Text-Fabric corpus. For demonstration purposes various methods of collecting and presenting the data are employed.
%load_ext autoreload
%autoreload 2
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use
# load the N1904 app and data
N1904 = use ("tonyjurg/Nestle1904LFT", version="0.6", hoist=globals())
Locating corpus resources ...
The requested app is not available offline ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app not found
The requested data is not available offline ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 not found
| 0.21s T otype from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 2.31s T oslots from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.56s T wordtranslit from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.48s T after from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.59s T normalized from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.49s T chapter from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.61s T unicode from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.56s T book from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.46s T verse from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.59s T wordunacc from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.59s T word from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | | 0.06s C __levels__ from otype, oslots, otext | | 1.79s C __order__ from otype, oslots, __levels__ | | 0.07s C __rank__ from otype, __order__ | | 3.35s C __levUp__ from otype, oslots, __rank__ | | 1.94s C __levDown__ from otype, __levUp__, __rank__ | | 0.21s C __characters__ from otext | | 0.92s C __boundary__ from otype, oslots, __rank__ | | 0.04s C __sections__ from otype, oslots, otext, __levUp__, __levels__, book, chapter, verse | | 0.23s C __structure__ from otype, oslots, otext, __rank__, __levUp__, book, chapter, verse | 0.43s T booknumber from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.51s T bookshort from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.48s T case from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.33s T clausetype from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.57s T containedclause from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.42s T degree from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.57s T gloss from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.46s T gn from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.03s T headverse from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.32s T junction from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.57s T lemma from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.52s T lex_dom from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.53s T ln from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.41s T markafter from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.41s T markbefore from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.41s T markorder from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.45s T monad from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.44s T mood from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.52s T morph from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.53s T nodeID from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.48s T nu from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.49s T number from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.43s T person from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.44s T punctuation from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.65s T ref from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.67s T reference from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.49s T roleclausedistance from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.47s T sentence from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.50s T sp from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.51s T sp_full from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.53s T strongs from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.45s T subj_ref from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.44s T tense from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.46s T type from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.45s T voice from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.40s T wgclass from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.35s T wglevel from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.43s T wgnum from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.36s T wgrole from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.35s T wgrolelong from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.41s T wgrule from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.33s T wgtype from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.52s T wordlevel from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.49s T wordrole from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6 | 0.51s T wordrolelong from ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.6
Name | # of nodes | # slots / node | % coverage |
---|---|---|---|
book | 27 | 5102.93 | 100 |
chapter | 260 | 529.92 | 100 |
verse | 7943 | 17.35 | 100 |
sentence | 8011 | 17.20 | 100 |
wg | 105430 | 6.85 | 524 |
word | 137779 | 1.00 | 100 |
3
tonyjurg/Nestle1904LFT
C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/app
''
orig_order
verse
book
chapter
none
unknown
NA
''
0
text-orig-full
https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/
about
https://github.com/tonyjurg/Nestle1904LFT
https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/features/<feature>.md
layout-orig-full
}True
C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/_temp
Nestle 1904 (Low Fat Tree)
notyet
tonyjurg
/tf
Nestle1904LFT
Nestle1904LFT
0.6
https://learner.bible/text/show_text/nestle1904/
Show this on the Bible Online Learner website
en
https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
{webBase}/word?version={version}&id=<lid>
True
True
{book}
''
True
True
{chapter}
''
0
#{sentence} (start: {book} {chapter}:{headverse})
''
True
chapter verse
{book} {chapter}:{verse}
''
0
#{wgnum}: {wgtype} {wgclass} {clausetype} {wgrole} {wgrule} {junction}
''
True
lemma
gloss
chapter verse
grc
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())
# Set default view in a way to limit noise as much as possible.
N1904.displaySetup(condensed=True, multiFeatures=False, queryFeatures=False)
The method freqList
returns A tuple of (value, frequency), items, ordered by frequency, highest frequencies first.
print("Amount\tword")
for (w, amount) in F.word.freqList("word")[0:25]:
print(f"{amount}\t{w}")
Amount word 8545 καὶ 2769 ὁ 2684 ἐν 2620 δὲ 2497 τοῦ 1755 εἰς 1658 τὸ 1556 τὸν 1518 τὴν 1411 αὐτοῦ 1300 τῆς 1281 ὅτι 1221 τῷ 1201 τῶν 1069 οἱ 941 ἡ 921 γὰρ 902 μὴ 859 τῇ 849 αὐτῷ 817 τὰ 767 οὐκ 722 τοὺς 689 Θεοῦ 670 πρὸς
This code generates a table that displays the frequency of characters within the Text-Fabric corpus. The API call 'C.characters.data' produces a Python dictionary structure that contains the data. The remaining code unpacks and sorts this structure to present the results in a formated table.
Note the first line of the output is 'Format: text-orig-full'. This
# Library to format table
from tabulate import tabulate
# The following API call will result in a Python dictionary structure
FrequencyDictionary=C.characters.data
# Present the results
KeyList = list(FrequencyDictionary.keys())
for Key in KeyList:
print('Format: ',Key)
# 'key' refers to the pre-defined formats the text will be displayed
FrequencyList=FrequencyDictionary[Key]
SortedFrequencyList=sorted(FrequencyList, key=lambda x: x[1], reverse=True)
# In this example the table will be truncated to the first 15 entries
max_rows = 15 # Set your desired number of rows here
TruncatedTable = SortedFrequencyList[:max_rows]
headers = ["character", "frequency"]
print(tabulate(TruncatedTable, headers=headers, tablefmt='fancy_grid'))
# Add a warning using markdown (API call A.dm) allowing it to be printed in bold type
N1904.dm("**Warning: table truncated!**")
Format: text-critical ╒═════════════╤═════════════╕ │ character │ frequency │ ╞═════════════╪═════════════╡ │ ν │ 56230 │ ├─────────────┼─────────────┤ │ α │ 51892 │ ├─────────────┼─────────────┤ │ τ │ 50599 │ ├─────────────┼─────────────┤ │ ο │ 45151 │ ├─────────────┼─────────────┤ │ ε │ 38597 │ ├─────────────┼─────────────┤ │ ς │ 27090 │ ├─────────────┼─────────────┤ │ ι │ 26131 │ ├─────────────┼─────────────┤ │ σ │ 24095 │ ├─────────────┼─────────────┤ │ ρ │ 22871 │ ├─────────────┼─────────────┤ │ κ │ 22630 │ ├─────────────┼─────────────┤ │ π │ 20308 │ ├─────────────┼─────────────┤ │ μ │ 19218 │ ├─────────────┼─────────────┤ │ λ │ 18228 │ ├─────────────┼─────────────┤ │ δ │ 12476 │ ├─────────────┼─────────────┤ │ ἐ │ 12116 │ ╘═════════════╧═════════════╛
Warning: table truncated!
Format: text-normalized ╒═════════════╤═════════════╕ │ character │ frequency │ ╞═════════════╪═════════════╡ │ │ 137779 │ ├─────────────┼─────────────┤ │ ν │ 56230 │ ├─────────────┼─────────────┤ │ α │ 52127 │ ├─────────────┼─────────────┤ │ τ │ 50599 │ ├─────────────┼─────────────┤ │ ο │ 45516 │ ├─────────────┼─────────────┤ │ ε │ 38807 │ ├─────────────┼─────────────┤ │ ς │ 27090 │ ├─────────────┼─────────────┤ │ ι │ 26404 │ ├─────────────┼─────────────┤ │ σ │ 24095 │ ├─────────────┼─────────────┤ │ ρ │ 22871 │ ├─────────────┼─────────────┤ │ κ │ 22630 │ ├─────────────┼─────────────┤ │ ί │ 21518 │ ├─────────────┼─────────────┤ │ π │ 20308 │ ├─────────────┼─────────────┤ │ μ │ 19218 │ ├─────────────┼─────────────┤ │ λ │ 18228 │ ╘═════════════╧═════════════╛
Warning: table truncated!
Format: text-orig-full ╒═════════════╤═════════════╕ │ character │ frequency │ ╞═════════════╪═════════════╡ │ │ 137779 │ ├─────────────┼─────────────┤ │ ν │ 56230 │ ├─────────────┼─────────────┤ │ α │ 51892 │ ├─────────────┼─────────────┤ │ τ │ 50599 │ ├─────────────┼─────────────┤ │ ο │ 45151 │ ├─────────────┼─────────────┤ │ ε │ 38597 │ ├─────────────┼─────────────┤ │ ς │ 27090 │ ├─────────────┼─────────────┤ │ ι │ 26131 │ ├─────────────┼─────────────┤ │ σ │ 24095 │ ├─────────────┼─────────────┤ │ ρ │ 22871 │ ├─────────────┼─────────────┤ │ κ │ 22630 │ ├─────────────┼─────────────┤ │ π │ 20308 │ ├─────────────┼─────────────┤ │ μ │ 19218 │ ├─────────────┼─────────────┤ │ λ │ 18228 │ ├─────────────┼─────────────┤ │ δ │ 12476 │ ╘═════════════╧═════════════╛
Warning: table truncated!
Format: text-transliterated ╒═════════════╤═════════════╕ │ character │ frequency │ ╞═════════════╪═════════════╡ │ │ 137779 │ ├─────────────┼─────────────┤ │ e │ 93371 │ ├─────────────┼─────────────┤ │ o │ 87008 │ ├─────────────┼─────────────┤ │ a │ 75119 │ ├─────────────┼─────────────┤ │ i │ 62778 │ ├─────────────┼─────────────┤ │ t │ 60011 │ ├─────────────┼─────────────┤ │ n │ 56230 │ ├─────────────┼─────────────┤ │ s │ 52132 │ ├─────────────┼─────────────┤ │ u │ 39287 │ ├─────────────┼─────────────┤ │ k │ 27300 │ ├─────────────┼─────────────┤ │ p │ 25081 │ ├─────────────┼─────────────┤ │ r │ 22871 │ ├─────────────┼─────────────┤ │ h │ 20033 │ ├─────────────┼─────────────┤ │ m │ 19218 │ ├─────────────┼─────────────┤ │ l │ 18228 │ ╘═════════════╧═════════════╛
Warning: table truncated!
Format: text-unaccented ╒═════════════╤═════════════╕ │ character │ frequency │ ╞═════════════╪═════════════╡ │ │ 137779 │ ├─────────────┼─────────────┤ │ α │ 75119 │ ├─────────────┼─────────────┤ │ ε │ 66656 │ ├─────────────┼─────────────┤ │ ο │ 65731 │ ├─────────────┼─────────────┤ │ ι │ 62834 │ ├─────────────┼─────────────┤ │ ν │ 56230 │ ├─────────────┼─────────────┤ │ τ │ 50599 │ ├─────────────┼─────────────┤ │ υ │ 39287 │ ├─────────────┼─────────────┤ │ ς │ 27090 │ ├─────────────┼─────────────┤ │ η │ 26715 │ ├─────────────┼─────────────┤ │ σ │ 24095 │ ├─────────────┼─────────────┤ │ ρ │ 23046 │ ├─────────────┼─────────────┤ │ κ │ 22630 │ ├─────────────┼─────────────┤ │ ω │ 21277 │ ├─────────────┼─────────────┤ │ π │ 20308 │ ╘═════════════╧═════════════╛
Warning: table truncated!
C.levels.data
(('book', 5102.925925925926, 137780, 137806), ('chapter', 529.9192307692308, 137807, 138066), ('verse', 17.345965000629484, 146078, 154020), ('sentence', 17.198726750717764, 138067, 146077), ('wg', 7.583849727185382, 154021, 267467), ('word', 1, 1, 137779))
Not particular a statistic function, but still important in relation to the corpus. The output of this command provides details on available formats to present the text of the corpus. See also module tf.advanced.options Display Settings.
N1904.showFormats()
format | level | template |
---|---|---|
text-critical |
word | {unicode} |
text-normalized |
word | {normalized}{after} |
text-orig-full |
word | {word}{after} |
text-transliterated |
word | {wordtranslit}{after} |
text-unaccented |
word | {wordunacc}{after} |
The same result (although formatted different) can be obtained by the following call:
T.formats
{'text-critical': 'word', 'text-normalized': 'word', 'text-orig-full': 'word', 'text-transliterated': 'word', 'text-unaccented': 'word'}
Note that this data originates from file otext.tf
:
@config
...
@fmt:text-orig-full={word}{after}
...
This code generates a lot of output! For that reason we will cut it off after 5 lines per feature.
FeatureList=Fall()
LinesToPrint=5
for Feature in FeatureList:
if Feature!='otype':
print ('Feature:',Feature,'\n\n\t value\t frequency')
FeatureFrequenceLists=Fs(Feature).freqList()
PrintedLine=0
for item, freq in FeatureFrequenceLists:
PrintedLine+=1
print ('\t',item,'\t',freq)
if PrintedLine==LinesToPrint: break
print ('\n')
Feature: after value frequency 119270 , 9462 . 5717 · 2359 ; 971 Feature: appos value frequency 100949 group 9699 apposition 2799 Feature: book value frequency Luke 19457 Acts 18394 Matthew 18300 John 15644 Mark 11278 Feature: booknumber value frequency 3 19457 5 18394 1 18300 4 15644 2 11278 Feature: bookshort value frequency Luke 19457 Acts 18394 Matt 18300 John 15644 Mark 11278 Feature: case value frequency 58261 nominative 24197 accusative 23031 genitive 19515 dative 12126 Feature: chapter value frequency 1 12868 2 10923 3 9652 4 9631 5 8788 Feature: clausetype value frequency 110679 VerbElided 1009 Verbless 929 Minor 830 Feature: containedclause value frequency 2 338 2036 167 97 82 172 81 1083 79 Feature: degree value frequency 137266 comparative 313 superlative 200 Feature: gloss value frequency the 9857 and 6212 - 5496 in 2320 And 2218 Feature: gn value frequency 63804 masculine 41486 feminine 18736 neuter 13753 Feature: junction value frequency 93392 coordinate 9178 subordinate 8491 apposition 2386 Feature: lemma value frequency ὁ 19783 καί 8978 αὐτός 5561 σύ 2892 δέ 2787 Feature: lex_dom value frequency 092004 26322 10487 089017 4370 093001 3672 033006 3225 Feature: ln value frequency 92.24 19781 10488 92.11 4718 89.92 2903 89.87 2756 Feature: markafter value frequency 137728 — 31 ) 11 ]] 7 ( 1 Feature: markbefore value frequency 137745 — 16 ( 10 [[ 7 [ 1 Feature: markorder value frequency 137694 0 34 3 32 2 10 1 9 Feature: monad value frequency 1 1 2 1 3 1 4 1 5 1 Feature: mood value frequency 109422 indicative 15617 participle 6653 infinitive 2285 imperative 1877 Feature: morph value frequency CONJ 16316 PREP 10568 ADV 3808 N-NSM 3475 N-GSM 2935 Feature: nodeID value frequency 52046 common 14186 personal 6040 proper 2192 relative 885 Feature: normalized value frequency καί 8576 ὁ 2769 δέ 2764 ἐν 2684 τοῦ 2497 Feature: nu value frequency singular 69846 38842 plural 29091 Feature: number value frequency singular 69846 38842 plural 29091 Feature: orig_order value frequency 1 1 2 1 3 1 4 1 5 1 Feature: person value frequency 118360 third 12747 second 3729 first 2943 Feature: punctuation value frequency 119270 , 9462 . 5717 · 2359 ; 971 Feature: ref value frequency 1CO 10:1!1 1 1CO 10:1!10 1 1CO 10:1!11 1 1CO 10:1!12 1 1CO 10:1!13 1 Feature: roleclausedistance value frequency 0 56129 1 37597 2 22297 3 12084 4 5277 Feature: sentence value frequency 3 1103 4 960 1 810 5 747 6 680 Feature: sp value frequency noun 28455 verb 28357 det 19786 conj 18227 pron 16177 Feature: sp_full value frequency Noun 28455 Verb 28357 Determiner 19786 Conjunction 18227 Pronoun 16177 Feature: strongs value frequency 3588 19783 2532 8978 846 5561 4771 2892 1161 2787 Feature: subj_ref value frequency 121204 n46003022002 172 n66001009002 131 n45001001001 104 n47010001004 104 Feature: tense value frequency 109422 aorist 11803 present 11579 imperfect 1689 future 1626 Feature: type value frequency 93321 common 23644 personal 11521 proper 4639 demonstrative 1722 Feature: unicode value frequency καὶ 8541 ὁ 2768 ἐν 2683 δὲ 2619 τοῦ 2497 Feature: verse value frequency 10 5180 12 5177 1 5064 9 5064 4 5024 Feature: voice value frequency 109422 active 20742 passive 3493 middle 2408 middlepassive 1714 Feature: wgclass value frequency np 33710 cl 30857 cl* 16378 12760 pp 11169 Feature: wglevel value frequency 5 16862 4 16527 6 15520 7 12163 3 10447 Feature: wgnum value frequency 1 27 2 27 3 27 4 27 5 27 Feature: wgrole value frequency 77251 adv 16710 o 9329 s 6710 p 1770 Feature: wgrolelong value frequency 77280 Adverbial 16710 Object 9329 Subject 6710 Predicate 1770 Feature: wgrule value frequency 22718 DetNP 15696 PrepNp 11044 NPofNP 6819 Conj-CL 5571 Feature: wgtype value frequency 100949 group 9699 apposition 2799 Feature: word value frequency καὶ 8545 ὁ 2769 ἐν 2684 δὲ 2620 τοῦ 2497 Feature: wordlevel value frequency 6 21857 7 20984 5 20538 8 16755 9 12772 Feature: wordrole value frequency adv 41598 v 25817 s 22908 o 21929 9347 Feature: wordrolelong value frequency Adverbial 41598 Verbal 25817 Subject 22908 Object 21929 9347 Feature: wordtranslit value frequency kai 8576 en 3152 o 3149 to 2885 de 2769 Feature: wordunacc value frequency και 8576 ο 3019 δε 2764 εν 2752 του 2497
Make a list of punctuations with their Unicode values. Here, the function used is for printing markdown-formatted strings, although the desired result has not yet been achieved.
result = F.after.freqList()
N1904.dm(" String | Unicode | Frequency\n--- | --- | ---")
for (string, freq) in result:
# important: string does contain two characters in case of punctuations
frequency=str(freq) #convert it to a string
unicode_value = str(ord(string[0])) #convert it to a string
N1904.dm(" `{}` | {} | {} ".format(string[0],unicode_value,frequency))
String | Unicode | Frequency |
---|---|---|
| 32 | 119272
,
| 44 | 9441
.
| 46 | 5712
·
| 183 | 2355
;
| 59 | 969
—
| 8212 | 30
The node number ranges are readily available by calling F.otype.all
which returns a list of all node types.
for NodeType in F.otype.all:
print (NodeType, F.otype.sInterval(NodeType))
book (137780, 137806) chapter (137807, 138066) verse (146078, 154020) sentence (138067, 146077) wg (154021, 268899) word (1, 137779)
Using the same API call, we can produce also another list where we are counting the number of nodes for each type.
for otype in F.otype.all:
i = 0
for n in F.otype.s(otype):
i += 1
print ("{:>7} {}s".format(i, otype))
27 books 260 chapters 7943 verses 8011 sentences 114879 wgs 137779 words
N1904.showProvenance(...)
This can be usefull if you want to process all feature in a script.
# Just print the structured tuple returned by the function call
FeatureName='word'
MetaData=Fs(FeatureName).meta
print (MetaData)
{'Availability': 'Creative Commons Attribution 4.0 International (CC BY 4.0)', 'Converter_author': 'Tony Jurg, ReMa Student Vrije Universiteit Amsterdam, Netherlands', 'Converter_execution': 'Tony Jurg, ReMa Student Vrije Universiteit Amsterdam, Netherlands', 'Converter_version': '0.3', 'Convertor_source': 'https://github.com/tonyjurg/Nestle1904LFT/tree/main/tools', 'Data source': 'MACULA Greek Linguistic Datasets, available at https://github.com/Clear-Bible/macula-greek/tree/main/Nestle1904/lowfat', 'Editors': 'Eberhard Nestle', 'Name': 'Greek New Testament (Nestle 1904 based on Low Fat Tree)', 'TextFabric version': '11.4.10', 'description': 'Word as it appears in the text (excl. punctuations)', 'valueType': 'str', 'writtenBy': 'Text-Fabric', 'dateWritten': '2023-06-19T15:13:46Z'}
Now do some very basic calculation with the data:
print ('feature ',FeatureName, end='')
if MetaData['valueType']=='str':
print (' is of type str.')
else:
print (' is not of type str.')
feature word is of type str.
origText=T.text(node,fmt='text-orig-full')
critText=T.text(node,fmt='text-critical-signs')
'fmt:text-orig-full': '{word}{after}',
'fmt:text-normalized': '{normalized}{after}',
'fmt:text-unaccented': '{wordunacc}{after}',
'fmt:text-transliterated':'{wordtranslit}{after}',
'fmt:text-critical':