The Louw-Nida classification, created by Greek linguists Johannes P. Louw and Eugene A. Nida, organizes New Testament Greek words into 93 semantic domains based on meaning rather than traditional lexical forms.1 Unlike standard lexicons, it groups words by related meanings, such as emotions or social relationships, providing a context-focused framework. This approach aids New Testament scholars by determining subtle distinctions in meaning. In our N1904-TF dataset, the Louw-Nida codes are found in the ln
feature.
%load_ext autoreload
%autoreload 2
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use
# load the N1904 app and data
N1904 = use ("centerblc/N1904", version="1.0.0", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots / node | % coverage |
---|---|---|---|
book | 27 | 5102.93 | 100 |
chapter | 260 | 529.92 | 100 |
verse | 7944 | 17.34 | 100 |
sentence | 8011 | 17.20 | 100 |
group | 8945 | 7.01 | 46 |
clause | 42506 | 8.36 | 258 |
wg | 106868 | 6.88 | 533 |
phrase | 69007 | 1.90 | 95 |
subphrase | 116178 | 1.60 | 135 |
word | 137779 | 1.00 | 100 |
3
centerblc/N1904
C:/Users/tonyj/text-fabric-data/github/centerblc/N1904/app
gdb630837ae89b9468c9e50d13bda05cfd3de4f18
''
[]
none
unknown
NA
:
text-orig-full
https://github.com/CenterBLC/N1904/tree/main/docs
about
https://github.com/CenterBLC/N1904
https://github.com/CenterBLC/N1904/blob/main/docs/features/<feature>.md
README
text-orig-full
}True
local
C:/Users/tonyj/text-fabric-data/github/centerblc/N1904/_temp
main
Nestle 1904 Greek New Testament
10.5281/zenodo.13117910
[]
centerblc
/tf
N1904
N1904
1.0.0
https://learner.bible/text/show_text/nestle1904/
Show this on the website
en
https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
{webBase}/word?version={version}&id=<lid>
1.0.0
True
{typ} {function} {rela} \\ {cls} {role} {junction}
''
{typ} {function} {rela} \\ {typems} {role} {rule}
''
True
{typ} {function} {rela} \\ {typems} {role} {rule}
''
{typ} {function} {rela} \\ {role} {rule}
''
{typ} {function} {rela} \\ {typems} {role} {rule}
''
True
{book} {chapter}:{verse}
''
True
{typems} {role} {rule} {junction}
''
lemma
sp
gloss
]grc
Display is setup for viewtype syntax-view
See here for more information on viewtypes
# The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer)
N1904.dh(N1904.getCss())
This section explores this issue step by step some important aspects of how the Louw-Nida semantic classification is implemented in the N1904-TF dataset.
When creating a Text-Fabric query template, it’s essential to recognize that each word node in the N1904-TF dataset may have zero, one, or multiple Louw-Nida classifications, as represented by the ln
feature
To illustrate this impact on query construction, let’s begin with a straightforward query pattern. Although this query retrieves instances of a specific Louw-Nida classification, as we’ll observe, it does not capture all occurrences.
# define query template
findLnValueEqual ='''
phrase
word ln=49.5
'''
equalSearch=N1904.search(findLnValueEqual)
# print the results
for phrase, node in equalSearch:
print (f'word node:{node}, ln feature:{F.ln.v(node)}')
0.13s 8 results word node:129141, ln feature:49.5 word node:129643, ln feature:49.5 word node:130801, ln feature:49.5 word node:130897, ln feature:49.5 word node:130897, ln feature:49.5 word node:134806, ln feature:49.5 word node:135561, ln feature:49.5 word node:135957, ln feature:49.5
The next step is to account for the possibility that each word node may have multiple Louw-Nida classifications attached. Using a regular expression can help here, but it requires careful handling. Replacing =
with ~
does increase the number of matches; however, it also introduces unintended results, as we’ll see when we examine the output.
# define query template
findLnValueRegexp ='''
phrase
word ln~49.5
'''
regexpSearch=N1904.search(findLnValueRegexp)
# print the results
for phrase, node in regexpSearch:
print (f'word node:{node}, ln feature:{F.ln.v(node)}')
0.12s 18 results word node:2890, ln feature:23.149 57.108 88.110 word node:3037, ln feature:49.3 49.5 word node:3057, ln feature:49.3 49.5 word node:39180, ln feature:23.149 57.108 88.110 word node:40001, ln feature:49.3 49.5 word node:40021, ln feature:49.3 49.5 word node:47551, ln feature:49.3 49.5 word node:62727, ln feature:49.3 49.5 word node:91884, ln feature:37.49 56.30 word node:91889, ln feature:37.49 56.30 word node:129141, ln feature:49.5 word node:129643, ln feature:49.5 word node:130801, ln feature:49.5 word node:130897, ln feature:49.5 word node:130897, ln feature:49.5 word node:134806, ln feature:49.5 word node:135561, ln feature:49.5 word node:135957, ln feature:49.5
In the above results it is clear that the following are 'false positives':
word node:2890, ln feature:23.149 57.108 88.110
word node:39180, ln feature:23.149 57.108 88.110
word node:91884, ln feature:37.49 56.30
word node:91889, ln feature:37.49 56.30
To retrieve only the desired results, the regular expression (regex) needs adjustment. The expression \b49[.]5\b
is designed to match the exact pattern 49.5
as a standalone sequence. Key components of this regex are:
\b
as a word boundary anchor, ensuring 49.5
appears as an independent sequence.[.]
to denote the period (.) as a literal character, since without brackets, a plain dot would match any character.Another key aspect of the query template is the r
prefix before the Text-Fabric query template containing the regex, which signals Python to treat the string as a raw string. This prevents Python from modifying the regex, which would otherwise happen in some cases. . Omitting this r
in environments like Jupyter Notebook can lead to no matches (like for this query) or may trigger warnings, such as SyntaxWarning: invalid escape sequence
, in other cases (e.g., when using \s
to designate a space in the regex).
See also Text-Fabric's manual on the use of regular expressions.
# define query template
# The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex.
findUpdatedLnValueRegex =r'''
phrase
word ln~\b49[.]5\b
'''
updatedRegexSearch=N1904.search(findUpdatedLnValueRegex)
# print the results
for phrase, node in updatedRegexSearch:
print (f'word node:{node}, ln feature:{F.ln.v(node)}')
0.14s 14 results word node:3037, ln feature:49.3 49.5 word node:3057, ln feature:49.3 49.5 word node:40001, ln feature:49.3 49.5 word node:40021, ln feature:49.3 49.5 word node:47551, ln feature:49.3 49.5 word node:62727, ln feature:49.3 49.5 word node:129141, ln feature:49.5 word node:129643, ln feature:49.5 word node:130801, ln feature:49.5 word node:130897, ln feature:49.5 word node:130897, ln feature:49.5 word node:134806, ln feature:49.5 word node:135561, ln feature:49.5 word node:135957, ln feature:49.5
As demonstrated above, the number of Louw-Nida clasifications per word may differ. This script counts how many words in the Greek New Testament have a specific number of Louw-Nida (LN) classifications attached, providing some sense on its magnitude.
The following script determines for each word node the count of LN categories (or zero if none) and tallies how often each count occurs. The results are displayed in a table, showing the number of words associated with each unique LN classification count.
from collections import defaultdict
import pandas as pd
from IPython.display import display
# Initialize a dictionary to count the number of Catagories per word node
lnCategoryCounts = defaultdict(int)
# Iterate over each word in the Greek New Testament
for word in F.otype.s("word"):
ln = F.ln.v(word) # Retrieve the Louw-Nida classification for the word
if ln:
# Count the number of Louw-Nida categories for this word
numLNs = len(ln.split())
else:
numLNs = 0 # For words with no Louw-Nida classification
# Increment the count for this LN category count
lnCategoryCounts[numLNs] += 1
# Convert to a DataFrame for easier display as a table
lnCategoryCountsDf = pd.DataFrame.from_dict(lnCategoryCounts, orient='index', columns=['Count'])
lnCategoryCountsDf.index.name = 'Number of LN categories'
lnCategoryCountsDf = lnCategoryCountsDf.sort_index() # Sort by the number of LNs
# Display the table
display(lnCategoryCountsDf)
Count | |
---|---|
Number of LN categories | |
0 | 11023 |
1 | 124380 |
2 | 2252 |
3 | 114 |
4 | 8 |
5 | 1 |
8 | 1 |
In this section, we will delve deeper into cases with multiple Louw-Nida classifications, focusing on their distribution across top-level categories (i.e., the part before the dot).
The following script counts occurrences of top-level Louw-Nida classifications for each word in the Greek New Testament, tracking both countAllItems
(total occurrences of each classification) and countFirstItem
(occurrences based only on the first classification listed for each word). The delta
column highlights the difference between these counts, helping to identify top-level values with the most significant discrepancies for lemmas with multiple classifications.
from collections import defaultdict
import pandas as pd
from IPython.display import display
# Initialize a dictionary to hold counts and delta for each Louw-Nida top-level classification
lnCounts = defaultdict(lambda: {'countAllItems': 0, 'countFirstItem': 0, 'delta': 0})
# Iterate over each word in the Greek New Testament
for word in F.otype.s("word"):
ln = F.ln.v(word) # Retrieve the Louw-Nida classification for the word
if ln: # Check if there is a valid Louw-Nida classification
# Split the ln value into it's individual codes
codes = ln.split()
# Count all items: increment for each top-level value in codes
for code in codes:
topLevelValue = int(code.split('.')[0])
lnCounts[topLevelValue]['countAllItems'] += 1
# Count only the first item: increment for the first top-level value
firstTopLevelValue = int(codes[0].split('.')[0])
lnCounts[firstTopLevelValue]['countFirstItem'] += 1
# Calculate delta for each top-level classification
for topLevelValue, counts in lnCounts.items():
counts['delta'] = counts['countAllItems'] - counts['countFirstItem']
# Convert to a DataFrame
lnCountsDf = pd.DataFrame.from_dict(lnCounts, orient='index')
lnCountsDf.index.name = 'Louw-Nida top-level'
lnCountsDf.columns = ['count all items', 'count only first item', 'delta']
# Sort by 'delta' in descending order
lnCountsDf = lnCountsDf.sort_values(by='delta', ascending=False)
# Display the sorted table
display(lnCountsDf)
count all items | count only first item | delta | |
---|---|---|---|
Louw-Nida top-level | |||
89 | 15532 | 15242 | 290 |
90 | 3675 | 3429 | 246 |
88 | 2183 | 2027 | 156 |
33 | 8000 | 7848 | 152 |
93 | 4705 | 4605 | 100 |
... | ... | ... | ... |
19 | 218 | 218 | 0 |
70 | 79 | 79 | 0 |
40 | 99 | 99 | 0 |
45 | 54 | 54 | 0 |
22 | 277 | 277 | 0 |
93 rows × 3 columns
Note also that the feature domain has multiple values. This is expected since this feature is to some extent equivalent to a numerical representation of feature 'ln' and can be decoded using the following method. Take for example feature 'domain' has a value of '089007'. The 6-digit value '089007' first need to be split into two 3-digit parts: '087' and '007'. The second part should be interpreted as a alphabetic (A=1, B=2, C=3, D=4, E=5, ..., Z=26). Taken the two parts together, this will result in '89G', which points to an entry in Louw-Nida. For this example (i.e. 89G) this maps to main section 'Relations' and subsection 'Cause and/or Reason'.
It is important to realize that the granularity of feature 'domain' is less than that of feature 'ln'. Consider for example the Greek word ἀρχή in John 1:1. According to Louw-Nida Lexicon this can map to either a:beginning (aspect)=>68.1 or b:beginning (time)=>67.65. In Text-Fabric one value is attached to feature 'domain', which is '067003'. Using the above explained method, this breaks down to '067' and '003' where the last part refers to section 'C', which is actualy a range (67.65-67.72) within Louw Nida's classification.
It can easily be checked that feature domain
also has multiple values associated for certain word nodes. For example, the feature values for node 39180 (taken from one of the previous examples) are displayed as follows:
print (f'word node:39180, ln feature:{F.ln.v(39180)}, domain feature:{F.domain.v(39180)}')
word node:39180, ln feature:23.149 57.108 88.110, domain feature:023009 057008 088015
Therefore, Text-Fabric queries on the 'domain' feature should use a similar regular expression pattern as previously discussed:
# define query template
# The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex.
findDomainRegex =r'''
phrase
word domain~\b088015\b
'''
domainResults=N1904.search(findDomainRegex)
# print the first 10 results
printCounter=0
for phrase, node in domainResults:
print (f'word node:{node}, domain feature:{F.domain.v(node)}')
printCounter+=1
if printCounter==10:break
0.13s 243 results word node:1756, domain feature:088015 word node:2307, domain feature:088015 word node:2326, domain feature:088015 word node:2419, domain feature:088015 word node:2890, domain feature:023009 057008 088015 word node:2905, domain feature:088015 word node:2908, domain feature:088015 word node:3282, domain feature:088015 word node:3386, domain feature:088015 word node:3386, domain feature:088015
2 ideas to work out...
Now that we have established the foundation for using the LN feature, we will explore one example where we would like to gather all New Testament references to the concept 'light' (the physical in the sense of the opposite of darkness).
Selecting the proper range of LN classificions depend on the actual research question. For the sake of demonstration we use the following range:
Top-level domain: 14 (Physical Events and States)
Sub-level: F (Light) ranging from 14.36-52
In order to create a list of word nodes, we need to translate this range (14.36-52) into the following regex: \b14[.](3[6-9]|4[0-9]|5[0-2])\b
. In this the \b
matches a word boundary. The parentheses ( ... )
serve to group the alternations within it separated by |
. The three sub-paterns, like 3[6-9]
, match a range of numbers (in this case 36-39). This grouping allows the regular expression to evaluate the enclosed pattern as a single unit, enabling multiple possible matches to be checked in sequence.
# define query template
# The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex.
lightQuery =r'''
phrase
word ln~\b14[.](3[6-9]|4[0-9]|5[0-2])\b
'''
lightSearch=N1904.search(lightQuery)
# print the first 5 results
printCounter=0
for phrase, node in lightSearch:
print (f'word node:{node}, ln feature:{F.ln.v(node)}')
printCounter+=1
if printCounter==5:break
0.12s 195 results word node:1458, ln feature:14.36 word node:1469, ln feature:14.36 word node:1470, ln feature:14.41 word node:1810, ln feature:14.36 word node:1834, ln feature:14.37
The following script produces a table showing which lemmas are associated with this LN range. Note this script depends on the output from the previous query in 4.2.
import pandas as pd
from collections import defaultdict
from IPython.display import display
# Initialize a dictionary to store counts for each lemma and ln classification
lemmaLnCounts = defaultdict(int)
# Populate the dictionary with counts from the search results
for phrase, node in lightSearch:
ln = F.ln.v(node) # Retrieve LN classification for the word node
lemma = F.lemma.v(node) # Retrieve lemma for the word node
gloss = F.gloss.v(node) # Retrieve english gloss for the word node
if ln and lemma:
lemmaLnCounts[(lemma, ln, gloss)] += 1
# Create the DataFrame without a default index
df = pd.DataFrame(
[{'lemma': lemma, 'ln': ln, 'gloss': gloss, 'count': count} for (lemma, ln, gloss), count in lemmaLnCounts.items()]
).sort_values(by='count', ascending=False)
# Display the DataFrame without the index
display(df.style.hide(axis="index"))
lemma | ln | gloss | count |
---|---|---|---|
φῶς | 14.36 | light, source of light | 65 |
δόξα | 14.49 79.18 | glory, splendor, brightness | 20 |
δόξα | 14.49 | glory, splendor, brightness | 17 |
ἡμέρα | 14.40 | day | 14 |
φαίνω | 14.37 | shine, (mid.) appear, become visible | 12 |
φωτίζω | 14.39 | illuminate, bring to light | 7 |
λάμπω | 14.37 | shine, shine out | 7 |
φωτεινός | 14.51 | bright, luminous, full of light, shining | 4 |
λαμπρός | 14.50 | shining, magnificent, bright | 4 |
ἀνατέλλω | 14.41 | make to rise, rise, shine, rise up | 3 |
ἀστράπτω | 14.47 | flash, am lustrous | 3 |
λευκός | 14.50 | white, bright | 3 |
λαμπρός | 14.50 79.20 | shining, magnificent, bright | 3 |
στίλβω | 14.47 | shine, glisten, flash | 2 |
ἐπιφαίνω | 14.39 | appear, shine upon | 2 |
ὕψος | 14.42 | height, heaven | 2 |
ἐπιφώσκω | 14.41 | dawn, am near commencing | 2 |
φέγγος | 14.36 | brightness, light | 2 |
διαυγάζω | 14.43 | shine through, dawn | 2 |
ἐπιφαύσκω | 14.39 | shine upon, give light to | 2 |
φωτίζω | 14.39 28.36 | illuminate, bring to light | 2 |
περιλάμπω | 14.44 | shine around | 2 |
ἀνατολή | 14.42 | rising of sun, East, rising | 2 |
περιαστράπτω | 14.45 | flash around like lightning | 2 |
ἀπαύγασμα | 14.48 | light flashing forth, radiation, gleam | 2 |
ἐκλάμπω | 14.38 | shine forth | 1 |
ἐξαστράπτω | 14.47 | flash forth like lightning | 1 |
φωτεινός | 14.50 | bright, luminous, full of light, shining | 1 |
ἀστραπή | 14.46 | flash of lightning, brightness, luster, lightning | 1 |
λαμπρότης | 14.49 | splendor, brightness | 1 |
ἐπιφαίνω | 14.39 24.21 | appear, shine upon | 1 |
κατοπτρίζω | 14.52 24.44 | mirror, reflect | 1 |
φῶς | 11.14 14.36 | light, source of light | 1 |
φωστήρ | 14.49 | light, brilliancy | 1 |
This script counts the number of occurrences from the search results for occurences of 'light-related' lemmas for each book and displays the totals in a sorted table. Note this script depends on the output from the previous query in 4.2.
import pandas as pd
from collections import defaultdict
# Initialize a dictionary to store counts for each book
bookCounts = defaultdict(int)
# Populate the dictionary with counts from the search results
for phrase, node in lightSearch:
book = F.book.v(node) # Retrieve book name
if book:
bookCounts[book] += 1
# Convert dictionary to a DataFrame
bookTotals = pd.DataFrame(list(bookCounts.items()), columns=['Book', 'Total Count'])
# Sort by total count in descending order
bookTotals = bookTotals.sort_values(by='Total Count', ascending=False).reset_index(drop=True)
# Display the totals per book as a nice-looking table
display(bookTotals)
Book | Total Count | |
---|---|---|
0 | Luke | 30 |
1 | John | 26 |
2 | Acts | 24 |
3 | Matthew | 20 |
4 | Revelation | 17 |
5 | II_Corinthians | 12 |
6 | Hebrews | 9 |
7 | I_Corinthians | 8 |
8 | I_John | 7 |
9 | II_Peter | 7 |
10 | Ephesians | 7 |
11 | Romans | 7 |
12 | Mark | 6 |
13 | James | 4 |
14 | I_Thessalonians | 3 |
15 | Philippians | 2 |
16 | I_Peter | 2 |
17 | I_Timothy | 2 |
18 | Colossians | 1 |
19 | II_Thessalonians | 1 |
It is also possible to dig a litle bit deeper and create an interactive plot which also shows the number of occurences of individual lemmas within each book. Note this script depends on the output from the previous query in 4.2.
import pandas as pd
from collections import defaultdict
from math import pi, cos, sin
from bokeh.io import show, output_notebook, save
from bokeh.plotting import figure
from bokeh.models import HoverTool, Label
from bokeh.palettes import Category20
from bokeh.resources import INLINE
# Enable Bokeh output for Jupyter Notebook and JupyterLab
output_notebook(resources=INLINE)
# Initialize a dictionary to store counts for each lemma, LN classification, gloss, and book
lemmaLnCounts = defaultdict(int)
# Populate the dictionary with counts from the search results
for phrase, node in lightSearch:
ln = F.ln.v(node) # Retrieve LN classification for the word node
lemma = F.lemma.v(node) # Retrieve lemma for the word node
gloss = F.gloss.v(node) # Retrieve English gloss for the word node
book = F.book.v(node) # Retrieve book name
if ln and lemma:
lemmaLnCounts[(lemma, ln, gloss, book)] += 1
# Convert dictionary to a DataFrame
df = pd.DataFrame(
[(lemma, ln, gloss, book, count) for (lemma, ln, gloss, book), count in lemmaLnCounts.items()],
columns=['lemma', 'ln', 'gloss', 'book', 'count']
)
# Group by book and aggregate lemma information per book
bookLemmaCounts = df.groupby(['book', 'lemma']).agg({'count': 'sum'}).reset_index()
# Sort lemmas by count in descending order within each book
bookLemmaCounts = bookLemmaCounts.sort_values(by=['book', 'count'], ascending=[True, False])
# Aggregate total counts per book for the pie chart and sort by total count in descending order
bookCounts = bookLemmaCounts.groupby('book')['count'].sum().reset_index()
bookCounts = bookCounts.sort_values(by='count', ascending=False).reset_index(drop=True)
# Create a custom column to hold sorted lemma breakdown in descending order for each book
tooltipData = bookLemmaCounts.groupby('book')[['lemma', 'count']].apply(
lambda group: '\n'.join(f"{row['lemma']}: {row['count']}" for _, row in group.iterrows())
).reset_index(name='lemmaInfo')
# Merge sorted lemma information with book counts
bookCounts = bookCounts.merge(tooltipData, on='book', how='left')
# Total count and cumulative angle calculation
totalCount = bookCounts['count'].sum()
bookCounts['angle'] = bookCounts['count'] / totalCount * 2 * pi
# Sequential start and end angle calculation
angles = [0] + list(bookCounts['angle'].cumsum())
bookCounts['startAngle'] = angles[:-1]
bookCounts['endAngle'] = angles[1:]
# Assign colors from Category20 palette
colors = Category20[len(bookCounts)] if len(bookCounts) <= 20 else Category20[20]
bookCounts['color'] = colors[:len(bookCounts)]
# Initialize Bokeh figure with increased size and adjusted y-axis range for offset
pieChart = figure(height=800, width=1000, title="Distribution of 'light-related' lemmas per book",
toolbar_location=None, tools="")
# Draw each wedge separately with a smaller radius and set individual hover data
for i, row in bookCounts.iterrows():
wedge = pieChart.wedge(x=0, y=0.5, radius=0.5, # Reduced radius to make the pie chart smaller
start_angle=row['startAngle'], end_angle=row['endAngle'],
line_color="white", fill_color=row['color'])
# Add a custom hover tool for each wedge with book-specific info in a multi-line format
hover = HoverTool(renderers=[wedge],
tooltips=[
("Book", row['book']),
("Total Count", f"{row['count']}"),
("Lemmas", row['lemmaInfo'])
])
pieChart.add_tools(hover)
# Calculate label position closer to the pie chart at a fixed distance
angle = (row['startAngle'] + row['endAngle']) / 2
labelX = 0.7 * cos(angle) # Position labels at 0.7 distance from the center
labelY = 0.5 + (0.7 * sin(angle)) # Adjust y-position by 0.5 to account for shifted chart center
# Determine alignment for the left half
alignment = "left" if -pi/2 < angle < pi/2 else "right"
# Fine-tune label alignment on the left side
if pi/2 < angle < 3 * pi / 2:
labelX += 0.05 # Move labels on the left side slightly to the right
# Draw connector line from pie segment edge to the center of the label
lineEndX = labelX - 0.02 if alignment == "right" else labelX + 0.02 # Adjust line end to center of label
lineX = [0.5 * cos(angle), lineEndX]
lineY = [0.5 + (0.5 * sin(angle)), labelY]
pieChart.line(x=lineX, y=lineY, line_width=1, color=row['color'])
# Set label alignment for book name
label = Label(x=labelX, y=labelY, text=row['book'], text_align=alignment, text_baseline="middle", text_font_size="10pt")
pieChart.add_layout(label)
# Position total count within the wedge
countX = 0.35 * cos(angle) # Position count text within the wedge, closer to the center
countY = 0.5 + (0.35 * sin(angle)) # Adjust y-position by 0.5 to account for shifted chart center
countLabel = Label(x=countX, y=countY, text=str(row['count']), text_align="center", text_baseline="middle", text_font_size="9pt", text_color="black")
pieChart.add_layout(countLabel)
# Adjust grid/axis settings
pieChart.axis.axis_label = None
pieChart.axis.visible = False
pieChart.grid.grid_line_color = None
# Show plot in notebook
show(pieChart)
After executing the previous cell the object pieChart
whas created represening the pie-chart. The following cell will create a download button allowing will call a function to base64-encode the data to allow it to be downloaded as an interactive HTML file for offline usage.
from IPython.display import HTML
import base64 # used to encode the data to be downloaded
from bokeh.embed import file_html
from bokeh.resources import CDN
def createDownloadLink(htmlContent, fileName, documentTitle, buttonText):
# Convert plot to HTML string
htmlContent = file_html(htmlContent, CDN, documentTitle)
# Encode the HTML content to base64
b64Html = base64.b64encode(htmlContent.encode()).decode()
# Create the HTML download link
downloadLink = f'''
<a download="{fileName}" href="data:text/html;base64,{b64Html}">
<button>{buttonText}</button>
</a>
'''
return HTML(downloadLink)
# Display the download link in the notebook
createDownloadLink(pieChart, 'light_related_lemmas_per_book.html', 'Distribution of \'light-related\' lemmas per book', 'Download pie-chart')
This section delves into cases where multiple Louw-Nida semantic classification codes are associated with a single word node. We will approach this exploration step by step.
The first step is performed by the following script, which is designed to analyze and count occurrences of lemmas according to their associated Louw-Nida semantic classification codes. By collecting frequency data for each lemma, particularly where multiple LN codes are assigned, the script supports more detailed linguistic analysis of word usage patterns across semantic categories.
At a high level, the script iterates through each word in the Text-Fabric dataset, extracting its lemma and any associated LN codes. It then counts these codes for each lemma, storing the results in a structured format that captures both total counts and per-code counts. The final output is a dictionary where each lemma is mapped to its cumulative count and individual LN code frequencies, providing a foundation for further exploration of lexical patterns. Detailed operations are commented directly within the script for clarity.
from collections import defaultdict, Counter
# Initialize a dictionary to store lemma information with counts per ln code
lemmaInfoDict = defaultdict(lambda: {"totalCount": 0, "lnCounts": Counter()})
# Iterate over each word in the Greek New Testament
for word in F.otype.s("word"):
lemma = F.lemma.v(word) # get the lemma associated with the current word
ln = F.ln.v(word)
if ln is not None:
# Split multiple `ln` codes and only process if there are two or more codes
lnCodes = ln.split()
if len(lnCodes) >= 2:
# Split multiple `ln` codes and count each one for the lemma
lnCodes = ln.split()
lemmaInfoDict[lemma]["lnCounts"].update(lnCodes)
# Increment totalCount by the number of `ln` codes found
lemmaInfoDict[lemma]["totalCount"] += len(lnCodes)
# Format the result as needed
result = {
lemma: (
info["totalCount"],
dict(info["lnCounts"]) # convert Counter to regular dictionary for readability
)
for lemma, info in lemmaInfoDict.items()
}
To examine the created dictionairy, you can run the following cell by removing the hash:
# print(result)
In this section we generate an interactive heatmap that visualizes the frequency of lemmas in the Greek New Testament, categorized by their top-level Louw-Nida (LN) semantic classification codes. Classifications are grouped based on their top-level category (i.e., the part of the LN code found before the dot) to create an insightful plot. Additional information is available for each data point, shown when hovering over it.
# The following script will produce a dictionairy of Louw-Nida Top-level domains
# The structure of the dictionairy is:
# louwNidaMapping = {
# numeric (key) : "description"
# ...
# }
import requests
from bs4 import BeautifulSoup
# Debugging mode (False, True)
debug = False
# URL of the Louw-Nida classifications page
url = "https://www.laparola.net/greco/louwnida.php"
# Retrieve the webpage content
response = requests.get(url)
if debug:
print(f"Retrieving URL {url} returned {response}")
response.raise_for_status() # Check for request errors
# Parse the HTML content with BeautifulSoup
soup = BeautifulSoup(response.text, "html.parser")
# Initialize an empty dictionary
louwNidaMapping = {}
# Find all <h3> elements that contain the Louw-Nida classification data
for entry in soup.find_all("h3"):
# Extract the number from the <a> tag within the <h3> tag
numberTag = entry.find("a")
descriptionText = entry.get_text(strip=True)
# Ensure there's content to process
if numberTag and descriptionText:
# Attempt to parse the number and description
keyText = numberTag.get_text(strip=True)
try:
# Convert the number to an integer
key = int(keyText)
except ValueError:
# If conversion fails, skip this entry
if debug:
print(f"Skipping entry due to non-numeric key: {keyText}")
continue
# Get description by removing the number portion from the full text
description = descriptionText.replace(keyText, "", 1).strip(' "')
# Add to dictionary
louwNidaMapping[key] = description
if debug:
print(f"Added classification: {key}: {description}")
if debug:
print(f"Resulting dictionary: {louwNidaMapping}")
The following script first prepares the data by extracting each lemma’s occurrence count and its top-level LN code, then organized it into a structured format. The counts are normalized within each lemma, allowing for a relative comparison across semantic categories. Using Bokeh, the script plots lemmas along the y-axis and LN codes along the x-axis, with color intensity representing normalized frequency, accompanied by a color bar for reference.
import pandas as pd
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, ColorBar
from bokeh.transform import linear_cmap
from bokeh.resources import INLINE
from bokeh.palettes import Viridis256
# Enable Bokeh output for Jupyter Notebook and JupyterLab
output_notebook(resources=INLINE)
# Prepare data dictionary for DataFrame creation
data = {
'lemma': [],
'topLevelLnCode': [],
'count': []
}
# Populate data dictionary with lemma, top-level LN code, and count
for lemma, (totalCount, lnDict) in result.items():
for lnCode, count in lnDict.items():
topLevelCode = int(lnCode.split('.')[0]) # Extract top-level LN code as integer
data['lemma'].append(lemma.lower()) # Convert lemma to lowercase
data['topLevelLnCode'].append(topLevelCode)
data['count'].append(count)
# Create DataFrame and normalize counts per lemma
df = pd.DataFrame(data)
dfGrouped = df.groupby(['lemma', 'topLevelLnCode'], as_index=False)['count'].sum()
dfGrouped['normalizedCount'] = dfGrouped.groupby('lemma')['count'].transform(lambda x: x / x.sum())
# Map top-level LN codes to descriptions
dfGrouped['louwNidaDescription'] = dfGrouped['topLevelLnCode'].map(louwNidaMapping)
# Sort lemmas alphabetically and LN codes numerically for plot arrangement
dfGrouped['lemma'] = pd.Categorical(dfGrouped['lemma'], categories=sorted(dfGrouped['lemma'].unique()))
dfGrouped['topLevelLnCode'] = pd.Categorical(dfGrouped['topLevelLnCode'], ordered=True, categories=sorted(set(dfGrouped['topLevelLnCode']), key=int))
# Set plot height dynamically based on the number of unique lemmas
numLemmas = len(dfGrouped['lemma'].unique())
plotHeight = 20 * numLemmas # Adjust multiplier for spacing
# Prepare data source for plotting
sortedLnCodes = [str(code) for code in sorted(dfGrouped['topLevelLnCode'].cat.categories)]
source = ColumnDataSource(dfGrouped)
# Initialize Bokeh figure with dynamic height and increased width
lnHeatmap = figure(
width=1250, height=plotHeight,
title="Per-Lemma Normalized Heatmap by Top-Level LN Code",
x_range=sortedLnCodes, # Sorted x-axis labels
y_range=sorted(dfGrouped['lemma'].unique(), reverse=True),
tools="hover", tooltips=[("Lemma", "@lemma"), ("LN Code", "@topLevelLnCode"), ("LN Description", "@louwNidaDescription"), ("Fraction", "@normalizedCount{0.00}")],
)
# Rotate x-axis labels for readability
lnHeatmap.xaxis.major_label_orientation = 1.57 # 90 degrees in radians
# Define color mapper for heatmap
mapper = linear_cmap(field_name='normalizedCount', palette=Viridis256, low=0, high=1)
# Add heatmap rectangles to the plot
lnHeatmap.rect(x="topLevelLnCode", y="lemma", width=1, height=1, source=source, fill_color=mapper, line_color=None)
# Add color bar for normalized count reference
colorBar = ColorBar(color_mapper=mapper['transform'], width=8, location=(0, 0))
lnHeatmap.add_layout(colorBar, 'right')
# Display the plot
show(lnHeatmap)
After executing the previous cell which displays the heatmap, executing the following cell will create a download button allowing the heatmap to be downloaded as an interactive HTML file for offline usage.
from IPython.display import HTML
import base64 # used to encode the data to be downloaded
from bokeh.embed import file_html
from bokeh.resources import CDN
def createDownloadLink(htmlContent, fileName, documentTitle, buttonText):
# Convert plot to HTML string
htmlContent = file_html(htmlContent, CDN, documentTitle)
# Encode the HTML content to base64
b64Html = base64.b64encode(htmlContent.encode()).decode()
# Create the HTML download link
downloadLink = f'''
<a download="{fileName}" href="data:text/html;base64,{b64Html}">
<button>{buttonText}</button>
</a>
'''
return HTML(downloadLink)
# Display the download link in the notebook
createDownloadLink(lnHeatmap, 'louw_nida_heatmap.html', 'Interactive Louw-Nida Heatmap', 'Download Louw-Nida heatmap')
1 Johannes P. Louw and Eugene Albert Nida, Greek-English Lexicon of the New Testament: Based on Semantic Domains (New York: United Bible Societies, 1996).
2 The dictionary is created from data available on Louw-Nida Lexicon @ laparola.net.
The scripts in this notebook require (beside text-fabric
) the following Python libraries to be installed in the environment:
base64
bokeh
bs4
collections
IPython
pandas
requests
You can install any missing library from within Jupyter Notebook using eitherpip
or pip3
.
Author | Tony Jurg |
Version | 1.1 |
Date | 14 November 2024 |
The following cell displays the active Anaconda environment along with a list of all installed packages and their versions within that environment.
import subprocess
from IPython.display import display, HTML
# Display the active conda environment
!conda env list | findstr "*"
# Run conda list and capture the output
condaListOutput = subprocess.check_output("conda list", shell=True).decode("utf-8")
# Wrap the output with <details> and <summary> HTML tags
htmlOutput = "<details><summary>Click to view installed packages</summary><pre>"
htmlOutput += condaListOutput
htmlOutput += "</pre></details>"
# Display the HTML in the notebook
display(HTML(htmlOutput))
nltk * C:\Users\tonyj\anaconda3\envs\nltk
# packages in environment at C:\Users\tonyj\anaconda3\envs\nltk: # # Name Version Build Channel ansi2html 1.9.2 pypi_0 pypi anyio 4.6.2.post1 pyhd8ed1ab_0 conda-forge argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py312h4389bb4_5 conda-forge arrow 1.3.0 pyhd8ed1ab_0 conda-forge asttokens 2.4.1 pyhd8ed1ab_0 conda-forge async-lru 2.0.4 pyhd8ed1ab_0 conda-forge attrs 24.2.0 pyh71513ae_0 conda-forge babel 2.14.0 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.12.3 pyha770c72_0 conda-forge bleach 6.1.0 pyhd8ed1ab_0 conda-forge blinker 1.8.2 pypi_0 pypi bokeh 3.6.0 pypi_0 pypi brotli-python 1.1.0 py312h275cf98_2 conda-forge bzip2 1.0.8 h2466b09_7 conda-forge ca-certificates 2024.8.30 h56e8100_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge certifi 2024.8.30 pyhd8ed1ab_0 conda-forge cffi 1.17.1 py312h4389bb4_0 conda-forge charset-normalizer 3.4.0 pyhd8ed1ab_0 conda-forge click 8.1.7 win_pyh7428d3b_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge comm 0.2.2 pyhd8ed1ab_0 conda-forge console_shortcut 0.1.1 haa95532_6 contourpy 1.3.0 pypi_0 pypi cpython 3.12.7 py312hd8ed1ab_0 conda-forge cryptography 43.0.3 pypi_0 pypi cycler 0.12.1 pypi_0 pypi dash 2.18.1 pypi_0 pypi dash-bootstrap-components 1.6.0 pypi_0 pypi dash-core-components 2.0.0 pypi_0 pypi dash-html-components 2.0.0 pypi_0 pypi dash-table 5.0.0 pypi_0 pypi debugpy 1.8.7 py312h275cf98_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge deprecated 1.2.14 pypi_0 pypi entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.2 pyhd8ed1ab_0 conda-forge executing 2.1.0 pyhd8ed1ab_0 conda-forge flask 3.0.3 pypi_0 pypi fonttools 4.54.1 pypi_0 pypi fqdn 1.5.1 pyhd8ed1ab_0 conda-forge h11 0.14.0 pyhd8ed1ab_0 conda-forge h2 4.1.0 pyhd8ed1ab_0 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge httpcore 1.0.6 pyhd8ed1ab_0 conda-forge httpx 0.27.2 pyhd8ed1ab_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge idna 3.10 pyhd8ed1ab_0 conda-forge importlib-metadata 8.5.0 pyha770c72_0 conda-forge importlib_metadata 8.5.0 hd8ed1ab_0 conda-forge importlib_resources 6.4.5 pyhd8ed1ab_0 conda-forge intel-openmp 2024.2.1 h57928b3_1083 conda-forge ipykernel 6.29.5 pyh4bbf305_0 conda-forge ipython 8.28.0 pyh7428d3b_0 conda-forge ipywidgets 8.1.5 pypi_0 pypi isoduration 20.11.0 pyhd8ed1ab_0 conda-forge itsdangerous 2.2.0 pypi_0 pypi jedi 0.19.1 pyhd8ed1ab_0 conda-forge jinja2 3.1.4 pyhd8ed1ab_0 conda-forge joblib 1.4.2 pyhd8ed1ab_0 conda-forge json5 0.9.25 pyhd8ed1ab_0 conda-forge jsonpointer 3.0.0 py312h2e8e312_1 conda-forge jsonschema 4.23.0 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2024.10.1 pyhd8ed1ab_0 conda-forge jsonschema-with-format-nongpl 4.23.0 hd8ed1ab_0 conda-forge jupyter-dash 0.4.2 pypi_0 pypi jupyter-lsp 2.2.5 pyhd8ed1ab_0 conda-forge jupyter_client 8.6.3 pyhd8ed1ab_0 conda-forge jupyter_core 5.7.2 pyh5737063_1 conda-forge jupyter_events 0.10.0 pyhd8ed1ab_0 conda-forge jupyter_server 2.14.2 pyhd8ed1ab_0 conda-forge jupyter_server_terminals 0.5.3 pyhd8ed1ab_0 conda-forge jupyterlab 4.2.5 pyhd8ed1ab_0 conda-forge jupyterlab-widgets 3.0.13 pypi_0 pypi jupyterlab_pygments 0.3.0 pyhd8ed1ab_1 conda-forge jupyterlab_server 2.27.3 pyhd8ed1ab_0 conda-forge kiwisolver 1.4.7 pypi_0 pypi krb5 1.21.3 hdf4eb48_0 conda-forge libblas 3.9.0 24_win64_mkl conda-forge libcblas 3.9.0 24_win64_mkl conda-forge libexpat 2.6.3 he0c23c2_0 conda-forge libffi 3.4.2 h8ffe710_5 conda-forge libhwloc 2.11.1 default_h8125262_1000 conda-forge libiconv 1.17 hcfcfb64_2 conda-forge liblapack 3.9.0 24_win64_mkl conda-forge libsodium 1.0.20 hc70643c_0 conda-forge libsqlite 3.46.1 h2466b09_0 conda-forge libxml2 2.12.7 h0f24e4e_4 conda-forge libzlib 1.3.1 h2466b09_2 conda-forge markdown 3.7 pypi_0 pypi markupsafe 3.0.1 py312h31fea79_1 conda-forge matplotlib 3.9.2 pypi_0 pypi matplotlib-inline 0.1.7 pyhd8ed1ab_0 conda-forge mistune 3.0.2 pyhd8ed1ab_0 conda-forge mkl 2024.1.0 h66d3029_694 conda-forge nbclient 0.10.0 pyhd8ed1ab_0 conda-forge nbconvert-core 7.16.4 pyhd8ed1ab_1 conda-forge nbformat 5.10.4 pyhd8ed1ab_0 conda-forge nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge networkx 3.4.1 pypi_0 pypi nltk 3.9.1 pyhd8ed1ab_0 conda-forge nltk_data 2019.07.04 h57928b3_3 conda-forge notebook 7.2.2 pyhd8ed1ab_0 conda-forge notebook-shim 0.2.4 pyhd8ed1ab_0 conda-forge numpy 2.1.2 py312hf10105a_0 conda-forge openssl 3.3.2 h2466b09_0 conda-forge overrides 7.7.0 pyhd8ed1ab_0 conda-forge packaging 24.1 pyhd8ed1ab_0 conda-forge pandas 2.2.3 py312h72972c8_1 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge parso 0.8.4 pyhd8ed1ab_0 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 11.0.0 pypi_0 pypi pip 24.2 pyh8b19718_1 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 4.3.6 pyhd8ed1ab_0 conda-forge plotly 5.24.1 pypi_0 pypi prometheus_client 0.21.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.48 pyha770c72_0 conda-forge psutil 6.0.0 py312h4389bb4_2 conda-forge pthreads-win32 2.9.1 h2466b09_4 conda-forge pure_eval 0.2.3 pyhd8ed1ab_0 conda-forge pycparser 2.22 pyhd8ed1ab_0 conda-forge pydot 3.0.2 pypi_0 pypi pygithub 2.5.0 pypi_0 pypi pygments 2.18.0 pyhd8ed1ab_0 conda-forge pyjwt 2.9.0 pypi_0 pypi pynacl 1.5.0 pypi_0 pypi pyparsing 3.2.0 pypi_0 pypi pysocks 1.7.1 pyh0701188_6 conda-forge python 3.12.7 hce54a09_0_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-fastjsonschema 2.20.0 pyhd8ed1ab_0 conda-forge python-graphviz 0.20.3 pypi_0 pypi python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge python-tzdata 2024.2 pyhd8ed1ab_0 conda-forge python_abi 3.12 5_cp312 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge pywin32 307 py312h275cf98_3 conda-forge pywinpty 2.0.13 py312h275cf98_1 conda-forge pyyaml 6.0.2 py312h4389bb4_1 conda-forge pyzmq 26.2.0 py312hd7027bb_3 conda-forge rake_nltk 1.0.6 pyhd8ed1ab_0 conda-forge referencing 0.35.1 pyhd8ed1ab_0 conda-forge regex 2024.9.11 py312h4389bb4_0 conda-forge requests 2.32.3 pyhd8ed1ab_0 conda-forge retrying 1.3.4 pypi_0 pypi rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge rpds-py 0.20.0 py312h2615798_1 conda-forge send2trash 1.8.3 pyh5737063_0 conda-forge setuptools 75.1.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge sniffio 1.3.1 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge stack_data 0.6.2 pyhd8ed1ab_0 conda-forge tbb 2021.13.0 hc790b64_0 conda-forge tenacity 9.0.0 pypi_0 pypi terminado 0.18.1 pyh5737063_0 conda-forge text-fabric 12.6.2 pypi_0 pypi tinycss2 1.3.0 pyhd8ed1ab_0 conda-forge tk 8.6.13 h5226925_1 conda-forge tomli 2.0.2 pyhd8ed1ab_0 conda-forge tornado 6.4.1 py312h4389bb4_1 conda-forge tqdm 4.66.5 pyhd8ed1ab_0 conda-forge traitlets 5.14.3 pyhd8ed1ab_0 conda-forge types-python-dateutil 2.9.0.20241003 pyhff2d567_0 conda-forge typing-extensions 4.12.2 hd8ed1ab_0 conda-forge typing_extensions 4.12.2 pyha770c72_0 conda-forge typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge tzdata 2024b hc8b5060_0 conda-forge ucrt 10.0.22621.0 h57928b3_1 conda-forge uri-template 1.3.0 pyhd8ed1ab_0 conda-forge urllib3 2.2.3 pyhd8ed1ab_0 conda-forge vc 14.3 h8a93ad2_22 conda-forge vc14_runtime 14.40.33810 hcc2c482_22 conda-forge vs2015_runtime 14.40.33810 h3bf8584_22 conda-forge wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge webcolors 24.8.0 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_2 conda-forge websocket-client 1.8.0 pyhd8ed1ab_0 conda-forge werkzeug 3.0.4 pypi_0 pypi wheel 0.44.0 pyhd8ed1ab_0 conda-forge widgetsnbextension 4.0.13 pypi_0 pypi win_inet_pton 1.1.0 pyh7428d3b_7 conda-forge winpty 0.4.3 4 conda-forge wrapt 1.16.0 pypi_0 pypi xyzservices 2024.9.0 pypi_0 pypi xz 5.2.6 h8d14728_0 conda-forge yaml 0.2.5 h8ffe710_2 conda-forge zeromq 4.3.5 ha9f60a1_6 conda-forge zipp 3.20.2 pyhd8ed1ab_0 conda-forge zstandard 0.23.0 py312h7606c53_1 conda-forge zstd 1.5.6 h0ea2cb4_0 conda-forge