#!/usr/bin/env python # coding: utf-8 # # Using the Louw-Nida feature in Text-Fabric (N1904-TF) # ## Table of content (ToC) # * 1 - Introduction # * 2 - Load Text-Fabric app and data # * 3 - Using semantic domain features in N1904-TF # * 3.1 - How to query on feature ln # * 3.2 - Determining the number of LN clasifications per word # * 3.3 - Multiple LN classifications for single words # * 3.4 - What about related feature domain? # * Example use case: references to light # * 4.1 - Selecting the LN classification(s) # * 4.2 - Query the occurences # * 4.3 - Showing all lemmas within this range # * 4.4 - Show number of 'light-related' lemmas per book # * 4.5 - Creating a piechart showing individual lemmas per book # * 4.6 - Provide a download link for the pie-chart # * 5 - Exploring nodes with multiple Louw-Nida clasifications # * 5.1 - Create a dictionairy for lemmas with multiple Louw-Nida classification # * 5.2 - Display the dictionairy (optional) # * 5.3 - Explore with an interactive heatmap # * 5.3.1 - Defining the mapping dictionairy # * 5.3.2 - Plot the heatmap # * 5.3.3 - Provide a download link for the heatmap # * 6 - Attribution and footnotes # * 7 - Required libraries # * 8 - Notebook and environment details # # 1 - Introduction # ##### [Back to ToC](#TOC) # # The Louw-Nida classification, created by Greek linguists Johannes P. Louw and Eugene A. Nida, organizes New Testament Greek words into 93 semantic domains based on meaning rather than traditional lexical forms.1 Unlike standard lexicons, it groups words by related meanings, such as emotions or social relationships, providing a context-focused framework. This approach aids New Testament scholars by determining subtle distinctions in meaning. In our N1904-TF dataset, the Louw-Nida codes are found in the `ln` feature. # # 2 - Load Text-Fabric app and data # ##### [Back to ToC](#TOC) # In[1]: get_ipython().run_line_magic('load_ext', 'autoreload') get_ipython().run_line_magic('autoreload', '2') # In[2]: # Loading the Text-Fabric code # Note: it is assumed Text-Fabric is installed in your environment from tf.fabric import Fabric from tf.app import use # In[3]: # load the N1904 app and data N1904 = use ("centerblc/N1904", version="1.0.0", hoist=globals()) # In[4]: # The following will push the Text-Fabric stylesheet to this notebook (to facilitate proper display with notebook viewer) N1904.dh(N1904.getCss()) # # 3 - Using semantic domain features in N1904-TF # ##### [Back to ToC](#TOC) # This section explores this issue step by step some important aspects of how the Louw-Nida semantic classification is implemented in the N1904-TF dataset. # # # ## 3.1 - How to query on feature ln # # When creating a Text-Fabric query template, it’s essential to recognize that each word node in the N1904-TF dataset may have zero, one, or multiple Louw-Nida classifications, as represented by the [`ln`](https://centerblc.github.io/N1904/features/ln.html#start) feature # To illustrate this impact on query construction, let’s begin with a straightforward query pattern. Although this query retrieves instances of a specific Louw-Nida classification, as we’ll observe, it does not capture all occurrences. # In[5]: # define query template findLnValueEqual =''' phrase word ln=49.5 ''' equalSearch=N1904.search(findLnValueEqual) # print the results for phrase, node in equalSearch: print (f'word node:{node}, ln feature:{F.ln.v(node)}') # The next step is to account for the possibility that each word node may have multiple Louw-Nida classifications attached. Using a regular expression can help here, but it requires careful handling. Replacing `=` with `~` does increase the number of matches; however, it also introduces unintended results, as we’ll see when we examine the output. # In[6]: # define query template findLnValueRegexp =''' phrase word ln~49.5 ''' regexpSearch=N1904.search(findLnValueRegexp) # print the results for phrase, node in regexpSearch: print (f'word node:{node}, ln feature:{F.ln.v(node)}') # In the above results it is clear that the following are 'false positives': # ``` # word node:2890, ln feature:23.149 57.108 88.110 # word node:39180, ln feature:23.149 57.108 88.110 # word node:91884, ln feature:37.49 56.30 # word node:91889, ln feature:37.49 56.30 # ``` # # To retrieve only the desired results, the regular expression (regex) needs adjustment. The expression `\b49[.]5\b` is designed to match the exact pattern `49.5` as a standalone sequence. Key components of this regex are: # # - `\b` as a word boundary anchor, ensuring `49.5` appears as an independent sequence. # - `[.]` to denote the period (.) as a literal character, since without brackets, a plain dot would match any character. # # Another key aspect of the query template is the `r` prefix before the Text-Fabric query template containing the regex, which signals Python to treat the string as a raw string. This prevents Python from modifying the regex, which would otherwise happen in some cases. . Omitting this `r` in environments like Jupyter Notebook can lead to no matches (like for this query) or may trigger warnings, such as `SyntaxWarning: invalid escape sequence`, in other cases (e.g., when using `\s` to designate a space in the regex). # # See also Text-Fabric's [manual](https://annotation.github.io/text-fabric/tf/about/manual.html) on the use of regular expressions. # In[7]: # define query template # The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex. findUpdatedLnValueRegex =r''' phrase word ln~\b49[.]5\b ''' updatedRegexSearch=N1904.search(findUpdatedLnValueRegex) # print the results for phrase, node in updatedRegexSearch: print (f'word node:{node}, ln feature:{F.ln.v(node)}') # ## 3.2 - Determining the number of LN clasifications per word # # As demonstrated above, the number of Louw-Nida clasifications per word may differ. This script counts how many words in the Greek New Testament have a specific number of Louw-Nida (LN) classifications attached, providing some sense on its magnitude. # # The following script determines for each word node the count of LN categories (or zero if none) and tallies how often each count occurs. The results are displayed in a table, showing the number of words associated with each unique LN classification count. # In[8]: from collections import defaultdict import pandas as pd from IPython.display import display # Initialize a dictionary to count the number of Catagories per word node lnCategoryCounts = defaultdict(int) # Iterate over each word in the Greek New Testament for word in F.otype.s("word"): ln = F.ln.v(word) # Retrieve the Louw-Nida classification for the word if ln: # Count the number of Louw-Nida categories for this word numLNs = len(ln.split()) else: numLNs = 0 # For words with no Louw-Nida classification # Increment the count for this LN category count lnCategoryCounts[numLNs] += 1 # Convert to a DataFrame for easier display as a table lnCategoryCountsDf = pd.DataFrame.from_dict(lnCategoryCounts, orient='index', columns=['Count']) lnCategoryCountsDf.index.name = 'Number of LN categories' lnCategoryCountsDf = lnCategoryCountsDf.sort_index() # Sort by the number of LNs # Display the table display(lnCategoryCountsDf) # ## 3.3 Multiple LN classifications for single words # In this section, we will delve deeper into cases with multiple Louw-Nida classifications, focusing on their distribution across top-level categories (i.e., the part before the dot). # # The following script counts occurrences of top-level Louw-Nida classifications for each word in the Greek New Testament, tracking both `countAllItems` (total occurrences of each classification) and `countFirstItem` (occurrences based only on the first classification listed for each word). The `delta` column highlights the difference between these counts, helping to identify top-level values with the most significant discrepancies for lemmas with multiple classifications. # In[9]: from collections import defaultdict import pandas as pd from IPython.display import display # Initialize a dictionary to hold counts and delta for each Louw-Nida top-level classification lnCounts = defaultdict(lambda: {'countAllItems': 0, 'countFirstItem': 0, 'delta': 0}) # Iterate over each word in the Greek New Testament for word in F.otype.s("word"): ln = F.ln.v(word) # Retrieve the Louw-Nida classification for the word if ln: # Check if there is a valid Louw-Nida classification # Split the ln value into it's individual codes codes = ln.split() # Count all items: increment for each top-level value in codes for code in codes: topLevelValue = int(code.split('.')[0]) lnCounts[topLevelValue]['countAllItems'] += 1 # Count only the first item: increment for the first top-level value firstTopLevelValue = int(codes[0].split('.')[0]) lnCounts[firstTopLevelValue]['countFirstItem'] += 1 # Calculate delta for each top-level classification for topLevelValue, counts in lnCounts.items(): counts['delta'] = counts['countAllItems'] - counts['countFirstItem'] # Convert to a DataFrame lnCountsDf = pd.DataFrame.from_dict(lnCounts, orient='index') lnCountsDf.index.name = 'Louw-Nida top-level' lnCountsDf.columns = ['count all items', 'count only first item', 'delta'] # Sort by 'delta' in descending order lnCountsDf = lnCountsDf.sort_values(by='delta', ascending=False) # Display the sorted table display(lnCountsDf) # ## 3.4 - What about related feature domain? # Note also that the feature [domain](https://centerblc.github.io/N1904/features/domain.html#start) has multiple values. This is expected since this feature is *to some extent* equivalent to a numerical representation of feature ['ln'](https://centerblc.github.io/N1904/features/ln.md#start) and can be decoded using the following method. Take for example feature 'domain' has a value of '089007'. The 6-digit value '089007' first need to be split into two 3-digit parts: '087' and '007'. The second part should be interpreted as a alphabetic (A=1, B=2, C=3, D=4, E=5, ..., Z=26). Taken the two parts together, this will result in '89G', which points to an entry in Louw-Nida. For this example (i.e. 89G) this maps to main section ['Relations'](https://www.laparola.net/greco/louwnida.php?sezmag=89) and subsection ['Cause and/or Reason'](https://www.laparola.net/greco/louwnida.php?sezmag=89&sez1=15&sez2=38). # # It is important to realize that the granularity of feature 'domain' is less than that of feature ['ln'](https://centerblc.github.io/N1904/features/ln.md#readme). Consider for example the Greek word ἀρχή in John 1:1. According to Louw-Nida Lexicon this can map to either a:beginning (aspect)=>68.1 or b:beginning (time)=>67.65. In Text-Fabric one value is attached to feature 'domain', which is '067003'. Using the above explained method, this breaks down to '067' and '003' where the last part refers to section 'C', which is actualy a range [(67.65-67.72)](https://www.laparola.net/greco/louwnida.php#67) within Louw Nida's classification. # # It can easily be checked that feature `domain` also has multiple values associated for certain word nodes. For example, the feature values for node 39180 (taken from one of the previous examples) are displayed as follows: # In[10]: print (f'word node:39180, ln feature:{F.ln.v(39180)}, domain feature:{F.domain.v(39180)}') # Therefore, Text-Fabric queries on the 'domain' feature should use a similar regular expression pattern as previously discussed: # In[11]: # define query template # The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex. findDomainRegex =r''' phrase word domain~\b088015\b ''' domainResults=N1904.search(findDomainRegex) # print the first 10 results printCounter=0 for phrase, node in domainResults: print (f'word node:{node}, domain feature:{F.domain.v(node)}') printCounter+=1 if printCounter==10:break # 2 ideas to work out... # # 4 - Example use case: references to light # ##### [Back to ToC](#TOC) # # Now that we have established the foundation for using the LN feature, we will explore one example where we would like to gather all New Testament references to the concept 'light' (the physical in the sense of the opposite of darkness). # ## 4.1 - Selecting the LN classification(s) # # Selecting the proper range of LN classificions depend on the actual research question. For the sake of demonstration we use the following range: # ``` # Top-level domain: 14 (Physical Events and States) # Sub-level: F (Light) ranging from 14.36-52 # ``` # ## 4.2 - Query the occurences # # In order to create a list of word nodes, we need to translate this range (14.36-52) into the following regex: `\b14[.](3[6-9]|4[0-9]|5[0-2])\b`. In this the `\b` matches a word boundary. The parentheses `( ... )` serve to group the alternations within it separated by `|`. The three sub-paterns, like `3[6-9]`, match a range of numbers (in this case 36-39). This grouping allows the regular expression to evaluate the enclosed pattern as a single unit, enabling multiple possible matches to be checked in sequence. # In[12]: # define query template # The preceding 'r' before the template allows for a raw strings, preventing Python from altering the regex. lightQuery =r''' phrase word ln~\b14[.](3[6-9]|4[0-9]|5[0-2])\b ''' lightSearch=N1904.search(lightQuery) # print the first 5 results printCounter=0 for phrase, node in lightSearch: print (f'word node:{node}, ln feature:{F.ln.v(node)}') printCounter+=1 if printCounter==5:break # ## 4.3 Showing all lemmas within this range # # The following script produces a table showing which lemmas are associated with this LN range. Note this script depends on the output from the previous query in 4.2. # In[13]: import pandas as pd from collections import defaultdict from IPython.display import display # Initialize a dictionary to store counts for each lemma and ln classification lemmaLnCounts = defaultdict(int) # Populate the dictionary with counts from the search results for phrase, node in lightSearch: ln = F.ln.v(node) # Retrieve LN classification for the word node lemma = F.lemma.v(node) # Retrieve lemma for the word node gloss = F.gloss.v(node) # Retrieve english gloss for the word node if ln and lemma: lemmaLnCounts[(lemma, ln, gloss)] += 1 # Create the DataFrame without a default index df = pd.DataFrame( [{'lemma': lemma, 'ln': ln, 'gloss': gloss, 'count': count} for (lemma, ln, gloss), count in lemmaLnCounts.items()] ).sort_values(by='count', ascending=False) # Display the DataFrame without the index display(df.style.hide(axis="index")) # ## 4.4 - Show number of 'light-related' lemmas per book # # This script counts the number of occurrences from the search results for occurences of 'light-related' lemmas for each book and displays the totals in a sorted table. Note this script depends on the output from the previous query in 4.2. # In[14]: import pandas as pd from collections import defaultdict # Initialize a dictionary to store counts for each book bookCounts = defaultdict(int) # Populate the dictionary with counts from the search results for phrase, node in lightSearch: book = F.book.v(node) # Retrieve book name if book: bookCounts[book] += 1 # Convert dictionary to a DataFrame bookTotals = pd.DataFrame(list(bookCounts.items()), columns=['Book', 'Total Count']) # Sort by total count in descending order bookTotals = bookTotals.sort_values(by='Total Count', ascending=False).reset_index(drop=True) # Display the totals per book as a nice-looking table display(bookTotals) # ## 4.5 Creating a piechart showing individual lemmas per book # # It is also possible to dig a litle bit deeper and create an interactive plot which also shows the number of occurences of individual lemmas within each book. Note this script depends on the output from the previous query in 4.2. # In[15]: import pandas as pd from collections import defaultdict from math import pi, cos, sin from bokeh.io import show, output_notebook, save from bokeh.plotting import figure from bokeh.models import HoverTool, Label from bokeh.palettes import Category20 from bokeh.resources import INLINE # Enable Bokeh output for Jupyter Notebook and JupyterLab output_notebook(resources=INLINE) # Initialize a dictionary to store counts for each lemma, LN classification, gloss, and book lemmaLnCounts = defaultdict(int) # Populate the dictionary with counts from the search results for phrase, node in lightSearch: ln = F.ln.v(node) # Retrieve LN classification for the word node lemma = F.lemma.v(node) # Retrieve lemma for the word node gloss = F.gloss.v(node) # Retrieve English gloss for the word node book = F.book.v(node) # Retrieve book name if ln and lemma: lemmaLnCounts[(lemma, ln, gloss, book)] += 1 # Convert dictionary to a DataFrame df = pd.DataFrame( [(lemma, ln, gloss, book, count) for (lemma, ln, gloss, book), count in lemmaLnCounts.items()], columns=['lemma', 'ln', 'gloss', 'book', 'count'] ) # Group by book and aggregate lemma information per book bookLemmaCounts = df.groupby(['book', 'lemma']).agg({'count': 'sum'}).reset_index() # Sort lemmas by count in descending order within each book bookLemmaCounts = bookLemmaCounts.sort_values(by=['book', 'count'], ascending=[True, False]) # Aggregate total counts per book for the pie chart and sort by total count in descending order bookCounts = bookLemmaCounts.groupby('book')['count'].sum().reset_index() bookCounts = bookCounts.sort_values(by='count', ascending=False).reset_index(drop=True) # Create a custom column to hold sorted lemma breakdown in descending order for each book tooltipData = bookLemmaCounts.groupby('book')[['lemma', 'count']].apply( lambda group: '\n'.join(f"{row['lemma']}: {row['count']}" for _, row in group.iterrows()) ).reset_index(name='lemmaInfo') # Merge sorted lemma information with book counts bookCounts = bookCounts.merge(tooltipData, on='book', how='left') # Total count and cumulative angle calculation totalCount = bookCounts['count'].sum() bookCounts['angle'] = bookCounts['count'] / totalCount * 2 * pi # Sequential start and end angle calculation angles = [0] + list(bookCounts['angle'].cumsum()) bookCounts['startAngle'] = angles[:-1] bookCounts['endAngle'] = angles[1:] # Assign colors from Category20 palette colors = Category20[len(bookCounts)] if len(bookCounts) <= 20 else Category20[20] bookCounts['color'] = colors[:len(bookCounts)] # Initialize Bokeh figure with increased size and adjusted y-axis range for offset pieChart = figure(height=800, width=1000, title="Distribution of 'light-related' lemmas per book", toolbar_location=None, tools="") # Draw each wedge separately with a smaller radius and set individual hover data for i, row in bookCounts.iterrows(): wedge = pieChart.wedge(x=0, y=0.5, radius=0.5, # Reduced radius to make the pie chart smaller start_angle=row['startAngle'], end_angle=row['endAngle'], line_color="white", fill_color=row['color']) # Add a custom hover tool for each wedge with book-specific info in a multi-line format hover = HoverTool(renderers=[wedge], tooltips=[ ("Book", row['book']), ("Total Count", f"{row['count']}"), ("Lemmas", row['lemmaInfo']) ]) pieChart.add_tools(hover) # Calculate label position closer to the pie chart at a fixed distance angle = (row['startAngle'] + row['endAngle']) / 2 labelX = 0.7 * cos(angle) # Position labels at 0.7 distance from the center labelY = 0.5 + (0.7 * sin(angle)) # Adjust y-position by 0.5 to account for shifted chart center # Determine alignment for the left half alignment = "left" if -pi/2 < angle < pi/2 else "right" # Fine-tune label alignment on the left side if pi/2 < angle < 3 * pi / 2: labelX += 0.05 # Move labels on the left side slightly to the right # Draw connector line from pie segment edge to the center of the label lineEndX = labelX - 0.02 if alignment == "right" else labelX + 0.02 # Adjust line end to center of label lineX = [0.5 * cos(angle), lineEndX] lineY = [0.5 + (0.5 * sin(angle)), labelY] pieChart.line(x=lineX, y=lineY, line_width=1, color=row['color']) # Set label alignment for book name label = Label(x=labelX, y=labelY, text=row['book'], text_align=alignment, text_baseline="middle", text_font_size="10pt") pieChart.add_layout(label) # Position total count within the wedge countX = 0.35 * cos(angle) # Position count text within the wedge, closer to the center countY = 0.5 + (0.35 * sin(angle)) # Adjust y-position by 0.5 to account for shifted chart center countLabel = Label(x=countX, y=countY, text=str(row['count']), text_align="center", text_baseline="middle", text_font_size="9pt", text_color="black") pieChart.add_layout(countLabel) # Adjust grid/axis settings pieChart.axis.axis_label = None pieChart.axis.visible = False pieChart.grid.grid_line_color = None # Show plot in notebook show(pieChart) # ## 4.6 - Provide a download link for the pie-chart # # After executing the previous cell the object `pieChart` whas created represening the pie-chart. The following cell will create a download button allowing will call a function to base64-encode the data to allow it to be downloaded as an interactive HTML file for offline usage. # In[16]: from IPython.display import HTML import base64 # used to encode the data to be downloaded from bokeh.embed import file_html from bokeh.resources import CDN def createDownloadLink(htmlContent, fileName, documentTitle, buttonText): # Convert plot to HTML string htmlContent = file_html(htmlContent, CDN, documentTitle) # Encode the HTML content to base64 b64Html = base64.b64encode(htmlContent.encode()).decode() # Create the HTML download link downloadLink = f''' ''' return HTML(downloadLink) # Display the download link in the notebook createDownloadLink(pieChart, 'light_related_lemmas_per_book.html', 'Distribution of \'light-related\' lemmas per book', 'Download pie-chart') # # 5 - Exploring nodes with multiple Louw-Nida clasifications # ##### [Back to ToC](#TOC) # # This section delves into cases where multiple Louw-Nida semantic classification codes are associated with a single word node. We will approach this exploration step by step. # ## 5.1 - Create a dictionairy for lemmas with multiple Louw-Nida classification # # The first step is performed by the following script, which is designed to analyze and count occurrences of lemmas according to their associated Louw-Nida semantic classification codes. By collecting frequency data for each lemma, particularly where multiple LN codes are assigned, the script supports more detailed linguistic analysis of word usage patterns across semantic categories. # # At a high level, the script iterates through each word in the Text-Fabric dataset, extracting its lemma and any associated LN codes. It then counts these codes for each lemma, storing the results in a structured format that captures both total counts and per-code counts. The final output is a dictionary where each lemma is mapped to its cumulative count and individual LN code frequencies, providing a foundation for further exploration of lexical patterns. Detailed operations are commented directly within the script for clarity. # In[17]: from collections import defaultdict, Counter # Initialize a dictionary to store lemma information with counts per ln code lemmaInfoDict = defaultdict(lambda: {"totalCount": 0, "lnCounts": Counter()}) # Iterate over each word in the Greek New Testament for word in F.otype.s("word"): lemma = F.lemma.v(word) # get the lemma associated with the current word ln = F.ln.v(word) if ln is not None: # Split multiple `ln` codes and only process if there are two or more codes lnCodes = ln.split() if len(lnCodes) >= 2: # Split multiple `ln` codes and count each one for the lemma lnCodes = ln.split() lemmaInfoDict[lemma]["lnCounts"].update(lnCodes) # Increment totalCount by the number of `ln` codes found lemmaInfoDict[lemma]["totalCount"] += len(lnCodes) # Format the result as needed result = { lemma: ( info["totalCount"], dict(info["lnCounts"]) # convert Counter to regular dictionary for readability ) for lemma, info in lemmaInfoDict.items() } # ## 5.2 - Display the dictionairy (optional) # # To examine the created dictionairy, you can run the following cell by removing the hash: # In[23]: # print(result) # ## 5.3 - Explore with an interactive heatmap # # In this section we generate an interactive heatmap that visualizes the frequency of lemmas in the Greek New Testament, categorized by their top-level Louw-Nida (LN) semantic classification codes. Classifications are grouped based on their top-level category (i.e., the part of the LN code found before the dot) to create an insightful plot. Additional information is available for each data point, shown when hovering over it. # # # ### 5.3.1 - Defining the mapping dictionairy # # Before running the script to create the plot, a dictionary is defined to map each Louw-Nida top-level domain number to a brief description.2 # In[19]: # The following script will produce a dictionairy of Louw-Nida Top-level domains # The structure of the dictionairy is: # louwNidaMapping = { # numeric (key) : "description" # ... # } import requests from bs4 import BeautifulSoup # Debugging mode (False, True) debug = False # URL of the Louw-Nida classifications page url = "https://www.laparola.net/greco/louwnida.php" # Retrieve the webpage content response = requests.get(url) if debug: print(f"Retrieving URL {url} returned {response}") response.raise_for_status() # Check for request errors # Parse the HTML content with BeautifulSoup soup = BeautifulSoup(response.text, "html.parser") # Initialize an empty dictionary louwNidaMapping = {} # Find all

elements that contain the Louw-Nida classification data for entry in soup.find_all("h3"): # Extract the number from the tag within the

tag numberTag = entry.find("a") descriptionText = entry.get_text(strip=True) # Ensure there's content to process if numberTag and descriptionText: # Attempt to parse the number and description keyText = numberTag.get_text(strip=True) try: # Convert the number to an integer key = int(keyText) except ValueError: # If conversion fails, skip this entry if debug: print(f"Skipping entry due to non-numeric key: {keyText}") continue # Get description by removing the number portion from the full text description = descriptionText.replace(keyText, "", 1).strip(' "') # Add to dictionary louwNidaMapping[key] = description if debug: print(f"Added classification: {key}: {description}") if debug: print(f"Resulting dictionary: {louwNidaMapping}") # ### 5.3.2 - Plot the heatmap # # The following script first prepares the data by extracting each lemma’s occurrence count and its top-level LN code, then organized it into a structured format. The counts are normalized within each lemma, allowing for a relative comparison across semantic categories. Using Bokeh, the script plots lemmas along the y-axis and LN codes along the x-axis, with color intensity representing normalized frequency, accompanied by a color bar for reference. # In[20]: import pandas as pd from bokeh.io import output_notebook, show from bokeh.plotting import figure from bokeh.models import ColumnDataSource, ColorBar from bokeh.transform import linear_cmap from bokeh.resources import INLINE from bokeh.palettes import Viridis256 # Enable Bokeh output for Jupyter Notebook and JupyterLab output_notebook(resources=INLINE) # Prepare data dictionary for DataFrame creation data = { 'lemma': [], 'topLevelLnCode': [], 'count': [] } # Populate data dictionary with lemma, top-level LN code, and count for lemma, (totalCount, lnDict) in result.items(): for lnCode, count in lnDict.items(): topLevelCode = int(lnCode.split('.')[0]) # Extract top-level LN code as integer data['lemma'].append(lemma.lower()) # Convert lemma to lowercase data['topLevelLnCode'].append(topLevelCode) data['count'].append(count) # Create DataFrame and normalize counts per lemma df = pd.DataFrame(data) dfGrouped = df.groupby(['lemma', 'topLevelLnCode'], as_index=False)['count'].sum() dfGrouped['normalizedCount'] = dfGrouped.groupby('lemma')['count'].transform(lambda x: x / x.sum()) # Map top-level LN codes to descriptions dfGrouped['louwNidaDescription'] = dfGrouped['topLevelLnCode'].map(louwNidaMapping) # Sort lemmas alphabetically and LN codes numerically for plot arrangement dfGrouped['lemma'] = pd.Categorical(dfGrouped['lemma'], categories=sorted(dfGrouped['lemma'].unique())) dfGrouped['topLevelLnCode'] = pd.Categorical(dfGrouped['topLevelLnCode'], ordered=True, categories=sorted(set(dfGrouped['topLevelLnCode']), key=int)) # Set plot height dynamically based on the number of unique lemmas numLemmas = len(dfGrouped['lemma'].unique()) plotHeight = 20 * numLemmas # Adjust multiplier for spacing # Prepare data source for plotting sortedLnCodes = [str(code) for code in sorted(dfGrouped['topLevelLnCode'].cat.categories)] source = ColumnDataSource(dfGrouped) # Initialize Bokeh figure with dynamic height and increased width lnHeatmap = figure( width=1250, height=plotHeight, title="Per-Lemma Normalized Heatmap by Top-Level LN Code", x_range=sortedLnCodes, # Sorted x-axis labels y_range=sorted(dfGrouped['lemma'].unique(), reverse=True), tools="hover", tooltips=[("Lemma", "@lemma"), ("LN Code", "@topLevelLnCode"), ("LN Description", "@louwNidaDescription"), ("Fraction", "@normalizedCount{0.00}")], ) # Rotate x-axis labels for readability lnHeatmap.xaxis.major_label_orientation = 1.57 # 90 degrees in radians # Define color mapper for heatmap mapper = linear_cmap(field_name='normalizedCount', palette=Viridis256, low=0, high=1) # Add heatmap rectangles to the plot lnHeatmap.rect(x="topLevelLnCode", y="lemma", width=1, height=1, source=source, fill_color=mapper, line_color=None) # Add color bar for normalized count reference colorBar = ColorBar(color_mapper=mapper['transform'], width=8, location=(0, 0)) lnHeatmap.add_layout(colorBar, 'right') # Display the plot show(lnHeatmap) # ### 5.3.3 - Provide a download link for the heatmap # # After executing the previous cell which displays the heatmap, executing the following cell will create a download button allowing the heatmap to be downloaded as an interactive HTML file for offline usage. # In[21]: from IPython.display import HTML import base64 # used to encode the data to be downloaded from bokeh.embed import file_html from bokeh.resources import CDN def createDownloadLink(htmlContent, fileName, documentTitle, buttonText): # Convert plot to HTML string htmlContent = file_html(htmlContent, CDN, documentTitle) # Encode the HTML content to base64 b64Html = base64.b64encode(htmlContent.encode()).decode() # Create the HTML download link downloadLink = f''' ''' return HTML(downloadLink) # Display the download link in the notebook createDownloadLink(lnHeatmap, 'louw_nida_heatmap.html', 'Interactive Louw-Nida Heatmap', 'Download Louw-Nida heatmap') # # 6 - Attribution and footnotes # ##### [Back to ToC](#TOC) # # #### Footnotes: # # 1 Johannes P. Louw and Eugene Albert Nida, *Greek-English Lexicon of the New Testament: Based on Semantic Domains* (New York: United Bible Societies, 1996). # # 2 The dictionary is created from data available on [Louw-Nida Lexicon @ laparola.net](https://www.laparola.net/greco/louwnida.php). # # 7 - Required libraries # ##### [Back to ToC](#TOC) # # The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment: # # base64 # bokeh # bs4 # collections # IPython # pandas # requests # # You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`. # # 8 - Notebook and environment details # ##### [Back to ToC](#TOC) # #
# # # # # # # # # # # # # #
AuthorTony Jurg
Version1.1
Date14 November 2024
#
# The following cell displays the active Anaconda environment along with a list of all installed packages and their versions within that environment. # In[22]: import subprocess from IPython.display import display, HTML # Display the active conda environment get_ipython().system('conda env list | findstr "*"') # Run conda list and capture the output condaListOutput = subprocess.check_output("conda list", shell=True).decode("utf-8") # Wrap the output with
and HTML tags htmlOutput = "
Click to view installed packages
"
htmlOutput += condaListOutput
htmlOutput += "
" # Display the HTML in the notebook display(HTML(htmlOutput))