#!/usr/bin/env python
# coding: utf-8
# # Hapaxes in parasha #44: Devarim (Deut. 1:1- 3:22)
# ## Table of Content (ToC)
#
# * 1 - Introduction
# * 2 - Load Text-Fabric app and data
# * 3 - Performing the queries
# * 4 - Required libraries
# * 5 - Further reading
# # 1 - Introduction
# ##### [Back to ToC](#TOC)
#
# A *hapax legomenon* (ἅπαξ λεγόμενον) is the term used in linguistics and philology to refer to a word or expression that appears only once within a specific context. Usually, this context is defined as the entire works of an author or a well-defined corpus of literature. The term comes from Greek, where "hapax" means "once" and "legomenon" means "something said." In this Notebook, the context to determine the *hapax legomena* is the full text of the Tenach, or more precisely, the full Biblica Hebraica Stuttgartensia.
# # 2 - Load Text-Fabric app and data
# ##### [Back to ToC](#TOC)
#
# The following code will load the Text-Fabric version of the [Biblia Hebraica Stuttgartensia (Amstelodamensis)](https://etcbc.github.io/bhsa/).
# In[1]:
get_ipython().run_line_magic('load_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')
# In[2]:
# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment.
from tf.fabric import Fabric
from tf.app import use
# In[3]:
# load the app and data
BHSA = use ("etcbc/BHSA", hoist=globals())
# # 3 - Performing the queries
# ##### [Back to ToC](#TOC)
#
# The Text-Fabric code in this Notebook is set up to query all words in the first and the last verse of this parasha. From these results (two lists of tuples), the boundaries (first and last word node) are determined. The value for the feature freq_lex is then examined for all word nodes within this range. Whenever the value for freq_lex is set to one, the related word and the verse it is part of are reported as a *hapax legomenon*. The indicated verse is hyperlinked to the STEP Bible, allowing for easy review of the verse.
# In[10]:
# find first word node for this parasha
startQuery = '''
verse book=Deuteronomium chapter=1 verse=1
word
'''
startResults = BHSA.search(startQuery)
# get the value of the first node in this list of tuples
startNode=startResults[0][1]
# In[11]:
# find last word node for this parasha
endQuery = '''
verse book=Deuteronomium chapter=3 verse=22
word
'''
endResults = BHSA.search(endQuery)
# get the value of the last node in this list of tuples
endNode=endResults[-1][1]
# In[12]:
# following is to escape some values for gloss that are labeled as ''
def escape_markdown(text):
return text.replace("<", "<").replace(">", ">")
# now iterate over this range of nodes
numberOfHapax=0
# format the table using MarkDown
tableContent="Verse|Word|Gloss\n---|---|---\n"
for node in range(startNode,endNode):
freq=F.freq_lex.v(node)
if freq==1:
numberOfHapax+=1
sectionTuple=T.sectionFromNode(node)
linkSTEPbible=f"{sectionTuple[0]} {sectionTuple[1]}:{sectionTuple[2]}"
tableContent+=f"{linkSTEPbible} | {F.g_word_utf8.v(node)}|{escape_markdown(F.gloss.v(node))}\n"
BHSA.dm(tableContent)
print(f"{numberOfHapax} hapaxes found")
# # 4 - Required libraries
# ##### [Back to ToC](#TOC)
#
# The scripts in this notebook require (beside `text-fabric`) the following Python libraries to be installed in the environment:
#
# {none}
#
# You can install any missing library from within Jupyter Notebook using either`pip` or `pip3`.
# # 5 - Further reading
# ##### [Back to ToC](#TOC)
#
# An discussion regarding Hapax Legomena, including details about ten hapaxes in the Hebrew Bible can be found at [The Torah.com](https://www.thetorah.com/article/hapax-legomena-ten-biblical-examples).