Notebook

displayDeltaBetweenVersions ¶

Tool to determine what features and featurevalues were changed between two Text-Fabric datasets¶

Version: 0.2 (26 September 2024); adding comparing description and datatype Version: 0.1 (25 September 2024); implementation of enhancement feature 17.

Table of content ¶

1 - Introduction
2 - Setting up the environment
3 - Load the two Text-Fabric datasets
- 3.1 - Load the first dataset
- 3.2 - Load the second dataset
4 - Create dictionaries for the two datasets
- 4.1 - Setting up some global variables
- 4.2 - Store all relevant data into a dictionary
5 - Report the delta between the datasets
- 5.1 - Generate and view the report
- 5.2 - Download the report
6 - Change log
7 - Licence

1 - Introduction ¶

Back to TOC ¶

The main steps in producing the comparison are:

Load the two Text-Fabric database.
Construct two python dictionaries stroring all the relevant data from both versions.
Compare the two dictionaries.
Print the results in an infromative format.

2 - Preparing the environment ¶

Back to TOC ¶

Your environment should (for obvious reasons) include the Python package Text-Fabric. If not installed yet, it can be installed using pip.

Further it is required to be able to invoke the Text-Fabric data set (either from an online resource, or from a localy stored copy). There are no further requirements as the scripts basicly operate 'stand alone'.

2.1 - Setting script version ¶

Set the version number and creation date of this script,

Required user action:¶

Run the following cell to store details on the script version into memory.

In [7]:

scriptVersion="0.2"
scriptDate="26 September 2024"

2.2 - Setting script parameters ¶

Set some parameters used by the script.

Required user action:¶

Review the options in the following cell and execute the cell.

In [9]:

# This switch can be set to 'True' if you want additional information, such as dictionary entries to be printed. For basic output, set this switch to 'False'.
verbose=False

# Limit the number of entries in the frequency tables per node type (set to 0 for 'no limit')
tableLimit=10

2.3 - Load Text-Fabric code ¶

Required user action:¶

Load the Text-Fabric code in this notebook by running the following two cells.

In [11]:

%load_ext autoreload
%autoreload 2

In [13]:

# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

3 - Load the two Text-Fabric datasets ¶

Back to TOC ¶

In this phase, the two Text-Fabric datasets are loaded. Which datasets are loaded is specified in the parameters, as detailed below.:

Ax = use ("{GitHub user name}/{repository name}", version="{version}")

In this notebook, we will load the two different versions into two object, respectively named A1 and A2. One of the consequences of working with two Text-Fabric datasets in the same Python environment is that we need to address them individually when using advanced API functions. That also means the invocation needs to exclude the hoist=globals() option.

For various options regarding other possible storage locations, and other load options, see the documentation for function use.

3.1 - Load the first dataset ¶

Required user action:¶

Update the next cell to match the first version of the Text-Fabric dataset you wish to compare and then execute the cell.

In [20]:

# Load the app and data from the first dataset
A1 = use ("tonyjurg/Nestle1904LFT", version="0.7")

Locating corpus resources ...

app: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/app

data: ~/text-fabric-data/github/tonyjurg/Nestle1904LFT/tf/0.7

TF: TF API 12.5.3, tonyjurg/Nestle1904LFT/app v3, Search Reference
Data: tonyjurg - Nestle1904LFT 0.7, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	27	5102.93	100
chapter	260	529.92	100
verse	7943	17.35	100
sentence	8011	17.20	100
wg	105430	6.85	524
word	137779	1.00	100

Sets: no custom sets
Features:

Nestle 1904 (Low Fat Tree)

after

str

✅ Characters (eg. punctuations) following the word

book

str

✅ Book name (in English language)

booknumber

int

✅ NT book number (Matthew=1, Mark=2, ..., Revelation=27)

bookshort

str

✅ Book name (abbreviated)

case

str

✅ Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)

chapter

int

✅ Chapter number inside book

clausetype

str

✅ Clause type details (e.g. Verbless, Minor)

containedclause

str

🆗 Contained clause (WG number)

degree

str

✅ Degree (e.g. Comparitative, Superlative)

gloss

str

✅ English gloss

str

✅ Gramatical gender (Masculine, Feminine, Neuter)

headverse

str

✅ Start verse number of a sentence

junction

str

✅ Junction data related to a wordgroup

lemma

str

✅ Lexeme (lemma)

lex_dom

str

✅ Lexical domain according to Semantic Dictionary of Biblical Greek, SDBG (not present everywhere?)

str

✅ Lauw-Nida lexical classification (not present everywhere?)

markafter

str

🆗 Text critical marker after word

markbefore

str

🆗 Text critical marker before word

markorder

str

Order of punctuation and text critical marker

monad

int

✅ Monad (smallest token matching word order in the corpus)

mood

str

✅ Gramatical mood of the verb (passive, etc)

morph

str

✅ Morphological tag (Sandborg-Petersen morphology)

nodeID

str

✅ Node ID (as in the XML source data)

normalized

str

✅ Surface word with accents normalized and trailing punctuations removed

str

✅ Gramatical number (Singular, Plural)

number

str

✅ Gramatical number of the verb (e.g. singular, plural)

otype

str

person

str

✅ Gramatical person of the verb (first, second, third)

punctuation

str

✅ Punctuation after word

ref

str

✅ Value of the ref ID (taken from XML sourcedata)

reference

str

✅ Reference (to nodeID in XML source data, not yet post-processes)

roleclausedistance

str

⚠️ Distance to the wordgroup defining the syntactical role of this word

sentence

int

✅ Sentence number (counted per chapter)

str

✅ Part of Speech (abbreviated)

sp_full

str

✅ Part of Speech (long description)

strongs

str

✅ Strongs number

subj_ref

str

🆗 Subject reference (to nodeID in XML source data, not yet post-processes)

tense

str

✅ Gramatical tense of the verb (e.g. Present, Aorist)

type

str

✅ Gramatical type of noun or pronoun (e.g. Common, Personal)

unicode

str

✅ Word as it apears in the text in Unicode (incl. punctuations)

verse

int

✅ Verse number inside chapter

voice

str

✅ Gramatical voice of the verb (e.g. active,passive)

wgclass

str

✅ Class of the wordgroup (e.g. cl, np, vp)

wglevel

int

🆗 Number of the parent wordgroups for a wordgroup

wgnum

int

✅ Wordgroup number (counted per book)

wgrole

str

✅ Syntactical role of the wordgroup (abbreviated)

wgrolelong

str

✅ Syntactical role of the wordgroup (full)

wgrule

str

✅ Wordgroup rule information (e.g. Np-Appos, ClCl2, PrepNp)

wgtype

str

✅ Wordgroup type details (e.g. group, apposition)

word

str

✅ Word as it appears in the text (excl. punctuations)

wordlevel

str

🆗 Number of the parent wordgroups for a word

wordrole

str

✅ Syntactical role of the word (abbreviated)

wordrolelong

str

✅ Syntactical role of the word (full)

wordtranslit

str

🆗 Transliteration of the text (in latin letters, excl. punctuations)

wordunacc

str

✅ Word without accents (excl. punctuations)

oslots

none

Settings:

specified

apiVersion: 3
appName: tonyjurg/Nestle1904LFT
appPath:
C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/app
commit: g95357e8bf298b090341cf277596be01f7f1f5ce9
css: ''
dataDisplay:
- excludedFeatures:
  - orig_order
  - verse
  - book
  - chapter
- noneValues:
  - none
  - unknown
  - no value
  - NA
  - ''
- showVerseInTuple: 0
- textFormat: text-orig-full
docs:
- docBase: https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/
- docPage: about
- docRoot: https://github.com/tonyjurg/Nestle1904LFT
- featureBase:
  https://github.com/tonyjurg/Nestle1904LFT/blob/main/docs/features/<feature>.md
interfaceDefaults: {fmt: layout-orig-full}
isCompatible: True
local: local
localDir:
C:/Users/tonyj/text-fabric-data/github/tonyjurg/Nestle1904LFT/_temp
provenanceSpec:
- branch: main
- corpus: Nestle 1904 (Low Fat Tree)
- doi: 10.5281/zenodo.10182594
- org: tonyjurg
- relative: /tf
- repo: Nestle1904LFT
- repro: Nestle1904LFT
- version: 0.7
- webBase: https://learner.bible/text/show_text/nestle1904/
- webHint: Show this on the Bible Online Learner website
- webLang: en
- webUrl:
  https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: v0.8
typeDisplay:
- book:
  - condense: True
  - hidden: True
  - label: {book}
  - style: ''
- chapter:
  - condense: True
  - hidden: True
  - label: {chapter}
  - style: ''
- sentence:
  - hidden: 0
  - label: #{sentence} (start: {book} {chapter}:{headverse})
  - style: ''
- verse:
  - condense: True
  - excludedFeatures: chapter verse
  - label: {book} {chapter}:{verse}
  - style: ''
- wg:
  - hidden: 0
  - label:
    #{wgnum}: {wgtype} {wgclass} {clausetype} {wgrole} {wgrule} {junction}
  - style: ''
- word:
  - base: True
  - features: lemma
  - featuresBare: gloss
  - surpress: chapter verse
writing: grc

3.2 - Load the second dataset ¶

Required user action:¶

Update the next cell to match the second version of the Text-Fabric datasets you want to compare and then execute the cell.

In [16]:

# Load the app and data from the second version in the set for comparison
A2 = use ("saulocantanhede/tfgreek2", version="0.5.7")

Locating corpus resources ...

app: ~/text-fabric-data/github/saulocantanhede/tfgreek2/app

data: ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.7

TF: TF API 12.5.3, saulocantanhede/tfgreek2/app v3, Search Reference
Data: saulocantanhede - tfgreek2 0.5.7, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	27	5102.93	100
chapter	260	529.92	100
verse	7944	17.34	100
sentence	19703	13.82	198
group	8945	7.01	46
clause	30814	7.17	160
wg	106868	6.88	533
phrase	69007	1.90	95
subphrase	116178	1.60	135
word	137779	1.00	100

Sets: no custom sets
Features:

Nestle 1904 Greek New Testament

after

str

material after the end of the word

appositioncontainer

int

1 if it is an apposition container

articular

int

1 if the sentence, group, clause, phrase or wg has an article

before

str

this is XML attribute before

book

str

book name (full name)

bookshort

str

book name (abbreviated) from ref attribute in xml

case

str

grammatical case

chapter

int

chapter number, from ref attribute in xml

clausetype

str

clause type

cls

str

this is XML attribute cls

cltype

str

clause type

criticalsign

str

this is XML attribute criticalsign

crule

str

clause rule (from xml attribute Rule)

degree

str

grammatical degree

discontinuous

int

1 if the word is out of sequence in the xml

domain

str

domain

framespec

str

this is XML attribute framespec

function

str

this is XML attribute function

gender

str

grammatical gender

gloss

str

English gloss (BGVB)

str

xml id

junction

str

type of junction

lang

str

language the text is in

lemma

str

lexical lemma

lemmatranslit

str

transliteration of the word lemma

str

mood

str

verbal mood

morph

str

morphological code

nodeid

int

node id (as in the XML source data

normalized

str

lemma normalized

note

str

annotation of linguistic nature

num

int

generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.

number

str

grammatical number

otype

str

person

str

grammatical person

punctuation

str

this is XML attribute punctuation

ref

str

biblical reference with word counting

referent

str

number of referent

rela

str

this is XML attribute rela

role

str

role

rule

str

syntactical rule

str

part-of-speach

strong

int

strong number

subjrefspec

str

this is XML attribute subjrefspec

tense

str

verbal tense

text

str

the text of a word

trans

str

translation of the word surface text according to the Berean Interlinear Bible

translit

str

transliteration of the word surface text

typ

str

this is XML attribute typ

type

str

morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)

unaccent

str

word in unicode characters without accents and diacritical markers

unicode

str

word in unicode characters plus material after it

variant

str

this is XML attribute variant

verse

int

verse number, from ref attribute in xml

voice

str

verbal voice

frame

str

frame

oslots

none

parent

none

parent relationship between words

subjref

none

number of subject referent

Settings:

specified

apiVersion: 3
appName: saulocantanhede/tfgreek2
appPath:
C:/Users/tonyj/text-fabric-data/github/saulocantanhede/tfgreek2/app
commit: 352af50c8ce86edd8a0e2d58519453a8f53ee084
css: ''
dataDisplay:
- excludedFeatures: []
- noneValues:
  - none
  - unknown
  - no value
  - NA
- sectionSep1:
- sectionSep2: :
- textFormat: text-orig-full
docs:
- docBase: https://github.com/saulocantanhede/tfgreek2/tree/main/docs
- docPage: about
- docRoot: https://github.com/saulocantanhede/tfgreek2
- featureBase:
  https://github.com/saulocantanhede/tfgreek2/tree/main/docs/features/<feature>.md
- featurePage: README
interfaceDefaults: {fmt: text-orig-full}
isCompatible: True
local: local
localDir:
C:/Users/tonyj/text-fabric-data/github/saulocantanhede/tfgreek2/_temp
provenanceSpec:
- branch: main
- corpus: Nestle 1904 Greek New Testament
- doi: 10.5281/zenodo.notyet
- moduleSpecs: []
- org: saulocantanhede
- relative: /tf
- repo: tfgreek2
- repro: tfgreek2
- version: 0.5.7
- webBase: https://learner.bible/text/show_text/nestle1904/
- webHint: Show this on the website
- webLang: en
- webUrl:
  https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: 0.5.7
typeDisplay:
- clause:
  - condense: True
  - label: {typ} {function} {rela} \\ {cls} {role} {junction}
  - style: ''
- group:
  - label: {typ} {function} {rela} \\ {type} {role} {rule}
  - style: ''
- phrase:
  - condense: True
  - label: {typ} {function} {rela} \\ {type} {role} {rule}
  - style: ''
- sentence:
  - label: {typ} {function} {rela} \\ {role} {rule}
  - style: ''
- subphrase:
  - label: {typ} {function} {rela} \\ {type} {role} {rule}
  - style: ''
- verse:
  - condense: True
  - label: {book} {chapter}:{verse}
  - style: ''
- wg:
  - condense: True
  - label: {type} {role} {rule} {junction}
  - style: ''
- word:
  - features:
    lemma
    sp
  - featuresBare: [gloss]
writing: grc

Display is setup for viewtype syntax-view

See here for more information on viewtypes

4 - Create dictionaries for the two datasets ¶

Back to TOC ¶

Required action:¶

Execute the following cell to create dictionaries containing all relevant information for the loaded node and edge features of the two datasets.

In [22]:

import time

# Initialize both APIs
api1 = A1.api
api2 = A2.api

# Initialize empty dictionaries to store feature data for both APIs
featureDict1 = {}
featureDict2 = {}

# Define some critical variables if not already defined by execution of step 2.2
if 'tableLimit' not in globals(): tableLimit = 10 
if 'verbose' not in globals(): verbose = False
if 'scriptVersion' not in globals(): scriptVersion="not set"
if 'scriptDate' not in globals(): scriptDate="not set"

overallTime = time.time()

def getFeatureDescription(metaData):
    """
    Retrieves the description of a feature from its metadata.
    """
    return metaData.get('description', "No feature description")

def setDataType(metaData):
    """
    Determines the data type of a feature based on its metadata.
    """
    if 'valueType' in metaData:
        return "String" if metaData["valueType"] == 'str' else "Integer"
    return "Unknown"

def processFeature(feature, featureType, featureMethod, api, featureDict):
    """
    Processes a single feature and updates the feature dictionary.
    
    Parameters:
        feature (str): The name of the feature to process.
        featureType (str): Type of the feature ('Node' or 'Edge').
        featureMethod (function): Method to retrieve feature data.
        api: The API instance being processed.
        featureDict (dict): The dictionary to store feature data.
    """
    # Obtain the meta data
    featureMetaData = featureMethod(feature).meta
    featureDescription = getFeatureDescription(featureMetaData)
    dataType = setDataType(featureMetaData)

    # Initialize dictionary to store feature frequency data
    featureFrequencyDict = {}

    # Skip specific features based on type
    if not (featureType == 'Node' and feature == 'otype') and not (featureType == 'Edge' and feature == 'oslots'):
        for nodeType in api.F.otype.all:
            frequencyLists = featureMethod(feature).freqList(nodeType)
            if not isinstance(frequencyLists, int):
                if len(frequencyLists) != 0:
                    featureFrequencyDict[nodeType] = {
                        'nodetype': nodeType, 
                        'freq': frequencyLists[:tableLimit] if tableLimit > 0 else frequencyLists
                    }
            elif isinstance(frequencyLists, int):
                if frequencyLists != 0:
                    featureFrequencyDict[nodeType] = {
                        'nodetype': nodeType, 
                        'freq': [("Link", frequencyLists)]
                    }

    # Add processed feature data to the main dictionary
    featureDict[feature] = {
        'name': feature, 
        'descr': featureDescription, 
        'type': featureType, 
        'datatype': dataType, 
        'freqlist': featureFrequencyDict
    }

def process_api(api, featureDict, api_label):
    """
    Processes all node and edge features for a given API and populates the feature dictionary.
    
    Parameters:
        api: The API instance to process.
        featureDict (dict): The dictionary to store feature data.
        api_label (str): Label for the API (used in print statements).
    """
    print(f'Analyzing Node Features for {api_label}: ', end='')
    for nodeFeature in api.Fall():
        if not verbose:
            print('.', end='')  # Progress indicator
        processFeature(nodeFeature, 'Node', api.Fs, api, featureDict)
        if verbose:
            print(f'\nFeature {nodeFeature} = {featureDict[nodeFeature]}\n')
    print('\n')  # Newline after node features

    print(f'Analyzing Edge Features for {api_label}: ', end='')
    for edgeFeature in api.Eall():
        if not verbose:
            print('.', end='')  # Progress indicator
        processFeature(edgeFeature, 'Edge', api.Es, api, featureDict)
        if verbose:
            print(f'\nFeature {edgeFeature} = {featureDict[edgeFeature]}\n')
    print('\n')  # Newline after edge features

########################################################
#                     MAIN FUNCTION                    #
########################################################

# Gather generic information for first dataset (stored in API1)
print('Gathering generic details for first dataset')

# Initialize default values
corpusName1 = A1.appName
liveName1 = 'not set'
versionName1 = A1.version

# Locate corpus information for first dataset (stored in API1)
if A1.provenance:
    for parts in A1.provenance[0]: 
        if isinstance(parts, tuple):
            key, value = parts[0], parts[1]
            if verbose: print(f'API1 General info: {key}={value}')
            if key == 'corpus': corpusName1 = value
            if key == 'version': versionName1 = value
            # Value for live is a tuple
            if key == 'live': liveName1 = value[1]


# Repeat the generic information gathering for API2 if needed
print('Gathering generic details for second dataset')

# Initialize default values for API2
corpusName2 = A2.appName
liveName2 = 'not set'
versionName2 = A2.version

# Locate corpus information for API2
if A2.provenance:
    for parts in A2.provenance[0]: 
        if isinstance(parts, tuple):
            key, value = parts[0], parts[1]
            if verbose: print(f'API2 General info: {key}={value}')
            if key == 'corpus': corpusName2 = value
            if key == 'version': versionName2 = value
            # Value for live is a tuple
            if key == 'live': liveName2 = value[1]

# Process both APIs
process_api(api1, featureDict1, api_label="first dataset (stored in API1)")
process_api(api2, featureDict2, api_label="second dataset (stored in API2)")

print(f'Finished in {time.time() - overallTime:.2f} seconds.')

Gathering generic details for first dataset
Gathering generic details for second dataset
Analyzing Node Features for first dataset (stored in API1): .......................................................

Analyzing Edge Features for first dataset (stored in API1): .

Analyzing Node Features for second dataset (stored in API2): .......................................................

Analyzing Edge Features for second dataset (stored in API2): ....

Finished in 21.06 seconds.

5 - Report the delta between the datasets ¶

Back to TOC ¶

5.1 - Generate and view the report ¶

Required action:¶

Execute the following cell to create a detailed report indicating the delta between the two datasets.

In [190]:

from IPython.display import display, HTML
from datetime import datetime

# Get current date and time
current_time = datetime.now()
formatted_time = current_time.strftime("%Y-%m-%d %H:%M:%S")

# Function to compare two feature dictionaries and report datatype, freqlist, descr, and type differences
def compare_feature_dicts(dict1, dict2):
    """
    Compares two feature dictionaries and returns a report of differences,
    filtering out identical entries in both datasets, and comparing 'datatype', 'descr', 'type', and 'freqlist'.
    """
    report = {
        'only_in_dict1': [],
        'only_in_dict2': [],
        'differences_in_common': {}
    }
    
    keys1 = set(dict1.keys())
    keys2 = set(dict2.keys())
    
    report['only_in_dict1'] = sorted(keys1 - keys2)
    report['only_in_dict2'] = sorted(keys2 - keys1)
    
    common_features = keys1 & keys2
    
    for feature in common_features:
        differences = {}
        feature1 = dict1[feature]
        feature2 = dict2[feature]
        
        for key in ['descr', 'type', 'datatype']:
            value1 = feature1.get(key, None)
            value2 = feature2.get(key, None)
            if value1 != value2:
                differences[key] = {'Dataset1': value1, 'Dataset2': value2}
        
        freqlist1 = feature1.get('freqlist', {})
        freqlist2 = feature2.get('freqlist', {})
        
        freqlist_diff = {}

        for nodetype in freqlist1.keys() | freqlist2.keys():
            freq1 = dict(freqlist1.get(nodetype, {}).get('freq', []))
            freq2 = dict(freqlist2.get(nodetype, {}).get('freq', []))
            
            diff1 = [t for t in freq1.items() if t not in freq2.items()]
            diff2 = [t for t in freq2.items() if t not in freq1.items()]
            
            if diff1 or diff2:
                freqlist_diff[nodetype] = {'Dataset1': diff1, 'Dataset2': diff2}
        
        if freqlist_diff:
            differences['freqlist'] = freqlist_diff
        
        if differences:
            report['differences_in_common'][feature] = differences
    
    return report

# Function to generate HTML delta report using <details>, <summary>, and nested <ul><li> for structure with collapsible nodetypes
def generate_html_delta_report(report):
    """
    Generates an HTML delta report from the comparison with collapsible sections using <details>, <summary>, and nested <ul><li> elements,
    making both features and nodetypes collapsible.
    """
    html = []
    html.append("""
    <!DOCTYPE html>
    <html lang='en'>
    <head>
    <meta charset='UTF-8'>
    <meta name='viewport' content='width=device-width, initial-scale=1.0'>
    <title>Delta Report</title>
    <style>
        body { font-family: Arial, sans-serif; margin: 20px; }
        .only-in-1 { color: #E74C3C; }
        .only-in-2 { color: #E67E22; }
        .diff-key { color: #8E44AD; }
        .freq-type { color: #16A085; }
        .freq-value { color: #D35400; }
        details { margin-bottom: 15px; }
        summary { cursor: pointer; font-weight: bold; color: #2980B9; }
        ul { list-style: none; padding-left: 0; } /* Enforce bullet removal */
        li { margin-left: 20px; } /* Add margin for list indentation */
        .api1 { color: #3498DB; }
        .api2 { color: #1ABC9C; }
        button { margin-bottom: 15px; padding: 10px 15px; cursor: pointer; }
    </style>
    <script>
        function toggleAllDetails(expand) {
            var detailsElements = document.querySelectorAll("details");
            detailsElements.forEach(details => {
                details.open = expand;
            });
        }

        function toggleSpecificLevel(levelClass, expand) {
            var detailsElements = document.querySelectorAll(levelClass);
            detailsElements.forEach(details => {
                details.open = expand;
            });
        }
    </script>
    </head>
    <body>
    """)
    
    # get details on the two Text-Fabric dataset
    liveName1=f'{A1.appName} - {A1.version}'
    if A1.provenance: 
        for parts in A1.provenance[0]:
            if isinstance(parts, tuple):
                key, value = parts[0], parts[1]
                if key == 'live': 
                    liveName1=value[0]
                    break
                
    liveName2=f'{A2.appName} - {A2.version}'
    if A2.provenance: 
        for parts in A2.provenance[0]:
            if isinstance(parts, tuple):
                key, value = parts[0], parts[1]
                if key == 'live': 
                    liveName2=value[0]
                    break

    html.append("<h1>Delta Report</h1>")
    html.append(f"<p>Dataset 1: <span class='only-in-1'>{liveName1}</span></p>")
    html.append(f"<p>Dataset 2: <span class='only-in-2'>{liveName2}</span></p>")
   
    # Add buttons to expand or collapse all details
    html.append("<button onclick='toggleAllDetails(false)'>Collapse All</button>")
    html.append("<button onclick='toggleSpecificLevel(\".level2\", true);toggleSpecificLevel(\".level3\", false);toggleSpecificLevel(\".level4\", false)'>Expand up to second level</button>") 
    html.append("<button onclick='toggleSpecificLevel(\".level2\", true);toggleSpecificLevel(\".level3\", true);toggleSpecificLevel(\".level4\", false)'>Expand up to third level</button>")
    html.append("<button onclick='toggleAllDetails(true)'>Expand All</button>")
    
    # check for node name and  number-range differences
    
    # Initialize empty dictionaries
    nodeIntervals1 = {}
    nodeIntervals2 = {}
    # Fill the dictionaries
    for nodeType in api1.F.otype.all: 
        nodeIntervals1[nodeType] = api1.F.otype.sInterval(nodeType)
    for nodeType in api2.F.otype.all: 
        nodeIntervals2[nodeType] = api2.F.otype.sInterval(nodeType)
    # Calculate key (node name) differences
    nodes_in_1_not_in_2 = set(nodeIntervals1.keys()) - set(nodeIntervals2.keys())
    nodes_in_2_not_in_1 = set(nodeIntervals2.keys()) - set(nodeIntervals1.keys())
    # Check if either set is not empty and print if true
    if nodes_in_1_not_in_2 or nodes_in_2_not_in_1:  
        if nodes_in_1_not_in_2:
            html.append("<details class='level2' open><summary> Nodenames only in Dataset 1</summary><ul>")
            for node in nodes_in_1_not_in_2:
                html.append(f"<li class='only-in-1'>{node}</li>")
            html.append("</ul></details>")
        if nodes_in_2_not_in_1:
            html.append("<details class='level2' open><summary> Nodenames only in Dataset 2</summary><ul>")
            for node in nodes_in_2_not_in_1:
                html.append(f"<li class='only-in-2'>{node}</li>")
            html.append("</ul></details>")

    # Compare tuple content for node number differences
    common_keys = set(nodeIntervals1.keys()) & set(nodeIntervals2.keys())
    different_values = {key: {'nodeIntervals1': nodeIntervals1[key], 'nodeIntervals2': nodeIntervals2[key]} 
    for key in common_keys if nodeIntervals1[key] != nodeIntervals2[key]}
    if different_values:
        html.append("<details class='level2' open><summary>Differences in nodenumber range for common nodenames</summary><ul>")
        for key, diff in different_values.items():
            html.append(f"<li><details class='level3'><summary>Nodename {key}</summary><ul>")
            html.append(f"<li>Dataset 1: <span class='only-in-1'>{diff['nodeIntervals1']}</span></li>")
            html.append(f"<li>Dataset 2: <span class='only-in-2'>{diff['nodeIntervals2']}</span></li>")
            html.append(f"</ul></details></li>")
        html.append("</ul></details>")
    
    # check for feature differences
    # Features only in dict1
    if report['only_in_dict1']:
        html.append("<details class='level2' open><summary>Features only in Dataset 1</summary><ul>")
        for feature in report['only_in_dict1']: html.append(f"<li class='only-in-1'>{feature}</li>")
        html.append("</ul></details>")

    # Features only in dict2
    if report['only_in_dict2']:
        html.append("<details class='level2' open><summary>Features only in Dataset 2</summary><ul>")
        for feature in report['only_in_dict2']: html.append(f"<li class='only-in-2'>{feature}</li>")
        html.append("</ul></details>")

    # Differences in common features
    if report['differences_in_common']:
        html.append("<details class='level2' open>")
        html.append("<summary>Differences in Common Features</summary>")
        html.append("<ul>")
        for feature, diffs in report['differences_in_common'].items():
            html.append(f"<li><details class='level3'><summary>Feature: {feature}</summary>")
            html.append("<ul>")
            for key, change in diffs.items():
                if key in ['descr', 'type', 'datatype']:
                    html.append(f"<li><strong style='color: #2980B9; '>{key.capitalize()} Difference:</strong>")
                    html.append("<ul>")
                    html.append(f"<li>Dataset 1: <span class='only-in-1'>{change['Dataset1']}</span></li>")
                    html.append(f"<li>Dataset 2: <span class='only-in-2'>{change['Dataset2']}</span></li>")
                    html.append("</ul></li>")
                elif key == 'freqlist':
                    freqlist = change
                    html.append("<li><details class='level4'><summary>Frequency List Differences</summary><ul>")
                    for nodetype, freq_diff in freqlist.items():
                        html.append(f"<li><details class='level4'><summary>Nodetype: {nodetype}</summary>")
                        html.append("<ul>")
                        dataset1_val = ', '.join([f"{t[0]}: {t[1]}" for t in freq_diff['Dataset1']]) if freq_diff['Dataset1'] else 'None'
                        dataset2_val = ', '.join([f"{t[0]}: {t[1]}" for t in freq_diff['Dataset2']]) if freq_diff['Dataset2'] else 'None'
                        html.append(f"<li>Dataset 1: <span class='only-in-1'>{dataset1_val}</span></li>")
                        html.append(f"<li>Dataset 2: <span class='only-in-2'>{dataset2_val}</span></li>")
                        html.append("</ul></details></li>")
                    html.append("</ul></details></li>")
            html.append("</ul></details></li>")
        html.append("</ul></details>")
    html.append(f"<p><small>Created on {formatted_time} with <a href='https://github.com/tonyjurg/Doc4TF/blob/main/tools/determineDeltaBetweenVersions.ipynb'>Doc4TF tool displayDeltaBetweenVersions</a> version {scriptVersion}.</small></p>")
    html.append("</body></html>")

    return "\n".join(html)

# Function to display HTML report in Jupyter Notebook
def display_html_report(report_html):
    display(HTML(report_html))

# Compare the dictionaries
delta_report = compare_feature_dicts(featureDict1, featureDict2)

# Generate the HTML delta report
report_html = generate_html_delta_report(delta_report)

# Display the report in the Jupyter Notebook
display_html_report(report_html)

Delta Report

Dataset 1: tonyjurg/Nestle1904LFT/tf v:0.7(rv0.8=#g95357e8bf298b090341cf277596be01f7f1f5ce9 offline under C:/Users/tonyj/text-fabric-data/github)

Dataset 2: saulocantanhede/tfgreek2/tf v:0.5.7(r0.5.8=#1a251a4a8daacae4cd5e02294a95d806b3964000 offline under C:/Users/tonyj/text-fabric-data/github)

Nodenames only in Dataset 2

subphrase
clause
phrase
group

Differences in nodenumber range for common nodenames

Nodename verse
- Dataset 1: (146078, 154020)
- Dataset 2: (382714, 390657)
Nodename wg
- Dataset 1: (154021, 259450)
- Dataset 2: (390658, 497525)
Nodename sentence
- Dataset 1: (138067, 146077)
- Dataset 2: (246833, 266535)

Features only in Dataset 1

booknumber
containedclause
gn
headverse
lex_dom
markafter
markbefore
markorder
monad
nodeID
nu
reference
roleclausedistance
sentence
sp_full
strongs
subj_ref
wgclass
wglevel
wgnum
wgrole
wgrolelong
wgrule
wgtype
word
wordlevel
wordrole
wordrolelong
wordtranslit
wordunacc

Features only in Dataset 2

appositioncontainer
articular
before
cls
cltype
criticalsign
crule
discontinuous
domain
frame
framespec
function
gender
id
lang
lemmatranslit
nodeid
note
num
parent
referent
rela
role
rule
strong
subjref
subjrefspec
text
trans
translit
typ
unaccent
variant

Differences in Common Features

Feature: verse
- Descr Difference:
  - Dataset 1: ✅ Verse number inside chapter
  - Dataset 2: verse number, from ref attribute in xml
- Frequency List Differences
  - Nodetype: verse
    
    Dataset 1: 1: 260
    
    Dataset 2: 1: 261
Feature: morph
- Descr Difference:
  - Dataset 1: ✅ Morphological tag (Sandborg-Petersen morphology)
  - Dataset 2: morphological code
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: V-PAI-3S: 2226, ADV: 2081, PRT-N: 1977, V-2AAI-3S: 1244, V-AAI-3S: 1225, V-PAP-NSM: 880, V-PAI-1S: 765, P-ASM: 746, V-PAN: 730, P-DSM: 700
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: CONJ: 16316, PREP: 10568, ADV: 3808, N-NSM: 3475, N-GSM: 2935, T-NSM: 2905, N-ASF: 2870, PRT-N: 2701, N-ASM: 2456, V-PAI-3S: 2271
Feature: punctuation
- Descr Difference:
  - Dataset 1: ✅ Punctuation after word
  - Dataset 2: this is XML attribute punctuation
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: : 37660, ,: 3903, .: 2731, ·: 1189, ;: 589
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: : 119264, ,: 9462, .: 5717, ·: 2359, ;: 971
  - Nodetype: word
    
    Dataset 1: : 119270
    
    Dataset 2: : 119264
Feature: number
- Descr Difference:
  - Dataset 1: ✅ Gramatical number of the verb (e.g. singular, plural)
  - Dataset 2: grammatical number
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: singular: 26293, plural: 12967
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: singular: 69846, plural: 29091
  - Nodetype: word
    
    Dataset 1: : 38842
    
    Dataset 2: None
Feature: book
- Descr Difference:
  - Dataset 1: ✅ Book name (in English language)
  - Dataset 2: book name (full name)
- Frequency List Differences
  - Nodetype: clause
    
    Dataset 1: None
    
    Dataset 2: Luke: 4880, Matthew: 4364, Acts: 4237, John: 3699, Mark: 2860, Revelation: 1803, I_Corinthians: 1487, Romans: 1401, Hebrews: 1040, II_Corinthians: 909
  - Nodetype: group
    
    Dataset 1: None
    
    Dataset 2: Acts: 1288, Luke: 1232, Matthew: 1165, Revelation: 909, John: 882, Mark: 753, I_Corinthians: 431, Romans: 362, Hebrews: 325, II_Corinthians: 222
  - Nodetype: wg
    
    Dataset 1: None
    
    Dataset 2: Luke: 8945, Matthew: 8165, Acts: 7770, John: 7207, Mark: 5363, Revelation: 3895, I_Corinthians: 3160, Romans: 2799, Hebrews: 1977, II_Corinthians: 1852
  - Nodetype: sentence
    
    Dataset 1: Luke: 1155, Matthew: 1133, John: 1038, Acts: 883, Mark: 727, I_Corinthians: 524, Revelation: 466, Romans: 465, II_Corinthians: 253, Hebrews: 241
    
    Dataset 2: Luke: 2833, Matthew: 2636, John: 2626, Acts: 2245, Mark: 1750, I_Corinthians: 1242, Revelation: 1183, Romans: 1036, II_Corinthians: 721, Hebrews: 612
Feature: tense
- Descr Difference:
  - Dataset 1: ✅ Gramatical tense of the verb (e.g. Present, Aorist)
  - Dataset 2: verbal tense
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: aorist: 11503, present: 11175, future: 1592, imperfect: 1547, perfect: 1450, pluperfect: 88
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: aorist: 11803, present: 11579, imperfect: 1689, future: 1626, perfect: 1572, pluperfect: 88
  - Nodetype: word
    
    Dataset 1: : 109422
    
    Dataset 2: None
Feature: gloss
- Descr Difference:
  - Dataset 1: ✅ English gloss
  - Dataset 2: English gloss (BGVB)
- Frequency List Differences
  - Nodetype: word
    
    Dataset 1: the: 9857, and: 6212, -: 5496, in: 2320, And: 2218, not: 2042, of the: 1551, for: 1501, that: 1498, you: 1226
    
    Dataset 2: the: 19783, and, also, likewise: 8978, he, she, it, himself, herself, itself; even, very; same: 5550, you: 2892, but, and: 2787, (with dat.) in: 2743, I: 2567, am, exist: 2457, say, tell: 2255, no, not: 1622
Feature: ref
- Descr Difference:
  - Dataset 1: ✅ Value of the ref ID (taken from XML sourcedata)
  - Dataset 2: biblical reference with word counting
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: 1CO 10:1!1: 1, 1CO 10:1!15: 1, 1CO 10:1!17: 1, 1CO 10:1!2: 1, 1CO 10:1!21: 1, 1CO 10:1!4: 1, 1CO 10:1!5: 1, 1CO 10:10!2: 1, 1CO 10:10!6: 1, 1CO 10:10!8: 1
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: 1CO 10:1!1: 1, 1CO 10:1!10: 1, 1CO 10:1!11: 1, 1CO 10:1!12: 1, 1CO 10:1!13: 1, 1CO 10:1!14: 1, 1CO 10:1!15: 1, 1CO 10:1!16: 1, 1CO 10:1!17: 1, 1CO 10:1!18: 1
Feature: ln
- Descr Difference:
  - Dataset 1: ✅ Lauw-Nida lexical classification (not present everywhere?)
  - Dataset 2: ln
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: 92.11: 2617, 33.69: 2334, 69.3: 1399, 92.1: 920, 92.27: 812, 92.7: 812, 13.1: 699, 13.4: 535, 92.29: 522, 15.81: 471
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: 92.24: 19738, 92.11: 4718, 89.92: 2903, 89.87: 2756, 33.69: 2336, 69.3: 1736, 92.1: 1732, 92.7: 1494, 12.1: 1247, 92.29: 1090
  - Nodetype: word
    
    Dataset 1: 92.24: 19781, : 10488
    
    Dataset 2: 92.24: 19738, 92.29: 1090
Feature: degree
- Descr Difference:
  - Dataset 1: ✅ Degree (e.g. Comparitative, Superlative)
  - Dataset 2: grammatical degree
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: comparative: 119, superlative: 32
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: comparative: 313, superlative: 200
  - Nodetype: word
    
    Dataset 1: : 137266
    
    Dataset 2: None
Feature: normalized
- Descr Difference:
  - Dataset 1: ✅ Surface word with accents normalized and trailing punctuations removed
  - Dataset 2: lemma normalized
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: αὐτόν: 746, μή: 717, αὐτῷ: 710, οὐκ: 660, εἶπεν: 586, ἐστιν: 556, αὐτοῖς: 491, ὑμῖν: 475, οὐ: 378, λέγει: 331
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: καί: 8576, ὁ: 2769, δέ: 2764, ἐν: 2684, τοῦ: 2497, εἰς: 1755, τό: 1664, τόν: 1562, τήν: 1523, αὐτοῦ: 1411
  - Nodetype: word
    
    Dataset 1: καί: 8576, δέ: 2764, τό: 1664, τόν: 1562, τήν: 1523
    
    Dataset 2: καί: 8576, δέ: 2764, τό: 1664, τόν: 1562, τήν: 1523
Feature: lemma
- Descr Difference:
  - Dataset 1: ✅ Lexeme (lemma)
  - Dataset 2: lexical lemma
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: αὐτός: 2839, λέγω: 2252, εἰμί: 2251, σύ: 1468, ἐγώ: 1247, οὐ: 1182, ὅς: 1111, μή: 779, ἔχω: 707, γίνομαι: 663
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: ὁ: 19783, καί: 8978, αὐτός: 5561, σύ: 2892, δέ: 2787, ἐν: 2743, ἐγώ: 2567, εἰμί: 2457, λέγω: 2255, εἰς: 1766
  - Nodetype: word
    
    Dataset 1: καί: 8978, αὐτός: 5561, σύ: 2892, δέ: 2787, ἐγώ: 2567, εἰμί: 2457, λέγω: 2255
    
    Dataset 2: καί: 8978, αὐτός: 5561, σύ: 2892, δέ: 2787, ἐγώ: 2567, εἰμί: 2457, λέγω: 2255
Feature: type
- Descr Difference:
  - Dataset 1: ✅ Gramatical type of noun or pronoun (e.g. Common, Personal)
  - Dataset 2: morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)
- Frequency List Differences
  - Nodetype: clause
    
    Dataset 1: None
    
    Dataset 2: wrapper-clause-scope: 191, group: 107, apposition-group: 20
  - Nodetype: word
    
    Dataset 1: : 93321
    
    Dataset 2: None
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: modifier-scope: 29645, common: 23644, personal: 11521, wrapper-scope: 11264, proper: 4639, group: 2325, demonstrative: 1722, modifier-clause-scope: 1712, relative: 1674, interrogative: 633
  - Nodetype: wg
    
    Dataset 1: None
    
    Dataset 2: modifier-scope: 29645, wrapper-clause-scope: 12166, wrapper-scope: 11264, conjuncted-wg: 8075, group: 4957, modifier-clause-scope: 1712, apposition-group: 891
  - Nodetype: group
    
    Dataset 1: None
    
    Dataset 2: conjuncted-wg: 8075, apposition-group: 870
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: modifier-scope: 10484, wrapper-scope: 9535, personal: 5885, common: 2120, relative: 1364, group: 952, modifier-clause-scope: 755, demonstrative: 744, proper: 683, interrogative: 480
  - Nodetype: sentence
    
    Dataset 1: None
    
    Dataset 2: wrapper-clause-scope: 11975, group: 2525, apposition-group: 1
Feature: unicode
- Descr Difference:
  - Dataset 1: ✅ Word as it apears in the text in Unicode (incl. punctuations)
  - Dataset 2: word in unicode characters plus material after it
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: μὴ: 669, οὐκ: 660, αὐτῷ: 602, εἶπεν: 560, αὐτὸν: 519, αὐτοῖς: 420, ἐστιν: 383, οὐ: 378, λέγει: 318, ὑμῖν: 283
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: καὶ: 8541, ὁ: 2768, ἐν: 2683, δὲ: 2619, τοῦ: 2497, εἰς: 1755, τὸ: 1657, τὸν: 1556, τὴν: 1518, τῆς: 1300
Feature: case
- Descr Difference:
  - Dataset 1: ✅ Gramatical case (Nominative, Genitive, Dative, Accusative, Vocative)
  - Dataset 2: grammatical case
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: nominative: 9609, accusative: 6170, dative: 3265, genitive: 1408, vocative: 1
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: nominative: 24197, accusative: 23031, genitive: 19515, dative: 12126, vocative: 649
  - Nodetype: word
    
    Dataset 1: : 58261
    
    Dataset 2: None
Feature: after
- Descr Difference:
  - Dataset 1: ✅ Characters (eg. punctuations) following the word
  - Dataset 2: material after the end of the word
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: : 37661, ,: 3892, .: 2724, ·: 1187, ;: 588, ,—: 8, ).: 4, —: 3, ,): 2, ·—: 2
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: : 119261, ,: 9439, .: 5704, ·: 2355, ;: 969, ,—: 18, —: 7, ).: 6, .]]: 4, ·—: 4
  - Nodetype: word
    
    Dataset 1: : 119270, , : 9462, . : 5717, · : 2359, ; : 971
    
    Dataset 2: : 119261, ,: 9439, .: 5704, ·: 2355, ;: 969, ,—: 18, —: 7, ).: 6, .]]: 4, ·—: 4
Feature: chapter
- Descr Difference:
  - Dataset 1: ✅ Chapter number inside book
  - Dataset 2: chapter number, from ref attribute in xml
- Frequency List Differences
  - Nodetype: verse
    
    Dataset 1: 4: 509
    
    Dataset 2: 4: 510
  - Nodetype: sentence
    
    Dataset 1: 1: 519, 3: 497, 4: 496, 2: 489, 5: 481, 6: 404, 12: 399, 9: 398, 11: 390, 8: 386
    
    Dataset 2: None
Feature: voice
- Descr Difference:
  - Dataset 1: ✅ Gramatical voice of the verb (e.g. active,passive)
  - Dataset 2: verbal voice
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: active: 20154, passive: 3345, middle: 2187, middlepassive: 1669
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: active: 20742, passive: 3493, middle: 2408, middlepassive: 1714
  - Nodetype: word
    
    Dataset 1: : 109422
    
    Dataset 2: None
Feature: person
- Descr Difference:
  - Dataset 1: ✅ Gramatical person of the verb (first, second, third)
  - Dataset 2: grammatical person
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: third: 12474, second: 3447, first: 2886
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: third: 12747, second: 3729, first: 2943
  - Nodetype: word
    
    Dataset 1: : 118360
    
    Dataset 2: None
Feature: clausetype
- Descr Difference:
  - Dataset 1: ✅ Clause type details (e.g. Verbless, Minor)
  - Dataset 2: clause type
- Frequency List Differences
  - Nodetype: sentence
    
    Dataset 1: None
    
    Dataset 2: nominalized: 59
  - Nodetype: clause
    
    Dataset 1: None
    
    Dataset 2: nominalized: 5237
  - Nodetype: wg
    
    Dataset 1: : 102662, VerbElided: 1009, Verbless: 929, Minor: 830
    
    Dataset 2: nominalized: 5296
Feature: sp
- Descr Difference:
  - Dataset 1: ✅ Part of Speech (abbreviated)
  - Dataset 2: part-of-speach
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: verb: 27355, pron: 8751, advb: 4384, subs: 2822, adjv: 2304, art: 257, intj: 90, conj: 85, num: 25, prep: 4
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: subs: 28455, verb: 28357, art: 19786, conj: 18227, pron: 16177, prep: 10914, adjv: 8452, advb: 6147, intj: 788, num: 476
  - Nodetype: word
    
    Dataset 1: noun: 28455, det: 19786, adj: 8452, adv: 6147, ptcl: 773
    
    Dataset 2: subs: 28455, art: 19786, adjv: 8452, advb: 6147, intj: 788
Feature: junction
- Descr Difference:
  - Dataset 1: ✅ Junction data related to a wordgroup
  - Dataset 2: type of junction
- Frequency List Differences
  - Nodetype: clause
    
    Dataset 1: None
    
    Dataset 2: coordinate: 8186, subordinate: 7449
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: subordinate: 116, coordinate: 64
  - Nodetype: wg
    
    Dataset 1: : 103128, apposition: 2302
    
    Dataset 2: coordinate: 9367, subordinate: 8554
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: subordinate: 57
  - Nodetype: sentence
    
    Dataset 1: None
    
    Dataset 2: coordinate: 1117, subordinate: 989
Feature: mood
- Descr Difference:
  - Dataset 1: ✅ Gramatical mood of the verb (passive, etc)
  - Dataset 2: verbal mood
- Frequency List Differences
  - Nodetype: phrase
    
    Dataset 1: None
    
    Dataset 2: indicative: 15245, participle: 6320, infinitive: 2228, subjunctive: 1832, imperative: 1663, optative: 67
  - Nodetype: subphrase
    
    Dataset 1: None
    
    Dataset 2: indicative: 15617, participle: 6653, infinitive: 2285, imperative: 1877, subjunctive: 1856, optative: 69
  - Nodetype: word
    
    Dataset 1: : 109422
    
    Dataset 2: None
Feature: bookshort
- Descr Difference:
  - Dataset 1: ✅ Book name (abbreviated)
  - Dataset 2: book name (abbreviated) from ref attribute in xml
- Frequency List Differences
  - Nodetype: clause
    
    Dataset 1: None
    
    Dataset 2: LUK: 4880, MAT: 4364, ACT: 4237, JHN: 3699, MRK: 2860, REV: 1803, 1CO: 1487, ROM: 1401, HEB: 1040, 2CO: 909
  - Nodetype: word
    
    Dataset 1: Luke: 19456, Acts: 18393, Matt: 18299, John: 15643, Mark: 11277, Rev: 9832, Rom: 7100, 1Cor: 6820, Heb: 4955, 2Cor: 4469
    
    Dataset 2: LUK: 19456, ACT: 18393, MAT: 18299, JHN: 15643, MRK: 11277, REV: 9832, ROM: 7100, 1CO: 6820, HEB: 4955, 2CO: 4469
  - Nodetype: book
    
    Dataset 1: 1Cor: 1, 1John: 1, 1Pet: 1, 1Thess: 1, 1Tim: 1, 2Cor: 1, 2John: 1, 2Pet: 1, 2Thess: 1, 2Tim: 1
    
    Dataset 2: 1CO: 1, 1JN: 1, 1PE: 1, 1TH: 1, 1TI: 1, 2CO: 1, 2JN: 1, 2PE: 1, 2TH: 1, 2TI: 1
  - Nodetype: wg
    
    Dataset 1: None
    
    Dataset 2: LUK: 8945, MAT: 8165, ACT: 7770, JHN: 7207, MRK: 5363, REV: 3895, 1CO: 3160, ROM: 2799, HEB: 1977, 2CO: 1852
  - Nodetype: group
    
    Dataset 1: None
    
    Dataset 2: ACT: 1288, LUK: 1232, MAT: 1165, REV: 909, JHN: 882, MRK: 753, 1CO: 431, ROM: 362, HEB: 325, 2CO: 222
  - Nodetype: sentence
    
    Dataset 1: None
    
    Dataset 2: LUK: 2833, MAT: 2636, JHN: 2626, ACT: 2245, MRK: 1750, 1CO: 1242, REV: 1183, ROM: 1036, 2CO: 721, HEB: 612

Created on 2024-09-26 21:07:30 with Doc4TF tool displayDeltaBetweenVersions version 0.2.

5.2 - Download the report ¶

You can also download the report as an HTML file by running the following cell.

Optional action:¶

Execute the following cell to create the download link.

In [192]:

from IPython.display import HTML
import base64

def create_download_link(html_content, file_name):
    # Encode the HTML content to base64
    b64_html = base64.b64encode(html_content.encode()).decode()
    
    # Create the HTML download link
    download_link = f'''
    <a download="{file_name}" href="data:text/html;base64,{b64_html}">
        <button>Download HTML File</button>
    </a>
    '''
    return HTML(download_link)

# Display the download link in the notebook
create_download_link(report_html, 'report.html')

Out[192]:

6 - Change log ¶

Back to TOC ¶

Version 0.2 (26 September 2024):

Added functionality:

comparing description and datatype

dynamicaly show/hide parts of the output

create a download link

Version 0.1 (25 September 2024):

initial implementation of enhancement feature 17.

7 - License ¶

Back to TOC ¶

Licenced under Creative Commons Attribution 4.0 International (CC BY 4.0)

Tool to determine what features and featurevalues were changed between two Text-Fabric datasets¶

Table of content ¶

1 - Introduction ¶

2 - Preparing the environment¶

2.1 - Setting script version¶

Required user action:¶

2.2 - Setting script parameters¶

Required user action:¶

2.3 - Load Text-Fabric code¶

Required user action:¶

3 - Load the two Text-Fabric datasets¶

3.1 - Load the first dataset¶

Required user action:¶

3.2 - Load the second dataset¶

Required user action:¶

4 - Create dictionaries for the two datasets¶

Required action:¶

5 - Report the delta between the datasets¶

5.1 - Generate and view the report¶

Required action:¶

Delta Report

5.2 - Download the report¶

Optional action:¶

6 - Change log¶

7 - License¶

2 - Preparing the environment ¶

2.1 - Setting script version ¶

2.2 - Setting script parameters ¶

2.3 - Load Text-Fabric code ¶

3 - Load the two Text-Fabric datasets ¶

3.1 - Load the first dataset ¶

3.2 - Load the second dataset ¶

4 - Create dictionaries for the two datasets ¶

5 - Report the delta between the datasets ¶

5.1 - Generate and view the report ¶

5.2 - Download the report ¶

6 - Change log ¶

7 - License ¶