Notebook

Doc4TF ¶

Automatic creation of feature documentation for existing Text-Fabric datasets¶

Version: 0.4 (Feb. 20, 2024); fixing bug 10 & adding info for functions.

Table of content ¶

1 - Introduction
2 - Setting up the environment
3 - Load Text-Fabric data
4 - Creation of the dataset
- 4.1 - Setting up some global variables
- 4.2 - Store all relevant data into a dictionary
5 - Create the documentation pages
- 5.1 - Create the set of feature pages
- 5.2 - Create the index pages
6 - Licence

1 - Introduction ¶

Back to TOC ¶

Ideally, a comprehensive documentation set should be created as part of developing a Text-Fabric dataset. However, in practice, this is not always completed during the initial phase or after changes to features. This Jupyter Notebook contains Python code to automatically generate (and thus ensure consistency) a documentation set for any Text-Fabric dataset. It serves as a robust starting point for the development of a brand new documentation set or as validation for an existing one. One major advantage is that the resulting documentation set is fully hyperlinked, a task that can be laborious if done manually.

The main steps in producing the documentation set are:

Load a Text-Fabric database
Execute the code pressent in the subsequent cells. The code will:
- Construct the python dictionarie stroring relevant data from the TF datase
- Create separate files for each feature
- Create a set of overview pages sorting the nodes accordingly

The output format can be either Markdown, the standard for feature documentation stored on GitHub using its on-site processor, or HTML, which facilitates local storage and browsing with any web browser.

2. Setting up the environment ¶

Back to TOC ¶

Your environment should (for obvious reasons) include the Python package Text-Fabric. If not installed yet, it can be installed using pip. Further it is required to be able to invoke the Text-Fabric data set (either from an online resource, or from a localy stored copy). There are no further requirements as the scripts basicly operate 'stand alone'.

3 - Load Text-Fabric data ¶

Back to TOC ¶

At this step, the Text-Fabric dataset is loaded, which embedded data will be used to create a documentation set. For various options regarding other possible storage locations, see the documentation for function use.

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

# Loading the Text-Fabric code
# Note: it is assumed Text-Fabric is installed in your environment
from tf.fabric import Fabric
from tf.app import use

In [3]:

# load the app and data
A = use ("saulocantanhede/tfgreek2", version="0.5.5", hoist=globals())

Locating corpus resources ...

app: ~/text-fabric-data/github/saulocantanhede/tfgreek2/app

data: ~/text-fabric-data/github/saulocantanhede/tfgreek2/tf/0.5.5

TF: TF API 12.3.2, saulocantanhede/tfgreek2/app v3, Search Reference
Data: saulocantanhede - tfgreek2 0.5.5, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	27	5102.93	100
chapter	260	529.92	100
verse	7944	17.34	100
sentence	19767	13.79	198
group	8964	7.02	46
clause	30479	7.19	159
wg	106868	6.88	533
phrase	66424	1.93	93
subphrase	119013	1.59	138
word	137779	1.00	100

Sets: no custom sets
Features:

Nestle 1904 Greek New Testament

after

str

material after the end of the word

appositioncontainer

int

1 if it is an apposition container

articular

int

1 if the sentence, group, clause, phrase or wg has an article

before

str

this is XML attribute before

book

str

book name (full name)

bookshort

str

book name (abbreviated) from ref attribute in xml

case

str

grammatical case

chapter

int

chapter number, from ref attribute in xml

clausetype

str

clause type

cls

str

this is XML attribute cls

cltype

str

clause type

criticalsign

str

this is XML attribute criticalsign

crule

str

clause rule (from xml attribute Rule)

degree

str

grammatical degree

discontinuous

int

1 if the word is out of sequence in the xml

domain

str

domain

framespec

str

this is XML attribute framespec

function

str

this is XML attribute function

gender

str

grammatical gender

gloss

str

short translation

str

xml id

junction

str

type of junction

lang

str

language the text is in

lemma

str

lexical lemma

str

mood

str

verbal mood

morph

str

morphological code

nodeid

int

node id (as in the XML source data

normalized

str

lemma normalized

note

str

annotation of linguistic nature

num

int

generated number (not in xml): book: (Matthew=1, Mark=2, ..., Revelation=27); sentence: numbered per chapter; word: numbered per verse.

number

str

grammatical number

otype

str

person

str

grammatical person

punctuation

str

this is XML attribute punctuation

ref

str

biblical reference with word counting

referent

str

number of referent

rela

str

this is XML attribute rela

role

str

role

rule

str

syntactical rule

strong

int

strong number

subjrefspec

str

this is XML attribute subjrefspec

tense

str

verbal tense

text

str

the text of a word

typ

str

this is XML attribute typ

type

str

morphological type (on word), syntactical type (on sentence, group, clause, phrase or wg)

unicode

str

word in unicode characters plus material after it

variant

str

this is XML attribute variant

verse

int

verse number, from ref attribute in xml

voice

str

verbal voice

frame

str

frame

oslots

none

parent

none

parent relationship between words

subjref

none

number of subject referent

Settings:

specified

apiVersion: 3
appName: saulocantanhede/tfgreek2
appPath:
C:/Users/tonyj/text-fabric-data/github/saulocantanhede/tfgreek2/app
commit: 2077c78df80d795478638c57d705833a8e40c7f4
css: ''
dataDisplay:
- excludedFeatures: []
- noneValues:
  - none
  - unknown
  - no value
  - NA
- sectionSep1: @
- textFormat: text-orig-full
docs:
- docPage: about
- featureBase: {docBase}/transcription.md
- featurePage: transcription
interfaceDefaults: {fmt: text-orig-full}
isCompatible: True
local: local
localDir:
C:/Users/tonyj/text-fabric-data/github/saulocantanhede/tfgreek2/_temp
provenanceSpec:
- corpus: Nestle 1904 Greek New Testament
- doi: 10.5281/zenodo.notyet
- moduleSpecs: []
- org: saulocantanhede
- relative: /tf
- repo: tfgreek2
- version: 0.5.5
- webBase: https://learner.bible/text/show_text/nestle1904/
- webHint: Show this on the website
- webLang: en
- webUrl:
  https://learner.bible/text/show_text/nestle1904/<1>/<2>/<3>
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: 0.5.5
typeDisplay:
- clause:
  - condense: True
  - label: #{num}: {cls} {rule} {junction}
  - style: ''
- group:
  - label: #{num}:
  - style: ''
- phrase:
  - condense: True
  - label: #{num}: {function} {role} {rule} {type}
  - style: ''
- sentence:
  - label: #{num}: {rule}
  - style: ''
- verse:
  - condense: True
  - label: {book} {chapter}:{verse}
  - style: ''
- wg:
  - condense: True
  - label: #{num}: {type} {role} {rule} {junction}
  - style: ''
- word: {base: True}
writing: grc

TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

4 - Creation of the dataset ¶

4.1 - Setting up some global variables ¶

Back to TOC ¶

In [4]:

# If the following variable is set, it will be used as title for all pages. It is intended to the describe the dataset in one line
# customPageTitleMD="N1904 Greek New Testament [saulocantanhede/tfgreek2 - 0.5.4](https://github.com/saulocantanhede/tfgreek2)"
# customPageTitleHTML="N1904 Greek New Testament <a href=\"https://github.com/saulocantanhede/tfgreek2\">saulocantanhede/tfgreek2 - 0.5.4</a>"

# Specify the location to store the resulting files, relative to the location of this notebook (without a trailing slash).
resultLocation = "results"

# Type of output format ('html' for HTML, 'md' for Mark Down, or 'both' for both HTML and Mark Down)
typeOutput='both'

# HTML table style definition (only relevant for HTML output format)
htmlStyle='<style>\ntable {\nborder-collapse: collapse;\n}\n th, td {\nborder: 1px solid black;\n padding: 8px;\n}\nth {\nfont-weight: bold;\n}\n</style>'

# Limit the number of entries in the frequency tables per node type on each feature description page to this number
tableLimit=10

# This switch can be set to 'True' if you want additional information, such as dictionary entries and file details, to be printed. For basic output, set this switch to 'False'.
verbose=False

# The version number of the script
scriptVersion="0.3"
scriptDate="Jan. 24, 2024"


# Create the footers for MD and HTML, include today's date
from datetime import datetime
today = datetime.today()
formatted_date = today.strftime("%b. %d, %Y")
footerMD=f'\n\nCreated on {formatted_date} using [Doc4TF  version {scriptVersion} ({scriptDate})](https://github.com/tonyjurg/Doc4TF)'
footerHTML=f'\n<p>Created on {formatted_date} using <a href=\"https://github.com/tonyjurg/Doc4TF\">Doc4TF - version {scriptVersion} ({scriptDate})</a></p></body></html>'

4.2 - Store all relevant data into a dictionary ¶

Back to TOC ¶

The following will create a dictionary containing all relevant information for the loaded node and edge features.

In [5]:

# Initialize an empty dictionary to store feature data
featureDict = {}
import time
overallTime = time.time()

def getFeatureDescription(metaData):
    """
    This function looks for the 'description' key in the metadata dictionary. If the key is found,
    it returns the corresponding description. If the key is not present, it returns a default 
    message indicating that no description is available.

    Parameters:
       metaData (dict): A dictionary containing metadata about a feature.

    Returns:
       str: The description of the feature if available, otherwise a default message.
    """
    return metaData.get('description', "No feature description")

def setDataType(metaData):
    """
    This function checks for the 'valueType' key in the metadata. If the key is present, it
    returns 'String' if the value is 'str', and 'Integer' for other types. If the 'valueType' key
    is not present, it returns 'Unknown'.

    Parameters:
       metaData (dict): A dictionary containing metadata, including the 'valueType' of a feature.

    Returns:
       str: A string indicating the determined data type ('String', 'Integer', or 'Unknown').
    """
    if 'valueType' in metaData:
        return "String" if metaData["valueType"] == 'str' else "Integer"
    return "Unknown"


def processFeature(feature, featureType, featureMethod):
    """
    Processes a given feature by extracting metadata, description, and data type, and then
    compiles frequency data for different node types in a feature dictionary. Certain features
    are skipped based on their type. The processed data is added to a global feature dictionary.

    Parameters:
       feature (str): The name of the feature to be processed.
       featureType (str): The type of the feature ('Node' or 'Edge').
       featureMethod (function): A function to obtain feature data.

    Returns:
       None: The function updates a global dictionary with processed feature data and does not return anything.
    """
    
    # Obtain the meta data
    featureMetaData = featureMethod(feature).meta
    featureDescription = getFeatureDescription(featureMetaData)
    dataType = setDataType(featureMetaData)

    # Initialize dictionary to store feature frequency data
    featureFrequencyDict = {}

    # Skip for specific features based on type
    if not (featureType == 'Node' and feature == 'otype') and not (featureType == 'Edge' and feature == 'oslots'):
        for nodeType in F.otype.all:
            frequencyLists = featureMethod(feature).freqList(nodeType)
            if not isinstance(frequencyLists, int):
                if len(frequencyLists)!=0:
                    featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': frequencyLists[:tableLimit]}
            elif isinstance(frequencyLists, int):
                if frequencyLists != 0:
                    featureFrequencyDict[nodeType] = {'nodetype': nodeType, 'freq': [("Link", frequencyLists)]}

    # Add processed feature data to the main dictionary
    featureDict[feature] = {'name': feature, 'descr': featureDescription, 'type': featureType, 'datatype': dataType, 'freqlist': featureFrequencyDict}
    
########################################################
#                     MAIN FUNCTION                    #
########################################################

########################################################
#             Gather general information               #
########################################################

print('Gathering generic details')

# Initialize default values
corpusName = A.appName
liveName = ''
versionName = A.version

# Trying to locate corpus information
if A.provenance:
    for parts in A.provenance[0]: 
        if isinstance(parts, tuple):
            key, value = parts[0], parts[1]
            if verbose: print (f'General info: {key}={value}')
            if key == 'corpus': corpusName = value
            if key == 'version': versionName = value
            # value for live is a tuple
            if key == 'live': liveName=value[1]
if liveName is not None and len(liveName)>1:
    # an URL was found
    pageTitleMD = f'Doc4TF pages for [{corpusName}]({liveName}) (version {versionName})'
    pageTitleHTML = f'<h1>Doc4TF pages for <a href="{liveName}">{corpusName}</a> (version {versionName})</h1>'
else:
    # No URL found
    pageTitleMD = f'Doc4TF pages for {corpusName} (version {versionName})'
    pageTitleHTML = f'<h1>Doc4TF pages for {corpusName} (version {versionName})</h1>'

# Overwrite in case user provided a title
if 'customPageTitleMD_' in globals():
    pageTitleMD = customPageTitleMD
if 'customPageTitleHTML' in globals():
    pageTitleMD = customPageTitleHTML

    
########################################################
#             Processing node features                 #
########################################################

print('Analyzing Node Features: ', end='')
for nodeFeature in Fall():
    if not verbose: print('.', end='')  # Progress indicator
    processFeature(nodeFeature, 'Node', Fs)
    if verbose: print(f'\nFeature {nodeFeature} = {featureDict[nodeFeature]}\n')  # Print feature data if verbose

########################################################
#             Processing edge features                 #
########################################################

print('\nAnalyzing Edge Features: ', end='')
for edgeFeature in Eall():
    if not verbose: print('.', end='')  # Progress indicator
    processFeature(edgeFeature, 'Edge', Es)
    if verbose: print(f'\nFeature {edgeFeature} = {featureDict[edgeFeature]}\n')  # Print feature data if verbose

print(f'\nFinished in {time.time() - overallTime:.2f} seconds.')

Gathering generic details
Analyzing Node Features: ..................................................
Analyzing Edge Features: ....
Finished in 12.62 seconds.

5 - Create the documentation pages ¶

Two types of pages will be created:

Feature description pages (one per feature)
Set of index pages (linking to the feature pages)

5.1 - Create the set of feature pages ¶

Back to TOC ¶

In [6]:

import os
import time
overallTime = time.time()

# Initialize a counter for the number of files created
filesCreated = 0
# Get the current working directory and append a backslash for path building
pathFull = os.getcwd() + '\\'

# Iterating over each feature in the feature dictionary
for featureName, featureData in featureDict.items():
    # Extracting various properties of each feature
    featureDescription = featureData.get('descr')
    featureType = featureData.get('type')
    featureDataType = featureData.get('datatype')
    
    # Initializing strings to accumulate HTML and Markdown content
    nodeListHTML = nodeListMD = ''
    tableListHTML = tableListMD = ''
    frequencyData = featureData.get('freqlist')

    # Processing frequency data for each node
    for node in frequencyData:
        # Building HTML and Markdown links for each node
        nodeListHTML += f' <a href=\"featurebynodetype.htm#{node}\">{node}</a>'
        nodeListMD += f' [`{node}`](featurebynodetype.md#{node}) '

        # Starting HTML and Markdown tables for frequency data
        tableListHTML += f'<h3>Frequency for nodetype <a href=\"featurebynodetype.htm#{node}\">{node}</a></h3><table><tr><th>Value</th><th>Occurences</th></tr>'
        tableListMD += f'### Frequency for nodetype [{node}](featurebynodetype.md#{node})\nValue|Occurences\n---|---\n'

        # Populating tables with frequency data
        itemData = frequencyData.get(node).get('freq')
        for item in itemData:
            handleSpace = item[0] if item[0] != ' ' else 'space' # prevent garbling of tables where the value itself is a space
            tableListHTML += f'<tr><td>{handleSpace}</td><td>{item[1]}</td></tr>'
            tableListMD += f'{handleSpace}|{item[1]}\n'
        tableListHTML += f'</table>\n'

    # Creating info blocks for HTML and Markdown
    infoBlockHTML = f'<table><tr><th>Data type</th><th>Feature type</th><th>Available for nodes</th></tr><tr><td><a href=\"featurebydatatype.htm#{featureDataType}\">{featureDataType}</a></td><td><a href="featurebytype.htm#{featureType}">{featureType}</a></td><td>{nodeListHTML}</td></tr></table>'
    infoBlockMD = f'Data type|Feature type|Available for nodes\n---|---|---\n[`{featureDataType}`](featurebydatatype.md#{featureDataType.lower()})|[`{featureType}`](featurebytype.md#{featureType.lower()})|{nodeListMD}'

    # Outputting in Markdown format
    if typeOutput in ('md','both'):
        pageMD = f'{pageTitleMD}\n# Feature: {featureName}\n{infoBlockMD}\n## Description\n{featureDescription}\n## Feature Values\n{tableListMD} {footerMD} '
        fileNameMD = os.path.join(resultLocation, f"{featureName}.md")
        try:
            with open(fileNameMD, "w", encoding="utf-8") as file:
                file.write(pageMD)
                filesCreated += 1
                # Log if verbose mode is on
                if verbose: print(f"Markdown content written to {pathFull + fileNameMD}")
        except Exception as e:
            print(f"Exception: {e}")
            break  # Stops execution on encountering an exception

    # Outputting in HTML format
    if typeOutput in ('html','both'):
        pageHTML = f'<html><head>{htmlStyle}</head><body><p>{pageTitleHTML}</p>\n<h1 id=\"start\">Feature: {featureName}</h1>\n{infoBlockHTML}\n<h2>Description</h2>\n<p>{featureDescription}</p>\n<h2>Feature Values</h2>\n{tableListHTML} {footerHTML}'
        fileNameHTML = os.path.join(resultLocation, f"{featureName}.htm")
        try:
            with open(fileNameHTML, "w", encoding="utf-8") as file:
                file.write(pageHTML)
                filesCreated += 1
                # Log if verbose mode is on
                if verbose: print(f"HTML content written to {pathFull + fileNameHTML}")
        except Exception as e:
            print(f"Exception: {e}")
            break  # Stops execution on encountering an exception

# Reporting the number of files created
if filesCreated != 0:
    print(f'Finished in {time.time() - overallTime:.2f} seconds (written {filesCreated} {"html and md" if typeOutput == "both" else typeOutput} files to directory {pathFull + resultLocation})')
else:
    print('No files written')

Finished in 0.14 seconds (written 108 html and md files to directory C:\Users\tonyj\OneDrive\Documents\GitHub\Doc4TF\results)

5.2 - Create the index pages ¶

Back to TOC ¶

In [7]:

import os
import time
overallTime = time.time()

# Initialize a counter for the number of files created
filesCreated = 0

def exampleData(feature):
    """
    This function checks if the specified feature exists in the global `featureDict` and if it 
    has a non-empty frequency list. If so, it extracts the first few values from this frequency 
    list to create a list of examples.

    Parameters:
      feature (str): The name of the feature for which examples are to be created.

    Returns:
      str: A string containing the examples concatenated together. Returns "No values" if the 
           feature does not exist in `featureDict` or if it has an empty frequency list.
    """
    # Check if the feature exists in featureDict and has non-empty freqlist.
    if feature in featureDict and featureDict[feature]['freqlist']:
        # Get the first value from the freqlist
        freq_list = next(iter(featureDict[feature]['freqlist'].values()))['freq']
        # Use list comprehension to create the example list. 
        example_list = ' '.join(f'`{item[0]}`' for item in freq_list[:4])
        return example_list
    else:
        return "No values"

    
def writeToFile(fileName, content, fileType, verbose):
    """
    Writes provided content to a specified file. If verbose is True, prints a confirmation message.
    This function attempts to write the given content to a file with the specified name. It handles 
    any exceptions during writing and can optionally print a message upon successful writing. The function 
    also increments a global counter `filesCreated` for each successful write operation.

    Parameters:
       fileName (str): The name of the file to write to.
       content (str):  The content to be written to the file.
       fileType (str): The type of file (used for informational messages; e.g., 'md' for Markdown, 'html' for HTML).
       verbose (bool): If True, prints a message upon successful writing.

    Returns:
       None: The function does not return a value but writes content to a file and may print messages.
    """
    global filesCreated
    try:
        with open(fileName, "w", encoding="utf-8") as file:
            file.write(content)
            filesCreated+=1
            if verbose: 
                print(f"{fileType.upper()} content written to {fileName}")
    except Exception as e:
        print(f"Exception while writing {fileType.upper()} file: {e}")

# Set up some lists
nodeFeatureList = []
typeFeatureList = []
dataTypeFeatureList = []

for featureName, featureData in featureDict.items():
    typeFeatureList.append((featureName,featureData.get('type')))
    dataTypeFeatureList.append((featureName,featureData.get('datatype')))
    for node in featureData.get('freqlist'):
        nodeFeatureList.append((node, featureName))
        
########################################################### 
# Create the page with overview per node type (e.g. word) #
###########################################################
        
pageMD=f'{pageTitleMD}\n# Overview features per nodetype\n'
pageHTML=f'<html><head>{htmlStyle}</head><body><p>{pageTitleHTML}</p>\n<h1>Overview features per nodetype</h1>'

# Sort the list alphabetically based on the second item of each tuple (featureName)
nodeFeatureList = sorted(nodeFeatureList, key=lambda x: x[1])
# Iterate over node types
for NodeType in F.otype.all:
    NodeItemTextMD=f'## {NodeType}\n\nFeature|Featuretype|Datatype|Description|Examples\n---|---|---|---|---\n' 
    NodeItemTextHTML=f'<h2 id=\"{NodeType}\">{NodeType}</h2>\n<table><tr><th>Feature</th><th>Featuretype</th><th>Datatype</th><th>Description</th><th>Examples</th></tr>\n' 
    for node, feature in nodeFeatureList:
        if node == NodeType: 
            featureData=featureDict[feature]
            featureDescription=featureData.get('descr')    
            featureType=featureData.get('type')  
            featureDataType=featureData.get('datatype')
            NodeItemTextMD+=f"[`{feature}`]({feature}.md#readme)|[`{featureType}`](featurebytype.md#{featureType})|[`{featureDataType}`](featurebydatatype.md#{featureDataType})|{featureDescription}|{exampleData(feature)}\n"
            NodeItemTextHTML+=f"<tr><td><a href=\"{feature}.htm#start\">{feature}</a></td><td><a href=\"featurebytype.htm#{featureType}\">{featureType}</td><td><a href=\"featurebydatatype.htm#{featureDataType}\">{featureDataType}</a></td><td>{featureDescription}</td><td>{exampleData(feature)}</td></tr>\n"
    NodeItemTextHTML+=f"</table>\n"
    pageHTML+=NodeItemTextHTML
    pageMD+=NodeItemTextMD
    
pageHTML+=f'{footerHTML}'
pageMD+=f'{footerMD}'
    
# Write to file by calling common function
if typeOutput in ('md','both'):
    fileNameMD = os.path.join(resultLocation, "featurebynodetype.md")
    writeToFile(fileNameMD, pageMD, 'md', verbose)

if typeOutput in ('html','both'):
    fileNameHTML = os.path.join(resultLocation, "featurebynodetype.htm")
    writeToFile(fileNameHTML, pageHTML, 'html', verbose)

####################################################################
# Create the page with overview per data type  (string or integer) #
####################################################################

pageMD=f'{pageTitleMD}\n# Overview features per datatype\n'
pageHTML=f'<html><head>{htmlStyle}</head><body><p>{pageTitleHTML}</p>\n<h1>Overview features per datatype</hl>'

# Sort the list alphabetically based on the second item of each tuple (featureName)
dataTypeFeatureList = sorted(dataTypeFeatureList, key=lambda x: x[1])

DataItemTextMD=DataItemTextHTML=''
for DataType in ('Integer','String'):
    DataItemTextMD=f'## {DataType}\n\nFeature|Featuretype|Available on nodes|Description|Examples\n---|---|---|---|---\n' 
    DataItemTextHTML=f'<h2 id=\"{DataType}\">{DataType}</h2>\n<table><tr><th>Feature</th><th>Featuretype</th><th>Available on nodes</th><th>Description</th><th>Examples</th></tr>\n' 
    for feature, featureDataType in dataTypeFeatureList:  
        if featureDataType == DataType: 
            featureDescription=featureDict[feature].get('descr')    
            featureType=featureDict[feature].get('type')  
            nodeListMD=nodeListHTML=''
            for thisNode in featureDict[feature]['freqlist']:
                nodeListMD+=f'[`{thisNode}`](featurebynodetype.md#{thisNode}) '
                nodeListHTML+=f'<a href=\"featurebynodetype.htm#{thisNode}\">{thisNode}</a> '
            DataItemTextMD+=f"[`{feature}`]({feature}.md#readme)|[`{featureType}`](featurebytype.md#{featureType.lower()})|{nodeListMD}|{featureDescription}|{exampleData(feature)}\n"
            DataItemTextHTML+=f"<tr><td><a href=\"{feature}.htm#start\">{feature}</a></td><td><a href=\"featurebytype.htm#{featureType}\">{featureType}</a></td><td>{nodeListHTML}</td><td>{featureDescription}</td><td>{exampleData(feature)}</td></tr>\n"
    DataItemTextHTML+=f"</table>\n"
    pageMD+=DataItemTextMD
    pageHTML+=DataItemTextHTML

pageHTML+=f'{footerHTML}'
pageMD+=f'{footerMD}'
    
   
# Write to file by calling common function
if typeOutput in ('md','both'):
    fileNameMD = os.path.join(resultLocation, "featurebydatatype.md")
    writeToFile(fileNameMD, pageMD, 'md', verbose)

if typeOutput in ('html','both'):
    fileNameHTML = os.path.join(resultLocation, "featurebydatatype.htm")
    writeToFile(fileNameHTML, pageHTML, 'html', verbose)
    
##################################################################
# Create the page with overview per feature type  (edge or node) #
##################################################################

pageMD=f'{pageTitleMD}\n# Overview features per type\n'
pageHTML=f'<html><head>{htmlStyle}</head><body><p>{pageTitleHTML}</p>\n<h1 id=\"start\">Overview features per type</hl>'

# Sort the list alphabetically based on the second item of each tuple (nodetype)
typeFeatureList = sorted(typeFeatureList, key=lambda x: x[1])
for featureType in ('Node','Edge'):
    ItemTextMD=f'## {featureType}\n\nFeature|Datatype|Available on nodes|Description|Examples\n---|---|---|---|---\n' 
    ItemTextHTML=f'<h2 id=\"{featureType}\">{featureType}</h2>\n<table><tr><th>Feature</th><th>Datatype</th><th>Available on nodes</th><th>Description</th><th>Examples</th></tr>\n' 
    for thisFeature, thisFeatureType in typeFeatureList: 
        if featureType == thisFeatureType:
            featureDescription=featureDict[thisFeature].get('descr')
            featureDataType=featureDict[thisFeature].get('datatype')
            nodeListMD=nodeListHTML=''
            for thisNode in featureDict[thisFeature]['freqlist']:
                nodeListMD+=f'[`{thisNode}`](featurebynodetype.md#{thisNode}) '
                nodeListHTML+=f'<a href=\"featurebynodetype.htm#{thisNode}\">{thisNode}</a> '
            ItemTextMD+=f"[`{thisFeature}`]({thisFeature}.md#readme)|[`{featureDataType}`](featurebydatatype.md#{featureDataType.lower()})|{nodeListMD}|{featureDescription}|{exampleData(thisFeature)}\n"
            ItemTextHTML+=f"<tr><td><a href=\"{thisFeature}.htm\">{thisFeature}</a></td><td><a href=\"featurebydatatype.htm#{featureDataType}\">{featureDataType}</a></td><td>{nodeListHTML}</td><td>{featureDescription}</td><td>{exampleData(thisFeature)}</td></tr>\n"
    ItemTextHTML+=f"</table>\n"
    pageMD+=ItemTextMD
    pageHTML+=ItemTextHTML

pageHTML+=f'{footerHTML}'
pageMD+=f'{footerMD}'

# Write to file by calling common function
if typeOutput in ('md','both'):
    fileNameMD = os.path.join(resultLocation, "featurebytype.md")
    writeToFile(fileNameMD, pageMD, 'md', verbose)

if typeOutput in ('html','both'):
    fileNameHTML = os.path.join(resultLocation, "featurebytype.htm")
    writeToFile(fileNameHTML, pageHTML, 'html', verbose)
    

# Reporting the number of files created
if filesCreated != 0:
    print(f'Finished in {time.time() - overallTime:.2f} seconds  (written {filesCreated} {"html and md" if typeOutput == "both" else typeOutput} files to directory {pathFull + resultLocation})')
else:
    print('No files written')

Finished in 0.01 seconds  (written 6 html and md files to directory C:\Users\tonyj\OneDrive\Documents\GitHub\Doc4TF\results)

6 - License ¶

Back to TOC ¶

Licenced under Creative Commons Attribution 4.0 International (CC BY 4.0)

Automatic creation of feature documentation for existing Text-Fabric datasets¶

Table of content ¶

1 - Introduction ¶

2. Setting up the environment¶

3 - Load Text-Fabric data ¶

4 - Creation of the dataset¶

4.1 - Setting up some global variables¶

4.2 - Store all relevant data into a dictionary¶

5 - Create the documentation pages¶

5.1 - Create the set of feature pages¶

5.2 - Create the index pages¶

6 - License¶

2. Setting up the environment ¶

4 - Creation of the dataset ¶

4.1 - Setting up some global variables ¶

4.2 - Store all relevant data into a dictionary ¶

5 - Create the documentation pages ¶

5.1 - Create the set of feature pages ¶

5.2 - Create the index pages ¶

6 - License ¶