This notebook adds multilingual book names to a BHSA dataset in text-Fabric format.
We add the features
book@
iso
where iso is a
two letter ISO-639
language code of a modern language.
We use a source file blang.py
that contains the names of the books of the bible
in modern languages (around 20, most big languages are covered).
This data has been gleaned mostly from Wikipedia.
We assume that the dataset has the book
feature present, holding Latin book names.
This program works for all datasets and versions that have this feature with the intended meaning.
import os
import sys
import utils
from tf.fabric import Fabric
from blang import bookLangs, bookNames
if "SCRIPT" not in locals():
SCRIPT = False
FORCE = True
CORE_NAME = "bhsa"
VERSION = "2021"
def stop(good=False):
if SCRIPT:
sys.exit(0 if good else 1)
The conversion is executed in an environment of directories, so that sources, temp files and results are in convenient places and do not have to be shifted around.
repoBase = os.path.expanduser("~/github/etcbc")
thisRepo = "{}/{}".format(repoBase, CORE_NAME)
thisTemp = "{}/_temp/{}".format(thisRepo, VERSION)
thisTempTf = "{}/tf".format(thisTemp)
thisTf = "{}/tf/{}".format(thisRepo, VERSION)
We collect the book names.
utils.caption(4, "Book names")
metaData = {
"": dict(
dataset="BHSA",
version=VERSION,
datasetName="Biblia Hebraica Stuttgartensia Amstelodamensis",
author="Eep Talstra Centre for Bible and Computer",
provenance="book names from wikipedia and other sources",
encoders="Dirk Roorda (TF)",
website="https://shebanq.ancient-data.org",
email="shebanq@ancient-data.org",
),
}
for (langCode, (langEnglish, langName)) in bookLangs.items():
metaData["book@{}".format(langCode)] = {
"valueType": "str",
"language": langName,
"languageCode": langCode,
"languageEnglish": langEnglish,
}
newFeatures = sorted(m for m in metaData if m != "")
newFeaturesStr = " ".join(newFeatures)
utils.caption(0, "{} languages ...".format(len(newFeatures)))
.............................................................................................. . 0.00s Book names . .............................................................................................. | 0.00s 26 languages ...
Check whether this conversion is needed in the first place. Only when run as a script.
if SCRIPT:
(good, work) = utils.mustRun(
None, "{}/.tf/{}.tfx".format(thisTf, newFeatures[0]), force=FORCE
)
if not good:
stop(good=False)
if not work:
stop(good=True)
utils.caption(4, "Loading relevant features")
TF = Fabric(locations=thisTf, modules=[""])
api = TF.load("book")
api.makeAvailableIn(globals())
nodeFeatures = {}
nodeFeatures["book@la"] = {}
bookNodes = []
for b in F.otype.s("book"):
bookNodes.append(b)
nodeFeatures["book@la"][b] = F.book.v(b)
for (langCode, langBookNames) in bookNames.items():
nodeFeatures["book@{}".format(langCode)] = dict(zip(bookNodes, langBookNames))
utils.caption(0, "{} book name features created".format(len(nodeFeatures)))
.............................................................................................. . 13s Loading relevant features . .............................................................................................. This is Text-Fabric 9.1.6 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 75 features found and 0 ignored 0.00s loading features ... | 0.00s Dataset without structure sections in otext:no structure functions in the T-API 11s All features loaded/computed - for details use TF.isLoaded() | 24s 26 book name features created
utils.caption(4, "Write book name features as TF")
TF = Fabric(locations=thisTempTf, silent=True)
TF.save(nodeFeatures=nodeFeatures, edgeFeatures={}, metaData=metaData)
.............................................................................................. . 31s Write book name features as TF . ..............................................................................................
True
Check differences with previous versions.
utils.checkDiffs(thisTempTf, thisTf, only=set(newFeatures))
.............................................................................................. . 32s Check differences with previous version . .............................................................................................. | 32s 26 features to add | 32s book@am | 32s book@ar | 32s book@bn | 32s book@da | 32s book@de | 32s book@el | 32s book@en | 32s book@es | 32s book@fa | 32s book@fr | 32s book@he | 32s book@hi | 32s book@id | 32s book@ja | 32s book@ko | 32s book@la | 32s book@nl | 32s book@pa | 32s book@pt | 32s book@ru | 32s book@sw | 32s book@syc | 32s book@tr | 32s book@ur | 32s book@yo | 32s book@zh | 32s no features to delete | 32s 0 features in common | 32s Done
Copy the new Text-Fabric features from the temporary location where they have been created to their final destination.
utils.deliverFeatures(thisTempTf, thisTf, newFeatures)
.............................................................................................. . 36s Deliver features to /Users/dirk/github/etcbc/bhsa/tf/2021 . .............................................................................................. | 36s book@am | 36s book@ar | 36s book@bn | 36s book@da | 36s book@de | 36s book@el | 36s book@en | 36s book@es | 36s book@fa | 36s book@fr | 36s book@he | 36s book@hi | 36s book@id | 36s book@ja | 36s book@ko | 36s book@la | 36s book@nl | 36s book@pa | 36s book@pt | 36s book@ru | 36s book@sw | 36s book@syc | 36s book@tr | 36s book@ur | 36s book@yo | 36s book@zh
utils.caption(4, "Load and compile the new TF features")
TF = Fabric(locations=thisTf, modules=[""])
api = TF.load("")
api.makeAvailableIn(globals())
.............................................................................................. . 43s Load and compile the new TF features . .............................................................................................. This is Text-Fabric 9.1.6 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 101 features found and 0 ignored 0.00s loading features ... | 0.00s Dataset without structure sections in otext:no structure functions in the T-API | 0.00s T book@id from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@he from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@la from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@yo from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@en from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ru from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@pt from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@tr from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@da from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@pa from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@hi from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@zh from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@fr from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ko from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ur from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ar from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@de from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@sw from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@syc from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ja from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@es from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@el from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@nl from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@am from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@fa from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@bn from ~/github/etcbc/bhsa/tf/2021 14s All features loaded/computed - for details use TF.isLoaded()
[('Computed', 'computed-data', ('C Computed', 'Call AllComputeds', 'Cs ComputedString')), ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')), ('Fabric', 'loading', ('TF',)), ('Locality', 'locality', ('L Locality',)), ('Nodes', 'navigating-nodes', ('N Nodes',)), ('Features', 'node-features', ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')), ('Search', 'search', ('S Search',)), ('Text', 'text', ('T Text',))]
utils.caption(4, "Genesis in all languages")
genesisNode = F.otype.s("book")[0]
for (lang, langInfo) in sorted(T.languages.items()):
language = langInfo["language"]
langEng = langInfo["languageEnglish"]
book = T.sectionFromNode(genesisNode, lang=lang)[0]
utils.caption(
0,
"{:<2} = {:<20} Genesis is {:<20} in {:<20}".format(
lang, langEng, book, language
),
)
utils.caption(0, "Done")
.............................................................................................. . 1m 05s Genesis in all languages . .............................................................................................. | 1m 05s = default Genesis is Genesis in default | 1m 05s am = amharic Genesis is ኦሪት_ዘፍጥረት in ኣማርኛ | 1m 05s ar = arabic Genesis is تكوين in العَرَبِية | 1m 05s bn = bengali Genesis is আদিপুস্তক in বাংলা | 1m 05s da = danish Genesis is 1.Mosebog in Dansk | 1m 05s de = german Genesis is Genesis in Deutsch | 1m 05s el = greek Genesis is Γένεση in Ελληνικά | 1m 05s en = english Genesis is Genesis in English | 1m 05s es = spanish Genesis is Génesis in Español | 1m 05s fa = farsi Genesis is پيدايش in فارسی | 1m 05s fr = french Genesis is Genèse in Français | 1m 05s he = hebrew Genesis is בראשית in עברית | 1m 05s hi = hindi Genesis is उत्पाति in हिन्दी | 1m 05s id = indonesian Genesis is Kejadian in Bahasa Indonesia | 1m 05s ja = japanese Genesis is 創世記 in 日本語 | 1m 05s ko = korean Genesis is 창세기 in 한국어 | 1m 05s la = latin Genesis is Genesis in Latina | 1m 05s nl = dutch Genesis is Genesis in Nederlands | 1m 05s pa = punjabi Genesis is ਉਤਪਤ in ਪੰਜਾਬੀ | 1m 05s pt = portuguese Genesis is Gênesis in Português | 1m 05s ru = russian Genesis is Бытия in Русский | 1m 05s sw = swahili Genesis is Mwanzo in Kiswahili | 1m 05s syc = syriac Genesis is ܒܪܝܬܐ in ܠܫܢܐ ܣܘܪܝܝܐ | 1m 05s tr = turkish Genesis is Yaratılış in Türkçe | 1m 05s ur = urdu Genesis is پیدائش in اُردُو | 1m 05s yo = yoruba Genesis is Genesisi in èdè Yorùbá | 1m 05s zh = chinese Genesis is 创世记 in 中文 | 1m 05s Done