This notebook adds multilingual book names to a BHSA dataset in text-Fabric format.
We add the features
book@
iso
where iso is a
two letter ISO-639
language code of a modern language.
We use a source file blang.py
that contains the names of the books of the bible
in modern languages (around 20, most big languages are covered).
This data has been gleaned mostly from Wikipedia.
We assume that the dataset has the book
feature present, holding Latin book names.
This program works for all datasets and versions that have this feature with the intended meaning.
import os
import sys
import utils
from tf.fabric import Fabric
from blang import bookLangs, bookNames
if "SCRIPT" not in locals():
SCRIPT = False
FORCE = True
CORE_NAME = "bhsa"
VERSION = "2021"
def stop(good=False):
if SCRIPT:
sys.exit(0 if good else 1)
The conversion is executed in an environment of directories, so that sources, temp files and results are in convenient places and do not have to be shifted around.
repoBase = os.path.expanduser("~/github/etcbc")
thisRepo = "{}/{}".format(repoBase, CORE_NAME)
thisTemp = "{}/_temp/{}".format(thisRepo, VERSION)
thisTempTf = "{}/tf".format(thisTemp)
thisTf = "{}/tf/{}".format(thisRepo, VERSION)
We collect the book names.
utils.caption(4, "Book names")
metaData = {
"": dict(
dataset="BHSA",
version=VERSION,
datasetName="Biblia Hebraica Stuttgartensia Amstelodamensis",
author="Eep Talstra Centre for Bible and Computer",
provenance="book names from wikipedia and other sources",
encoders="Dirk Roorda (TF)",
website="https://shebanq.ancient-data.org",
email="shebanq@ancient-data.org",
),
}
for (langCode, (langEnglish, langName)) in bookLangs.items():
metaData["book@{}".format(langCode)] = {
"valueType": "str",
"language": langName,
"languageCode": langCode,
"languageEnglish": langEnglish,
}
newFeatures = sorted(m for m in metaData if m != "")
newFeaturesStr = " ".join(newFeatures)
utils.caption(0, "{} languages ...".format(len(newFeatures)))
.............................................................................................. . 0.00s Book names . .............................................................................................. | 0.00s 26 languages ...
Check whether this conversion is needed in the first place. Only when run as a script.
if SCRIPT:
(good, work) = utils.mustRun(
None, "{}/.tf/{}.tfx".format(thisTf, newFeatures[0]), force=FORCE
)
if not good:
stop(good=False)
if not work:
stop(good=True)
utils.caption(4, "Loading relevant features")
TF = Fabric(locations=thisTf, modules=[""])
api = TF.load("book")
api.makeAvailableIn(globals())
nodeFeatures = {}
nodeFeatures["book@la"] = {}
bookNodes = []
for b in F.otype.s("book"):
bookNodes.append(b)
nodeFeatures["book@la"][b] = F.book.v(b)
for (langCode, langBookNames) in bookNames.items():
nodeFeatures["book@{}".format(langCode)] = dict(zip(bookNodes, langBookNames))
utils.caption(0, "{} book name features created".format(len(nodeFeatures)))
.............................................................................................. . 4.78s Loading relevant features . .............................................................................................. This is Text-Fabric 8.5.13 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 88 features found and 0 ignored 0.00s loading features ... | 0.00s Dataset without structure sections in otext:no structure functions in the T-API 3.58s All features loaded/computed - for details use loadLog() | 8.37s 26 book name features created
utils.caption(4, "Write book name features as TF")
TF = Fabric(locations=thisTempTf, silent=True)
TF.save(nodeFeatures=nodeFeatures, edgeFeatures={}, metaData=metaData)
.............................................................................................. . 18s Write book name features as TF . ..............................................................................................
True
Check differences with previous versions.
utils.checkDiffs(thisTempTf, thisTf, only=set(newFeatures))
.............................................................................................. . 21s Check differences with previous version . .............................................................................................. | 21s 26 features to add | 21s book@am | 21s book@ar | 21s book@bn | 21s book@da | 21s book@de | 21s book@el | 21s book@en | 21s book@es | 21s book@fa | 21s book@fr | 21s book@he | 21s book@hi | 21s book@id | 21s book@ja | 21s book@ko | 21s book@la | 21s book@nl | 21s book@pa | 21s book@pt | 21s book@ru | 21s book@sw | 21s book@syc | 21s book@tr | 21s book@ur | 21s book@yo | 21s book@zh | 21s no features to delete | 21s 0 features in common | 21s Done
Copy the new Text-Fabric features from the temporary location where they have been created to their final destination.
utils.deliverFeatures(thisTempTf, thisTf, newFeatures)
.............................................................................................. . 23s Deliver features to /Users/dirk/github/etcbc/bhsa/tf/2021 . .............................................................................................. | 23s book@am | 23s book@ar | 23s book@bn | 23s book@da | 23s book@de | 23s book@el | 23s book@en | 23s book@es | 23s book@fa | 23s book@fr | 23s book@he | 23s book@hi | 23s book@id | 23s book@ja | 23s book@ko | 23s book@la | 23s book@nl | 23s book@pa | 23s book@pt | 23s book@ru | 23s book@sw | 23s book@syc | 23s book@tr | 23s book@ur | 23s book@yo | 23s book@zh
utils.caption(4, "Load and compile the new TF features")
TF = Fabric(locations=thisTf, modules=[""])
api = TF.load("")
api.makeAvailableIn(globals())
.............................................................................................. . 27s Load and compile the new TF features . .............................................................................................. This is Text-Fabric 8.5.13 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 114 features found and 0 ignored 0.00s loading features ... | 0.00s Dataset without structure sections in otext:no structure functions in the T-API | 0.00s T book@ko from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@fr from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@zh from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@en from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ja from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@syc from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@he from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@es from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@id from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@bn from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@yo from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@la from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@da from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ru from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@pt from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@tr from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ur from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@hi from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@de from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@ar from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@el from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@pa from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@sw from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@fa from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@nl from ~/github/etcbc/bhsa/tf/2021 | 0.00s T book@am from ~/github/etcbc/bhsa/tf/2021 3.62s All features loaded/computed - for details use loadLog()
[('Computed', 'computed-data', ('C Computed', 'Call AllComputeds', 'Cs ComputedString')), ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')), ('Fabric', 'loading', ('TF',)), ('Locality', 'locality', ('L Locality',)), ('Nodes', 'navigating-nodes', ('N Nodes',)), ('Features', 'node-features', ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')), ('Search', 'search', ('S Search',)), ('Text', 'text', ('T Text',))]
utils.caption(4, "Genesis in all languages")
genesisNode = F.otype.s("book")[0]
for (lang, langInfo) in sorted(T.languages.items()):
language = langInfo["language"]
langEng = langInfo["languageEnglish"]
book = T.sectionFromNode(genesisNode, lang=lang)[0]
utils.caption(
0,
"{:<2} = {:<20} Genesis is {:<20} in {:<20}".format(
lang, langEng, book, language
),
)
utils.caption(0, "Done")
.............................................................................................. . 33s Genesis in all languages . .............................................................................................. | 33s = default Genesis is Genesis in default | 33s am = amharic Genesis is ኦሪት_ዘፍጥረት in ኣማርኛ | 33s ar = arabic Genesis is تكوين in العَرَبِية | 33s bn = bengali Genesis is আদিপুস্তক in বাংলা | 33s da = danish Genesis is 1.Mosebog in Dansk | 33s de = german Genesis is Genesis in Deutsch | 33s el = greek Genesis is Γένεση in Ελληνικά | 33s en = english Genesis is Genesis in English | 33s es = spanish Genesis is Génesis in Español | 33s fa = farsi Genesis is پيدايش in فارسی | 33s fr = french Genesis is Genèse in Français | 33s he = hebrew Genesis is בראשית in עברית | 33s hi = hindi Genesis is उत्पाति in हिन्दी | 33s id = indonesian Genesis is Kejadian in Bahasa Indonesia | 33s ja = japanese Genesis is 創世記 in 日本語 | 33s ko = korean Genesis is 창세기 in 한국어 | 33s la = latin Genesis is Genesis in Latina | 33s nl = dutch Genesis is Genesis in Nederlands | 33s pa = punjabi Genesis is ਉਤਪਤ in ਪੰਜਾਬੀ | 33s pt = portuguese Genesis is Gênesis in Português | 33s ru = russian Genesis is Бытия in Русский | 33s sw = swahili Genesis is Mwanzo in Kiswahili | 33s syc = syriac Genesis is ܒܪܝܬܐ in ܠܫܢܐ ܣܘܪܝܝܐ | 33s tr = turkish Genesis is Yaratılış in Türkçe | 33s ur = urdu Genesis is پیدائش in اُردُو | 33s yo = yoruba Genesis is Genesisi in èdè Yorùbá | 33s zh = chinese Genesis is 创世记 in 中文 | 33s Done