Various checks on the correctness of the transformation from ASCII transcriptions to a text-fabric data set.
The diagnostics of the transformation contains valuable issues that may be used to correct mistakes in the sources. Or, equally likely, they correspond to misunderstandings on my (Dirk's) part of the model that underlies the transcriptions.
We will perform grep commands on the source files, and we will traverse node in Text-Fabric and collect information.
Then we compare these sets of information.
There is some documentation about the checking software itself:
%load_ext autoreload
%autoreload 2
import os
import collections
import re
from tf.app import use
from utils import Compare
A = use("Nino-cunei/oldbabylonian", hoist=globals(), lgc=True)
BASE = os.path.expanduser("~/github")
SOURCE_VERSION = "0.3"
SOURCE_DIR = f"{BASE}/{A.org}/{A.repo}/sources/cdli/transcriptions/{SOURCE_VERSION}"
SOURCE_FILES = """
AbB-primary
AbB-secondary
""".strip().split()
TEMP_DIR = f"{BASE}/_temp"
Using TF app oldbabylonian in /Users/dirk/github/annotation/app-oldbabylonian/code Using Nino-cunei/oldbabylonian/tf - 1.0.4 in /Users/dirk/github
Old Babylonian Letters 1900-1600: Cuneiform tablets : ARK after afterr afteru atf atfpost atfpre author col collated collection comment damage det docnote docnumber excavation excised face flags fraction genre grapheme graphemer graphemeu lang langalt ln lnc lnno material missing museumcode museumname object operator operatorr operatoru otype period pnumber primecol primeln pubdate question reading readingr readingu remarkable remarks repeat srcLn srcLnNum srcfile subgenre supplied sym symr symu trans transcriber translation@ll type uncertain volume oslots
COMP = Compare(TF.api, SOURCE_DIR, SOURCE_FILES, TEMP_DIR)
FACES: bottom case case - lower edge case - obverse case - reverse case - seal envelope envelope - obverse envelope - reverse envelope - seal 1 eyestone - surface a left left edge left side lower edge obverse reverse seal seal 1 seal 2 upper edge EMPTY TABLETS (0):
We make an inventory of all characters that occur on an ATF line in transcribed material.
transRe = re.compile(r"""^([0-9a-zA-Z']*)\.\s+(.+)$""")
trimRe = re.compile(r"""\s\s+""")
prime = "'"
times = "×"
div = "÷"
quad = "|"
clusterChars = (
("┌", "┐", "_", "_", "langalt"),
("◀", "▶", "{", "}", "det"),
("∈", "∋", "(", ")", "uncertain"),
("〖", "〗", "[", "]", "missing"),
("«", "»", "<<", ">>", "excised"),
("⊂", "⊃", "<", ">", "supplied"),
)
clusterType = {x[0]: x[4] for x in clusterChars}
clusterTypeInfo = {x[4]: x[0:-1] for x in clusterChars}
clusterB = {c[0] for c in clusterChars}
clusterE = {c[1] for c in clusterChars}
clusterA = clusterB | clusterE
clusterOB = {c[2] for c in clusterChars}
clusterOE = {c[3] for c in clusterChars}
clusterOA = clusterOB | clusterOE
clusterBstr = "".join(sorted(clusterB))
clusterEstr = "".join(sorted(clusterE))
clusterAstr = "".join(sorted(clusterA))
flaggingStr = "!?*#"
flagging = set(flaggingStr)
separatorStr = "-"
separator = set(separatorStr)
ellips = "…"
unknownStr = "xXnN"
unknown = set(unknownStr) | {ellips}
lowerLetterStr = "abcdefghijklmnopqrstuvwyz"
upperLetterStr = lowerLetterStr.upper()
lowerLetter = set(lowerLetterStr)
upperLetter = set(upperLetterStr)
digitStr = "0123456789"
digit = set(digitStr) | {div}
emph_s = "ş"
emph_S = "Ş"
emph_t = "ţ"
emph_T = "Ţ"
emphatic = {emph_s, emph_S, emph_t, emph_T}
def emphRepl(x):
return (
x.replace("s,", emph_s)
.replace("S,", emph_S)
.replace("t,", emph_t)
.replace("T,", emph_T)
)
inlineCommentRe = re.compile(r"""\(\$.*?\$\)""")
operatorStr = f".+/:{times}"
operator = set(operatorStr)
divRe = re.compile(r"""([0-9])/([0-9])""")
def divRepl(match):
return f"{match.group(1)}{div}{match.group(2)}"
seen = collections.Counter()
for (srcfile, document, face, column, ln, line) in COMP.readCorpora():
match = transRe.match(line)
if not match:
continue
trans = match.group(2)
trans = inlineCommentRe.sub("", trans)
trans = trans.replace("...", ellips)
trans = trans.replace("x(", times)
trans = emphRepl(trans)
trans = divRe.sub(divRepl, trans)
words = trans.split()
for word in words:
for (i, c) in enumerate(word):
seen[c] += 1
allChars = collections.defaultdict(dict)
for (c, amount) in seen.items():
if c in lowerLetter:
allChars["lower"][c] = amount
elif c in unknown:
allChars["unknown"][c] = amount
elif c in upperLetter:
allChars["upper"][c] = amount
elif c in digit:
allChars["digit"][c] = amount
elif c in emphatic:
allChars["emphatic"][c] = amount
elif c == prime:
allChars["prime"][c] = amount
elif c == quad:
allChars["quad"][c] = amount
elif c in flagging:
allChars["flagging"][c] = amount
elif c in separator:
allChars["separator"][c] = amount
elif c in operator:
allChars["operator"][c] = amount
elif c in clusterOA:
allChars["cluster"][c] = amount
else:
allChars["rest"][c] = amount
for (kind, data) in sorted(allChars.items()):
print(f"{kind}:")
for (c, amount) in sorted(
data.items(),
key=lambda x: (-x[1], x[0]),
):
print(f"\t{c:<1} {amount:>6}")
cluster: _ 15200 [ 7572 ] 7572 { 6794 } 6794 ) 3489 ( 3484 < 369 > 369 digit: 2 15362 3 5858 4 1412 1 1190 5 424 8 264 6 263 7 146 ÷ 121 0 43 9 36 emphatic: ţ 2212 ş 1748 Ş 5 flagging: # 9974 ? 560 ! 216 * 13 lower: a 83892 i 56380 u 45188 m 34283 s 26373 z 24237 n 21059 l 16466 d 14416 r 14193 t 14124 k 13164 b 12681 e 11430 p 5266 g 4486 h 4243 q 3666 w 1176 y 1 operator: / 15 . 11 × 5 + 2 : 1 prime: ' 38 quad: | 8 separator: - 118903 unknown: x 8729 … 1617 N 192 upper: A 808 I 448 D 337 U 270 R 222 Z 186 G 184 B 153 K 143 S 102 L 61 H 60 M 58 E 54 T 48 P 42 W 9
for (c, amount) in F.lang.freqList():
print(f"{c} {amount:>6} x")
akk 1283 x sux 2 x
In the ATF source, after the line with the P-number (&P...
) there is additional identification,
usually in the form collection volume, number note.
We give an overview of the collections in which the documents of this corpus are found, and we list the notes, which are really the irregular parts of the identification.
We will not check the TF values with the GREP values for this part of the document identification.
for (c, amount) in F.collection.freqList():
print(f"{c:<8} {amount:>6} x")
AbB 492 x CT 241 x VS 218 x YOS 108 x TCL 105 x LIH 77 x YNER 16 x TLB 10 x BIN 7 x OECT 3 x AJSL 1 x CT43, 1 x JCS 1 x LFBD 1 x RA 1 x RIME 1 x abb 1 x
for (c, amount) in F.docnote.freqList():
print(f"{c:<8} {amount:>6} x")
37 BM 097815 1 x AO 21105 1 x BM 012819 1 x BM 023357 1 x BM 023823 1 x BM 025693 1 x BM 027780 1 x BM 028435 1 x BM 028436 1 x BM 028444 1 x BM 028447 1 x BM 028457 1 x BM 028473 1 x BM 028474 1 x BM 028475 1 x BM 028476 1 x BM 028508 1 x BM 028510 1 x BM 028531 1 x BM 028558 1 x BM 028559 1 x BM 028588 1 x BM 028840 1 x BM 029655 1 x BM 040037 1 x BM 078214 1 x BM 080186 1 x BM 080329 1 x BM 080340 1 x BM 080354 1 x BM 080410 1 x BM 080484 1 x BM 080558 1 x BM 080594 1 x BM 080612 1 x BM 080616 1 x BM 080685 1 x BM 080723 1 x BM 080797 1 x BM 080802 1 x BM 080816 1 x BM 080840 1 x BM 080878 1 x BM 080885 1 x BM 080897 1 x BM 080947 1 x BM 081095 1 x BM 087395 1 x BM 096604 1 x BM 096608 1 x BM 096629 1 x BM 097031 1 x BM 097040 1 x BM 097050 1 x BM 097098 1 x BM 097115 1 x BM 097130 1 x BM 097274 1 x BM 097325 1 x BM 097347 1 x BM 097405 1 x BM 097675 1 x BM 097686 1 x BM 097693 1 x BM 097816 1 x BM 100117 1 x BM 103848 1 x Bu 1888-05-12, 0184 1 x Bu 1888-05-12, 0278 1 x Bu 1888-05-12, 0323 1 x Bu 1888-05-12, 0329 1 x Bu 1888-05-12, 0333 1 x Bu 1888-05-12, 0342 1 x Bu 1888-05-12, 0505 1 x Bu 1888-05-12, 0568 1 x Bu 1888-05-12, 0581 1 x Bu 1888-05-12, 0598 1 x Bu 1888-05-12, 0602 1 x Bu 1888-05-12, 0607 1 x Bu 1888-05-12, 0621 1 x Bu 1888-05-12, 0638 1 x Bu 1888-05-12, 200 1 x Bu 1888-05-12, 207 1 x Bu 1888-05-12, 212 1 x Bu 1891-05-09, 0279 1 x Bu 1891-05-09, 0315 1 x Bu 1891-05-09, 0370 1 x Bu 1891-05-09, 0383 1 x Bu 1891-05-09, 0413 1 x Bu 1891-05-09, 0418 1 x Bu 1891-05-09, 0468 1 x Bu 1891-05-09, 0534 1 x Bu 1891-05-09, 0579a 1 x Bu 1891-05-09, 0585 1 x Bu 1891-05-09, 0587 1 x Bu 1891-05-09, 0790 1 x Bu 1891-05-09, 1154 1 x Bu 1891-05-09, 2185 1 x Bu 1891-05-09, 2187 1 x Bu 1891-05-09, 2194 1 x Bu 1891-05-09, 290 1 x Bu 1891-05-09, 294 1 x Bu 1891-05-09, 325 1 x Bu 1891-05-09, 354 1 x Fs Landsberger 235 1 x ex. 01 1 x no. 2 1 x pp. 980191 no. 1 1 x
We check whether we have the same sequence of document numbers.
In TF, the document number is stored in the feature pnumber
.
Note that we also check on the order of the documents.
def tfDocuments():
documents = []
for t in F.otype.s("document"):
(document,) = T.sectionFromNode(t)
documents.append((F.srcfile.v(t), document, F.srcLnNum.v(t), F.pnumber.v(t)))
return documents
def grepDocuments(gen):
documents = []
prevTablet = None
for (srcFile, document, face, column, srcLnNum, srcLn) in gen:
if document != prevTablet:
documents.append((srcFile, document, srcLnNum, document))
prevTablet = document
return documents
COMP.checkSanity(
("tablet",),
grepDocuments,
tfDocuments,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ tablet IDENTICAL: all 1285 items = : AbB-primary ◆ P509373 ◆ 27 ◆ P509373 = : AbB-primary ◆ P509374 ◆ 96 ◆ P509374 = : AbB-primary ◆ P509375 ◆ 147 ◆ P509375 = : AbB-primary ◆ P509376 ◆ 196 ◆ P509376 = : AbB-primary ◆ P509377 ◆ 250 ◆ P509377 = : AbB-primary ◆ P507628 ◆ 309 ◆ P507628 = : AbB-primary ◆ P481190 ◆ 349 ◆ P481190 = : AbB-primary ◆ P481191 ◆ 392 ◆ P481191 = : AbB-primary ◆ P481192 ◆ 443 ◆ P481192 = : AbB-primary ◆ P389958 ◆ 508 ◆ P389958 = : AbB-primary ◆ P389256 ◆ 552 ◆ P389256 = : AbB-primary ◆ P510526 ◆ 7593 ◆ P510526 = : AbB-primary ◆ P510527 ◆ 7643 ◆ P510527 = : AbB-primary ◆ P510528 ◆ 7708 ◆ P510528 = : AbB-primary ◆ P510529 ◆ 7753 ◆ P510529 = : AbB-primary ◆ P510530 ◆ 7805 ◆ P510530 = : AbB-primary ◆ P510531 ◆ 7879 ◆ P510531 = : AbB-primary ◆ P510532 ◆ 7931 ◆ P510532 = : AbB-primary ◆ P510533 ◆ 7984 ◆ P510533 = : AbB-primary ◆ P510534 ◆ 8032 ◆ P510534 = and 1265 more Number of results: TF 1285; GREP 1285
True
for (obj, amount) in F.object.freqList():
print(f"{obj:<10} {amount:>5} x")
tablet 2778 x envelope 43 x case 12 x eyestone 1 x
We check whether we see the same faces with GREP and TF.
def tfFaces():
faces = []
for document in F.otype.s("document"):
documentName = F.pnumber.v(document)
srcfile = F.srcfile.v(document)
for face in L.d(document, otype="face"):
typ = F.face.v(face)
firstLine = L.d(face, otype="line")[0]
ln = F.srcLnNum.v(firstLine)
faces.append((srcfile, documentName, ln, typ))
return faces
def grepFaces(gen):
faces = []
prevDocument = None
prevFace = None
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
if face is None or (prevDocument == document and prevFace == face):
continue
faces.append((srcfile, document, srcLnNum, face))
prevDocument = document
prevFace = face
return faces
COMP.checkSanity(
("face",),
grepFaces,
tfFaces,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ face IDENTICAL: all 2834 items = : AbB-primary ◆ P509373 ◆ 31 ◆ obverse = : AbB-primary ◆ P509373 ◆ 48 ◆ reverse = : AbB-primary ◆ P509374 ◆ 100 ◆ obverse = : AbB-primary ◆ P509374 ◆ 117 ◆ reverse = : AbB-primary ◆ P509375 ◆ 151 ◆ obverse = : AbB-primary ◆ P509375 ◆ 156 ◆ reverse = : AbB-primary ◆ P509376 ◆ 200 ◆ obverse = : AbB-primary ◆ P509376 ◆ 212 ◆ reverse = : AbB-primary ◆ P509377 ◆ 254 ◆ obverse = : AbB-primary ◆ P509377 ◆ 268 ◆ reverse = : AbB-primary ◆ P507628 ◆ 313 ◆ obverse = : AbB-primary ◆ P507628 ◆ 321 ◆ reverse = : AbB-primary ◆ P481190 ◆ 353 ◆ obverse = : AbB-primary ◆ P481190 ◆ 361 ◆ reverse = : AbB-primary ◆ P481191 ◆ 396 ◆ obverse = : AbB-primary ◆ P481191 ◆ 406 ◆ reverse = : AbB-primary ◆ P481191 ◆ 413 ◆ seal 1 = : AbB-primary ◆ P481192 ◆ 447 ◆ obverse = : AbB-primary ◆ P481192 ◆ 464 ◆ reverse = : AbB-primary ◆ P389958 ◆ 512 ◆ obverse = and 2814 more Number of results: TF 2834; GREP 2834
True
We check whether we see the same column and line numbers with GREP and TF.
def tfLines():
lines = []
for document in F.otype.s("document"):
documentName = F.pnumber.v(document)
srcfile = F.srcfile.v(document)
for face in L.d(document, otype="face"):
typ = F.face.v(face)
for line in L.d(face, otype="line"):
srcLn = F.srcLnNum.v(line)
ln = str(F.ln.v(line) or F.lnc.v(line))
if F.primeln.v(line):
ln += "'"
col = str(F.col.v(line) or "")
if F.primecol.v(line):
col += "'"
lines.append((srcfile, documentName, srcLn, typ, col, ln))
return lines
def grepLines(gen):
lines = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
if face is None or column is None:
continue
isComment = srcLn.startswith("$")
if isComment:
ln = srcLn[0]
else:
match = transRe.match(srcLn)
if not match:
continue
ln = match.group(1)
lines.append((srcfile, document, srcLnNum, face, column, ln))
return lines
COMP.checkSanity(
("face", "column", "atf lineno"),
grepLines,
tfLines,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ face ◆ column ◆ atf lineno IDENTICAL: all 27375 items = : AbB-primary ◆ P509373 ◆ 31 ◆ obverse ◆ ◆ 1 = : AbB-primary ◆ P509373 ◆ 32 ◆ obverse ◆ ◆ 2 = : AbB-primary ◆ P509373 ◆ 33 ◆ obverse ◆ ◆ 3 = : AbB-primary ◆ P509373 ◆ 34 ◆ obverse ◆ ◆ 4 = : AbB-primary ◆ P509373 ◆ 35 ◆ obverse ◆ ◆ 5 = : AbB-primary ◆ P509373 ◆ 36 ◆ obverse ◆ ◆ 6 = : AbB-primary ◆ P509373 ◆ 37 ◆ obverse ◆ ◆ 7 = : AbB-primary ◆ P509373 ◆ 38 ◆ obverse ◆ ◆ 8 = : AbB-primary ◆ P509373 ◆ 39 ◆ obverse ◆ ◆ 9 = : AbB-primary ◆ P509373 ◆ 40 ◆ obverse ◆ ◆ 10 = : AbB-primary ◆ P509373 ◆ 41 ◆ obverse ◆ ◆ 11 = : AbB-primary ◆ P509373 ◆ 42 ◆ obverse ◆ ◆ 12 = : AbB-primary ◆ P509373 ◆ 43 ◆ obverse ◆ ◆ 13 = : AbB-primary ◆ P509373 ◆ 44 ◆ obverse ◆ ◆ 14 = : AbB-primary ◆ P509373 ◆ 45 ◆ obverse ◆ ◆ 15 = : AbB-primary ◆ P509373 ◆ 46 ◆ obverse ◆ ◆ $ = : AbB-primary ◆ P509373 ◆ 48 ◆ reverse ◆ ◆ $ = : AbB-primary ◆ P509373 ◆ 49 ◆ reverse ◆ ◆ 1' = : AbB-primary ◆ P509373 ◆ 50 ◆ reverse ◆ ◆ 2' = : AbB-primary ◆ P509373 ◆ 51 ◆ reverse ◆ ◆ 3' = and 27355 more Number of results: TF 27375; GREP 27375
True
Remarks are marked by the #
character in lines that are not
metadata following the document header. The criterion for a line starting with #
to be
a comment is that it has a space after the #
.
There are also translation lines, starting with #tr.en
, but we do not deal with those here.
def tfRemarks():
remarks = []
for ln in F.otype.s("line"):
rmks = F.remarks.v(ln)
if rmks:
for (i, rmk) in enumerate(rmks.split("\n")):
remarks.append((F.srcfile.v(ln), F.srcLnNum.v(ln) + i + 1, rmk))
return remarks
def grepRemarks(gen):
remarks = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
isRemark = srcLn.startswith("#") and len(srcLn) > 1 and srcLn[1] == " "
if isRemark:
remarks.append((srcfile, srcLnNum, srcLn[1:].strip()))
return remarks
COMP.checkSanity(
("remark",),
grepRemarks,
tfRemarks,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ remark IDENTICAL: all 12 items = : AbB-secondary ◆ 11849 ◆ word (li-ba-al-li-t,u2-ka) divided over two lines = : AbB-secondary ◆ 12535 ◆ word (i-li-ka-am) divided over two lines = : AbB-secondary ◆ 15552 ◆ reading i-ba-al-lu-ut, proposed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 15555 ◆ reading szi-'i-it-sa3 proposed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 15559 ◆ reading tu-ut-t,i-bi-ma following Von Soden BiOr 23 55 = : AbB-secondary ◆ 15573 ◆ reading ma-s,a-ra-am proposed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 15575 ◆ reading a-hu-ki propsed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 15577 ◆ reading ki-i ne-em-szi-ma propsed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 15582 ◆ reconstruction of this line propsed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 16226 ◆ reading szu-ku-si propsed by Von Soden BiOr 23 55 = : AbB-secondary ◆ 68946 ◆ reading la-mi! proposed by Von Soden BiOr 39 590 = : AbB-secondary ◆ 69030 ◆ reading la us2-su2-ka-tim proposed by Von Soden BiOr 39 590 = no more items Number of results: TF 12; GREP 12
True
Translations are marked by the #
character in lines that are not
metadata following the document header.
The #
must be immediately followed by tr.
language code:
and the translation comes after that (with white space in between).
languages = [t[12:] for t in Fall() if t.startswith("translation@")]
print(f'languages: {", ".join(languages)}')
def tfTrans():
trans = []
for ln in F.otype.s("line"):
for lc in languages:
trs = Fs(f"translation@{lc}").v(ln)
if trs:
trans.append((F.srcfile.v(ln), F.srcLnNum.v(ln) + 1, lc, trs))
return trans
languages: en
def grepTrans(gen):
trans = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
isTrans = srcLn.startswith("#tr.")
if isTrans:
parts = srcLn[4:].split(":", 1)
if len(parts) > 1:
lc = parts[0].strip()
trs = parts[1].strip()
trans.append((srcfile, srcLnNum, lc, trs))
return trans
COMP.checkSanity(
(
"language",
"translation",
),
grepTrans,
tfTrans,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ language ◆ translation IDENTICAL: all 134 items = : AbB-secondary ◆ 27139 ◆ en ◆ To Šamaš-ḫazir = : AbB-secondary ◆ 27141 ◆ en ◆ speak, = : AbB-secondary ◆ 27143 ◆ en ◆ thus Hammurapi: = : AbB-secondary ◆ 27145 ◆ en ◆ Ilī-ippalsam, the shepherd, = : AbB-secondary ◆ 27147 ◆ en ◆ thus informed me, as follows that one: = : AbB-secondary ◆ 27149 ◆ en ◆ A field of 3 bur3, which through a sealed document of my lord = : AbB-secondary ◆ 27151 ◆ en ◆ was given (lit. sealed) to me— = : AbB-secondary ◆ 27153 ◆ en ◆ 4 years ago Etel-pî-Marduk took it away from me, and = : AbB-secondary ◆ 27155 ◆ en ◆ its barley regularly takes. = : AbB-secondary ◆ 27157 ◆ en ◆ Further, Sîn-iddinam I informed, = : AbB-secondary ◆ 27159 ◆ en ◆ but it was not returned to me; = : AbB-secondary ◆ 27161 ◆ en ◆ Thus he (Ilī-ippalsam) informed me. = : AbB-secondary ◆ 27163 ◆ en ◆ To Sîn-iddinam I (now) have written. = : AbB-secondary ◆ 27165 ◆ en ◆ If, as that Ilī-ippalsam = : AbB-secondary ◆ 27167 ◆ en ◆ said, = : AbB-secondary ◆ 27169 ◆ en ◆ a field of 3 bur3, which in the palace = : AbB-secondary ◆ 27171 ◆ en ◆ was given (lit. sealed) to him, = : AbB-secondary ◆ 27174 ◆ en ◆ Etel-pî-Marduk 4 years ago took away, and = : AbB-secondary ◆ 27176 ◆ en ◆ is ‘eating,′ = : AbB-secondary ◆ 27178 ◆ en ◆ then a more sickening case = and 114 more Number of results: TF 134; GREP 134
True
Comments are marked by the $
character at the start of a line.
We have also inline comments, shaped as ($ $)
but we do not deal with them here.
Inline comments are treated under signs.
def tfComments():
comments = []
for ln in F.otype.s("line"):
if not F.lnc.v(ln):
continue
comment = F.comment.v(L.d(ln, otype="sign")[0])
if comment:
ln = F.lnc.v(ln)
comments.append((F.srcfile.v(ln), F.srcLnNum.v(ln), ln, comment))
return comments
def grepComments(gen):
comments = []
for (srcfile, document, face, column, ln, line) in gen:
isComment = line.startswith("$")
if isComment:
comments.append((srcfile, ln, line[0], line[1:].strip()))
return comments
COMP.checkSanity(
("comment",),
grepComments,
tfComments,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ comment IDENTICAL: all 969 items = : AbB-primary ◆ 46 ◆ $ ◆ rest broken = : AbB-primary ◆ 48 ◆ $ ◆ beginning broken = : AbB-primary ◆ 154 ◆ $ ◆ rest missing = : AbB-primary ◆ 319 ◆ $ ◆ blank space = : AbB-primary ◆ 321 ◆ $ ◆ blank space = : AbB-primary ◆ 447 ◆ $ ◆ beginning broken = : AbB-primary ◆ 462 ◆ $ ◆ rest broken = : AbB-primary ◆ 464 ◆ $ ◆ beginning broken = : AbB-primary ◆ 480 ◆ $ ◆ rest broken = : AbB-primary ◆ 512 ◆ $ ◆ beginning broken = : AbB-primary ◆ 556 ◆ $ ◆ beginning broken = : AbB-primary ◆ 562 ◆ $ ◆ rest broken = : AbB-primary ◆ 564 ◆ $ ◆ beginning broken = : AbB-primary ◆ 568 ◆ $ ◆ rest broken = : AbB-primary ◆ 8049 ◆ $ ◆ rest broken = : AbB-primary ◆ 8051 ◆ $ ◆ beginning broken = : AbB-primary ◆ 8180 ◆ $ ◆ single ruling = : AbB-primary ◆ 8222 ◆ $ ◆ blank space = : AbB-primary ◆ 8441 ◆ $ ◆ rest broken = : AbB-primary ◆ 8554 ◆ $ ◆ single ruling = and 949 more Number of results: TF 969; GREP 969
True
Metadata comes from lines starting with a #
without a space following the #
.
We have found metadata for language, translation (English) and comments to the contents of lines.
The language is specified for documents, the translation for lines.
def tfMetas():
metas = []
for d in F.otype.s("document"):
lang = F.lang.v(d)
documentName = F.pnumber.v(d)
srcfile = F.srcfile.v(d)
if lang:
srcLn = F.srcLnNum.v(d)
metas.append((srcfile, documentName, srcLn + 1, f"atf: lang = {lang}"))
for ln in L.d(d, otype="line"):
trans = Fs("translation@en").v(ln)
if trans:
srcLn = F.srcLnNum.v(ln)
metas.append((srcfile, documentName, srcLn + 1, f"tr.en: = {trans}"))
return metas
def grepMetas(gen):
metas = []
for (srcfile, document, face, column, ln, line) in gen:
if line.startswith("#") and len(line) > 1 and line[1] != " ":
if line.startswith("#atf:l"):
line = "#atf: l" + line[6:]
fields = line[1:].split(maxsplit=1)
nFields = len(fields)
if nFields == 1:
key = fields[0]
feat = ""
val = ""
else:
(key, val) = fields
feat = ""
if key == "atf:":
fields = val.split(maxsplit=1)
nFields = len(fields)
if nFields == 2:
(feat, val) = fields
if val.startswith("="):
val = val[1:].strip()
metas.append((srcfile, document, ln, f"{key} {feat} = {val}"))
return metas
COMP.checkSanity(
("comment",),
grepMetas,
tfMetas,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ comment IDENTICAL: all 1419 items = : AbB-primary ◆ P509373 ◆ 28 ◆ atf: lang = akk = : AbB-primary ◆ P509374 ◆ 97 ◆ atf: lang = akk = : AbB-primary ◆ P509375 ◆ 148 ◆ atf: lang = akk = : AbB-primary ◆ P509376 ◆ 197 ◆ atf: lang = akk = : AbB-primary ◆ P509377 ◆ 251 ◆ atf: lang = akk = : AbB-primary ◆ P507628 ◆ 310 ◆ atf: lang = akk = : AbB-primary ◆ P481190 ◆ 350 ◆ atf: lang = akk = : AbB-primary ◆ P481191 ◆ 393 ◆ atf: lang = akk = : AbB-primary ◆ P481192 ◆ 444 ◆ atf: lang = akk = : AbB-primary ◆ P389958 ◆ 509 ◆ atf: lang = akk = : AbB-primary ◆ P389256 ◆ 553 ◆ atf: lang = akk = : AbB-primary ◆ P510526 ◆ 7594 ◆ atf: lang = akk = : AbB-primary ◆ P510527 ◆ 7644 ◆ atf: lang = akk = : AbB-primary ◆ P510528 ◆ 7709 ◆ atf: lang = akk = : AbB-primary ◆ P510529 ◆ 7754 ◆ atf: lang = akk = : AbB-primary ◆ P510530 ◆ 7806 ◆ atf: lang = akk = : AbB-primary ◆ P510531 ◆ 7880 ◆ atf: lang = akk = : AbB-primary ◆ P510532 ◆ 7932 ◆ atf: lang = akk = : AbB-primary ◆ P510533 ◆ 7985 ◆ atf: lang = akk = : AbB-primary ◆ P510534 ◆ 8033 ◆ atf: lang = akk = and 1399 more Number of results: TF 1419; GREP 1419
True
We check whether the contents of lines after the number can be reproduced by means of TF features
There are two ways to do that:
scrLn
T.text()
srcLn
¶This way is rather trivial. But it is applicable to all lines, also comment lines. We also pick up remarks, but not translations.
def tfLineContents():
lines = []
for document in F.otype.s("document"):
documentName = F.pnumber.v(document)
srcfile = F.srcfile.v(document)
for line in L.d(document, otype="line"):
srcLnNum = F.srcLnNum.v(line)
srcLn = F.srcLn.v(line)
lines.append((srcfile, documentName, srcLnNum, srcLn))
remarks = F.remarks.v(line)
if remarks:
for (i, remark) in enumerate(remarks.split("\n")):
lines.append(
(srcfile, documentName, srcLnNum + i + 1, f"# {remark}")
)
return lines
structureChars = set("&@")
def grepLineContents(gen):
lines = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
if (
not srcLn
or srcLn[0] in structureChars
or (srcLn[0] == "#" and len(srcLn) > 1 and srcLn[1] != " ")
):
continue
lines.append((srcfile, document, srcLnNum, srcLn))
return lines
COMP.checkSanity(
("contents",),
grepLineContents,
tfLineContents,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ contents IDENTICAL: all 27387 items = : AbB-primary ◆ P509373 ◆ 31 ◆ 1. [a-na] _{d}suen_-i-[din-nam] = : AbB-primary ◆ P509373 ◆ 32 ◆ 2. qi2-bi2-[ma] = : AbB-primary ◆ P509373 ◆ 33 ◆ 3. um-ma _{d}en-lil2_-sza-du-u2-ni-ma = : AbB-primary ◆ P509373 ◆ 34 ◆ 4. _{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim] = : AbB-primary ◆ P509373 ◆ 35 ◆ 5. li-ba-al-li-t,u2-u2-ka = : AbB-primary ◆ P509373 ◆ 36 ◆ 6. {disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_ = : AbB-primary ◆ P509373 ◆ 37 ◆ 7. ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma] = : AbB-primary ◆ P509373 ◆ 38 ◆ 8. {disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim] = : AbB-primary ◆ P509373 ◆ 39 ◆ 9. 2(esze3) _a-sza3_ s,i-[bi]-it {disz}[ku]-un-zu-lum _sza3-gud_ = : AbB-primary ◆ P509373 ◆ 40 ◆ 10. _a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki} = : AbB-primary ◆ P509373 ◆ 41 ◆ 11. sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum = : AbB-primary ◆ P509373 ◆ 42 ◆ 12. u3 i-na _uru_ x-szum{ki} sza-ak-nu id-di-a-am-ma = : AbB-primary ◆ P509373 ◆ 43 ◆ 13. 2(esze3) _a-sza3 szuku_ i-li-ib-bu s,i-bi-it _nagar-mesz_ = : AbB-primary ◆ P509373 ◆ 44 ◆ 14. _a-sza3 a-gar3 uru_ ra-bu-um x [...] = : AbB-primary ◆ P509373 ◆ 45 ◆ 15. x x x x x x [...] = : AbB-primary ◆ P509373 ◆ 46 ◆ $ rest broken = : AbB-primary ◆ P509373 ◆ 48 ◆ $ beginning broken = : AbB-primary ◆ P509373 ◆ 49 ◆ 1'. [x x] x x [...] = : AbB-primary ◆ P509373 ◆ 50 ◆ 2'. [x x] x [...] = : AbB-primary ◆ P509373 ◆ 51 ◆ 3'. [x x] x s,i-bi-it _gir3-se3#-ga#_ = and 27367 more Number of results: TF 27387; GREP 27387
True
T.text()
¶We apply the T.text()
method on each line, using the default text format text-orig-full
.
The method will walk over all signs on the line, and represent each sign by means of the feature atf
plus
some auxiliary features such as
atfpre
and atfpost
(for cluster characters preceding and following the sign reading),after
(for separator characters after the sign: -
, :
, /
,
, or the empty string)We only compare lines containing transcribed material: numbered lines in the source.
In rare cases some clusters start or end with a space or a hyphen, where the input had rather been encoded with that space of hyphen just outside the cluster.
We work around them, and we check whether we have encountered all listed workarounds.
def tfLineText():
lines = []
for document in F.otype.s("document"):
documentName = F.pnumber.v(document)
srcfile = F.srcfile.v(document)
for line in L.d(document, otype="line"):
if F.lnc.v(line):
continue
face = F.face.v(L.u(line, otype="face")[0])
srcLnNum = F.srcLnNum.v(line)
primeLn = prime if F.primeln.v(line) else ""
ln = F.ln.v(line)
text = T.text(line)
lines.append(
(srcfile, documentName, face, srcLnNum, f"{ln}{primeLn}. {text}")
)
return lines
def methodB1(x):
return x.replace("_-", "-_")
def methodB2(x):
return x.replace("[-", "-[")
def methodE1(x):
return x.replace("-]", "]-")
workarounds = {
("P313391", "reverse", "5"): methodB1,
("P312032", "reverse", "12"): methodB2,
("P345563", "obverse", "4"): methodE1,
("P305773", "reverse", "1"): methodE1,
}
workaroundsApplied = set()
def initWorkarounds():
workaroundsApplied.clear()
def checkWorkarounds(document, face, ln, srcLn):
if (document, face, ln) in workarounds:
workaroundsApplied.add((document, face, ln))
method = workarounds[(document, face, ln)]
srcLn = method(srcLn)
print(f'workaround applied: "{srcLn}"')
return srcLn
def finishWorkarounds():
if set(workarounds) == workaroundsApplied:
print(f"ALL {len(workarounds)} WORKAROUNDS APPLIED")
else:
print("UNAPPLIED WORKAROUNDS:")
for (document, face, ln) in sorted(set(workarounds) - workaroundsApplied):
print(f"\t{document} {face}:{ln}")
def grepLineText(gen):
lines = []
initWorkarounds()
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
ln = match.group(1)
srcLn = trimRe.sub(" ", srcLn)
srcLn = checkWorkarounds(document, face, ln, srcLn)
lines.append((srcfile, document, face, srcLnNum, srcLn))
finishWorkarounds()
return lines
COMP.checkSanity(
("contents",),
grepLineText,
tfLineText,
)
workaround applied: "5. 1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "4. ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "1. [1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "12. _iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ contents IDENTICAL: all 26406 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ 1. [a-na] _{d}suen_-i-[din-nam] = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ 2. qi2-bi2-[ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ 3. um-ma _{d}en-lil2_-sza-du-u2-ni-ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ 4. _{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim] = : AbB-primary ◆ P509373 ◆ obverse ◆ 35 ◆ 5. li-ba-al-li-t,u2-u2-ka = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ 6. {disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ 7. ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ 8. {disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim] = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 9. 2(esze3) _a-sza3_ s,i-[bi]-it {disz}[ku]-un-zu-lum _sza3-gud_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ 10. _a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki} = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 11. sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum = : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ 12. u3 i-na _uru_ x-szum{ki} sza-ak-nu id-di-a-am-ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 13. 2(esze3) _a-sza3 szuku_ i-li-ib-bu s,i-bi-it _nagar-mesz_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ 14. _a-sza3 a-gar3 uru_ ra-bu-um x [...] = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ 15. x x x x x x [...] = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ 1'. [x x] x x [...] = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ 2'. [x x] x [...] = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ 3'. [x x] x s,i-bi-it _gir3-se3#-ga#_ = : AbB-primary ◆ P509373 ◆ reverse ◆ 52 ◆ 4'. [x x] x x x-ir ub-lam = : AbB-primary ◆ P509373 ◆ reverse ◆ 53 ◆ 5'. in-na-me-er-ma = and 26386 more Number of results: TF 26406; GREP 26406
True
Clusters are groupings of signs. The transcription uses a variety of brackets for several kinds of clustering. Clusters may be nested. Clusters of different types need not be properly nested.
Usually, clusters do not start with an inter-word space or an inter-sign hyphen. But if they do, we work around them by pushing the offending space or hyphen out of the cluster.
See Workarounds above.
See the ORACC ATF docs
Most clusters are trivial: [...]
.
We count how much clusters we have of each type.
for (typ, amount) in F.type.freqList("cluster"):
print(f"{typ:<15} {amount:>5} x")
langalt 7600 x missing 7572 x det 6794 x uncertain 1183 x supplied 231 x excised 69 x
We count how much material is in the alternate language (Sumerian) and how much in the main language (Akkadian).
lang = collections.Counter()
altLang = dict(
sux="akk",
akk="sux",
)
skipTypes = {"empty", "comment", "ellipsis", "unknown"}
for d in F.otype.s("document"):
docLang = F.lang.v(d)
for s in L.d(d, otype="sign"):
typ = F.type.v(s)
if typ in skipTypes:
continue
signLang = altLang[docLang] if F.langalt.v(s) else docLang
lang[signLang] += 1
for (ln, amount) in sorted(
lang.items(),
key=lambda x: (-x[1], x[0]),
):
print(f"{ln} {amount:>6} signs")
akk 173823 signs sux 19016 signs
Now we check for each cluster whether the ATF of its material as delivered by TF is equal to the material that we get "directly" by GREPping.
Note however, that in order to GREP the clusters correctly, we have to do similar manipulations as what we did to generate the TF.
Clusters are not directly greppable, because:
( )
occurs in non-cluster constructs like rrr!(YYY)
, rrrx(YYY)
, 333(rrr)
_ _
< >
and << >>
.So we proceed by escaping all cluster characters first to fresh characters that have none of these problems.
def tfClusters():
clusters = []
for ln in F.otype.s("line"):
lineClusters = []
for c in L.d(ln, "cluster"):
lineClusters.append((F.type.v(c), T.text(c)))
if lineClusters:
(document, face, line) = T.sectionFromNode(ln)
srcfile = F.srcfile.v(ln)
srcLnNum = F.srcLnNum.v(ln)
lineClusters = [
(srcfile, document, face, srcLnNum, typ, f'"{atf}"')
for (typ, atf) in sorted(lineClusters)
]
clusters.extend(lineClusters)
return clusters
inlineCommentRe = re.compile(r"""\s*\(\$.*?\$\)\s*""")
noClusterRe = re.compile(r"""([0-9nx!])\(([A-Za-z0-9,/'#!?*+|.]+)\)""")
bChars = f"""[{clusterBstr}]*"""
eChars = f"""[{clusterEstr}#?!+*]*[ -]*"""
def noClusterRepl(match):
return f"{match.group(1)}§§{match.group(2)}±±"
def noClusterRemove(text):
return text.replace("§§", "(").replace("±±", ")")
def makeClusterEscRepl(cab, cae):
def repl(match):
return f"{cab}{match.group(2)}{cae}"
return repl
clusterEscRe = {}
clusterFindRe = {}
clusterEscRepl = {}
for (cab, cae, cob, coe, ctp) in clusterChars:
if cob == coe:
clusterEscRe[cab] = re.compile(f"""({re.escape(cob)}(.*?){re.escape(coe)})""")
clusterEscRepl[cab] = makeClusterEscRepl(cab, cae)
clusterFindRe[cab] = re.compile(
f"""{bChars}{re.escape(cab)}.+?{re.escape(cae)}{eChars}"""
)
def clusterEsc(text):
text = noClusterRe.sub(noClusterRepl, text)
for (cab, cae, cob, coe, ctp) in clusterChars:
if cob == coe:
text = clusterEscRe[cab].sub(clusterEscRepl[cab], text)
else:
text = text.replace(cob, cab).replace(coe, cae)
return text
def clusterUnesc(text):
for (cab, cae, cob, coe, ctp) in clusterChars:
text = text.replace(cab, cob).replace(cae, coe)
text = noClusterRemove(text)
return text
def grepClusters(gen):
clusters = []
initWorkarounds()
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
ln = match.group(1)
srcLn = match.group(2)
srcLn = checkWorkarounds(document, face, ln, srcLn)
lineClusters = []
srcLn = inlineCommentRe.sub("", srcLn)
srcLn = trimRe.sub(" ", srcLn)
srcLn = clusterEsc(srcLn)
for (cab, cae, cob, coe, ctp) in clusterChars:
css = clusterFindRe[cab].findall(srcLn)
for cs in css:
lineClusters.append((ctp, clusterUnesc(cs)))
lineClusters = [
(srcfile, document, face, srcLnNum, c, f'"{cs}"')
for (c, cs) in sorted(lineClusters)
]
clusters.extend(lineClusters)
finishWorkarounds()
return clusters
COMP.checkSanity(
(
"type",
"cluster",
),
grepClusters,
tfClusters,
)
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ type ◆ cluster IDENTICAL: all 23449 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ det ◆ "_{d}" = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ langalt ◆ "_{d}suen_-" = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ missing ◆ "[a-na] " = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ missing ◆ "[din-nam]" = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ missing ◆ "[ma]" = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ det ◆ "_{d}" = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ langalt ◆ "_{d}en-lil2_-" = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ det ◆ "_{d}" = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ det ◆ "_{d}" = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ langalt ◆ "_{d}[marduk]_ " = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ langalt ◆ "_{d}utu_ " = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ missing ◆ "[marduk]_ " = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ missing ◆ "[tim]" = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ det ◆ "_{d}" = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ det ◆ "{disz}" = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ langalt ◆ "_{d}suen a2-gal2 [dumu] um-mi-a-mesz_" = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ missing ◆ "[dumu] " = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ missing ◆ "[ma]" = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ missing ◆ "[ma] " = : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ det ◆ "{disz}" = and 23429 more Number of results: TF 23449; GREP 23449
True
Every type of cluster corresponds to a sign feature of the same name that has value 1 for each sign that occurs in a cluster of that type.
Per cluster type, we check whether the list of signs inside a cluster corresponds with the signs that have the cluster type feature set to 1.
def clusterAtf(signs):
atf = ""
for (i, s) in enumerate(signs):
atf += F.atfpre.v(s) or ""
atf += F.atf.v(s)
atf += F.atfpost.v(s) or ""
atf += F.after.v(s) or ""
return atf
def checkClusterType(cType, cB, cE):
excluded = {"empty", "comment"}
def getCluster(sign):
if sign is None:
return None
clusters = L.u(sign, otype="cluster")
cTarget = [cluster for cluster in clusters if F.type.v(cluster) == cType]
return cTarget[0] if cTarget else None
def tfClustersType():
clusters = []
for ln in F.otype.s("line"):
if F.comment.v(ln):
continue
(document, face, line) = T.sectionFromNode(ln)
srcfile = F.srcfile.v(ln)
srcLnNum = F.srcLnNum.v(ln)
prevS = None
curCluster = []
for s in L.d(ln, otype="sign"):
sType = F.type.v(s)
if sType in excluded:
continue
isIn = Fs(cType).v(s)
thisC = getCluster(s)
prevC = getCluster(prevS)
if thisC != prevC:
if curCluster:
clusters.append(
(srcfile, document, face, srcLnNum, clusterAtf(curCluster))
)
curCluster = []
if isIn:
curCluster.append(s)
prevS = s
if curCluster:
clusters.append(
(srcfile, document, face, srcLnNum, clusterAtf(curCluster))
)
curCluster = []
return clusters
def grepClustersType(gen):
clusters = []
initWorkarounds()
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
ln = match.group(1)
srcLn = match.group(2)
srcLn = checkWorkarounds(document, face, ln, srcLn)
srcLn = inlineCommentRe.sub("", srcLn)
srcLn = trimRe.sub(" ", srcLn)
srcLn = clusterEsc(srcLn)
(cab, cae, cob, coe) = clusterTypeInfo[cType]
css = clusterFindRe[cab].findall(srcLn)
for cs in css:
clusters.append((srcfile, document, face, srcLnNum, clusterUnesc(cs)))
finishWorkarounds()
return clusters
COMP.checkSanity(
("cluster",),
grepClustersType,
tfClustersType,
)
langalt _ _
¶checkClusterType("langalt", "_", "_")
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ cluster IDENTICAL: all 7600 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}suen_- = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}en-lil2_- = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}utu_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}[marduk]_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ _{d}suen a2-gal2 [dumu] um-mi-a-mesz_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ _a-sza3_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ _sza3-gud_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ _a-sza3 a-gar3_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ _uru_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _{d}utu_- = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _mu 7(disz) kam_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ _uru_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ _a-sza3 szuku_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ _nagar-mesz_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ _a-sza3 a-gar3 uru_ = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ _gir3-se3#-ga#_ = : AbB-primary ◆ P509373 ◆ reverse ◆ 54 ◆ _[a-sza3_ = : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ _a-[sza3 a-gar3_ = : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ _uru gan2_ = : AbB-primary ◆ P509373 ◆ reverse ◆ 57 ◆ _a-sza3_ = and 7580 more Number of results: TF 7600; GREP 7600
missing [ ]
¶checkClusterType("missing", "[", "]")
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ cluster IDENTICAL: all 7572 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [a-na] = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [din-nam] = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ [ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ [marduk]_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ [tim] = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ [dumu] = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ [ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ [ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ [tim] = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ [bi]- = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ [ku]- = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ [ma-lum] = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ [ir] = : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ [...] = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ [...] = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ [x x] = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ [...] = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ [x x] = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ [...] = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ [x x] = and 7552 more Number of results: TF 7572; GREP 7572
det { }
¶checkClusterType("det", "{", "}")
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ cluster IDENTICAL: all 6794 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d} = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d} = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d} = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d} = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ {disz} = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ _{d} = : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ {disz} = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ {disz} = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ {ki} = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _{d} = : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ {ki} = : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ {ki} = : AbB-primary ◆ P509373 ◆ reverse ◆ 60 ◆ _{d} = : AbB-primary ◆ P509374 ◆ obverse ◆ 103 ◆ _{d} = : AbB-primary ◆ P509374 ◆ obverse ◆ 103 ◆ _{d} = : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {disz} = : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {d} = : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {ki} = : AbB-primary ◆ P509376 ◆ obverse ◆ 208 ◆ {d} = : AbB-primary ◆ P509376 ◆ reverse ◆ 220 ◆ {disz} = and 6774 more Number of results: TF 6794; GREP 6794
uncertain ( )
¶checkClusterType("uncertain", "(", ")")
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ cluster IDENTICAL: all 1183 items = : AbB-primary ◆ P481192 ◆ obverse ◆ 460 ◆ (x)] = : AbB-primary ◆ P481192 ◆ reverse ◆ 466 ◆ [(x)] = : AbB-primary ◆ P481192 ◆ reverse ◆ 469 ◆ (x)] = : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ (x)] = : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ [(x) = : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ (x)] = : AbB-primary ◆ P510529 ◆ reverse ◆ 7772 ◆ [(x)] = : AbB-primary ◆ P510530 ◆ obverse ◆ 7821 ◆ [(x)] = : AbB-primary ◆ P510530 ◆ reverse ◆ 7845 ◆ (x)] = : AbB-primary ◆ P510531 ◆ obverse ◆ 7896 ◆ (x)] = : AbB-primary ◆ P510531 ◆ reverse ◆ 7898 ◆ (x)]- = : AbB-primary ◆ P510531 ◆ reverse ◆ 7901 ◆ (x)] = : AbB-primary ◆ P510531 ◆ reverse ◆ 7902 ◆ (x)] = : AbB-primary ◆ P510534 ◆ obverse ◆ 8046 ◆ [(x) = : AbB-primary ◆ P510534 ◆ obverse ◆ 8046 ◆ (x)] = : AbB-primary ◆ P510534 ◆ reverse ◆ 8055 ◆ [(x)] = : AbB-primary ◆ P510534 ◆ reverse ◆ 8067 ◆ (x)]- = : AbB-primary ◆ P510536 ◆ obverse ◆ 8165 ◆ [(x)] = : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ (x)] = : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ (x) = and 1163 more Number of results: TF 1183; GREP 1183
supplied < >
¶checkClusterType("supplied", "<", ">")
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ cluster IDENTICAL: all 231 items = : AbB-primary ◆ P389958 ◆ reverse ◆ 523 ◆ <ru>- = : AbB-primary ◆ P510526 ◆ obverse ◆ 7604 ◆ <li-ki-il> = : AbB-primary ◆ P510551 ◆ obverse ◆ 8942 ◆ <ti>- = : AbB-primary ◆ P510552 ◆ obverse ◆ 8992 ◆ <li>- = : AbB-primary ◆ P510552 ◆ obverse ◆ 8993 ◆ <ma> = : AbB-primary ◆ P510559 ◆ obverse ◆ 9402 ◆ <li>- = : AbB-primary ◆ P510561 ◆ obverse ◆ 9503 ◆ <ra>- = : AbB-primary ◆ P510562 ◆ obverse ◆ 9548 ◆ <ma?>- = : AbB-primary ◆ P510571 ◆ reverse ◆ 10054 ◆ <ut>- = : AbB-primary ◆ P510577 ◆ obverse ◆ 10396 ◆ <wi>- = : AbB-primary ◆ P510583 ◆ obverse ◆ 10748 ◆ <isz> = : AbB-primary ◆ P510588 ◆ obverse ◆ 11067 ◆ <wi>- = : AbB-primary ◆ P510591 ◆ reverse ◆ 11292 ◆ <ta>- = : AbB-primary ◆ P510592 ◆ reverse ◆ 11373 ◆ <ti> = : AbB-primary ◆ P510599 ◆ obverse ◆ 11750 ◆ <li>- = : AbB-primary ◆ P510606 ◆ obverse ◆ 12137 ◆ <t,u2>- = : AbB-primary ◆ P510613 ◆ obverse ◆ 12534 ◆ <ma> = : AbB-primary ◆ P510616 ◆ obverse ◆ 12719 ◆ <ta> = : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ <lu> = : AbB-primary ◆ P510617 ◆ reverse ◆ 12799 ◆ <li>- = and 211 more Number of results: TF 231; GREP 231
excised << >>
¶checkClusterType("excised", "<<", ">>")
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ cluster IDENTICAL: all 69 items = : AbB-primary ◆ P510530 ◆ reverse ◆ 7835 ◆ <<TE>>- = : AbB-primary ◆ P510543 ◆ obverse ◆ 8537 ◆ <<li>>- = : AbB-primary ◆ P510562 ◆ reverse ◆ 9563 ◆ <<KI>> = : AbB-primary ◆ P510573 ◆ reverse ◆ 10149 ◆ <<an-na>> = : AbB-primary ◆ P510576 ◆ reverse ◆ 10329 ◆ <<x>> = : AbB-primary ◆ P510621 ◆ reverse ◆ 13006 ◆ <<ti>> = : AbB-primary ◆ P497370 ◆ obverse ◆ 13101 ◆ <<mar>>- = : AbB-primary ◆ P510634 ◆ obverse ◆ 13743 ◆ <<ma>> = : AbB-primary ◆ P510660 ◆ reverse ◆ 15093 ◆ <<i-na>> = : AbB-primary ◆ P510661 ◆ reverse ◆ 15147 ◆ <<kam iti>> = : AbB-primary ◆ P510686 ◆ obverse ◆ 16380 ◆ <<qa2-be2-e>> = : AbB-primary ◆ P510686 ◆ obverse ◆ 16383 ◆ <<bi>>- = : AbB-primary ◆ P510688 ◆ obverse ◆ 16513 ◆ <<i>> = : AbB-primary ◆ P510725 ◆ obverse ◆ 18485 ◆ <<um>> = : AbB-primary ◆ P510775 ◆ obverse ◆ 21373 ◆ <<ti>> = : AbB-primary ◆ P510798 ◆ obverse ◆ 22630 ◆ <<gur>> = : AbB-primary ◆ P510807 ◆ obverse ◆ 23150 ◆ <<ID>>- = : AbB-primary ◆ P510821 ◆ obverse ◆ 23894 ◆ <<u2-ul>> = : AbB-primary ◆ P510861 ◆ reverse ◆ 26058 ◆ <<u2-sza#>> = : AbB-primary ◆ P413589 ◆ reverse ◆ 27013 ◆ <<sza-li-im>> = and 49 more Number of results: TF 69; GREP 69
Here is an overview of the occurrence of primes.
There are primes within sign readings, they denote a numerical property.
Primes on column and line numbers denote that the given number deviates from the physical number because of damage.
N.B.: This gathers primes on signs, column numbers and case numbers.
First a bit of exploration.
primeFt = ("primecol", "primeln")
for ft in primeFt:
for (value, frequency) in Fs(ft).freqList():
print(f"{ft:<8}: {frequency:>5} x {value}")
primecol: 4 x 1 primeln : 1825 x 1
We also want so see the node types of primed entities.
for ft in primeFt:
primed = collections.Counter()
for n in Fs(ft).s(1):
primed[F.otype.v(n)] += 1
for x in sorted(primed.items()):
print(f"{ft:<8}: {x[1]:>5} x {x[0]}")
primecol: 4 x line primeln : 1825 x line
Now let us check the primes with grep, directly in the source files.
nonSignStuff = r"""()\[\]{}<>|.#!?+*"""
nonSignRe = re.compile(f"""[{nonSignStuff}]+""")
def tfPrimes():
primes = []
for ln in F.otype.s("line"):
(document, face, line) = T.sectionFromNode(ln)
srcfile = F.srcfile.v(ln)
srcln = F.srcLnNum.v(ln)
primeln = F.primeln.v(ln)
primecol = F.primecol.v(ln)
if primecol and (
not L.p(ln, otype="line") or F.col.v(ln) == F.col.v(L.p(ln, otype="line")[0])
):
primes.append(
(srcfile, document, face, srcln - 1, "column", f"{F.col.v(ln)}{prime}")
)
if primeln:
primes.append(
(srcfile, document, face, srcln, "line", f"{F.ln.v(ln)}{prime}.")
)
for s in L.d(ln, otype="sign"):
reading = F.reading.v(s)
if reading:
if prime in reading:
rep = nonSignRe.sub("", F.atf.v(s))
primes.append((srcfile, document, face, srcln, "sign", rep))
return primes
material = f"""A-Za-z0-9,'/{nonSignStuff}"""
materialP = f"{material}{prime}"
primeRe = re.compile(f"""[{material}]*{prime}[{materialP}]*""")
readingRe = re.compile(r"""!\([^)]+\)""")
def grepPrimes(gen):
primes = []
prevColumn = None
for (src, document, face, column, srcln, line) in gen:
if column and column != prevColumn:
if "'" in column:
primes.append((src, document, face, srcln, "column", column))
prevColumn = column
fields = line.split(maxsplit=1)
lineNum = fields[0]
if prime in lineNum:
primes.append((src, document, face, srcln, "line", lineNum))
if len(fields) != 2:
continue
if lineNum.startswith("$") or lineNum.startswith("#"):
continue
trans = fields[1]
if prime in trans:
trans = readingRe.sub("", trans)
hits = primeRe.findall(trans)
for hit in hits:
hit = nonSignRe.sub("", hit)
primes.append((src, document, face, srcln, "sign", hit))
return primes
COMP.checkSanity(
(
"kind",
"prime",
),
grepPrimes,
tfPrimes,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ kind ◆ prime IDENTICAL: all 1865 items = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ line ◆ 1'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ line ◆ 2'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ line ◆ 3'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 52 ◆ line ◆ 4'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 53 ◆ line ◆ 5'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 54 ◆ line ◆ 6'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ line ◆ 7'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 56 ◆ line ◆ 8'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 57 ◆ line ◆ 9'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 58 ◆ line ◆ 10'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 59 ◆ line ◆ 11'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 60 ◆ line ◆ 12'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 61 ◆ line ◆ 13'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 62 ◆ line ◆ 14'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 63 ◆ line ◆ 15'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 64 ◆ line ◆ 16'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 65 ◆ line ◆ 17'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 66 ◆ line ◆ 18'. = : AbB-primary ◆ P509373 ◆ reverse ◆ 67 ◆ line ◆ 19'. = : AbB-primary ◆ P481192 ◆ obverse ◆ 448 ◆ line ◆ 1'. = and 1845 more Number of results: TF 1865; GREP 1865
True
Words are space separated parts of a transcription line (not counting inline comments).
Words have very few features, currently only one: atf
.
def tfWords():
words = []
for w in F.otype.s("word"):
(document, face, line) = T.sectionFromNode(w)
ln = L.u(w, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
atf = F.atf.v(w)
if atf:
words.append((srcfile, document, face, srcln, atf))
return words
commentLineRe = re.compile(r"""^\$\s*(.*)""")
commentInlineRe = re.compile(r"""\(\$ (.*?) \$\)""")
def grepWords(gen):
words = []
initWorkarounds()
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
ln = match.group(1)
srcLn = match.group(2)
srcLn = checkWorkarounds(document, face, ln, srcLn)
srcLn = commentInlineRe.sub("", srcLn)
for w in srcLn.split():
words.append((srcfile, document, face, srcLnNum, w))
finishWorkarounds()
return words
COMP.checkSanity(
("sign",),
grepWords,
tfWords,
)
workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_" workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na" workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]" workaround applied: "_iti gu4-si#-[sa2_ ...]" ALL 4 WORKAROUNDS APPLIED HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 76503 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [a-na] = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}suen_-i-[din-nam] = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2-bi2-[ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um-ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}en-lil2_-sza-du-u2-ni-ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}utu_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ u3 = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}[marduk]_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ a-na = : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ da-ri-a-[tim] = : AbB-primary ◆ P509373 ◆ obverse ◆ 35 ◆ li-ba-al-li-t,u2-u2-ka = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ {disz}sze-ep-_{d}suen = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ a2-gal2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ [dumu] = : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ um-mi-a-mesz_ = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ ki-a-am = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ u2-lam-mi-da-an-ni = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ um-[ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ szu-u2-[ma] = : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ {disz}sa-am-su-ba-ah-li = and 76483 more Number of results: TF 76503; GREP 76503
True
flagMap = {
"#": "damage",
"?": "question",
"!": "remarkable",
"*": "collated",
}
flagChars = list(flagMap.keys())
flagFeatures = list(flagMap.values())
flagNodeOverview = collections.Counter()
flagNodeTypes = set()
for n in N():
for ft in flagFeatures:
value = Fs(ft).v(n)
if not value:
continue
nType = F.otype.v(n)
flagNodeTypes.add(nType)
flagNodeOverview[f"{nType}-{ft}-{value}"] += 1
for (combi, amount) in sorted(flagNodeOverview.items(), key=lambda x: (-x[1], x[0])):
print(f"{amount:>6} x {combi}")
9974 x sign-damage-1 560 x sign-question-1 99 x sign-remarkable-1 13 x sign-collated-1
Let us see whether there are any cooccurrences of flags.
flagCombis = collections.Counter()
for n in N():
if F.otype.v(n) not in flagNodeTypes:
continue
values = []
for ft in flagFeatures:
rawValue = Fs(ft).v(n)
value = (
f'{"*":^10}'
if rawValue is None
else f"{ft:^10}"
if rawValue
else f'{"":^10}'
)
values.append(value)
combi = "-".join(values)
flagCombis[combi] += 1
for (combi, amount) in sorted(flagCombis.items(), key=lambda x: (-x[1], x[0])):
print(f"{amount:>6} x {combi}")
192721 x * - * - * - * 9830 x damage - * - * - * 421 x * - question - * - * 138 x damage - question - * - * 91 x * - * -remarkable- * 9 x * - * - * - collated 5 x damage - * -remarkable- * 2 x * - * -remarkable- collated 1 x * - question -remarkable- collated 1 x damage - * - * - collated
We need to address the question about order of flags.
A quick inspection in the corpus yields:
#?
) is frequent, question-damage (?#
) is rare#!
) in all cases!*
) in all cases#*
) in all cases?!
) in all casesBased on this observation, and assuming that the order between damage and question is not relevant, we produce flags always in the order:
When grepping, we have to normalize ?#
to #?
.
def tfFlags():
discrepancies = collections.Counter()
flags = []
for n in F.otype.s("sign"):
values = [Fs(ft).v(n) for ft in flagFeatures]
if all(value is None for value in values):
continue
fl = ""
for (i, val) in enumerate(values):
if val:
fl += flagChars[i]
checkFl = F.flags.v(n) or ""
if checkFl != fl:
msg = "OK" if set(fl) == set(checkFl) else "PROBLEM"
discrepancies[f"{fl} vs {checkFl} ({msg})"] += 1
(document, face, line) = T.sectionFromNode(n)
ln = L.u(n, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
opx = F.operator.v(n) == "x"
num = F.type.v(n) == "numeral"
reading = (
F.grapheme.v(n) or F.reading.v(n)
if opx
else F.reading.v(n) or F.grapheme.v(n)
)
br = ")" if num or opx else ""
flags.append((srcfile, document, face, srcln, f"{reading}{br}{checkFl}"))
if not discrepancies:
print("NO DISCREPANCIES")
else:
for (d, amount) in sorted(
discrepancies.items(),
key=lambda x: (-x[1], x[0]),
):
print(f"{d:<4} {amount:>3} x")
return flags
flagsRe = re.compile(r"""[A-Za-z0-9,'.]+\)?[#*!?]+""")
def grepFlags(gen):
flags = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = trimRe.sub(" ", srcLn)
srcLn = srcLn.replace("!(", "§§")
fls = flagsRe.findall(srcLn)
if match:
for f in fls:
flags.append((srcfile, document, face, srcLnNum, f.replace("§§", "!(")))
return flags
COMP.checkSanity(
("sign",),
grepFlags,
tfFlags,
)
#? vs ?# (OK) 7 x HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 10498 items = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ se3# = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ ga# = : AbB-primary ◆ P509375 ◆ obverse ◆ 151 ◆ na# = : AbB-primary ◆ P509375 ◆ obverse ◆ 152 ◆ bi2# = : AbB-primary ◆ P509375 ◆ reverse ◆ 166 ◆ il# = : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ am# = : AbB-primary ◆ P509377 ◆ obverse ◆ 257 ◆ ia# = : AbB-primary ◆ P509377 ◆ obverse ◆ 258 ◆ ma# = : AbB-primary ◆ P509377 ◆ obverse ◆ 260 ◆ ak# = : AbB-primary ◆ P509377 ◆ obverse ◆ 260 ◆ kum# = : AbB-primary ◆ P509377 ◆ reverse ◆ 269 ◆ ak# = : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ ta# = : AbB-primary ◆ P509377 ◆ reverse ◆ 272 ◆ na# = : AbB-primary ◆ P509377 ◆ reverse ◆ 279 ◆ mu# = : AbB-primary ◆ P481190 ◆ obverse ◆ 355 ◆ nu# = : AbB-primary ◆ P481190 ◆ obverse ◆ 355 ◆ ur2# = : AbB-primary ◆ P481190 ◆ obverse ◆ 357 ◆ din# = : AbB-primary ◆ P481190 ◆ obverse ◆ 357 ◆ nam# = : AbB-primary ◆ P481191 ◆ reverse ◆ 410 ◆ szu# = : AbB-primary ◆ P481192 ◆ obverse ◆ 450 ◆ ka# = and 10478 more Number of results: TF 10498; GREP 10498
True
We have arrived at the level of signs.
We will compare them, and all the structure we see in and around them, such as readings, graphemes, numerals, operators and flags.
First we have a glance at what happens between the signs, though.
There might be material between a sign and the next one (if any).
The most usual ones are the -
, separating signs within words and separating words.
Here is the complete overview.
for (c, amount) in F.after.freqList():
print(f"{c} {amount:>6} x")
- 118903 x 100198 x / 15 x . 5 x + 2 x : 1 x
Now an overview of the types of signs.
signTypes = collections.Counter()
for s in F.otype.s("sign"):
signTypes[F.type.v(s)] += 1
for (t, amount) in sorted(
signTypes.items(),
key=lambda x: (-x[1], x[0]),
):
print(f"{t:<15} {amount:>6} x")
reading 188292 x unknown 8761 x numeral 2184 x ellipsis 1617 x grapheme 1272 x commentline 969 x complex 122 x comment 2 x
We check these types individually, from the least frequent to the most frequent.
These are inline comments of the form ($
text $)
.
def tfSignsComment():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ != "comment":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
comment = F.comment.v(s)
signs.append((srcfile, document, face, srcln, comment))
return signs
def grepSignsComment(gen):
signs = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
cms = commentInlineRe.findall(srcLn)
for c in cms:
signs.append((srcfile, document, face, srcLnNum, c))
return signs
COMP.checkSanity(
("sign",),
grepSignsComment,
tfSignsComment,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 2 items = : AbB-secondary ◆ P275088 ◆ reverse ◆ 23687 ◆ blank space = : AbB-secondary ◆ P275104 ◆ reverse ◆ 24524 ◆ blank space = no more items Number of results: TF 2; GREP 2
True
We check whether all complex signs have come through exactly right.
These are the signs of the form x(ZZZ)
and !(ZZZ)
The characters x
and !
are called the operators in these complexes.
Here is the distribution of operators.
for (c, amount) in F.operator.freqList():
print(f"{c} {amount:>6} x")
! 117 x x 5 x
We do two checks: an easy check involving the atf
feature of a sign and a more involved check using the
operator
, reading
, and grapheme
features of a sign.
def tfComplexes():
complexes = []
for s in F.otype.s("sign"):
if F.type.v(s) != "complex":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
atf = F.atf.v(s)
complexes.append((srcfile, document, face, srcln, atf))
return complexes
complexRe = re.compile("""[a-z][a-z,0-9']*[#!?*]*""" r"[!x]\([^)]+\)[#!?*]*")
def grepComplexes(gen):
complexes = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = commentInlineRe.sub("", srcLn)
cls = complexRe.findall(srcLn)
for c in cls:
complexes.append((srcfile, document, face, srcLnNum, c))
return complexes
COMP.checkSanity(
("complex",),
grepComplexes,
tfComplexes,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ complex IDENTICAL: all 122 items = : AbB-primary ◆ P510533 ◆ obverse ◆ 7999 ◆ ku!(LU) = : AbB-primary ◆ P510560 ◆ reverse ◆ 9461 ◆ im!(NIM) = : AbB-primary ◆ P510560 ◆ reverse ◆ 9462 ◆ tam!(TUM) = : AbB-primary ◆ P510562 ◆ obverse ◆ 9543 ◆ tum!(TIM) = : AbB-primary ◆ P510562 ◆ reverse ◆ 9559 ◆ szi!(SZU) = : AbB-primary ◆ P510564 ◆ reverse ◆ 9677 ◆ bu!(BI) = : AbB-primary ◆ P510566 ◆ obverse ◆ 9762 ◆ tim!(IM) = : AbB-primary ◆ P510566 ◆ reverse ◆ 9777 ◆ lam!(IB) = : AbB-primary ◆ P510569 ◆ obverse ◆ 9920 ◆ ka!(KI) = : AbB-primary ◆ P510572 ◆ obverse ◆ 10092 ◆ ba!(SZA) = : AbB-primary ◆ P510578 ◆ obverse ◆ 10463 ◆ tum!(TAM) = : AbB-primary ◆ P510583 ◆ obverse ◆ 10750 ◆ tim!(TUM) = : AbB-primary ◆ P510588 ◆ obverse ◆ 11075 ◆ na!(HU) = : AbB-primary ◆ P510616 ◆ obverse ◆ 12724 ◆ nam!(LAM) = : AbB-primary ◆ P510616 ◆ obverse ◆ 12725 ◆ nam!(LAM) = : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ u2!(NA) = : AbB-primary ◆ P510623 ◆ reverse ◆ 13178 ◆ ze2!(SZE) = : AbB-primary ◆ P510626 ◆ obverse ◆ 13333 ◆ mi!(UL) = : AbB-primary ◆ P510635 ◆ reverse ◆ 13797 ◆ ir!(AR) = : AbB-primary ◆ P510635 ◆ reverse ◆ 13798 ◆ zimbir!(|UD.KIB.NU|) = and 102 more Number of results: TF 122; GREP 122
True
def tfComplexes2():
complexes = []
for s in F.otype.s("sign"):
if F.type.v(s) != "complex":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
fl = F.flags.v(s) or ""
op = F.operator.v(s)
atf = (
f"{F.reading.v(s)}{fl}{op}({F.grapheme.v(s)})"
if op == "!"
else f"{F.reading.v(s)}{op}({F.grapheme.v(s)}){fl}"
)
complexes.append((srcfile, document, face, srcln, atf))
return complexes
COMP.checkSanity(
("numeral",),
grepComplexes,
tfComplexes2,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ numeral IDENTICAL: all 122 items = : AbB-primary ◆ P510533 ◆ obverse ◆ 7999 ◆ ku!(LU) = : AbB-primary ◆ P510560 ◆ reverse ◆ 9461 ◆ im!(NIM) = : AbB-primary ◆ P510560 ◆ reverse ◆ 9462 ◆ tam!(TUM) = : AbB-primary ◆ P510562 ◆ obverse ◆ 9543 ◆ tum!(TIM) = : AbB-primary ◆ P510562 ◆ reverse ◆ 9559 ◆ szi!(SZU) = : AbB-primary ◆ P510564 ◆ reverse ◆ 9677 ◆ bu!(BI) = : AbB-primary ◆ P510566 ◆ obverse ◆ 9762 ◆ tim!(IM) = : AbB-primary ◆ P510566 ◆ reverse ◆ 9777 ◆ lam!(IB) = : AbB-primary ◆ P510569 ◆ obverse ◆ 9920 ◆ ka!(KI) = : AbB-primary ◆ P510572 ◆ obverse ◆ 10092 ◆ ba!(SZA) = : AbB-primary ◆ P510578 ◆ obverse ◆ 10463 ◆ tum!(TAM) = : AbB-primary ◆ P510583 ◆ obverse ◆ 10750 ◆ tim!(TUM) = : AbB-primary ◆ P510588 ◆ obverse ◆ 11075 ◆ na!(HU) = : AbB-primary ◆ P510616 ◆ obverse ◆ 12724 ◆ nam!(LAM) = : AbB-primary ◆ P510616 ◆ obverse ◆ 12725 ◆ nam!(LAM) = : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ u2!(NA) = : AbB-primary ◆ P510623 ◆ reverse ◆ 13178 ◆ ze2!(SZE) = : AbB-primary ◆ P510626 ◆ obverse ◆ 13333 ◆ mi!(UL) = : AbB-primary ◆ P510635 ◆ reverse ◆ 13797 ◆ ir!(AR) = : AbB-primary ◆ P510635 ◆ reverse ◆ 13798 ◆ zimbir!(|UD.KIB.NU|) = and 102 more Number of results: TF 122; GREP 122
True
Comment line signs are artificial signs introduced on comment lines.
Comment lines have no transcribed material, but they annotate the structure ($
) or the
line contents (#
) of other lines.
In order to anchor these comments to the text sequence, we have made extra signs for these lines.
For each comment line, there is one such sign, and it has type commentline
.
The comments of these lines are stored in the comment
feature on those signs.
def tfSignsEmpty():
comments = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ != "commentline":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
s = L.d(ln, otype="sign")
if not s or F.type.v(s[0]) != "commentline":
continue
comment = F.comment.v(s[0])
ln = F.lnc.v(ln)
comments.append((srcfile, document, face, srcln, ln, comment))
return comments
def grepSignsEmpty(gen):
comments = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = commentLineRe.match(srcLn)
if not match:
continue
cms = match.group(1)
ln = srcLn[0]
comments.append((srcfile, document, face, srcLnNum, ln, cms))
return comments
COMP.checkSanity(
("kind", "comment"),
grepSignsEmpty,
tfSignsEmpty,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ kind ◆ comment IDENTICAL: all 969 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 46 ◆ $ ◆ rest broken = : AbB-primary ◆ P509373 ◆ reverse ◆ 48 ◆ $ ◆ beginning broken = : AbB-primary ◆ P509375 ◆ obverse ◆ 154 ◆ $ ◆ rest missing = : AbB-primary ◆ P507628 ◆ obverse ◆ 319 ◆ $ ◆ blank space = : AbB-primary ◆ P507628 ◆ reverse ◆ 321 ◆ $ ◆ blank space = : AbB-primary ◆ P481192 ◆ obverse ◆ 447 ◆ $ ◆ beginning broken = : AbB-primary ◆ P481192 ◆ obverse ◆ 462 ◆ $ ◆ rest broken = : AbB-primary ◆ P481192 ◆ reverse ◆ 464 ◆ $ ◆ beginning broken = : AbB-primary ◆ P481192 ◆ reverse ◆ 480 ◆ $ ◆ rest broken = : AbB-primary ◆ P389958 ◆ obverse ◆ 512 ◆ $ ◆ beginning broken = : AbB-primary ◆ P389256 ◆ obverse ◆ 556 ◆ $ ◆ beginning broken = : AbB-primary ◆ P389256 ◆ obverse ◆ 562 ◆ $ ◆ rest broken = : AbB-primary ◆ P389256 ◆ reverse ◆ 564 ◆ $ ◆ beginning broken = : AbB-primary ◆ P389256 ◆ reverse ◆ 568 ◆ $ ◆ rest broken = : AbB-primary ◆ P510534 ◆ obverse ◆ 8049 ◆ $ ◆ rest broken = : AbB-primary ◆ P510534 ◆ reverse ◆ 8051 ◆ $ ◆ beginning broken = : AbB-primary ◆ P510536 ◆ reverse ◆ 8180 ◆ $ ◆ single ruling = : AbB-primary ◆ P510537 ◆ reverse ◆ 8222 ◆ $ ◆ blank space = : AbB-primary ◆ P510541 ◆ reverse ◆ 8441 ◆ $ ◆ rest broken = : AbB-primary ◆ P510543 ◆ reverse ◆ 8554 ◆ $ ◆ single ruling = and 949 more Number of results: TF 969; GREP 969
True
These are signs that do not contain a reading (lower case name of a transcribed unit) but a grapheme (upper case name of a transcribed unit).
Complex signs that have a grapheme in their x(GGG)
or !(GGG)
parts are not included.
def tfSignsGrapheme():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ != "grapheme":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
d = F.grapheme.v(s)
signs.append((srcfile, document, face, srcln, d))
return signs
graphemeRe = re.compile(r"""[A-WYZ][A-WYZ,0-9]*""")
excludeRe = re.compile(r"""[x!]\([^)]+\)""")
def grepSignsGrapheme(gen):
signs = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = commentInlineRe.sub("", srcLn)
srcLn = excludeRe.sub("", srcLn)
data = graphemeRe.findall(srcLn)
for d in data:
signs.append((srcfile, document, face, srcLnNum, d))
return signs
COMP.checkSanity(
("sign",),
grepSignsGrapheme,
tfSignsGrapheme,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 1272 items = : AbB-primary ◆ P481191 ◆ seal 1 ◆ 415 ◆ ARAD = : AbB-primary ◆ P481192 ◆ obverse ◆ 455 ◆ AD = : AbB-primary ◆ P481192 ◆ obverse ◆ 455 ◆ DA = : AbB-primary ◆ P389958 ◆ obverse ◆ 518 ◆ DA = : AbB-primary ◆ P510527 ◆ reverse ◆ 7673 ◆ SZESZ = : AbB-primary ◆ P510530 ◆ reverse ◆ 7833 ◆ ARAD = : AbB-primary ◆ P510530 ◆ reverse ◆ 7835 ◆ TE = : AbB-primary ◆ P510530 ◆ reverse ◆ 7839 ◆ GAN2 = : AbB-primary ◆ P510530 ◆ reverse ◆ 7844 ◆ ARAD = : AbB-primary ◆ P510530 ◆ reverse ◆ 7847 ◆ ARAD = : AbB-primary ◆ P510530 ◆ reverse ◆ 7848 ◆ ARAD = : AbB-primary ◆ P510534 ◆ reverse ◆ 8054 ◆ ARAD = : AbB-primary ◆ P510536 ◆ obverse ◆ 8163 ◆ ARAD = : AbB-primary ◆ P510536 ◆ obverse ◆ 8168 ◆ ARAD = : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ SU = : AbB-primary ◆ P510537 ◆ obverse ◆ 8220 ◆ SU = : AbB-primary ◆ P510541 ◆ obverse ◆ 8407 ◆ GAN2 = : AbB-primary ◆ P510541 ◆ obverse ◆ 8412 ◆ GAN2 = : AbB-primary ◆ P510541 ◆ obverse ◆ 8416 ◆ GAN2 = : AbB-primary ◆ P510541 ◆ reverse ◆ 8425 ◆ GAN2 = and 1252 more Number of results: TF 1272; GREP 1272
True
These are signs that are represented as ...
.
def tfSignsEllipsis():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ != "ellipsis":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
d = F.grapheme.v(s)
signs.append((srcfile, document, face, srcln, d))
return signs
ellipsisRe = re.compile(r"""\.\.\.""")
def grepSignsEllipsis(gen):
signs = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = commentInlineRe.sub("", srcLn)
data = ellipsisRe.findall(srcLn)
for d in data:
signs.append((srcfile, document, face, srcLnNum, d))
return signs
COMP.checkSanity(
("sign",),
grepSignsEllipsis,
tfSignsEllipsis,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 1617 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ ... = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 63 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 64 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 65 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 66 ◆ ... = : AbB-primary ◆ P509373 ◆ reverse ◆ 67 ◆ ... = : AbB-primary ◆ P509374 ◆ obverse ◆ 102 ◆ ... = : AbB-primary ◆ P509374 ◆ obverse ◆ 105 ◆ ... = : AbB-primary ◆ P509377 ◆ obverse ◆ 254 ◆ ... = : AbB-primary ◆ P509377 ◆ obverse ◆ 255 ◆ ... = : AbB-primary ◆ P509377 ◆ reverse ◆ 268 ◆ ... = : AbB-primary ◆ P509377 ◆ reverse ◆ 270 ◆ ... = : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ ... = : AbB-primary ◆ P509377 ◆ reverse ◆ 272 ◆ ... = : AbB-primary ◆ P509377 ◆ reverse ◆ 278 ◆ ... = : AbB-primary ◆ P509377 ◆ reverse ◆ 280 ◆ ... = : AbB-primary ◆ P481191 ◆ seal 1 ◆ 413 ◆ ... = and 1597 more Number of results: TF 1617; GREP 1617
True
We check whether all numerals have come through exactly right.
We do two checks: an easy check involving the atf
feature of a sign and a more involved check using the
repeat
, fraction
and reading
features of a sign.
def tfNumerals():
numerals = []
for s in F.otype.s("sign"):
if F.type.v(s) != "numeral":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
atf = F.atf.v(s).rstrip(flaggingStr)
numerals.append((srcfile, document, face, srcln, atf))
return numerals
numeralRe = re.compile("((?:n|(?:[0-9/]+))" r"\([^)]+\))")
def grepNumerals(gen):
numerals = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = commentInlineRe.sub("", srcLn)
nls = numeralRe.findall(srcLn)
for n in nls:
numerals.append((srcfile, document, face, srcLnNum, n))
return numerals
COMP.checkSanity(
("numeral",),
grepNumerals,
tfNumerals,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ numeral IDENTICAL: all 2184 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 2(esze3) = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 7(disz) = : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 2(esze3) = : AbB-primary ◆ P509374 ◆ obverse ◆ 108 ◆ 1(disz) = : AbB-primary ◆ P509374 ◆ obverse ◆ 111 ◆ 1(disz) = : AbB-primary ◆ P509374 ◆ obverse ◆ 113 ◆ 2(disz) = : AbB-primary ◆ P509374 ◆ reverse ◆ 117 ◆ 2(disz) = : AbB-primary ◆ P509376 ◆ obverse ◆ 203 ◆ 4(disz) = : AbB-primary ◆ P509377 ◆ obverse ◆ 259 ◆ 3(u) = : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 1(disz) = : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 3(disz) = : AbB-primary ◆ P509377 ◆ reverse ◆ 276 ◆ 3(u) = : AbB-primary ◆ P509377 ◆ reverse ◆ 277 ◆ 6(disz) = : AbB-primary ◆ P481191 ◆ obverse ◆ 396 ◆ 2(u) = : AbB-primary ◆ P481191 ◆ reverse ◆ 406 ◆ 2(u) = : AbB-primary ◆ P481192 ◆ reverse ◆ 470 ◆ 1(asz) = : AbB-primary ◆ P481192 ◆ reverse ◆ 472 ◆ 1(asz) = : AbB-primary ◆ P389958 ◆ obverse ◆ 519 ◆ 5(disz) = : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 1(disz) = : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 5/6(disz) = and 2164 more Number of results: TF 2184; GREP 2184
True
def tfNumerals2():
numerals = []
for s in F.otype.s("sign"):
if F.type.v(s) != "numeral":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
repeat = F.repeat.v(s)
if repeat == -1:
repeat = "n"
atf = f"{repeat or F.fraction.v(s)}({F.reading.v(s)})"
numerals.append((srcfile, document, face, srcln, atf))
return numerals
COMP.checkSanity(
("numeral",),
grepNumerals,
tfNumerals2,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ numeral IDENTICAL: all 2184 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 2(esze3) = : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 7(disz) = : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 2(esze3) = : AbB-primary ◆ P509374 ◆ obverse ◆ 108 ◆ 1(disz) = : AbB-primary ◆ P509374 ◆ obverse ◆ 111 ◆ 1(disz) = : AbB-primary ◆ P509374 ◆ obverse ◆ 113 ◆ 2(disz) = : AbB-primary ◆ P509374 ◆ reverse ◆ 117 ◆ 2(disz) = : AbB-primary ◆ P509376 ◆ obverse ◆ 203 ◆ 4(disz) = : AbB-primary ◆ P509377 ◆ obverse ◆ 259 ◆ 3(u) = : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 1(disz) = : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 3(disz) = : AbB-primary ◆ P509377 ◆ reverse ◆ 276 ◆ 3(u) = : AbB-primary ◆ P509377 ◆ reverse ◆ 277 ◆ 6(disz) = : AbB-primary ◆ P481191 ◆ obverse ◆ 396 ◆ 2(u) = : AbB-primary ◆ P481191 ◆ reverse ◆ 406 ◆ 2(u) = : AbB-primary ◆ P481192 ◆ reverse ◆ 470 ◆ 1(asz) = : AbB-primary ◆ P481192 ◆ reverse ◆ 472 ◆ 1(asz) = : AbB-primary ◆ P389958 ◆ obverse ◆ 519 ◆ 5(disz) = : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 1(disz) = : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 5/6(disz) = and 2164 more Number of results: TF 2184; GREP 2184
True
These are not unknown signs but signs that represent unknown readings/graphemes.
They are represented as x
or X
, n
or N
.
def tfSignsUnknown():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ != "unknown":
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
d = F.reading.v(s) or F.grapheme.v(s)
signs.append((srcfile, document, face, srcln, d))
return signs
unknownRe = re.compile(r"""([xX])|(?:(?:_|\b)([nN])(?:_|\b)(?!\())""")
def grepSignsUnknown(gen):
signs = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = commentInlineRe.sub("", srcLn)
srcLn = excludeRe.sub("", srcLn)
data = unknownRe.findall(srcLn)
for result in data:
d = result[0] or result[1]
signs.append((srcfile, document, face, srcLnNum, d))
return signs
COMP.checkSanity(
("sign",),
grepSignsUnknown,
tfSignsUnknown,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 8761 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x = : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ x = : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ x = and 8741 more Number of results: TF 8761; GREP 8761
True
These are signs that contain a reading (lower case name of a transcribed unit).
We also include the readings of complex signs that also have a grapheme in their representations:
rrrx(GGG)
or rrr!(GGG)
def tfSignsReading():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ not in {"reading", "complex", "numeral"}:
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
d = F.reading.v(s)
signs.append((srcfile, document, face, srcln, d))
return signs
readingRe = re.compile(r"""[a-wyz'][a-wyz,0-9']*""")
nExcludeRe = re.compile(r"""(?:_|\b)n(?:_|\b)""")
def grepSignsReading(gen):
signs = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = commentInlineRe.sub("", srcLn)
srcLn = nExcludeRe.sub("", srcLn)
data = readingRe.findall(srcLn)
for d in data:
signs.append((srcfile, document, face, srcLnNum, d))
return signs
COMP.checkSanity(
("sign",),
grepSignsReading,
tfSignsReading,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 190598 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma = and 190578 more Number of results: TF 190598; GREP 190598
True
Just for redundancy, we do a comparison all simple, non-empty signs in the transcriptions and in TF.
So: no numerals, no rrrx(GGG)
, no rrr!(GGG)
.
We do it based on the atf
feature and based on the other features.
def tfSigns():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ in {"complex", "numeral", "commentline"}:
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
atf = F.atf.v(s).rstrip(flaggingStr)
signs.append((srcfile, document, face, srcln, atf))
return signs
signRe = re.compile(
r"""x|(?:\.\.\.)|(?:[a-wyzA-WYZ'][a-wyzA-WYZ,0-9']*)|(?:\(\$.*?\$\))"""
)
def grepSigns(gen):
signs = []
for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
match = transRe.match(srcLn)
if not match:
continue
srcLn = match.group(2)
srcLn = numeralRe.sub("", srcLn)
srcLn = complexRe.sub("", srcLn)
sns = signRe.findall(srcLn)
for s in sns:
signs.append((srcfile, document, face, srcLnNum, s))
return signs
COMP.checkSanity(
("sign",),
grepSigns,
tfSigns,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 199944 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma = and 199924 more Number of results: TF 199944; GREP 199944
True
def tfSigns2():
signs = []
for s in F.otype.s("sign"):
typ = F.type.v(s)
if typ in {"complex", "numeral", "commentline"}:
continue
(document, face, line) = T.sectionFromNode(s)
ln = L.u(s, otype="line")[0]
d = T.documentNode(document)
srcfile = F.srcfile.v(d)
srcln = F.srcLnNum.v(ln)
atf = (
F.reading.v(s)
if typ == "reading"
else f"($ {F.comment.v(s)} $)"
if typ == "comment"
else F.grapheme.v(s)
if typ == "grapheme" or typ == "ellipsis"
else F.reading.v(s) or F.grapheme.v(s)
if typ == "unknown"
else "§§§"
)
signs.append((srcfile, document, face, srcln, atf))
return signs
COMP.checkSanity(
("sign",),
grepSigns,
tfSigns2,
)
HEAD : srcfile ◆ tablet ◆ ln ◆ sign IDENTICAL: all 199944 items = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din = : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2 = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni = : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma = and 199924 more Number of results: TF 199944; GREP 199944
True
Here ends the checking.
This notebook has tested all patterns and quantities found in the transcriptions.
By a somewhat convoluted GREP we have extracted patterns from the sources.
By somewhat contrived TF alchemy we have produced the same patterns from the Text-Fabric representation of the sources.
Then we have made a rigorous comparison: we have checked wether both methods found exactly the same sequence of values.
And that turned out to be so!