Notebook

Checks¶

Various checks on the correctness of the transformation from ASCII transcriptions to a text-fabric data set.

The diagnostics of the transformation contains valuable issues that may be used to correct mistakes in the sources. Or, equally likely, they correspond to misunderstandings on my (Dirk's) part of the model that underlies the transcriptions.

We will perform grep commands on the source files, and we will traverse node in Text-Fabric and collect information.

Then we compare these sets of information.

Docs¶

There is some documentation about the checking software itself:

Utility API

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

import os
import collections
import re

from tf.app import use
from utils import Compare

In [3]:

A = use("Nino-cunei/oldbabylonian", hoist=globals(), lgc=True)

BASE = os.path.expanduser("~/github")

SOURCE_VERSION = "0.3"
SOURCE_DIR = f"{BASE}/{A.org}/{A.repo}/sources/cdli/transcriptions/{SOURCE_VERSION}"
SOURCE_FILES = """
  AbB-primary
  AbB-secondary
""".strip().split()

TEMP_DIR = f"{BASE}/_temp"

Using TF app oldbabylonian in /Users/dirk/github/annotation/app-oldbabylonian/code
Using Nino-cunei/oldbabylonian/tf - 1.0.4 in /Users/dirk/github

Documentation: OLDBABYLONIAN Character table Feature docs oldbabylonian API Text-Fabric API 7.5.1 Search Reference

Loaded features:

Old Babylonian Letters 1900-1600: Cuneiform tablets : ARK after afterr afteru atf atfpost atfpre author col collated collection comment damage det docnote docnumber excavation excised face flags fraction genre grapheme graphemer graphemeu lang langalt ln lnc lnno material missing museumcode museumname object operator operatorr operatoru otype period pnumber primecol primeln pubdate question reading readingr readingu remarkable remarks repeat srcLn srcLnNum srcfile subgenre supplied sym symr symu trans transcriber translation@ll type uncertain volume oslots

In [4]:

COMP = Compare(TF.api, SOURCE_DIR, SOURCE_FILES, TEMP_DIR)

FACES:
	
	bottom
	case
	case - lower edge
	case - obverse
	case - reverse
	case - seal
	envelope
	envelope - obverse
	envelope - reverse
	envelope - seal 1
	eyestone - surface a
	left
	left edge
	left side
	lower edge
	obverse
	reverse
	seal
	seal 1
	seal 2
	upper edge
EMPTY TABLETS (0):

Character usage¶

We make an inventory of all characters that occur on an ATF line in transcribed material.

In [5]:

transRe = re.compile(r"""^([0-9a-zA-Z']*)\.\s+(.+)$""")
trimRe = re.compile(r"""\s\s+""")

prime = "'"
times = "×"
div = "÷"
quad = "|"

clusterChars = (
    ("┌", "┐", "_", "_", "langalt"),
    ("◀", "▶", "{", "}", "det"),
    ("∈", "∋", "(", ")", "uncertain"),
    ("〖", "〗", "[", "]", "missing"),
    ("«", "»", "<<", ">>", "excised"),
    ("⊂", "⊃", "<", ">", "supplied"),
)

clusterType = {x[0]: x[4] for x in clusterChars}
clusterTypeInfo = {x[4]: x[0:-1] for x in clusterChars}

clusterB = {c[0] for c in clusterChars}
clusterE = {c[1] for c in clusterChars}
clusterA = clusterB | clusterE
clusterOB = {c[2] for c in clusterChars}
clusterOE = {c[3] for c in clusterChars}
clusterOA = clusterOB | clusterOE
clusterBstr = "".join(sorted(clusterB))
clusterEstr = "".join(sorted(clusterE))
clusterAstr = "".join(sorted(clusterA))

flaggingStr = "!?*#"
flagging = set(flaggingStr)

separatorStr = "-"
separator = set(separatorStr)

ellips = "…"
unknownStr = "xXnN"
unknown = set(unknownStr) | {ellips}

lowerLetterStr = "abcdefghijklmnopqrstuvwyz"
upperLetterStr = lowerLetterStr.upper()
lowerLetter = set(lowerLetterStr)
upperLetter = set(upperLetterStr)

digitStr = "0123456789"
digit = set(digitStr) | {div}

emph_s = "ş"
emph_S = "Ş"
emph_t = "ţ"
emph_T = "Ţ"

emphatic = {emph_s, emph_S, emph_t, emph_T}


def emphRepl(x):
    return (
        x.replace("s,", emph_s)
        .replace("S,", emph_S)
        .replace("t,", emph_t)
        .replace("T,", emph_T)
    )


inlineCommentRe = re.compile(r"""\(\$.*?\$\)""")

operatorStr = f".+/:{times}"
operator = set(operatorStr)

divRe = re.compile(r"""([0-9])/([0-9])""")


def divRepl(match):
    return f"{match.group(1)}{div}{match.group(2)}"

In [6]:

seen = collections.Counter()

for (srcfile, document, face, column, ln, line) in COMP.readCorpora():
    match = transRe.match(line)
    if not match:
        continue
    trans = match.group(2)

    trans = inlineCommentRe.sub("", trans)
    trans = trans.replace("...", ellips)
    trans = trans.replace("x(", times)
    trans = emphRepl(trans)
    trans = divRe.sub(divRepl, trans)

    words = trans.split()
    for word in words:
        for (i, c) in enumerate(word):
            seen[c] += 1

In [7]:

allChars = collections.defaultdict(dict)

for (c, amount) in seen.items():
    if c in lowerLetter:
        allChars["lower"][c] = amount
    elif c in unknown:
        allChars["unknown"][c] = amount
    elif c in upperLetter:
        allChars["upper"][c] = amount
    elif c in digit:
        allChars["digit"][c] = amount
    elif c in emphatic:
        allChars["emphatic"][c] = amount
    elif c == prime:
        allChars["prime"][c] = amount
    elif c == quad:
        allChars["quad"][c] = amount
    elif c in flagging:
        allChars["flagging"][c] = amount
    elif c in separator:
        allChars["separator"][c] = amount
    elif c in operator:
        allChars["operator"][c] = amount
    elif c in clusterOA:
        allChars["cluster"][c] = amount
    else:
        allChars["rest"][c] = amount

for (kind, data) in sorted(allChars.items()):
    print(f"{kind}:")
    for (c, amount) in sorted(
        data.items(),
        key=lambda x: (-x[1], x[0]),
    ):
        print(f"\t{c:<1} {amount:>6}")

Documents¶

Document language¶

Here are the document languages, according to the #atf:lang meta tags:

In [8]:

for (c, amount) in F.lang.freqList():
    print(f"{c} {amount:>6} x")

akk   1283 x
sux      2 x

Document collection/volume/number/note¶

In the ATF source, after the line with the P-number (&P...) there is additional identification, usually in the form collection volume, number note.

We give an overview of the collections in which the documents of this corpus are found, and we list the notes, which are really the irregular parts of the identification.

We will not check the TF values with the GREP values for this part of the document identification.

In [9]:

for (c, amount) in F.collection.freqList():
    print(f"{c:<8} {amount:>6} x")

AbB         492 x
CT          241 x
VS          218 x
YOS         108 x
TCL         105 x
LIH          77 x
YNER         16 x
TLB          10 x
BIN           7 x
OECT          3 x
AJSL          1 x
CT43,         1 x
JCS           1 x
LFBD          1 x
RA            1 x
RIME          1 x
abb           1 x

In [10]:

for (c, amount) in F.docnote.freqList():
    print(f"{c:<8} {amount:>6} x")

37 BM 097815      1 x
AO 21105      1 x
BM 012819      1 x
BM 023357      1 x
BM 023823      1 x
BM 025693      1 x
BM 027780      1 x
BM 028435      1 x
BM 028436      1 x
BM 028444      1 x
BM 028447      1 x
BM 028457      1 x
BM 028473      1 x
BM 028474      1 x
BM 028475      1 x
BM 028476      1 x
BM 028508      1 x
BM 028510      1 x
BM 028531      1 x
BM 028558      1 x
BM 028559      1 x
BM 028588      1 x
BM 028840      1 x
BM 029655      1 x
BM 040037      1 x
BM 078214      1 x
BM 080186      1 x
BM 080329      1 x
BM 080340      1 x
BM 080354      1 x
BM 080410      1 x
BM 080484      1 x
BM 080558      1 x
BM 080594      1 x
BM 080612      1 x
BM 080616      1 x
BM 080685      1 x
BM 080723      1 x
BM 080797      1 x
BM 080802      1 x
BM 080816      1 x
BM 080840      1 x
BM 080878      1 x
BM 080885      1 x
BM 080897      1 x
BM 080947      1 x
BM 081095      1 x
BM 087395      1 x
BM 096604      1 x
BM 096608      1 x
BM 096629      1 x
BM 097031      1 x
BM 097040      1 x
BM 097050      1 x
BM 097098      1 x
BM 097115      1 x
BM 097130      1 x
BM 097274      1 x
BM 097325      1 x
BM 097347      1 x
BM 097405      1 x
BM 097675      1 x
BM 097686      1 x
BM 097693      1 x
BM 097816      1 x
BM 100117      1 x
BM 103848      1 x
Bu 1888-05-12, 0184      1 x
Bu 1888-05-12, 0278      1 x
Bu 1888-05-12, 0323      1 x
Bu 1888-05-12, 0329      1 x
Bu 1888-05-12, 0333      1 x
Bu 1888-05-12, 0342      1 x
Bu 1888-05-12, 0505      1 x
Bu 1888-05-12, 0568      1 x
Bu 1888-05-12, 0581      1 x
Bu 1888-05-12, 0598      1 x
Bu 1888-05-12, 0602      1 x
Bu 1888-05-12, 0607      1 x
Bu 1888-05-12, 0621      1 x
Bu 1888-05-12, 0638      1 x
Bu 1888-05-12, 200      1 x
Bu 1888-05-12, 207      1 x
Bu 1888-05-12, 212      1 x
Bu 1891-05-09, 0279      1 x
Bu 1891-05-09, 0315      1 x
Bu 1891-05-09, 0370      1 x
Bu 1891-05-09, 0383      1 x
Bu 1891-05-09, 0413      1 x
Bu 1891-05-09, 0418      1 x
Bu 1891-05-09, 0468      1 x
Bu 1891-05-09, 0534      1 x
Bu 1891-05-09, 0579a      1 x
Bu 1891-05-09, 0585      1 x
Bu 1891-05-09, 0587      1 x
Bu 1891-05-09, 0790      1 x
Bu 1891-05-09, 1154      1 x
Bu 1891-05-09, 2185      1 x
Bu 1891-05-09, 2187      1 x
Bu 1891-05-09, 2194      1 x
Bu 1891-05-09, 290      1 x
Bu 1891-05-09, 294      1 x
Bu 1891-05-09, 325      1 x
Bu 1891-05-09, 354      1 x
Fs Landsberger 235      1 x
ex. 01        1 x
no. 2         1 x
pp. 980191 no. 1      1 x

We check whether we have the same sequence of document numbers. In TF, the document number is stored in the feature pnumber.

Note that we also check on the order of the documents.

In [11]:

def tfDocuments():
    documents = []
    for t in F.otype.s("document"):
        (document,) = T.sectionFromNode(t)
        documents.append((F.srcfile.v(t), document, F.srcLnNum.v(t), F.pnumber.v(t)))
    return documents


def grepDocuments(gen):
    documents = []
    prevTablet = None
    for (srcFile, document, face, column, srcLnNum, srcLn) in gen:
        if document != prevTablet:
            documents.append((srcFile, document, srcLnNum, document))
        prevTablet = document
    return documents

In [12]:

COMP.checkSanity(
    ("tablet",),
    grepDocuments,
    tfDocuments,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ tablet
IDENTICAL: all 1285 items
=    : AbB-primary ◆ P509373 ◆ 27 ◆ P509373
=    : AbB-primary ◆ P509374 ◆ 96 ◆ P509374
=    : AbB-primary ◆ P509375 ◆ 147 ◆ P509375
=    : AbB-primary ◆ P509376 ◆ 196 ◆ P509376
=    : AbB-primary ◆ P509377 ◆ 250 ◆ P509377
=    : AbB-primary ◆ P507628 ◆ 309 ◆ P507628
=    : AbB-primary ◆ P481190 ◆ 349 ◆ P481190
=    : AbB-primary ◆ P481191 ◆ 392 ◆ P481191
=    : AbB-primary ◆ P481192 ◆ 443 ◆ P481192
=    : AbB-primary ◆ P389958 ◆ 508 ◆ P389958
=    : AbB-primary ◆ P389256 ◆ 552 ◆ P389256
=    : AbB-primary ◆ P510526 ◆ 7593 ◆ P510526
=    : AbB-primary ◆ P510527 ◆ 7643 ◆ P510527
=    : AbB-primary ◆ P510528 ◆ 7708 ◆ P510528
=    : AbB-primary ◆ P510529 ◆ 7753 ◆ P510529
=    : AbB-primary ◆ P510530 ◆ 7805 ◆ P510530
=    : AbB-primary ◆ P510531 ◆ 7879 ◆ P510531
=    : AbB-primary ◆ P510532 ◆ 7931 ◆ P510532
=    : AbB-primary ◆ P510533 ◆ 7984 ◆ P510533
=    : AbB-primary ◆ P510534 ◆ 8032 ◆ P510534
=     and 1265 more
Number of results: TF 1285; GREP 1285

Out[12]:

True

Faces¶

Objects¶

First we count on which kind of objects the faces occur.

In [13]:

for (obj, amount) in F.object.freqList():
    print(f"{obj:<10} {amount:>5} x")

tablet      2778 x
envelope      43 x
case          12 x
eyestone       1 x

We check whether we see the same faces with GREP and TF.

In [14]:

def tfFaces():
    faces = []
    for document in F.otype.s("document"):
        documentName = F.pnumber.v(document)
        srcfile = F.srcfile.v(document)
        for face in L.d(document, otype="face"):
            typ = F.face.v(face)
            firstLine = L.d(face, otype="line")[0]
            ln = F.srcLnNum.v(firstLine)
            faces.append((srcfile, documentName, ln, typ))
    return faces

In [15]:

def grepFaces(gen):
    faces = []
    prevDocument = None
    prevFace = None
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        if face is None or (prevDocument == document and prevFace == face):
            continue
        faces.append((srcfile, document, srcLnNum, face))
        prevDocument = document
        prevFace = face
    return faces

In [16]:

COMP.checkSanity(
    ("face",),
    grepFaces,
    tfFaces,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ face
IDENTICAL: all 2834 items
=    : AbB-primary ◆ P509373 ◆ 31 ◆ obverse
=    : AbB-primary ◆ P509373 ◆ 48 ◆ reverse
=    : AbB-primary ◆ P509374 ◆ 100 ◆ obverse
=    : AbB-primary ◆ P509374 ◆ 117 ◆ reverse
=    : AbB-primary ◆ P509375 ◆ 151 ◆ obverse
=    : AbB-primary ◆ P509375 ◆ 156 ◆ reverse
=    : AbB-primary ◆ P509376 ◆ 200 ◆ obverse
=    : AbB-primary ◆ P509376 ◆ 212 ◆ reverse
=    : AbB-primary ◆ P509377 ◆ 254 ◆ obverse
=    : AbB-primary ◆ P509377 ◆ 268 ◆ reverse
=    : AbB-primary ◆ P507628 ◆ 313 ◆ obverse
=    : AbB-primary ◆ P507628 ◆ 321 ◆ reverse
=    : AbB-primary ◆ P481190 ◆ 353 ◆ obverse
=    : AbB-primary ◆ P481190 ◆ 361 ◆ reverse
=    : AbB-primary ◆ P481191 ◆ 396 ◆ obverse
=    : AbB-primary ◆ P481191 ◆ 406 ◆ reverse
=    : AbB-primary ◆ P481191 ◆ 413 ◆ seal 1
=    : AbB-primary ◆ P481192 ◆ 447 ◆ obverse
=    : AbB-primary ◆ P481192 ◆ 464 ◆ reverse
=    : AbB-primary ◆ P389958 ◆ 512 ◆ obverse
=     and 2814 more
Number of results: TF 2834; GREP 2834

Out[16]:

True

Columns and lines¶

We check whether we see the same column and line numbers with GREP and TF.

In [17]:

def tfLines():
    lines = []
    for document in F.otype.s("document"):
        documentName = F.pnumber.v(document)
        srcfile = F.srcfile.v(document)
        for face in L.d(document, otype="face"):
            typ = F.face.v(face)
            for line in L.d(face, otype="line"):
                srcLn = F.srcLnNum.v(line)
                ln = str(F.ln.v(line) or F.lnc.v(line))
                if F.primeln.v(line):
                    ln += "'"
                col = str(F.col.v(line) or "")
                if F.primecol.v(line):
                    col += "'"
                lines.append((srcfile, documentName, srcLn, typ, col, ln))
    return lines

In [18]:

def grepLines(gen):
    lines = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        if face is None or column is None:
            continue
        isComment = srcLn.startswith("$")
        if isComment:
            ln = srcLn[0]
        else:
            match = transRe.match(srcLn)
            if not match:
                continue
            ln = match.group(1)
        lines.append((srcfile, document, srcLnNum, face, column, ln))
    return lines

In [19]:

COMP.checkSanity(
    ("face", "column", "atf lineno"),
    grepLines,
    tfLines,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ face ◆ column ◆ atf lineno
IDENTICAL: all 27375 items
=    : AbB-primary ◆ P509373 ◆ 31 ◆ obverse ◆  ◆ 1
=    : AbB-primary ◆ P509373 ◆ 32 ◆ obverse ◆  ◆ 2
=    : AbB-primary ◆ P509373 ◆ 33 ◆ obverse ◆  ◆ 3
=    : AbB-primary ◆ P509373 ◆ 34 ◆ obverse ◆  ◆ 4
=    : AbB-primary ◆ P509373 ◆ 35 ◆ obverse ◆  ◆ 5
=    : AbB-primary ◆ P509373 ◆ 36 ◆ obverse ◆  ◆ 6
=    : AbB-primary ◆ P509373 ◆ 37 ◆ obverse ◆  ◆ 7
=    : AbB-primary ◆ P509373 ◆ 38 ◆ obverse ◆  ◆ 8
=    : AbB-primary ◆ P509373 ◆ 39 ◆ obverse ◆  ◆ 9
=    : AbB-primary ◆ P509373 ◆ 40 ◆ obverse ◆  ◆ 10
=    : AbB-primary ◆ P509373 ◆ 41 ◆ obverse ◆  ◆ 11
=    : AbB-primary ◆ P509373 ◆ 42 ◆ obverse ◆  ◆ 12
=    : AbB-primary ◆ P509373 ◆ 43 ◆ obverse ◆  ◆ 13
=    : AbB-primary ◆ P509373 ◆ 44 ◆ obverse ◆  ◆ 14
=    : AbB-primary ◆ P509373 ◆ 45 ◆ obverse ◆  ◆ 15
=    : AbB-primary ◆ P509373 ◆ 46 ◆ obverse ◆  ◆ $
=    : AbB-primary ◆ P509373 ◆ 48 ◆ reverse ◆  ◆ $
=    : AbB-primary ◆ P509373 ◆ 49 ◆ reverse ◆  ◆ 1'
=    : AbB-primary ◆ P509373 ◆ 50 ◆ reverse ◆  ◆ 2'
=    : AbB-primary ◆ P509373 ◆ 51 ◆ reverse ◆  ◆ 3'
=     and 27355 more
Number of results: TF 27375; GREP 27375

Out[19]:

True

Remarks¶

Remarks are marked by the # character in lines that are not metadata following the document header. The criterion for a line starting with # to be a comment is that it has a space after the #.

There are also translation lines, starting with #tr.en, but we do not deal with those here.

In [20]:

def tfRemarks():
    remarks = []
    for ln in F.otype.s("line"):
        rmks = F.remarks.v(ln)
        if rmks:
            for (i, rmk) in enumerate(rmks.split("\n")):
                remarks.append((F.srcfile.v(ln), F.srcLnNum.v(ln) + i + 1, rmk))
    return remarks

In [21]:

def grepRemarks(gen):
    remarks = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        isRemark = srcLn.startswith("#") and len(srcLn) > 1 and srcLn[1] == " "
        if isRemark:
            remarks.append((srcfile, srcLnNum, srcLn[1:].strip()))
    return remarks

In [22]:

COMP.checkSanity(
    ("remark",),
    grepRemarks,
    tfRemarks,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ remark
IDENTICAL: all 12 items
=    : AbB-secondary ◆ 11849 ◆ word (li-ba-al-li-t,u2-ka) divided over two lines
=    : AbB-secondary ◆ 12535 ◆ word (i-li-ka-am) divided over two lines
=    : AbB-secondary ◆ 15552 ◆ reading i-ba-al-lu-ut, proposed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 15555 ◆ reading szi-'i-it-sa3 proposed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 15559 ◆ reading tu-ut-t,i-bi-ma following Von Soden BiOr 23 55
=    : AbB-secondary ◆ 15573 ◆ reading ma-s,a-ra-am proposed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 15575 ◆ reading a-hu-ki propsed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 15577 ◆ reading ki-i ne-em-szi-ma propsed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 15582 ◆ reconstruction of this line propsed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 16226 ◆ reading szu-ku-si propsed by Von Soden BiOr 23 55
=    : AbB-secondary ◆ 68946 ◆ reading la-mi! proposed by Von Soden BiOr 39 590
=    : AbB-secondary ◆ 69030 ◆ reading la us2-su2-ka-tim proposed by Von Soden BiOr 39 590
=     no more items
Number of results: TF 12; GREP 12

Out[22]:

True

Translations¶

Translations are marked by the # character in lines that are not metadata following the document header. The # must be immediately followed by tr.language code: and the translation comes after that (with white space in between).

In [23]:

languages = [t[12:] for t in Fall() if t.startswith("translation@")]
print(f'languages: {", ".join(languages)}')


def tfTrans():
    trans = []
    for ln in F.otype.s("line"):
        for lc in languages:
            trs = Fs(f"translation@{lc}").v(ln)
            if trs:
                trans.append((F.srcfile.v(ln), F.srcLnNum.v(ln) + 1, lc, trs))
    return trans

languages: en

In [24]:

def grepTrans(gen):
    trans = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        isTrans = srcLn.startswith("#tr.")
        if isTrans:
            parts = srcLn[4:].split(":", 1)
            if len(parts) > 1:
                lc = parts[0].strip()
                trs = parts[1].strip()
            trans.append((srcfile, srcLnNum, lc, trs))
    return trans

In [25]:

COMP.checkSanity(
    (
        "language",
        "translation",
    ),
    grepTrans,
    tfTrans,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ language ◆ translation
IDENTICAL: all 134 items
=    : AbB-secondary ◆ 27139 ◆ en ◆ To Šamaš-ḫazir
=    : AbB-secondary ◆ 27141 ◆ en ◆ speak,
=    : AbB-secondary ◆ 27143 ◆ en ◆ thus Hammurapi:
=    : AbB-secondary ◆ 27145 ◆ en ◆ Ilī-ippalsam, the shepherd,
=    : AbB-secondary ◆ 27147 ◆ en ◆ thus informed me, as follows that one:
=    : AbB-secondary ◆ 27149 ◆ en ◆ A field of 3 bur3, which through a sealed document of my lord
=    : AbB-secondary ◆ 27151 ◆ en ◆ was given (lit. sealed) to me—
=    : AbB-secondary ◆ 27153 ◆ en ◆ 4 years ago Etel-pî-Marduk took it away from me, and
=    : AbB-secondary ◆ 27155 ◆ en ◆ its barley regularly takes.
=    : AbB-secondary ◆ 27157 ◆ en ◆ Further, Sîn-iddinam I informed,
=    : AbB-secondary ◆ 27159 ◆ en ◆ but it was not returned to me;
=    : AbB-secondary ◆ 27161 ◆ en ◆ Thus he (Ilī-ippalsam) informed me.
=    : AbB-secondary ◆ 27163 ◆ en ◆ To Sîn-iddinam I (now) have written.
=    : AbB-secondary ◆ 27165 ◆ en ◆ If, as that Ilī-ippalsam
=    : AbB-secondary ◆ 27167 ◆ en ◆ said,
=    : AbB-secondary ◆ 27169 ◆ en ◆ a field of 3 bur3, which in the palace
=    : AbB-secondary ◆ 27171 ◆ en ◆ was given (lit. sealed) to him,
=    : AbB-secondary ◆ 27174 ◆ en ◆ Etel-pî-Marduk 4 years ago took away, and
=    : AbB-secondary ◆ 27176 ◆ en ◆ is ‘eating,′
=    : AbB-secondary ◆ 27178 ◆ en ◆ then a more sickening case
=     and 114 more
Number of results: TF 134; GREP 134

Out[25]:

True

Line comments¶

Comments are marked by the $ character at the start of a line.

We have also inline comments, shaped as ($ $) but we do not deal with them here. Inline comments are treated under signs.

In [26]:

def tfComments():
    comments = []
    for ln in F.otype.s("line"):
        if not F.lnc.v(ln):
            continue
        comment = F.comment.v(L.d(ln, otype="sign")[0])
        if comment:
            ln = F.lnc.v(ln)
            comments.append((F.srcfile.v(ln), F.srcLnNum.v(ln), ln, comment))
    return comments

In [27]:

def grepComments(gen):
    comments = []
    for (srcfile, document, face, column, ln, line) in gen:
        isComment = line.startswith("$")
        if isComment:
            comments.append((srcfile, ln, line[0], line[1:].strip()))
    return comments

In [28]:

COMP.checkSanity(
    ("comment",),
    grepComments,
    tfComments,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ comment
IDENTICAL: all 969 items
=    : AbB-primary ◆ 46 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 48 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 154 ◆ $ ◆ rest missing
=    : AbB-primary ◆ 319 ◆ $ ◆ blank space
=    : AbB-primary ◆ 321 ◆ $ ◆ blank space
=    : AbB-primary ◆ 447 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 462 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 464 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 480 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 512 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 556 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 562 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 564 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 568 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 8049 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 8051 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ 8180 ◆ $ ◆ single ruling
=    : AbB-primary ◆ 8222 ◆ $ ◆ blank space
=    : AbB-primary ◆ 8441 ◆ $ ◆ rest broken
=    : AbB-primary ◆ 8554 ◆ $ ◆ single ruling
=     and 949 more
Number of results: TF 969; GREP 969

Out[28]:

True

Metadata¶

Metadata comes from lines starting with a # without a space following the #.

We have found metadata for language, translation (English) and comments to the contents of lines.

The language is specified for documents, the translation for lines.

In [29]:

def tfMetas():
    metas = []
    for d in F.otype.s("document"):
        lang = F.lang.v(d)
        documentName = F.pnumber.v(d)
        srcfile = F.srcfile.v(d)
        if lang:
            srcLn = F.srcLnNum.v(d)
            metas.append((srcfile, documentName, srcLn + 1, f"atf: lang = {lang}"))
        for ln in L.d(d, otype="line"):
            trans = Fs("translation@en").v(ln)
            if trans:
                srcLn = F.srcLnNum.v(ln)
                metas.append((srcfile, documentName, srcLn + 1, f"tr.en:  = {trans}"))
    return metas

In [30]:

def grepMetas(gen):
    metas = []
    for (srcfile, document, face, column, ln, line) in gen:
        if line.startswith("#") and len(line) > 1 and line[1] != " ":
            if line.startswith("#atf:l"):
                line = "#atf: l" + line[6:]
            fields = line[1:].split(maxsplit=1)
            nFields = len(fields)
            if nFields == 1:
                key = fields[0]
                feat = ""
                val = ""
            else:
                (key, val) = fields
                feat = ""
                if key == "atf:":
                    fields = val.split(maxsplit=1)
                    nFields = len(fields)
                    if nFields == 2:
                        (feat, val) = fields
            if val.startswith("="):
                val = val[1:].strip()
            metas.append((srcfile, document, ln, f"{key} {feat} = {val}"))
    return metas

In [31]:

COMP.checkSanity(
    ("comment",),
    grepMetas,
    tfMetas,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ comment
IDENTICAL: all 1419 items
=    : AbB-primary ◆ P509373 ◆ 28 ◆ atf: lang = akk
=    : AbB-primary ◆ P509374 ◆ 97 ◆ atf: lang = akk
=    : AbB-primary ◆ P509375 ◆ 148 ◆ atf: lang = akk
=    : AbB-primary ◆ P509376 ◆ 197 ◆ atf: lang = akk
=    : AbB-primary ◆ P509377 ◆ 251 ◆ atf: lang = akk
=    : AbB-primary ◆ P507628 ◆ 310 ◆ atf: lang = akk
=    : AbB-primary ◆ P481190 ◆ 350 ◆ atf: lang = akk
=    : AbB-primary ◆ P481191 ◆ 393 ◆ atf: lang = akk
=    : AbB-primary ◆ P481192 ◆ 444 ◆ atf: lang = akk
=    : AbB-primary ◆ P389958 ◆ 509 ◆ atf: lang = akk
=    : AbB-primary ◆ P389256 ◆ 553 ◆ atf: lang = akk
=    : AbB-primary ◆ P510526 ◆ 7594 ◆ atf: lang = akk
=    : AbB-primary ◆ P510527 ◆ 7644 ◆ atf: lang = akk
=    : AbB-primary ◆ P510528 ◆ 7709 ◆ atf: lang = akk
=    : AbB-primary ◆ P510529 ◆ 7754 ◆ atf: lang = akk
=    : AbB-primary ◆ P510530 ◆ 7806 ◆ atf: lang = akk
=    : AbB-primary ◆ P510531 ◆ 7880 ◆ atf: lang = akk
=    : AbB-primary ◆ P510532 ◆ 7932 ◆ atf: lang = akk
=    : AbB-primary ◆ P510533 ◆ 7985 ◆ atf: lang = akk
=    : AbB-primary ◆ P510534 ◆ 8033 ◆ atf: lang = akk
=     and 1399 more
Number of results: TF 1419; GREP 1419

Out[31]:

True

Line contents¶

We check whether the contents of lines after the number can be reproduced by means of TF features

There are two ways to do that:

using the feature scrLn
using T.text()

By the feature `srcLn`¶

This way is rather trivial. But it is applicable to all lines, also comment lines. We also pick up remarks, but not translations.

In [32]:

def tfLineContents():
    lines = []
    for document in F.otype.s("document"):
        documentName = F.pnumber.v(document)
        srcfile = F.srcfile.v(document)
        for line in L.d(document, otype="line"):
            srcLnNum = F.srcLnNum.v(line)
            srcLn = F.srcLn.v(line)
            lines.append((srcfile, documentName, srcLnNum, srcLn))
            remarks = F.remarks.v(line)
            if remarks:
                for (i, remark) in enumerate(remarks.split("\n")):
                    lines.append(
                        (srcfile, documentName, srcLnNum + i + 1, f"# {remark}")
                    )
    return lines

In [33]:

structureChars = set("&@")


def grepLineContents(gen):
    lines = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        if (
            not srcLn
            or srcLn[0] in structureChars
            or (srcLn[0] == "#" and len(srcLn) > 1 and srcLn[1] != " ")
        ):
            continue
        lines.append((srcfile, document, srcLnNum, srcLn))
    return lines

In [34]:

COMP.checkSanity(
    ("contents",),
    grepLineContents,
    tfLineContents,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ contents
IDENTICAL: all 27387 items
=    : AbB-primary ◆ P509373 ◆ 31 ◆ 1. [a-na] _{d}suen_-i-[din-nam]
=    : AbB-primary ◆ P509373 ◆ 32 ◆ 2. qi2-bi2-[ma]
=    : AbB-primary ◆ P509373 ◆ 33 ◆ 3. um-ma _{d}en-lil2_-sza-du-u2-ni-ma
=    : AbB-primary ◆ P509373 ◆ 34 ◆ 4. _{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim]
=    : AbB-primary ◆ P509373 ◆ 35 ◆ 5. li-ba-al-li-t,u2-u2-ka
=    : AbB-primary ◆ P509373 ◆ 36 ◆ 6. {disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_
=    : AbB-primary ◆ P509373 ◆ 37 ◆ 7. ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma]
=    : AbB-primary ◆ P509373 ◆ 38 ◆ 8. {disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim]
=    : AbB-primary ◆ P509373 ◆ 39 ◆ 9. 2(esze3) _a-sza3_ s,i-[bi]-it {disz}[ku]-un-zu-lum _sza3-gud_
=    : AbB-primary ◆ P509373 ◆ 40 ◆ 10. _a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki}
=    : AbB-primary ◆ P509373 ◆ 41 ◆ 11. sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum
=    : AbB-primary ◆ P509373 ◆ 42 ◆ 12. u3 i-na _uru_ x-szum{ki} sza-ak-nu id-di-a-am-ma
=    : AbB-primary ◆ P509373 ◆ 43 ◆ 13. 2(esze3) _a-sza3 szuku_ i-li-ib-bu s,i-bi-it _nagar-mesz_
=    : AbB-primary ◆ P509373 ◆ 44 ◆ 14. _a-sza3 a-gar3 uru_ ra-bu-um x [...]
=    : AbB-primary ◆ P509373 ◆ 45 ◆ 15. x x x x x x [...]
=    : AbB-primary ◆ P509373 ◆ 46 ◆ $ rest broken
=    : AbB-primary ◆ P509373 ◆ 48 ◆ $ beginning broken
=    : AbB-primary ◆ P509373 ◆ 49 ◆ 1'. [x x] x x [...]
=    : AbB-primary ◆ P509373 ◆ 50 ◆ 2'. [x x] x [...]
=    : AbB-primary ◆ P509373 ◆ 51 ◆ 3'. [x x] x s,i-bi-it _gir3-se3#-ga#_
=     and 27367 more
Number of results: TF 27387; GREP 27387

Out[34]:

True

By the method `T.text()`¶

We apply the T.text() method on each line, using the default text format text-orig-full. The method will walk over all signs on the line, and represent each sign by means of the feature atf plus some auxiliary features such as

atfpre and atfpost (for cluster characters preceding and following the sign reading),
after (for separator characters after the sign: -, :, /, , or the empty string)

We only compare lines containing transcribed material: numbered lines in the source.

Workarounds¶

In rare cases some clusters start or end with a space or a hyphen, where the input had rather been encoded with that space of hyphen just outside the cluster.

We work around them, and we check whether we have encountered all listed workarounds.

In [35]:

def tfLineText():
    lines = []
    for document in F.otype.s("document"):
        documentName = F.pnumber.v(document)
        srcfile = F.srcfile.v(document)
        for line in L.d(document, otype="line"):
            if F.lnc.v(line):
                continue
            face = F.face.v(L.u(line, otype="face")[0])
            srcLnNum = F.srcLnNum.v(line)
            primeLn = prime if F.primeln.v(line) else ""
            ln = F.ln.v(line)
            text = T.text(line)
            lines.append(
                (srcfile, documentName, face, srcLnNum, f"{ln}{primeLn}. {text}")
            )
    return lines

In [36]:

def methodB1(x):
    return x.replace("_-", "-_")


def methodB2(x):
    return x.replace("[-", "-[")


def methodE1(x):
    return x.replace("-]", "]-")


workarounds = {
    ("P313391", "reverse", "5"): methodB1,
    ("P312032", "reverse", "12"): methodB2,
    ("P345563", "obverse", "4"): methodE1,
    ("P305773", "reverse", "1"): methodE1,
}

workaroundsApplied = set()


def initWorkarounds():
    workaroundsApplied.clear()


def checkWorkarounds(document, face, ln, srcLn):
    if (document, face, ln) in workarounds:
        workaroundsApplied.add((document, face, ln))
        method = workarounds[(document, face, ln)]
        srcLn = method(srcLn)
        print(f'workaround applied: "{srcLn}"')
    return srcLn


def finishWorkarounds():
    if set(workarounds) == workaroundsApplied:
        print(f"ALL {len(workarounds)} WORKAROUNDS APPLIED")
    else:
        print("UNAPPLIED WORKAROUNDS:")
        for (document, face, ln) in sorted(set(workarounds) - workaroundsApplied):
            print(f"\t{document} {face}:{ln}")

In [37]:

def grepLineText(gen):
    lines = []
    initWorkarounds()
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        ln = match.group(1)
        srcLn = trimRe.sub(" ", srcLn)
        srcLn = checkWorkarounds(document, face, ln, srcLn)
        lines.append((srcfile, document, face, srcLnNum, srcLn))
    finishWorkarounds()
    return lines

In [38]:

COMP.checkSanity(
    ("contents",),
    grepLineText,
    tfLineText,
)

workaround applied: "5. 1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "4. ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "1. [1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "12. _iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ contents
IDENTICAL: all 26406 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ 1. [a-na] _{d}suen_-i-[din-nam]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ 2. qi2-bi2-[ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ 3. um-ma _{d}en-lil2_-sza-du-u2-ni-ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ 4. _{d}utu_ u3 _{d}[marduk]_ a-na da-ri-a-[tim]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 35 ◆ 5. li-ba-al-li-t,u2-u2-ka
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ 6. {disz}sze-ep-_{d}suen a2-gal2 [dumu] um-mi-a-mesz_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ 7. ki-a-am u2-lam-mi-da-an-ni um-[ma] szu-u2-[ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ 8. {disz}sa-am-su-ba-ah-li sza-pi2-ir ma-[tim]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 9. 2(esze3) _a-sza3_ s,i-[bi]-it {disz}[ku]-un-zu-lum _sza3-gud_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ 10. _a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 11. sza _{d}utu_-ha-zi-[ir] isz-tu _mu 7(disz) kam_ id-di-nu-szum
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ 12. u3 i-na _uru_ x-szum{ki} sza-ak-nu id-di-a-am-ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 13. 2(esze3) _a-sza3 szuku_ i-li-ib-bu s,i-bi-it _nagar-mesz_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ 14. _a-sza3 a-gar3 uru_ ra-bu-um x [...]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ 15. x x x x x x [...]
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ 1'. [x x] x x [...]
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ 2'. [x x] x [...]
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ 3'. [x x] x s,i-bi-it _gir3-se3#-ga#_
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 52 ◆ 4'. [x x] x x x-ir ub-lam
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 53 ◆ 5'. in-na-me-er-ma
=     and 26386 more
Number of results: TF 26406; GREP 26406

Out[38]:

True

Clusters¶

Clusters are groupings of signs. The transcription uses a variety of brackets for several kinds of clustering. Clusters may be nested. Clusters of different types need not be properly nested.

Usually, clusters do not start with an inter-word space or an inter-sign hyphen. But if they do, we work around them by pushing the offending space or hyphen out of the cluster.

See Workarounds above.

See the ORACC ATF docs

Most clusters are trivial: [...].

Cluster types¶

We count how much clusters we have of each type.

In [39]:

for (typ, amount) in F.type.freqList("cluster"):
    print(f"{typ:<15} {amount:>5} x")

langalt          7600 x
missing          7572 x
det              6794 x
uncertain        1183 x
supplied          231 x
excised            69 x

Alternate language¶

We count how much material is in the alternate language (Sumerian) and how much in the main language (Akkadian).

In [40]:

lang = collections.Counter()

altLang = dict(
    sux="akk",
    akk="sux",
)

skipTypes = {"empty", "comment", "ellipsis", "unknown"}
for d in F.otype.s("document"):
    docLang = F.lang.v(d)
    for s in L.d(d, otype="sign"):
        typ = F.type.v(s)
        if typ in skipTypes:
            continue
        signLang = altLang[docLang] if F.langalt.v(s) else docLang
        lang[signLang] += 1

for (ln, amount) in sorted(
    lang.items(),
    key=lambda x: (-x[1], x[0]),
):
    print(f"{ln} {amount:>6} signs")

akk 173823 signs
sux  19016 signs

Check by ATF¶

Now we check for each cluster whether the ATF of its material as delivered by TF is equal to the material that we get "directly" by GREPping.

Note however, that in order to GREP the clusters correctly, we have to do similar manipulations as what we did to generate the TF.

Clusters are not directly greppable, because:

the cluster characters may coincide with other usages of the same character: ( ) occurs in non-cluster constructs like rrr!(YYY), rrrx(YYY), 333(rrr)
the begin and end boundaries may be coded by the same character: _ _
clusters of one type may use boundary characters that also are used by clusters of another type: < > and << >>.

So we proceed by escaping all cluster characters first to fresh characters that have none of these problems.

In [41]:

def tfClusters():
    clusters = []
    for ln in F.otype.s("line"):
        lineClusters = []
        for c in L.d(ln, "cluster"):
            lineClusters.append((F.type.v(c), T.text(c)))
        if lineClusters:
            (document, face, line) = T.sectionFromNode(ln)
            srcfile = F.srcfile.v(ln)
            srcLnNum = F.srcLnNum.v(ln)
            lineClusters = [
                (srcfile, document, face, srcLnNum, typ, f'"{atf}"')
                for (typ, atf) in sorted(lineClusters)
            ]
            clusters.extend(lineClusters)
    return clusters

In [42]:

inlineCommentRe = re.compile(r"""\s*\(\$.*?\$\)\s*""")
noClusterRe = re.compile(r"""([0-9nx!])\(([A-Za-z0-9,/'#!?*+|.]+)\)""")

bChars = f"""[{clusterBstr}]*"""
eChars = f"""[{clusterEstr}#?!+*]*[ -]*"""


def noClusterRepl(match):
    return f"{match.group(1)}§§{match.group(2)}±±"


def noClusterRemove(text):
    return text.replace("§§", "(").replace("±±", ")")


def makeClusterEscRepl(cab, cae):
    def repl(match):
        return f"{cab}{match.group(2)}{cae}"

    return repl


clusterEscRe = {}
clusterFindRe = {}
clusterEscRepl = {}

for (cab, cae, cob, coe, ctp) in clusterChars:
    if cob == coe:
        clusterEscRe[cab] = re.compile(f"""({re.escape(cob)}(.*?){re.escape(coe)})""")
        clusterEscRepl[cab] = makeClusterEscRepl(cab, cae)
    clusterFindRe[cab] = re.compile(
        f"""{bChars}{re.escape(cab)}.+?{re.escape(cae)}{eChars}"""
    )


def clusterEsc(text):
    text = noClusterRe.sub(noClusterRepl, text)
    for (cab, cae, cob, coe, ctp) in clusterChars:
        if cob == coe:
            text = clusterEscRe[cab].sub(clusterEscRepl[cab], text)
        else:
            text = text.replace(cob, cab).replace(coe, cae)
    return text


def clusterUnesc(text):
    for (cab, cae, cob, coe, ctp) in clusterChars:
        text = text.replace(cab, cob).replace(cae, coe)
    text = noClusterRemove(text)
    return text


def grepClusters(gen):
    clusters = []
    initWorkarounds()
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        ln = match.group(1)
        srcLn = match.group(2)
        srcLn = checkWorkarounds(document, face, ln, srcLn)
        lineClusters = []
        srcLn = inlineCommentRe.sub("", srcLn)
        srcLn = trimRe.sub(" ", srcLn)
        srcLn = clusterEsc(srcLn)
        for (cab, cae, cob, coe, ctp) in clusterChars:
            css = clusterFindRe[cab].findall(srcLn)
            for cs in css:
                lineClusters.append((ctp, clusterUnesc(cs)))
        lineClusters = [
            (srcfile, document, face, srcLnNum, c, f'"{cs}"')
            for (c, cs) in sorted(lineClusters)
        ]
        clusters.extend(lineClusters)
    finishWorkarounds()

    return clusters

In [43]:

COMP.checkSanity(
    (
        "type",
        "cluster",
    ),
    grepClusters,
    tfClusters,
)

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ type ◆ cluster
IDENTICAL: all 23449 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ det ◆ "_{d}"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ langalt ◆ "_{d}suen_-"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ missing ◆ "[a-na] "
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ missing ◆ "[din-nam]"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ missing ◆ "[ma]"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ det ◆ "_{d}"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ langalt ◆ "_{d}en-lil2_-"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ det ◆ "_{d}"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ det ◆ "_{d}"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ langalt ◆ "_{d}[marduk]_ "
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ langalt ◆ "_{d}utu_ "
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ missing ◆ "[marduk]_ "
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ missing ◆ "[tim]"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ det ◆ "_{d}"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ det ◆ "{disz}"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ langalt ◆ "_{d}suen a2-gal2 [dumu] um-mi-a-mesz_"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ missing ◆ "[dumu] "
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ missing ◆ "[ma]"
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ missing ◆ "[ma] "
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ det ◆ "{disz}"
=     and 23429 more
Number of results: TF 23449; GREP 23449

Out[43]:

True

Check by cluster type feature¶

Every type of cluster corresponds to a sign feature of the same name that has value 1 for each sign that occurs in a cluster of that type.

Per cluster type, we check whether the list of signs inside a cluster corresponds with the signs that have the cluster type feature set to 1.

In [44]:

def clusterAtf(signs):
    atf = ""
    for (i, s) in enumerate(signs):
        atf += F.atfpre.v(s) or ""
        atf += F.atf.v(s)
        atf += F.atfpost.v(s) or ""
        atf += F.after.v(s) or ""
    return atf


def checkClusterType(cType, cB, cE):
    excluded = {"empty", "comment"}

    def getCluster(sign):
        if sign is None:
            return None
        clusters = L.u(sign, otype="cluster")
        cTarget = [cluster for cluster in clusters if F.type.v(cluster) == cType]
        return cTarget[0] if cTarget else None

    def tfClustersType():
        clusters = []
        for ln in F.otype.s("line"):
            if F.comment.v(ln):
                continue
            (document, face, line) = T.sectionFromNode(ln)
            srcfile = F.srcfile.v(ln)
            srcLnNum = F.srcLnNum.v(ln)
            prevS = None
            curCluster = []
            for s in L.d(ln, otype="sign"):
                sType = F.type.v(s)
                if sType in excluded:
                    continue
                isIn = Fs(cType).v(s)
                thisC = getCluster(s)
                prevC = getCluster(prevS)
                if thisC != prevC:
                    if curCluster:
                        clusters.append(
                            (srcfile, document, face, srcLnNum, clusterAtf(curCluster))
                        )
                        curCluster = []
                if isIn:
                    curCluster.append(s)
                prevS = s
            if curCluster:
                clusters.append(
                    (srcfile, document, face, srcLnNum, clusterAtf(curCluster))
                )
                curCluster = []
        return clusters

    def grepClustersType(gen):
        clusters = []
        initWorkarounds()
        for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
            match = transRe.match(srcLn)
            if not match:
                continue
            ln = match.group(1)
            srcLn = match.group(2)
            srcLn = checkWorkarounds(document, face, ln, srcLn)
            srcLn = inlineCommentRe.sub("", srcLn)
            srcLn = trimRe.sub(" ", srcLn)
            srcLn = clusterEsc(srcLn)
            (cab, cae, cob, coe) = clusterTypeInfo[cType]
            css = clusterFindRe[cab].findall(srcLn)
            for cs in css:
                clusters.append((srcfile, document, face, srcLnNum, clusterUnesc(cs)))
        finishWorkarounds()
        return clusters

    COMP.checkSanity(
        ("cluster",),
        grepClustersType,
        tfClustersType,
    )

`langalt _ _`¶

In [45]:

checkClusterType("langalt", "_", "_")

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ cluster
IDENTICAL: all 7600 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}suen_-
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}en-lil2_-
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}utu_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}[marduk]_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ _{d}suen a2-gal2 [dumu] um-mi-a-mesz_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ _a-sza3_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ _sza3-gud_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ _a-sza3 a-gar3_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ _uru_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _{d}utu_-
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _mu 7(disz) kam_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ _uru_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ _a-sza3 szuku_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ _nagar-mesz_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ _a-sza3 a-gar3 uru_ 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ _gir3-se3#-ga#_
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 54 ◆ _[a-sza3_ 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ _a-[sza3 a-gar3_ 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ _uru gan2_ 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 57 ◆ _a-sza3_ 
=     and 7580 more
Number of results: TF 7600; GREP 7600

`missing [ ]`¶

In [46]:

checkClusterType("missing", "[", "]")

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ cluster
IDENTICAL: all 7572 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [a-na] 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [din-nam]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ [ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ [marduk]_ 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ [tim]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ [dumu] 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ [ma] 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ [ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ [tim]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ [bi]-
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ [ku]-
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ [ma-lum] 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ [ir] 
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ [...]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ [...]
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ [x x] 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ [...]
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ [x x] 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ [...]
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ [x x] 
=     and 7552 more
Number of results: TF 7572; GREP 7572

`det { }`¶

In [47]:

checkClusterType("det", "{", "}")

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ cluster
IDENTICAL: all 6794 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ {disz}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ _{d}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ {disz}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ {disz}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ {ki}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ _{d}
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ {ki} 
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ {ki}
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 60 ◆ _{d}
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 103 ◆ _{d}
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 103 ◆ _{d}
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {disz}
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {d}
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ {ki} 
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 208 ◆ {d}
=    : AbB-primary ◆ P509376 ◆ reverse ◆ 220 ◆ {disz}
=     and 6774 more
Number of results: TF 6794; GREP 6794

`uncertain ( )`¶

In [48]:

checkClusterType("uncertain", "(", ")")

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ cluster
IDENTICAL: all 1183 items
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 460 ◆ (x)] 
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 466 ◆ [(x)]
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 469 ◆ (x)] 
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ (x)] 
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ [(x) 
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 477 ◆ (x)]
=    : AbB-primary ◆ P510529 ◆ reverse ◆ 7772 ◆ [(x)]
=    : AbB-primary ◆ P510530 ◆ obverse ◆ 7821 ◆ [(x)]
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7845 ◆ (x)] 
=    : AbB-primary ◆ P510531 ◆ obverse ◆ 7896 ◆ (x)] 
=    : AbB-primary ◆ P510531 ◆ reverse ◆ 7898 ◆ (x)]-
=    : AbB-primary ◆ P510531 ◆ reverse ◆ 7901 ◆ (x)] 
=    : AbB-primary ◆ P510531 ◆ reverse ◆ 7902 ◆ (x)] 
=    : AbB-primary ◆ P510534 ◆ obverse ◆ 8046 ◆ [(x) 
=    : AbB-primary ◆ P510534 ◆ obverse ◆ 8046 ◆ (x)]
=    : AbB-primary ◆ P510534 ◆ reverse ◆ 8055 ◆ [(x)]
=    : AbB-primary ◆ P510534 ◆ reverse ◆ 8067 ◆ (x)]-
=    : AbB-primary ◆ P510536 ◆ obverse ◆ 8165 ◆ [(x)] 
=    : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ (x)] 
=    : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ (x) 
=     and 1163 more
Number of results: TF 1183; GREP 1183

`supplied < >`¶

In [49]:

checkClusterType("supplied", "<", ">")

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ cluster
IDENTICAL: all 231 items
=    : AbB-primary ◆ P389958 ◆ reverse ◆ 523 ◆ <ru>-
=    : AbB-primary ◆ P510526 ◆ obverse ◆ 7604 ◆ <li-ki-il>
=    : AbB-primary ◆ P510551 ◆ obverse ◆ 8942 ◆ <ti>-
=    : AbB-primary ◆ P510552 ◆ obverse ◆ 8992 ◆ <li>-
=    : AbB-primary ◆ P510552 ◆ obverse ◆ 8993 ◆ <ma> 
=    : AbB-primary ◆ P510559 ◆ obverse ◆ 9402 ◆ <li>-
=    : AbB-primary ◆ P510561 ◆ obverse ◆ 9503 ◆ <ra>-
=    : AbB-primary ◆ P510562 ◆ obverse ◆ 9548 ◆ <ma?>-
=    : AbB-primary ◆ P510571 ◆ reverse ◆ 10054 ◆ <ut>-
=    : AbB-primary ◆ P510577 ◆ obverse ◆ 10396 ◆ <wi>-
=    : AbB-primary ◆ P510583 ◆ obverse ◆ 10748 ◆ <isz> 
=    : AbB-primary ◆ P510588 ◆ obverse ◆ 11067 ◆ <wi>-
=    : AbB-primary ◆ P510591 ◆ reverse ◆ 11292 ◆ <ta>-
=    : AbB-primary ◆ P510592 ◆ reverse ◆ 11373 ◆ <ti> 
=    : AbB-primary ◆ P510599 ◆ obverse ◆ 11750 ◆ <li>-
=    : AbB-primary ◆ P510606 ◆ obverse ◆ 12137 ◆ <t,u2>-
=    : AbB-primary ◆ P510613 ◆ obverse ◆ 12534 ◆ <ma>
=    : AbB-primary ◆ P510616 ◆ obverse ◆ 12719 ◆ <ta> 
=    : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ <lu> 
=    : AbB-primary ◆ P510617 ◆ reverse ◆ 12799 ◆ <li>-
=     and 211 more
Number of results: TF 231; GREP 231

`excised << >>`¶

In [50]:

checkClusterType("excised", "<<", ">>")

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ cluster
IDENTICAL: all 69 items
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7835 ◆ <<TE>>-
=    : AbB-primary ◆ P510543 ◆ obverse ◆ 8537 ◆ <<li>>-
=    : AbB-primary ◆ P510562 ◆ reverse ◆ 9563 ◆ <<KI>>
=    : AbB-primary ◆ P510573 ◆ reverse ◆ 10149 ◆ <<an-na>> 
=    : AbB-primary ◆ P510576 ◆ reverse ◆ 10329 ◆ <<x>> 
=    : AbB-primary ◆ P510621 ◆ reverse ◆ 13006 ◆ <<ti>>
=    : AbB-primary ◆ P497370 ◆ obverse ◆ 13101 ◆ <<mar>>-
=    : AbB-primary ◆ P510634 ◆ obverse ◆ 13743 ◆ <<ma>>
=    : AbB-primary ◆ P510660 ◆ reverse ◆ 15093 ◆ <<i-na>> 
=    : AbB-primary ◆ P510661 ◆ reverse ◆ 15147 ◆ <<kam iti>> 
=    : AbB-primary ◆ P510686 ◆ obverse ◆ 16380 ◆ <<qa2-be2-e>>
=    : AbB-primary ◆ P510686 ◆ obverse ◆ 16383 ◆ <<bi>>-
=    : AbB-primary ◆ P510688 ◆ obverse ◆ 16513 ◆ <<i>> 
=    : AbB-primary ◆ P510725 ◆ obverse ◆ 18485 ◆ <<um>>
=    : AbB-primary ◆ P510775 ◆ obverse ◆ 21373 ◆ <<ti>>
=    : AbB-primary ◆ P510798 ◆ obverse ◆ 22630 ◆ <<gur>>
=    : AbB-primary ◆ P510807 ◆ obverse ◆ 23150 ◆ <<ID>>-
=    : AbB-primary ◆ P510821 ◆ obverse ◆ 23894 ◆ <<u2-ul>> 
=    : AbB-primary ◆ P510861 ◆ reverse ◆ 26058 ◆ <<u2-sza#>>
=    : AbB-primary ◆ P413589 ◆ reverse ◆ 27013 ◆ <<sza-li-im>>
=     and 49 more
Number of results: TF 69; GREP 69

Primes¶

Here is an overview of the occurrence of primes.

There are primes within sign readings, they denote a numerical property.

Primes on column and line numbers denote that the given number deviates from the physical number because of damage.

N.B.: This gathers primes on signs, column numbers and case numbers.

First a bit of exploration.

In [51]:

primeFt = ("primecol", "primeln")

for ft in primeFt:
    for (value, frequency) in Fs(ft).freqList():
        print(f"{ft:<8}: {frequency:>5} x {value}")

primecol:     4 x 1
primeln :  1825 x 1

We also want so see the node types of primed entities.

In [52]:

for ft in primeFt:
    primed = collections.Counter()
    for n in Fs(ft).s(1):
        primed[F.otype.v(n)] += 1
    for x in sorted(primed.items()):
        print(f"{ft:<8}: {x[1]:>5} x {x[0]}")

primecol:     4 x line
primeln :  1825 x line

Now let us check the primes with grep, directly in the source files.

In [53]:

nonSignStuff = r"""()\[\]{}<>|.#!?+*"""
nonSignRe = re.compile(f"""[{nonSignStuff}]+""")


def tfPrimes():
    primes = []
    for ln in F.otype.s("line"):
        (document, face, line) = T.sectionFromNode(ln)
        srcfile = F.srcfile.v(ln)
        srcln = F.srcLnNum.v(ln)
        primeln = F.primeln.v(ln)
        primecol = F.primecol.v(ln)
        if primecol and (
            not L.p(ln, otype="line") or F.col.v(ln) == F.col.v(L.p(ln, otype="line")[0])
        ):
            primes.append(
                (srcfile, document, face, srcln - 1, "column", f"{F.col.v(ln)}{prime}")
            )
        if primeln:
            primes.append(
                (srcfile, document, face, srcln, "line", f"{F.ln.v(ln)}{prime}.")
            )
        for s in L.d(ln, otype="sign"):
            reading = F.reading.v(s)
            if reading:
                if prime in reading:
                    rep = nonSignRe.sub("", F.atf.v(s))
                    primes.append((srcfile, document, face, srcln, "sign", rep))
    return primes

In [54]:

material = f"""A-Za-z0-9,'/{nonSignStuff}"""
materialP = f"{material}{prime}"
primeRe = re.compile(f"""[{material}]*{prime}[{materialP}]*""")

readingRe = re.compile(r"""!\([^)]+\)""")


def grepPrimes(gen):
    primes = []
    prevColumn = None
    for (src, document, face, column, srcln, line) in gen:
        if column and column != prevColumn:
            if "'" in column:
                primes.append((src, document, face, srcln, "column", column))
        prevColumn = column
        fields = line.split(maxsplit=1)
        lineNum = fields[0]
        if prime in lineNum:
            primes.append((src, document, face, srcln, "line", lineNum))
        if len(fields) != 2:
            continue
        if lineNum.startswith("$") or lineNum.startswith("#"):
            continue
        trans = fields[1]
        if prime in trans:
            trans = readingRe.sub("", trans)
            hits = primeRe.findall(trans)
            for hit in hits:
                hit = nonSignRe.sub("", hit)
                primes.append((src, document, face, srcln, "sign", hit))
    return primes

In [55]:

COMP.checkSanity(
    (
        "kind",
        "prime",
    ),
    grepPrimes,
    tfPrimes,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ kind ◆ prime
IDENTICAL: all 1865 items
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ line ◆ 1'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ line ◆ 2'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ line ◆ 3'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 52 ◆ line ◆ 4'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 53 ◆ line ◆ 5'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 54 ◆ line ◆ 6'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 55 ◆ line ◆ 7'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 56 ◆ line ◆ 8'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 57 ◆ line ◆ 9'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 58 ◆ line ◆ 10'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 59 ◆ line ◆ 11'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 60 ◆ line ◆ 12'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 61 ◆ line ◆ 13'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 62 ◆ line ◆ 14'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 63 ◆ line ◆ 15'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 64 ◆ line ◆ 16'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 65 ◆ line ◆ 17'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 66 ◆ line ◆ 18'.
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 67 ◆ line ◆ 19'.
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 448 ◆ line ◆ 1'.
=     and 1845 more
Number of results: TF 1865; GREP 1865

Out[55]:

True

Words¶

Words are space separated parts of a transcription line (not counting inline comments).

Words have very few features, currently only one: atf.

In [56]:

def tfWords():
    words = []
    for w in F.otype.s("word"):
        (document, face, line) = T.sectionFromNode(w)
        ln = L.u(w, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        atf = F.atf.v(w)
        if atf:
            words.append((srcfile, document, face, srcln, atf))
    return words

In [57]:

commentLineRe = re.compile(r"""^\$\s*(.*)""")
commentInlineRe = re.compile(r"""\(\$ (.*?) \$\)""")


def grepWords(gen):
    words = []
    initWorkarounds()
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        ln = match.group(1)
        srcLn = match.group(2)
        srcLn = checkWorkarounds(document, face, ln, srcLn)
        srcLn = commentInlineRe.sub("", srcLn)
        for w in srcLn.split():
            words.append((srcfile, document, face, srcLnNum, w))
    finishWorkarounds()
    return words

In [58]:

COMP.checkSanity(
    ("sign",),
    grepWords,
    tfWords,
)

workaround applied: "1(disz) _lu2 TUR+DISZ_ szu-nu-ma-_dingir_"
workaround applied: "ta-asz-pu-ri um-ma at-ti-ma asz-[szum a-di i]-na#-an-na"
workaround applied: "[1/2(disz) _ma]-na# ku3-babbar_ a-nu-um#-[ma-am]"
workaround applied: "_iti gu4-si#-[sa2_ ...]"
ALL 4 WORKAROUNDS APPLIED
HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 76503 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ [a-na]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ _{d}suen_-i-[din-nam]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2-bi2-[ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um-ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ _{d}en-lil2_-sza-du-u2-ni-ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}utu_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ u3
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ _{d}[marduk]_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ a-na
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 34 ◆ da-ri-a-[tim]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 35 ◆ li-ba-al-li-t,u2-u2-ka
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ {disz}sze-ep-_{d}suen
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ a2-gal2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ [dumu]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 36 ◆ um-mi-a-mesz_
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ ki-a-am
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ u2-lam-mi-da-an-ni
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ um-[ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 37 ◆ szu-u2-[ma]
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 38 ◆ {disz}sa-am-su-ba-ah-li
=     and 76483 more
Number of results: TF 76503; GREP 76503

Out[58]:

True

Flags¶

We have several features for flags:

mark	feature	comments
`*`	collation
`#`	damage
`?`	question
`!`	remarkable

A bit of research¶

We start by surveying the possible values, including on which node types they occur

In [59]:

flagMap = {
    "#": "damage",
    "?": "question",
    "!": "remarkable",
    "*": "collated",
}

flagChars = list(flagMap.keys())
flagFeatures = list(flagMap.values())

In [60]:

flagNodeOverview = collections.Counter()
flagNodeTypes = set()

for n in N():
    for ft in flagFeatures:
        value = Fs(ft).v(n)
        if not value:
            continue
        nType = F.otype.v(n)
        flagNodeTypes.add(nType)
        flagNodeOverview[f"{nType}-{ft}-{value}"] += 1
for (combi, amount) in sorted(flagNodeOverview.items(), key=lambda x: (-x[1], x[0])):
    print(f"{amount:>6} x {combi}")

  9974 x sign-damage-1
   560 x sign-question-1
    99 x sign-remarkable-1
    13 x sign-collated-1

Let us see whether there are any cooccurrences of flags.

In [61]:

flagCombis = collections.Counter()

for n in N():
    if F.otype.v(n) not in flagNodeTypes:
        continue
    values = []
    for ft in flagFeatures:
        rawValue = Fs(ft).v(n)
        value = (
            f'{"*":^10}'
            if rawValue is None
            else f"{ft:^10}"
            if rawValue
            else f'{"":^10}'
        )
        values.append(value)

    combi = "-".join(values)
    flagCombis[combi] += 1

for (combi, amount) in sorted(flagCombis.items(), key=lambda x: (-x[1], x[0])):
    print(f"{amount:>6} x {combi}")

192721 x     *     -    *     -    *     -    *     
  9830 x   damage  -    *     -    *     -    *     
   421 x     *     - question -    *     -    *     
   138 x   damage  - question -    *     -    *     
    91 x     *     -    *     -remarkable-    *     
     9 x     *     -    *     -    *     - collated 
     5 x   damage  -    *     -remarkable-    *     
     2 x     *     -    *     -remarkable- collated 
     1 x     *     - question -remarkable- collated 
     1 x   damage  -    *     -    *     - collated

We need to address the question about order of flags.

A quick inspection in the corpus yields:

damage-question (#?) is frequent, question-damage (?#) is rare
damage-remarkable (#!) in all cases
remarkable-collated (!*) in all cases
damage-collated (#*) in all cases
question-remarkable-collated (?!) in all cases

Based on this observation, and assuming that the order between damage and question is not relevant, we produce flags always in the order:

damage question remarkable collated

When grepping, we have to normalize ?# to #?.

In [62]:

def tfFlags():
    discrepancies = collections.Counter()
    flags = []
    for n in F.otype.s("sign"):
        values = [Fs(ft).v(n) for ft in flagFeatures]
        if all(value is None for value in values):
            continue
        fl = ""
        for (i, val) in enumerate(values):
            if val:
                fl += flagChars[i]
        checkFl = F.flags.v(n) or ""
        if checkFl != fl:
            msg = "OK" if set(fl) == set(checkFl) else "PROBLEM"
            discrepancies[f"{fl} vs {checkFl} ({msg})"] += 1

        (document, face, line) = T.sectionFromNode(n)
        ln = L.u(n, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        opx = F.operator.v(n) == "x"
        num = F.type.v(n) == "numeral"
        reading = (
            F.grapheme.v(n) or F.reading.v(n)
            if opx
            else F.reading.v(n) or F.grapheme.v(n)
        )
        br = ")" if num or opx else ""
        flags.append((srcfile, document, face, srcln, f"{reading}{br}{checkFl}"))

    if not discrepancies:
        print("NO DISCREPANCIES")
    else:
        for (d, amount) in sorted(
            discrepancies.items(),
            key=lambda x: (-x[1], x[0]),
        ):
            print(f"{d:<4} {amount:>3} x")
    return flags

In [63]:

flagsRe = re.compile(r"""[A-Za-z0-9,'.]+\)?[#*!?]+""")


def grepFlags(gen):
    flags = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = trimRe.sub(" ", srcLn)
        srcLn = srcLn.replace("!(", "§§")
        fls = flagsRe.findall(srcLn)
        if match:
            for f in fls:
                flags.append((srcfile, document, face, srcLnNum, f.replace("§§", "!(")))
    return flags

In [64]:

COMP.checkSanity(
    ("sign",),
    grepFlags,
    tfFlags,
)

#? vs ?# (OK)   7 x
HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 10498 items
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ se3#
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ ga#
=    : AbB-primary ◆ P509375 ◆ obverse ◆ 151 ◆ na#
=    : AbB-primary ◆ P509375 ◆ obverse ◆ 152 ◆ bi2#
=    : AbB-primary ◆ P509375 ◆ reverse ◆ 166 ◆ il#
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 206 ◆ am#
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 257 ◆ ia#
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 258 ◆ ma#
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 260 ◆ ak#
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 260 ◆ kum#
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 269 ◆ ak#
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ ta#
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 272 ◆ na#
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 279 ◆ mu#
=    : AbB-primary ◆ P481190 ◆ obverse ◆ 355 ◆ nu#
=    : AbB-primary ◆ P481190 ◆ obverse ◆ 355 ◆ ur2#
=    : AbB-primary ◆ P481190 ◆ obverse ◆ 357 ◆ din#
=    : AbB-primary ◆ P481190 ◆ obverse ◆ 357 ◆ nam#
=    : AbB-primary ◆ P481191 ◆ reverse ◆ 410 ◆ szu#
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 450 ◆ ka#
=     and 10478 more
Number of results: TF 10498; GREP 10498

Out[64]:

True

Signs¶

We have arrived at the level of signs.

We will compare them, and all the structure we see in and around them, such as readings, graphemes, numerals, operators and flags.

First we have a glance at what happens between the signs, though.

After the signs¶

There might be material between a sign and the next one (if any).

The most usual ones are the -, separating signs within words and separating words.

Here is the complete overview.

In [65]:

for (c, amount) in F.after.freqList():
    print(f"{c} {amount:>6} x")

Now an overview of the types of signs.

In [66]:

signTypes = collections.Counter()

for s in F.otype.s("sign"):
    signTypes[F.type.v(s)] += 1

for (t, amount) in sorted(
    signTypes.items(),
    key=lambda x: (-x[1], x[0]),
):
    print(f"{t:<15} {amount:>6} x")

reading         188292 x
unknown           8761 x
numeral           2184 x
ellipsis          1617 x
grapheme          1272 x
commentline        969 x
complex            122 x
comment              2 x

We check these types individually, from the least frequent to the most frequent.

Comment signs¶

These are inline comments of the form ($ text $).

In [67]:

def tfSignsComment():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ != "comment":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        comment = F.comment.v(s)
        signs.append((srcfile, document, face, srcln, comment))
    return signs

In [68]:

def grepSignsComment(gen):
    signs = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        cms = commentInlineRe.findall(srcLn)
        for c in cms:
            signs.append((srcfile, document, face, srcLnNum, c))
    return signs

In [69]:

COMP.checkSanity(
    ("sign",),
    grepSignsComment,
    tfSignsComment,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 2 items
=    : AbB-secondary ◆ P275088 ◆ reverse ◆ 23687 ◆ blank space
=    : AbB-secondary ◆ P275104 ◆ reverse ◆ 24524 ◆ blank space
=     no more items
Number of results: TF 2; GREP 2

Out[69]:

True

Complex Signs¶

We check whether all complex signs have come through exactly right.

These are the signs of the form x(ZZZ) and !(ZZZ)

The characters x and ! are called the operators in these complexes.

Here is the distribution of operators.

In [70]:

for (c, amount) in F.operator.freqList():
    print(f"{c} {amount:>6} x")

!    117 x
x      5 x

We do two checks: an easy check involving the atf feature of a sign and a more involved check using the operator, reading, and grapheme features of a sign.

Based on ATF¶

In [71]:

def tfComplexes():
    complexes = []
    for s in F.otype.s("sign"):
        if F.type.v(s) != "complex":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        atf = F.atf.v(s)
        complexes.append((srcfile, document, face, srcln, atf))
    return complexes

In [72]:

complexRe = re.compile("""[a-z][a-z,0-9']*[#!?*]*""" r"[!x]\([^)]+\)[#!?*]*")


def grepComplexes(gen):
    complexes = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = commentInlineRe.sub("", srcLn)
        cls = complexRe.findall(srcLn)
        for c in cls:
            complexes.append((srcfile, document, face, srcLnNum, c))
    return complexes

In [73]:

COMP.checkSanity(
    ("complex",),
    grepComplexes,
    tfComplexes,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ complex
IDENTICAL: all 122 items
=    : AbB-primary ◆ P510533 ◆ obverse ◆ 7999 ◆ ku!(LU)
=    : AbB-primary ◆ P510560 ◆ reverse ◆ 9461 ◆ im!(NIM)
=    : AbB-primary ◆ P510560 ◆ reverse ◆ 9462 ◆ tam!(TUM)
=    : AbB-primary ◆ P510562 ◆ obverse ◆ 9543 ◆ tum!(TIM)
=    : AbB-primary ◆ P510562 ◆ reverse ◆ 9559 ◆ szi!(SZU)
=    : AbB-primary ◆ P510564 ◆ reverse ◆ 9677 ◆ bu!(BI)
=    : AbB-primary ◆ P510566 ◆ obverse ◆ 9762 ◆ tim!(IM)
=    : AbB-primary ◆ P510566 ◆ reverse ◆ 9777 ◆ lam!(IB)
=    : AbB-primary ◆ P510569 ◆ obverse ◆ 9920 ◆ ka!(KI)
=    : AbB-primary ◆ P510572 ◆ obverse ◆ 10092 ◆ ba!(SZA)
=    : AbB-primary ◆ P510578 ◆ obverse ◆ 10463 ◆ tum!(TAM)
=    : AbB-primary ◆ P510583 ◆ obverse ◆ 10750 ◆ tim!(TUM)
=    : AbB-primary ◆ P510588 ◆ obverse ◆ 11075 ◆ na!(HU)
=    : AbB-primary ◆ P510616 ◆ obverse ◆ 12724 ◆ nam!(LAM)
=    : AbB-primary ◆ P510616 ◆ obverse ◆ 12725 ◆ nam!(LAM)
=    : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ u2!(NA)
=    : AbB-primary ◆ P510623 ◆ reverse ◆ 13178 ◆ ze2!(SZE)
=    : AbB-primary ◆ P510626 ◆ obverse ◆ 13333 ◆ mi!(UL)
=    : AbB-primary ◆ P510635 ◆ reverse ◆ 13797 ◆ ir!(AR)
=    : AbB-primary ◆ P510635 ◆ reverse ◆ 13798 ◆ zimbir!(|UD.KIB.NU|)
=     and 102 more
Number of results: TF 122; GREP 122

Out[73]:

True

Based on other features¶

In [74]:

def tfComplexes2():
    complexes = []
    for s in F.otype.s("sign"):
        if F.type.v(s) != "complex":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        fl = F.flags.v(s) or ""
        op = F.operator.v(s)
        atf = (
            f"{F.reading.v(s)}{fl}{op}({F.grapheme.v(s)})"
            if op == "!"
            else f"{F.reading.v(s)}{op}({F.grapheme.v(s)}){fl}"
        )
        complexes.append((srcfile, document, face, srcln, atf))
    return complexes

In [75]:

COMP.checkSanity(
    ("numeral",),
    grepComplexes,
    tfComplexes2,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ numeral
IDENTICAL: all 122 items
=    : AbB-primary ◆ P510533 ◆ obverse ◆ 7999 ◆ ku!(LU)
=    : AbB-primary ◆ P510560 ◆ reverse ◆ 9461 ◆ im!(NIM)
=    : AbB-primary ◆ P510560 ◆ reverse ◆ 9462 ◆ tam!(TUM)
=    : AbB-primary ◆ P510562 ◆ obverse ◆ 9543 ◆ tum!(TIM)
=    : AbB-primary ◆ P510562 ◆ reverse ◆ 9559 ◆ szi!(SZU)
=    : AbB-primary ◆ P510564 ◆ reverse ◆ 9677 ◆ bu!(BI)
=    : AbB-primary ◆ P510566 ◆ obverse ◆ 9762 ◆ tim!(IM)
=    : AbB-primary ◆ P510566 ◆ reverse ◆ 9777 ◆ lam!(IB)
=    : AbB-primary ◆ P510569 ◆ obverse ◆ 9920 ◆ ka!(KI)
=    : AbB-primary ◆ P510572 ◆ obverse ◆ 10092 ◆ ba!(SZA)
=    : AbB-primary ◆ P510578 ◆ obverse ◆ 10463 ◆ tum!(TAM)
=    : AbB-primary ◆ P510583 ◆ obverse ◆ 10750 ◆ tim!(TUM)
=    : AbB-primary ◆ P510588 ◆ obverse ◆ 11075 ◆ na!(HU)
=    : AbB-primary ◆ P510616 ◆ obverse ◆ 12724 ◆ nam!(LAM)
=    : AbB-primary ◆ P510616 ◆ obverse ◆ 12725 ◆ nam!(LAM)
=    : AbB-primary ◆ P510616 ◆ reverse ◆ 12743 ◆ u2!(NA)
=    : AbB-primary ◆ P510623 ◆ reverse ◆ 13178 ◆ ze2!(SZE)
=    : AbB-primary ◆ P510626 ◆ obverse ◆ 13333 ◆ mi!(UL)
=    : AbB-primary ◆ P510635 ◆ reverse ◆ 13797 ◆ ir!(AR)
=    : AbB-primary ◆ P510635 ◆ reverse ◆ 13798 ◆ zimbir!(|UD.KIB.NU|)
=     and 102 more
Number of results: TF 122; GREP 122

Out[75]:

True

Comment line signs¶

Comment line signs are artificial signs introduced on comment lines. Comment lines have no transcribed material, but they annotate the structure ($) or the line contents (#) of other lines.

In order to anchor these comments to the text sequence, we have made extra signs for these lines. For each comment line, there is one such sign, and it has type commentline. The comments of these lines are stored in the comment feature on those signs.

In [76]:

def tfSignsEmpty():
    comments = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ != "commentline":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        s = L.d(ln, otype="sign")
        if not s or F.type.v(s[0]) != "commentline":
            continue
        comment = F.comment.v(s[0])
        ln = F.lnc.v(ln)
        comments.append((srcfile, document, face, srcln, ln, comment))
    return comments

In [77]:

def grepSignsEmpty(gen):
    comments = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = commentLineRe.match(srcLn)
        if not match:
            continue
        cms = match.group(1)
        ln = srcLn[0]
        comments.append((srcfile, document, face, srcLnNum, ln, cms))
    return comments

In [78]:

COMP.checkSanity(
    ("kind", "comment"),
    grepSignsEmpty,
    tfSignsEmpty,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ kind ◆ comment
IDENTICAL: all 969 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 46 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 48 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P509375 ◆ obverse ◆ 154 ◆ $ ◆ rest missing
=    : AbB-primary ◆ P507628 ◆ obverse ◆ 319 ◆ $ ◆ blank space
=    : AbB-primary ◆ P507628 ◆ reverse ◆ 321 ◆ $ ◆ blank space
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 447 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 462 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 464 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 480 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P389958 ◆ obverse ◆ 512 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P389256 ◆ obverse ◆ 556 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P389256 ◆ obverse ◆ 562 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P389256 ◆ reverse ◆ 564 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P389256 ◆ reverse ◆ 568 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P510534 ◆ obverse ◆ 8049 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P510534 ◆ reverse ◆ 8051 ◆ $ ◆ beginning broken
=    : AbB-primary ◆ P510536 ◆ reverse ◆ 8180 ◆ $ ◆ single ruling
=    : AbB-primary ◆ P510537 ◆ reverse ◆ 8222 ◆ $ ◆ blank space
=    : AbB-primary ◆ P510541 ◆ reverse ◆ 8441 ◆ $ ◆ rest broken
=    : AbB-primary ◆ P510543 ◆ reverse ◆ 8554 ◆ $ ◆ single ruling
=     and 949 more
Number of results: TF 969; GREP 969

Out[78]:

True

Grapheme signs¶

These are signs that do not contain a reading (lower case name of a transcribed unit) but a grapheme (upper case name of a transcribed unit).

Complex signs that have a grapheme in their x(GGG) or !(GGG) parts are not included.

In [79]:

def tfSignsGrapheme():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ != "grapheme":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        d = F.grapheme.v(s)
        signs.append((srcfile, document, face, srcln, d))
    return signs

In [80]:

graphemeRe = re.compile(r"""[A-WYZ][A-WYZ,0-9]*""")
excludeRe = re.compile(r"""[x!]\([^)]+\)""")


def grepSignsGrapheme(gen):
    signs = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = commentInlineRe.sub("", srcLn)
        srcLn = excludeRe.sub("", srcLn)
        data = graphemeRe.findall(srcLn)
        for d in data:
            signs.append((srcfile, document, face, srcLnNum, d))
    return signs

In [81]:

COMP.checkSanity(
    ("sign",),
    grepSignsGrapheme,
    tfSignsGrapheme,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 1272 items
=    : AbB-primary ◆ P481191 ◆ seal 1 ◆ 415 ◆ ARAD
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 455 ◆ AD
=    : AbB-primary ◆ P481192 ◆ obverse ◆ 455 ◆ DA
=    : AbB-primary ◆ P389958 ◆ obverse ◆ 518 ◆ DA
=    : AbB-primary ◆ P510527 ◆ reverse ◆ 7673 ◆ SZESZ
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7833 ◆ ARAD
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7835 ◆ TE
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7839 ◆ GAN2
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7844 ◆ ARAD
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7847 ◆ ARAD
=    : AbB-primary ◆ P510530 ◆ reverse ◆ 7848 ◆ ARAD
=    : AbB-primary ◆ P510534 ◆ reverse ◆ 8054 ◆ ARAD
=    : AbB-primary ◆ P510536 ◆ obverse ◆ 8163 ◆ ARAD
=    : AbB-primary ◆ P510536 ◆ obverse ◆ 8168 ◆ ARAD
=    : AbB-primary ◆ P510537 ◆ obverse ◆ 8216 ◆ SU
=    : AbB-primary ◆ P510537 ◆ obverse ◆ 8220 ◆ SU
=    : AbB-primary ◆ P510541 ◆ obverse ◆ 8407 ◆ GAN2
=    : AbB-primary ◆ P510541 ◆ obverse ◆ 8412 ◆ GAN2
=    : AbB-primary ◆ P510541 ◆ obverse ◆ 8416 ◆ GAN2
=    : AbB-primary ◆ P510541 ◆ reverse ◆ 8425 ◆ GAN2
=     and 1252 more
Number of results: TF 1272; GREP 1272

Out[81]:

True

Ellipsis signs¶

These are signs that are represented as ....

In [82]:

def tfSignsEllipsis():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ != "ellipsis":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        d = F.grapheme.v(s)
        signs.append((srcfile, document, face, srcln, d))
    return signs

In [83]:

ellipsisRe = re.compile(r"""\.\.\.""")


def grepSignsEllipsis(gen):
    signs = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = commentInlineRe.sub("", srcLn)
        data = ellipsisRe.findall(srcLn)
        for d in data:
            signs.append((srcfile, document, face, srcLnNum, d))
    return signs

In [84]:

COMP.checkSanity(
    ("sign",),
    grepSignsEllipsis,
    tfSignsEllipsis,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 1617 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ ...
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 63 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 64 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 65 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 66 ◆ ...
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 67 ◆ ...
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 102 ◆ ...
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 105 ◆ ...
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 254 ◆ ...
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 255 ◆ ...
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 268 ◆ ...
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 270 ◆ ...
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ ...
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 272 ◆ ...
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 278 ◆ ...
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 280 ◆ ...
=    : AbB-primary ◆ P481191 ◆ seal 1 ◆ 413 ◆ ...
=     and 1597 more
Number of results: TF 1617; GREP 1617

Out[84]:

True

Numerals¶

We check whether all numerals have come through exactly right.

We do two checks: an easy check involving the atf feature of a sign and a more involved check using the repeat, fraction and reading features of a sign.

Based on ATF¶

In [85]:

def tfNumerals():
    numerals = []
    for s in F.otype.s("sign"):
        if F.type.v(s) != "numeral":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        atf = F.atf.v(s).rstrip(flaggingStr)
        numerals.append((srcfile, document, face, srcln, atf))
    return numerals

In [86]:

numeralRe = re.compile("((?:n|(?:[0-9/]+))" r"\([^)]+\))")


def grepNumerals(gen):
    numerals = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = commentInlineRe.sub("", srcLn)
        nls = numeralRe.findall(srcLn)
        for n in nls:
            numerals.append((srcfile, document, face, srcLnNum, n))
    return numerals

In [87]:

COMP.checkSanity(
    ("numeral",),
    grepNumerals,
    tfNumerals,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ numeral
IDENTICAL: all 2184 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 2(esze3)
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 7(disz)
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 2(esze3)
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 108 ◆ 1(disz)
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 111 ◆ 1(disz)
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 113 ◆ 2(disz)
=    : AbB-primary ◆ P509374 ◆ reverse ◆ 117 ◆ 2(disz)
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 203 ◆ 4(disz)
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 259 ◆ 3(u)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 1(disz)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 3(disz)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 276 ◆ 3(u)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 277 ◆ 6(disz)
=    : AbB-primary ◆ P481191 ◆ obverse ◆ 396 ◆ 2(u)
=    : AbB-primary ◆ P481191 ◆ reverse ◆ 406 ◆ 2(u)
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 470 ◆ 1(asz)
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 472 ◆ 1(asz)
=    : AbB-primary ◆ P389958 ◆ obverse ◆ 519 ◆ 5(disz)
=    : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 1(disz)
=    : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 5/6(disz)
=     and 2164 more
Number of results: TF 2184; GREP 2184

Out[87]:

True

Based on other features¶

In [88]:

def tfNumerals2():
    numerals = []
    for s in F.otype.s("sign"):
        if F.type.v(s) != "numeral":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        repeat = F.repeat.v(s)
        if repeat == -1:
            repeat = "n"
        atf = f"{repeat or F.fraction.v(s)}({F.reading.v(s)})"
        numerals.append((srcfile, document, face, srcln, atf))
    return numerals

In [89]:

COMP.checkSanity(
    ("numeral",),
    grepNumerals,
    tfNumerals2,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ numeral
IDENTICAL: all 2184 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 39 ◆ 2(esze3)
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 41 ◆ 7(disz)
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 43 ◆ 2(esze3)
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 108 ◆ 1(disz)
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 111 ◆ 1(disz)
=    : AbB-primary ◆ P509374 ◆ obverse ◆ 113 ◆ 2(disz)
=    : AbB-primary ◆ P509374 ◆ reverse ◆ 117 ◆ 2(disz)
=    : AbB-primary ◆ P509376 ◆ obverse ◆ 203 ◆ 4(disz)
=    : AbB-primary ◆ P509377 ◆ obverse ◆ 259 ◆ 3(u)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 1(disz)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 271 ◆ 3(disz)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 276 ◆ 3(u)
=    : AbB-primary ◆ P509377 ◆ reverse ◆ 277 ◆ 6(disz)
=    : AbB-primary ◆ P481191 ◆ obverse ◆ 396 ◆ 2(u)
=    : AbB-primary ◆ P481191 ◆ reverse ◆ 406 ◆ 2(u)
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 470 ◆ 1(asz)
=    : AbB-primary ◆ P481192 ◆ reverse ◆ 472 ◆ 1(asz)
=    : AbB-primary ◆ P389958 ◆ obverse ◆ 519 ◆ 5(disz)
=    : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 1(disz)
=    : AbB-primary ◆ P510527 ◆ reverse ◆ 7677 ◆ 5/6(disz)
=     and 2164 more
Number of results: TF 2184; GREP 2184

Out[89]:

True

Unknown signs¶

These are not unknown signs but signs that represent unknown readings/graphemes.

They are represented as x or X, n or N.

In [90]:

def tfSignsUnknown():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ != "unknown":
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        d = F.reading.v(s) or F.grapheme.v(s)
        signs.append((srcfile, document, face, srcln, d))
    return signs

In [108]:

unknownRe = re.compile(r"""([xX])|(?:(?:_|\b)([nN])(?:_|\b)(?!\())""")


def grepSignsUnknown(gen):
    signs = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = commentInlineRe.sub("", srcLn)
        srcLn = excludeRe.sub("", srcLn)
        data = unknownRe.findall(srcLn)
        for result in data:
            d = result[0] or result[1]
            signs.append((srcfile, document, face, srcLnNum, d))
    return signs

In [109]:

COMP.checkSanity(
    ("sign",),
    grepSignsUnknown,
    tfSignsUnknown,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 8761 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 40 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 42 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 44 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 45 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 49 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 50 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ x
=    : AbB-primary ◆ P509373 ◆ reverse ◆ 51 ◆ x
=     and 8741 more
Number of results: TF 8761; GREP 8761

Out[109]:

True

Reading signs¶

These are signs that contain a reading (lower case name of a transcribed unit).

We also include the readings of complex signs that also have a grapheme in their representations: rrrx(GGG) or rrr!(GGG)

In [110]:

def tfSignsReading():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ not in {"reading", "complex", "numeral"}:
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        d = F.reading.v(s)
        signs.append((srcfile, document, face, srcln, d))
    return signs

In [115]:

readingRe = re.compile(r"""[a-wyz'][a-wyz,0-9']*""")
nExcludeRe = re.compile(r"""(?:_|\b)n(?:_|\b)""")


def grepSignsReading(gen):
    signs = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = commentInlineRe.sub("", srcLn)
        srcLn = nExcludeRe.sub("", srcLn)
        data = readingRe.findall(srcLn)
        for d in data:
            signs.append((srcfile, document, face, srcLnNum, d))
    return signs

In [116]:

COMP.checkSanity(
    ("sign",),
    grepSignsReading,
    tfSignsReading,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 190598 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma
=     and 190578 more
Number of results: TF 190598; GREP 190598

Out[116]:

True

All simple signs¶

Just for redundancy, we do a comparison all simple, non-empty signs in the transcriptions and in TF. So: no numerals, no rrrx(GGG), no rrr!(GGG).

We do it based on the atf feature and based on the other features.

In [117]:

def tfSigns():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ in {"complex", "numeral", "commentline"}:
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        atf = F.atf.v(s).rstrip(flaggingStr)
        signs.append((srcfile, document, face, srcln, atf))
    return signs

In [118]:

signRe = re.compile(
    r"""x|(?:\.\.\.)|(?:[a-wyzA-WYZ'][a-wyzA-WYZ,0-9']*)|(?:\(\$.*?\$\))"""
)


def grepSigns(gen):
    signs = []
    for (srcfile, document, face, column, srcLnNum, srcLn) in gen:
        match = transRe.match(srcLn)
        if not match:
            continue
        srcLn = match.group(2)
        srcLn = numeralRe.sub("", srcLn)
        srcLn = complexRe.sub("", srcLn)
        sns = signRe.findall(srcLn)
        for s in sns:
            signs.append((srcfile, document, face, srcLnNum, s))
    return signs

In [119]:

COMP.checkSanity(
    ("sign",),
    grepSigns,
    tfSigns,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 199944 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma
=     and 199924 more
Number of results: TF 199944; GREP 199944

Out[119]:

True

In [120]:

def tfSigns2():
    signs = []
    for s in F.otype.s("sign"):
        typ = F.type.v(s)
        if typ in {"complex", "numeral", "commentline"}:
            continue
        (document, face, line) = T.sectionFromNode(s)
        ln = L.u(s, otype="line")[0]
        d = T.documentNode(document)
        srcfile = F.srcfile.v(d)
        srcln = F.srcLnNum.v(ln)
        atf = (
            F.reading.v(s)
            if typ == "reading"
            else f"($ {F.comment.v(s)} $)"
            if typ == "comment"
            else F.grapheme.v(s)
            if typ == "grapheme" or typ == "ellipsis"
            else F.reading.v(s) or F.grapheme.v(s)
            if typ == "unknown"
            else "§§§"
        )
        signs.append((srcfile, document, face, srcln, atf))
    return signs

In [121]:

COMP.checkSanity(
    ("sign",),
    grepSigns,
    tfSigns2,
)

HEAD : srcfile ◆ tablet ◆ ln ◆ sign
IDENTICAL: all 199944 items
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ a
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ na
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ d
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ suen
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ i
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ din
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 31 ◆ nam
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ qi2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ bi2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 32 ◆ ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ um
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ d
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ en
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ lil2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ sza
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ du
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ u2
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ni
=    : AbB-primary ◆ P509373 ◆ obverse ◆ 33 ◆ ma
=     and 199924 more
Number of results: TF 199944; GREP 199944

Out[121]:

True

Conclusion¶

Here ends the checking.

This notebook has tested all patterns and quantities found in the transcriptions.

By a somewhat convoluted GREP we have extracted patterns from the sources.

By somewhat contrived TF alchemy we have produced the same patterns from the Text-Fabric representation of the sources.

Then we have made a rigorous comparison: we have checked wether both methods found exactly the same sequence of values.

And that turned out to be so!

Checks¶

Docs¶

Character usage¶

Documents¶

Document language¶

Document collection/volume/number/note¶

Faces¶

Objects¶

Columns and lines¶

Remarks¶

Translations¶

Line comments¶

Metadata¶

Line contents¶

By the feature srcLn¶

By the method T.text()¶

Workarounds¶

Clusters¶

Cluster types¶

Alternate language¶

Check by ATF¶

Check by cluster type feature¶

langalt _ _¶

missing [ ]¶

det { }¶

uncertain ( )¶

supplied < >¶

excised << >>¶

Primes¶

Words¶

Flags¶

A bit of research¶

Signs¶

After the signs¶

Comment signs¶

Complex Signs¶

Based on ATF¶

Based on other features¶

Comment line signs¶

Grapheme signs¶

Ellipsis signs¶

Numerals¶

Based on ATF¶

Based on other features¶

Unknown signs¶

Reading signs¶

All simple signs¶

Conclusion¶

By the feature `srcLn`¶

By the method `T.text()`¶

`langalt _ _`¶

`missing [ ]`¶

`det { }`¶

`uncertain ( )`¶

`supplied < >`¶

`excised << >>`¶