Notebook

To get started: consult start

Similar lines¶

We spot the many similarities between lines in the corpus.

There are ca 25000 lines in the corpus. To compare them all requires 300 million comparisons. That is a costly operation. On this laptop it took 6 whole minutes.

The good news it that we have stored the outcome in an extra feature.

This feature is packaged in a TF data module, that we will load below, by using the parameter mod in the use() statement.

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

import collections

from tf.app import use

In [3]:

A = use(
    "Nino-cunei/oldbabylonian",
    mod="Nino-cunei/oldbabylonian/parallels/tf:clone",
    hoist=globals(),
)

TF-app: ~/text-fabric-data/Nino-cunei/oldbabylonian/app

data: ~/text-fabric-data/Nino-cunei/oldbabylonian/tf/1.0.6

data: ~/github/Nino-cunei/oldbabylonian/parallels/tf/1.0.6

This is Text-Fabric 9.2.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

68 features found and 0 ignored

Text-Fabric: Text-Fabric API 9.2.2, Nino-cunei/oldbabylonian/app v3, Search Reference
Data: OLDBABYLONIAN, Character table, Feature docs
Features:

Nino-cunei/oldbabylonian/parallels/tf

sim

int

similarity between lines, as a percentage of the common material wrt the combined material

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T11:26:09Z

editor:

Cale Johnson et. al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

Old Babylonian Letters 1900-1600: Cuneiform tablets

ARK

str

persistent identifier of type ARK from metadata field "UCLA Library ARK"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:07Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

after

str

what comes after a sign or word (- or space)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:07Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

afterr

str

what comes after a sign or word (- or space); between adjacent signs a ␣ is inserted

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:07Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

afteru

str

what comes after a sign when represented as unicode (space)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:07Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

atf

str

full atf of a sign (without cluster chars) or word (including cluster chars)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:07Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

atfpost

str

atf of cluster closings at sign

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:07Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

atfpre

str

atf of cluster openings at sign

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

author

str

author from metadata field "Author(s)"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

col

int

ATF column number

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

collated

int

whether a sign is collated (*)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

collection

str

collection of a document

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

comment

str

$ comment to line or inline comment to slot ($ and $)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

damage

int

whether a sign is damaged

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

det

int

whether a sign is a determinative gloss - between braces { }

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

docnote

str

additional remarks in the document identification

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

docnumber

str

number of a document within a collection-volume

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

excavation

str

excavation number from metadata field "Excavation no."

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

excised

int

whether a sign is excised - between double angle brackets << >>

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

face

str

full name of a face including the enclosing object

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

flags

str

sequence of flags after a sign

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

fraction

str

fraction of a numeral

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

genre

str

genre from metadata field "Genre"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

grapheme

str

grapheme of a sign

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

graphemer

str

grapheme of a sign using non-ascii characters

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

graphemeu

str

grapheme of a sign using cuneiform unicode characters

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

lang

str

language of a document

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

langalt

int

1 if a sign is in the alternate language (i.e. Sumerian) - between underscores _ _

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

int

ATF line number of a numbered line, without prime

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

lnc

str

ATF line identification of a comment line ($)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

lnno

str

ATF line number, may be $ or #, with prime; column number prepended

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

material

str

material indication from metadata field "Material"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

missing

int

whether a sign is missing - between square brackets [ ]

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

museumcode

str

museum code from metadata field "Museum no."

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

museumname

str

museum name from metadata field "Collection"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

object

str

name of an object of a document

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

operator

str

the ! or x in a !() or x() construction

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

operatorr

str

the ! or x in a !() or x() construction, represented as =, ␣

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

operatoru

str

the ! or x in a !() or x() construction, represented as =, ␣

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

otype

str

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

period

str

period indication from metadata field "Period"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

pnumber

str

P number of a document

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

primecol

int

whether a prime is present on a column number

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

primeln

int

whether a prime is present on a line number

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

pubdate

str

publication date from metadata field "Publication date"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

question

int

whether a sign has the question flag (?)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

reading

str

reading of a sign

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

readingr

str

reading of a sign using non-ascii characters

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

readingu

str

reading of a sign using cuneiform unicode characters

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:08Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

remarkable

int

whether a sign is remarkable (!)

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

remarks

str

# comment to line

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

repeat

int

repeat of a numeral; the value n (unknown) is represented as -1

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

srcLn

str

full line in source file

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

srcLnNum

int

line number in source file

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

srcfile

str

source file name of a document

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

subgenre

str

genre from metadata field "Sub-genre"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

supplied

int

whether a sign is supplied - between angle brackets < >

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

sym

str

essential part of a sign or of a word

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

symr

str

essential part of a sign or of a word using non-ascii characters

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

symu

str

essential part of a sign or of a word using cuneiform unicode characters

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:09Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

trans

int

whether a line has a translation

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

transcriber

str

person who did the encoding into ATF from metadata field "ATF source"

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

translation@ll

str

translation of line in language en = English

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

type

str

name of a type of cluster or kind of sign

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

uncertain

int

whether a sign is uncertain - between brackets ( )

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

volume

int

volume of a document within a collection

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

oslots

none

converters:

Cale Johnson, Dirk Roorda

dateWritten:

2020-06-26T09:20:10Z

editor:

Cale Johnson et al.

institute:

CDL

name:

AbB Old Babylonian Cuneiform

writtenBy:

Text-Fabric

Text-Fabric API: names N F E L T S C TF directly usable

The new feature is sim and it it an edge feature. It annotates pairs of lines $(l, m)$ where $l$ and $m$ have similar content. The degree of similarity is a percentage (between 90 and 100), and this value is annotated onto the edges.

Here is an example:

In [4]:

exampleLine = F.otype.s("line")[0]
sisters = E.sim.b(exampleLine)
print(f"{len(sisters)} similar lines")
print("\n".join(f"{s[0]} with similarity {s[1]}" for s in sisters[0:10]))
A.table(tuple((s[0],) for s in sisters), end=10)

75 similar lines
235394 with similarity 100
235421 with similarity 100
235434 with similarity 100
235464 with similarity 100
235478 with similarity 100
235503 with similarity 100
235529 with similarity 100
235585 with similarity 100
235615 with similarity 100
235629 with similarity 100

n	p	line
1	P510729 obverse:1	a-na {d}suen-i-din-[nam]
2	P510730 obverse:1	a-na {d}suen-i-din-nam
3	P510731 obverse:1	a-na {d}suen-i-din-nam
4	P510732 obverse:1	a-na {d}suen#-i-din-nam#
5	P497779 obverse:1	a-na {d}suen#-[i]-din-nam
6	P510733 obverse:1	[a-na] {d}[suen-i-din-nam]
7	P510734 obverse:1	[a-na {d}suen-i-din-nam]
8	P510736 obverse:1	a-na {d}suen-i-din-nam
9	P510737 obverse:1	a-na {d}suen-i-din-nam#
10	P370926 obverse:1	a-na {d}suen-i-din-nam

All similarities¶

Let's first find out the range of similarities:

In [5]:

minSim = None
maxSim = None

for ln in F.otype.s("line"):
    sisters = E.sim.f(ln)
    if not sisters:
        continue
    thisMin = min(s[1] for s in sisters)
    thisMax = max(s[1] for s in sisters)
    if minSim is None or thisMin < minSim:
        minSim = thisMin
    if maxSim is None or thisMax > maxSim:
        maxSim = thisMax

print(f"minimum similarity is {minSim:>3}")
print(f"maximum similarity is {maxSim:>3}")

minimum similarity is  90
maximum similarity is 100

The bottom lines¶

We give a few examples of the least similar lines.

N.B. When lines are less than 90% similar, they have not made it into the sim feature!

We can use a search template to get the 90% lines.

In [6]:

query = """
line
-sim=90> line
"""

In words: find a line connected via a sim-edge with value 90 to an other line.

In [7]:

results = A.search(query)

  0.19s 722 results

Not very much indeed. It seems that lines are either very similar, or not so similar at all.

In [8]:

A.table(results, start=1, end=10)

n	p	line	line
1	P509373 obverse:10	_a-sza3 a-gar3_ na-ag-[ma-lum] _uru_ x x x{ki}	_a-[sza3 a-gar3_ na-ag]-ma-lum _uru gan2_ x x{ki}
2	P509374 obverse:4	_{d}utu_ u3 _{d}marduk_ da-ri-[isz] _u4_-[mi x]	{d}utu# u3 {d}marduk# [da-ri-isz _u4_-mi-im]
3	P509374 obverse:4	_{d}utu_ u3 _{d}marduk_ da-ri-[isz] _u4_-[mi x]	_{d}utu_ u3 _{d}marduk_ da-ri-isz u4-mi-im
4	P509374 obverse:4	_{d}utu_ u3 _{d}marduk_ da-ri-[isz] _u4_-[mi x]	{d}utu u3 {d}[marduk da-ri-isz _u4_]-mi#-im
5	P509376 obverse:11	it-ti-szu a-na _a-sza3_ ri-id-ma	[it-ti]-szu#-nu a-na _a-sza3_ ri-id-ma
6	P510527 obverse:4	{d}utu u3 {d}marduk li-ba-al-li-t,u2-ka	{d}utu u3 {d}marduk li-ba-al-li-t,u2-ka!(KI)
7	P510527 obverse:4	{d}utu u3 {d}marduk li-ba-al-li-t,u2-ka	{d}utu u3 {d}marduk tu-ba-al-li-t,u2-ka
8	P510529 obverse:4	{d}utu u3 {d}marduk da-ri-isz _u4_-mi	{d}utu# u3 {d}marduk# [da-ri-isz _u4_-mi-im]
9	P510529 obverse:4	{d}utu u3 {d}marduk da-ri-isz _u4_-mi	_{d}utu_ u3 _{d}marduk_ da-ri-isz u4-mi-im
10	P510529 obverse:4	{d}utu u3 {d}marduk da-ri-isz _u4_-mi	{d}utu u3 {d}[marduk da-ri-isz _u4_]-mi#-im

In case the ATF flags and clusters are a bit heavy on the eye, you can switch to a more pleasing rich text layout:

In [9]:

A.table(results, start=1, end=10, fmt="layout-orig-rich")

n	p	line	line
1	P509373 obverse:10	a-ša₃ a-gar₃ na-ag-ma-lum uru x x xki	a-ša₃ a-gar₃ na-ag-ma-lum uru gan₂ x xki
2	P509374 obverse:4	dutu u₃ dmarduk da-ri-iš u₄-mi x	dutu u₃ dmarduk da-ri-iš u₄-mi-im
3	P509374 obverse:4	dutu u₃ dmarduk da-ri-iš u₄-mi x	dutu u₃ dmarduk da-ri-iš u₄-mi-im
4	P509374 obverse:4	dutu u₃ dmarduk da-ri-iš u₄-mi x	dutu u₃ dmarduk da-ri-iš u₄-mi-im
5	P509376 obverse:11	it-ti-šu a-na a-ša₃ ri-id-ma	it-ti-šu-nu a-na a-ša₃ ri-id-ma
6	P510527 obverse:4	dutu u₃ dmarduk li-ba-al-li-ṭu₂-ka	dutu u₃ dmarduk li-ba-al-li-ṭu₂-ka=⌈KI⌉
7	P510527 obverse:4	dutu u₃ dmarduk li-ba-al-li-ṭu₂-ka	dutu u₃ dmarduk tu-ba-al-li-ṭu₂-ka
8	P510529 obverse:4	dutu u₃ dmarduk da-ri-iš u₄-mi	dutu u₃ dmarduk da-ri-iš u₄-mi-im
9	P510529 obverse:4	dutu u₃ dmarduk da-ri-iš u₄-mi	dutu u₃ dmarduk da-ri-iš u₄-mi-im
10	P510529 obverse:4	dutu u₃ dmarduk da-ri-iš u₄-mi	dutu u₃ dmarduk da-ri-iš u₄-mi-im

Or even in cuneiform unicode:

In [10]:

A.table(results, start=1, end=10, fmt="layout-orig-unicode")

n	p	line	line
1	P509373 obverse:10	𒀀𒊮 𒀀𒃼 𒈾𒀝𒈠𒈝 𒌷 x x x𒆠	𒀀𒊮 𒀀𒃼 𒈾𒀝𒈠𒈝 𒌷 𒃷 x x𒆠
2	P509374 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪 x	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪𒅎
3	P509374 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪 x	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪𒅎
4	P509374 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪 x	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪𒅎
5	P509376 obverse:11	𒀉𒋾𒋗 𒀀𒈾 𒀀𒊮 𒊑𒀉𒈠	𒀉𒋾𒋗𒉡 𒀀𒈾 𒀀𒊮 𒊑𒀉𒈠
6	P510527 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒇷𒁀𒀠𒇷𒌅𒅗	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒇷𒁀𒀠𒇷𒌅𒅗=⌈𒆠⌉
7	P510527 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒇷𒁀𒀠𒇷𒌅𒅗	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒌅𒁀𒀠𒇷𒌅𒅗
8	P510529 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪𒅎
9	P510529 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪𒅎
10	P510529 obverse:4	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪	𒀭𒌓 𒅇 𒀭𒀫𒌓 𒁕𒊑𒅖 𒌓𒈪𒅎

From now on we forget about the level of similarity, and focus on whether two lines are just "similar", meaning that they have a high degree of similarity.

Cluster the lines¶

Before we try to find them, let's see if we can cluster the lines in similar clusters.

In [11]:

CLUSTER_THRESHOLD = 0.5


def makeClusters():
    A.indent(reset=True)
    chunkSize = 1000
    b = 0
    j = 0
    clusters = []
    for ln in F.otype.s("line"):
        j += 1
        b += 1
        if b == chunkSize:
            b = 0
            A.info(f"{j:>5} lines and {len(clusters):>5} clusters")
        lSisters = {x[0] for x in E.sim.b(ln)}
        lAdded = False
        for cl in clusters:
            if len(cl & lSisters) > CLUSTER_THRESHOLD * len(cl):
                cl.add(ln)
                lAdded = True
                break
        if not lAdded:
            clusters.append({ln})
    A.info(f"{j} lines and {len(clusters)} clusters")
    return clusters

In [12]:

clusters = makeClusters()

  0.10s  1000 lines and   858 clusters
  0.31s  2000 lines and  1691 clusters
  0.65s  3000 lines and  2509 clusters
  1.12s  4000 lines and  3338 clusters
  1.72s  5000 lines and  4135 clusters
  2.42s  6000 lines and  4885 clusters
  3.21s  7000 lines and  5659 clusters
  4.05s  8000 lines and  6358 clusters
  5.07s  9000 lines and  7125 clusters
  6.23s 10000 lines and  7894 clusters
  7.49s 11000 lines and  8715 clusters
  8.82s 12000 lines and  9450 clusters
    10s 13000 lines and 10166 clusters
    12s 14000 lines and 11011 clusters
    14s 15000 lines and 11774 clusters
    15s 16000 lines and 12592 clusters
    17s 17000 lines and 13219 clusters
    19s 18000 lines and 13893 clusters
    21s 19000 lines and 14637 clusters
    24s 20000 lines and 15380 clusters
    26s 21000 lines and 16095 clusters
    28s 22000 lines and 16799 clusters
    31s 23000 lines and 17505 clusters
    33s 24000 lines and 18235 clusters
    36s 25000 lines and 19005 clusters
    39s 26000 lines and 19722 clusters
    41s 27000 lines and 20446 clusters
    43s 27375 lines and 20735 clusters

What is the distribution of the clusters, in terms of how many similar lines they contain? We count them.

In [13]:

clusterSizes = collections.Counter()

for cl in clusters:
    clusterSizes[len(cl)] += 1

for (size, amount) in sorted(
    clusterSizes.items(),
    key=lambda x: (-x[0], x[1]),
):
    print(f"clusters of size {size:>4}: {amount:>5}")

clusters of size 1006:     1
clusters of size  129:     1
clusters of size  126:     1
clusters of size  125:     1
clusters of size   84:     1
clusters of size   78:     1
clusters of size   76:     1
clusters of size   74:     1
clusters of size   69:     1
clusters of size   64:     1
clusters of size   56:     1
clusters of size   52:     1
clusters of size   51:     1
clusters of size   49:     1
clusters of size   48:     1
clusters of size   45:     1
clusters of size   44:     1
clusters of size   43:     1
clusters of size   39:     1
clusters of size   35:     1
clusters of size   34:     1
clusters of size   32:     1
clusters of size   30:     3
clusters of size   29:     1
clusters of size   28:     4
clusters of size   27:     2
clusters of size   26:     2
clusters of size   25:     3
clusters of size   24:     1
clusters of size   23:     3
clusters of size   22:     3
clusters of size   20:     4
clusters of size   19:     2
clusters of size   18:     3
clusters of size   17:     3
clusters of size   16:     2
clusters of size   15:     4
clusters of size   14:     9
clusters of size   13:     7
clusters of size   12:     9
clusters of size   11:    12
clusters of size   10:    17
clusters of size    9:    14
clusters of size    8:    28
clusters of size    7:    30
clusters of size    6:    49
clusters of size    5:    58
clusters of size    4:   123
clusters of size    3:   276
clusters of size    2:   998
clusters of size    1: 19043

Interesting groups¶

Let's investigate some interesting groups, that lie in some sweet spots.

the biggest clusters: more than 31 members
the medium clusters: between 12 and 30 members
the small clusters: between 2 and 11 members

All chapters:

start become an expert in creating pretty displays of your text structures
display become an expert in creating pretty displays of your text structures
search turbo charge your hand-coding with search templates
exportExcel make tailor-made spreadsheets out of your results
share draw in other people's data and let them use yours
similar Lines spot the similarities between lines

See the cookbook for recipes for small, concrete tasks.

CC-BY Dirk Roorda