Signs are the building blocks in the transcriptions. They correspond to the individual "glyphs" on the tablet.
However, we have inserted a few empty signs, which we can leave out subsequently ...
We need a few extra modules.
%load_ext autoreload
%autoreload 2
import os
import collections
from textwrap import dedent
from IPython.display import Markdown
from tf.app import use
A = use("Nino-cunei/uruk", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
tablet | 6364 | 22.01 | 100 |
face | 9456 | 14.10 | 95 |
column | 14023 | 9.34 | 93 |
line | 35842 | 3.61 | 92 |
case | 9651 | 3.46 | 24 |
cluster | 32753 | 1.03 | 24 |
quad | 3794 | 2.05 | 6 |
comment | 11090 | 1.00 | 8 |
sign | 140094 | 1.00 | 100 |
The main characteristic of a sign is its grapheme. Everything we do with signs, is complicated by the fact that signs can be repeated, and augmented with primes, variants, modifiers and flags.
Before we go on, we call up our example tablet.
If you want to output multiple text items in an output cell, you have to print()
it.
pNum = "P005381"
query = """
tablet catalogId=P005381
"""
results = A.search(query)
A.show(results, withNodes=True)
0.00s 1 result
result 1
We navigate to the last sign in line 1 in column 2 on the obverse face:
case = A.nodeFromCase((pNum, "obverse:2", "1"))
sign1 = L.d(case, otype="sign")[-1]
print(sign1)
106611
That must be the right bar code.
We can retrieve the ATF transliteration:
print(A.atfFromSign(sign1))
GI4~a
Note that we get the ATF for a sign by means of A.atfFromSign(node)
.
We get also the augments such as primes and modifiers and variant.
We get the flags if we say so by flags=True
.
Take for example the first sign on line 3 in column 1 on the obverse face:
case = A.nodeFromCase((pNum, "obverse:1", "3"))
sign2 = L.d(case, otype="sign")[0]
print(sign2)
print(A.atfFromSign(sign2))
print(A.atfFromSign(sign2, flags=True))
106597 2(N04) 2(N04)#
Secondly, we want to get pointers to the locations of these signs in the corpus.
A.pretty(sign1, withNodes=True)
A.pretty(sign2, withNodes=True)
Click the links below sign
and you are taken to the CDLI page for this tablet.
If we want to enlarge the sign, we can call it up with the lineart function.
N.B.
For concepts that span one or more transliteration lines,
such as tablet, face, column, line, case, comment, you can get the source
material by requesting the feature srcLn
, as we have seen before.
For inline concepts, such as clusters, quads, and signs,
there are functions in A.
.
For signs we have:
atfFromSign(sign, flags=False)
Returns the ATF representation for a sign, including primes, repeats, variants, modifiers, and, optionally, flags.
The unaugmented transliteration of a single sign can be obtained from the feature grapheme:
print(F.grapheme.v(sign1))
print(F.grapheme.v(sign2))
GI4 N04
Let's pretty-print the line in which sign2
occurs:
A.pretty(L.u(sign2, otype="line")[0])
Now we are using something we learned before: we want all signs with exactly
the grapheme GU7
, regardless of augments or flags:
gu7s = F.grapheme.s("GU7")
len(gu7s)
314
Or, with a search template:
results = A.search(
"""
sign grapheme=GU7
"""
)
0.05s 314 results
table()
and show()
¶The simplest way to show the results is with A.table()
for a compact tabular view, or with
A.show()
with a full context view.
We show a tabular view of 3 occurrences, including node numbers. The show view can be quite unwieldy, so we show a only 3 tablets.
A.table(results, withNodes=True, end=3)
n | p | sign |
---|---|---|
1 | P001705 obverse:3:1 | 2456 GU7 |
2 | P001719 obverse:1:1 | 2660 GU7 |
3 | P001951 obverse:3:2 | 3883 GU7 |
A.show(results, end=3, showGraphics=False)
result 1
result 2
result 3
There are a few hundred occurrences, we show a bit more context for them, like we did before. We show the full grapheme, with all its augments, and flags. We also show the full source line.
for g in gu7s[0:10]:
t = L.u(g, otype="tablet")[0]
cl = A.lineFromNode(g)
pNum = T.sectionFromNode(t)[0]
gRep = A.atfFromSign(g, flags=True)
line = F.srcLn.v(cl)
print(f"{gRep:<7} {pNum} {line}")
GU7#? P001705 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] GU7#? P001719 1. [...] 1(N14)# 2(N01)# , GU7#? [...] GU7 P001951 2. , GU7 GU7# P002002 1. , [...] EN~a# |SILA3~axSZE~a@t|# X [...] GU7# GU7# P002035 2. , GU7# GU7# P002062 1. [...] , [...] GU7# GU7 P002100 1. , [...] GU7 GU7# P002370 1. 1(N14) 1(N01) , GU7# [...] GU7 P002510 1. 1(N34) 1(N14) 1(N01) , GIR2~a GU7 GU7 P002524 2. , GU7
We can make it more user friendly: we can link each occurrence to its page on CDLI, and put everything in a Markdown table.
We have a function to generate the link: A.cdli()
.
We build a markdown table.
We write a function for this, because we want to do it again.
First we use the function to write the first 10 to the screen, and then to write the whole set to a directory on your file system.
def showSigns(signs, amount=None):
markdown = dedent("""\
sign | tablet | line
---- | ------ | ----\
""").strip()
for g in signs if amount is None else signs[0:amount]:
t = L.u(g, otype="tablet")[0]
cl = A.lineFromNode(g)
gRep = A.atfFromSign(g, flags=True)
line = F.srcLn.v(cl).replace("|", "|")
markdown += f"\n{gRep} | {A.cdli(t, asString=True)} | {line}"
markdown += "\n"
return markdown
Markdown(showSigns(gu7s, 3))
A bit more please ...
Markdown(showSigns(gu7s, 10))
sign | tablet | line |
---|---|---|
GU7#? | P001705 | 1. 2(N47)# 6(N20)# 5(N05)#? 1(N42~a)# 1(N25)# 1(N28~c)#? , GU7#? [...] |
GU7#? | P001719 | 1. [...] 1(N14)# 2(N01)# , GU7#? [...] |
GU7 | P001951 | 2. , GU7 |
GU7# | P002002 | 1. , [...] EN~a# |SILA3~axSZE~a@t|# X [...] GU7# |
GU7# | P002035 | 2. , GU7# |
GU7# | P002062 | 1. [...] , [...] GU7# |
GU7 | P002100 | 1. , [...] GU7 |
GU7# | P002370 | 1. 1(N14) 1(N01) , GU7# [...] |
GU7 | P002510 | 1. 1(N34) 1(N14) 1(N01) , GIR2~a GU7 |
GU7 | P002524 | 2. , GU7 |
We give you the whole list, in a Markdown file, on your local system.
if not os.path.exists(A.tempDir):
os.makedirs(A.tempDir, exist_ok=True)
with open(f"{A.tempDir}/gu7.md", "w") as fh:
fh.write(showSigns(gu7s))
print(f"data written to file {A.tempDir}/gu7.md")
data written to file /Users/me/text-fabric-data/github/Nino-cunei/uruk/_temp/gu7.md
Have a look!
Tip: Open the file in Atom. Switch to preview by Ctr+Shift+M (in Atom).
Again, the tablet links are clickable, and bring you straight to CDLI.
We use a bit more power of Text-Fabric by generating frequency lists.
We just studied the GU7
grapheme a bit.
Suppose we want to get an overview of all graphemes?
There is a generic Text-Fabric function to give us that. For each feature you can call up a frequency list of its values.
graphemes = F.grapheme.freqList()
len(graphemes)
632
We show the top-20:
graphemes[0:20]
(('…', 29413), ('N01', 21645), ('', 12440), ('X', 6870), ('N14', 5898), ('EN', 1950), ('N34', 1831), ('N57', 1826), ('SZE', 1334), ('GAL', 1180), ('DUG', 1084), ('U4', 1023), ('AN', 1020), ('PAP', 876), ('SAL', 876), ('NUN', 870), ('E2', 854), ('GI', 850), ('BA', 781), ('SANGA', 733))
N.B.:
('', 12440),
These have been inserted by the conversion
to Text-Fabric inside comments, in order to link them to the tablets....
in ATF, usually within an uncertainty
cluster [...]
We can quickly get an overview of all kinds of augments: primes ,variants, modifiers, flags.
The prime is a feature with values: 2, 1 or 0. The number indicates the number of primes. Below you see how often that occurs. Note that we count all primes here: on signs, case numbers and column numbers.
for (value, frequency) in F.prime.freqList():
print(f"{frequency:>5} x {value}")
5164 x 1 1 x 2
The variant or allograph is what occurs after the grapheme and after the ~
symbol, which should be digits and/or
lowercase letters except the x
.
Here is the frequency list of variant values.
for (value, frequency) in F.variant.freqList():
print(f"{frequency:>5} x {value}")
23162 x a 3994 x b 1505 x c 1308 x a1 689 x b1 188 x a2 183 x d 125 x b2 85 x f 72 x a3 42 x e 29 x c2 22 x c1 22 x c3 14 x c5 12 x a0 12 x b3 12 x d1 12 x v 11 x c4 6 x a4 6 x g 5 x d2 4 x d4 4 x h 2 x 3a 2 x d3 1 x h2
The modifier is what occurs after the grapheme and after the @
symbol
It consists of digits and/or
lowercase letters except the x
.
Sometimes modifiers occur inside a repeat, then we have stored the modifier in the feature
modifierInner
, as in
7(N34@f)
Here is the frequency list of modifier and modifierInner
values.
for (value, frequency) in F.modifier.freqList():
print(f"{frequency:>5} x {value}")
634 x g 262 x t 35 x n 6 x r 4 x s 1 x c 1 x h
for (value, frequency) in F.modifierInner.freqList():
print(f"{frequency:>5} x {value}")
25 x f 1 x r 1 x v
We make a frequency list of all full signs, i.e. the grapheme including variant, modifier, and prime. We show them as they appear in transcriptions.
We only deal with instances which are not contained in a quad.
This is no longer the frequency distribution of the values of a single feature, so we have to do the coding ourselves.
fullGraphemes = collections.Counter()
for n in F.otype.s("sign"):
grapheme = F.grapheme.v(n)
if grapheme == "" or grapheme == "…" or grapheme == "X":
continue
fullGrapheme = A.atfFromSign(n)
fullGraphemes[fullGrapheme] += 1
len(fullGraphemes)
1476
Or with a query:
query = """
sign type=ideograph|numeral
"""
fullGraphemesQ = {A.atfFromSign(r[0]) for r in A.search(query, silent=True)}
len(fullGraphemesQ)
1476
There! We have counted all incarnations of full graphemes, and there are 1476 distinct ones.
We show the top-20, sorted by frequency.
We specify a key
function, that given an (value, amount) pair returns
(-amount, value).
This determines the order after sorting. Signs with a high value of amount come
before signs with a low value.
for (value, frequency) in sorted(
fullGraphemes.items(),
key=lambda x: (-x[1], x[0]),
)[0:20]:
print(f"{frequency:>5} x {value}")
12983 x 1(N01) 3080 x 2(N01) 2584 x 1(N14) 1830 x EN~a 1598 x 3(N01) 1357 x 2(N14) 1294 x 5(N01) 1294 x SZE~a 1164 x GAL~a 1117 x 4(N01) 1022 x U4 1020 x AN 999 x 1(N34) 876 x SAL 851 x PAP~a 849 x GI 791 x 3(N14) 789 x 1(N57) 781 x BA 719 x NUN~a
We also want to write the results to files in your _temp
directory, within this repo.
writeFreqs
writes distribution data of data items called dataName
to a file fileName.txt
.
In fact, it writes two files:
fileName-alpha.txt
, ordered by data itemsfileName-freq.txt
, ordered by frequency.def writeFreqs(fileName, data, dataName):
print(f"There are {len(data)} {dataName}s")
for (sortName, sortKey) in (
("alpha", lambda x: (x[0], -x[1])),
("freq", lambda x: (-x[1], x[0])),
):
with open(f"{A.tempDir}/{fileName}-{sortName}.txt", "w") as fh:
for (item, freq) in sorted(data, key=sortKey):
if item != "":
fh.write(f"{freq:>5} x {item}\n")
writeFreqs("grapheme-plain", F.grapheme.freqList(), "bare grapheme")
There are 632 bare graphemes
writeFreqs("grapheme-full", fullGraphemes.items(), "full grapheme")
There are 1476 full graphemes
Now have a look at your {{A.tempDir}} and you see two generated files:
graphemes-plain-alpha.txt
(sorted by grapheme)graphemes-plain-freq.txt
(sorted by frequency)graphemes-full-alpha.txt
(sorted by grapheme)graphemes-full-freq.txt
(sorted by frequency)