We just want to grasp what the corpus is about and how we can find our way in the data.
Open a terminal or command prompt and say one of the following
text-fabric oldbabylonian
Wait and see a lot happening before your browser starts up and shows you an interface on the corpus:
Text-Fabric needs an app to deal with the corpus-specific things. It downloads/finds/caches the latest version of the app:
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-oldbabylonian/code:
rv0.2=#4bb2530bfb94dc93601f8b3df7722cb0e5df7a43 (latest release)
It downloads/finds/caches the latest version of the data:
Using data in /Users/dirk/text-fabric-data/Nino-cunei/oldbabylonian/tf/1.0.4:
rv1.4=#43c36d148794e3feeb3dd39e105ce6a4df79c467 (latest release)
The data is preprocessed in order to speed up typical Text-Fabric operations. The result is cached on your computer. Preprocessing costs time. Next time you use this corpus on this machine, the startup time is much quicker.
TF setup done.
Then the app goes on to act as a local webserver serving the corpus that has just been downloaded and it will open your browser for you and load the corpus page
* Running on http://localhost:8106/ (Press CTRL+C to quit)
Opening oldbabylonian in browser
Listening at port 18986
Indeed, that is what you need. Click the vertical Help
tab.
From there, click around a little bit. Don't read closely, just note the kinds of information that is presented to you.
Later on, it will make more sense!
First we browse our data. Click the browse button.
and then, in the table of documents (tablets), click on obverse
Now you're looking at one side of tablet: the marks in an ASCII transcription.
Now click the Options tab and select the layout-orig-unicode
format to see the same tablet in cuneiform signs.
You can click a triangle to see how a line is broken down:
See that line, starting with the word um-ma
, and whose last word ends in the sign ma
?
That is a pattern. Let's search for it.
Enter this query in the search pad and press the search icon above it.
line
=: word
=: sign reading=um
<: sign reading=ma
:=
< sign reading=ma
:=
In English:
search all line
s that contain a word
and a sign
where:
=:
the word
starts where the line
startsword
contains a sign
and a sign
where:=:
the first sign
starts where the word
starts<:
the second sign follows the first sign immediately:=
the second sign ends where the word ends<
the sign
comes after the word:=
the sign
ends where the line endsYou can expand results by clicking the triangle.
You can see the result in context by clicking the browse icon.
You can go back to the result list by clicking the results icon.
We see that this line comes at the start of a tablet.
In fact, this pattern corresponds to a heading of a letter.
Question: of all 1274 results, how many are the first line, the second line, the third line, etc?
This is a typical question where you want to leave the search mode and enter computing mode.
Let's do that!
If you have followed the installation instructions, you are nearly set.
Open your terminal and say
jupyter notebook
Your browser starts up and presents you a local computing environment where you can run Python programs.
You see cells like the one below, where you can type programming statements and execute them by pressing Shift Enter
.
First we load the Text-Fabric module, as follows:
from tf.app import use
Now we load the TF-app for the corpus oldbabylonian
and that app loads the corpus data.
We give a name to the result of all that loading: A
.
A = use('Nino-cunei/oldbabylonian', hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots / node | % coverage |
---|---|---|---|
document | 1285 | 158.15 | 100 |
face | 2834 | 71.71 | 100 |
line | 27375 | 7.42 | 100 |
word | 76505 | 2.64 | 100 |
cluster | 23449 | 1.78 | 21 |
sign | 203219 | 1.00 | 100 |
3
Nino-cunei/oldbabylonian
/Users/me/text-fabric-data/github/Nino-cunei/oldbabylonian/app
g00c996ce164f4a1dbb6c6c39aee06075d1f70a82
.pnum {
font-family: sans-serif;
font-size: small;
font-weight: bold;
color: #444444;
}
.op {
padding: 0.5em 0.1em 0.1em 0.1em;
margin: 0.8em 0.1em 0.1em 0.1em;
font-family: monospace;
font-size: x-large;
font-weight: bold;
}
.period {
font-family: monospace;
font-size: medium;
font-weight: bold;
color: #0000bb;
}
.comment {
color: #7777dd;
font-family: monospace;
font-size: small;
}
.operator {
color: #ff77ff;
font-size: large;
}
/* LANGUAGE: superscript and subscript */
/* cluster */
.det {
vertical-align: super;
}
/* cluster */
.langalt {
vertical-align: sub;
}
/* REDACTIONAL: line over or under */
/* flag */
.collated {
font-weight: bold;
text-decoration: underline;
}
/* cluster */
.excised {
color: #dd0000;
text-decoration: line-through;
}
/* cluster */
.supplied {
color: #0000ff;
text-decoration: overline;
}
/* flag */
.remarkable {
font-weight: bold;
text-decoration: overline;
}
/* UNSURE: italic*/
/* cluster */
.uncertain {
font-style: italic
}
/* flag */
.question {
font-weight: bold;
font-style: italic
}
/* BROKEN: text-shadow */
/* cluster */
.missing {
color: #999999;
text-shadow: #bbbbbb 1px 1px;
}
/* flag */
.damage {
font-weight: bold;
color: #999999;
text-shadow: #bbbbbb 1px 1px;
}
.empty {
color: #ff0000;
}
True
layoutRich
trans
layoutUnicode
orig
source
}trans
}trans
}orig
}mapping from readings to UNICODE
https://nbviewer.jupyter.org/github/Nino-cunei/tfFromAtf/blob/master/programs/mapReadings.ipynb
about
https://github.com/Nino-cunei/tfFromAtf/blob/master/docs/transcription{docExt}
''
0
}True
local
/Users/me/text-fabric-data/github/Nino-cunei/oldbabylonian/_temp
Old Babylonian Letters 1900-1600: Cuneiform tablets
10.5281/zenodo.2579207
Nino-cunei
/tf
oldbabylonian
1.0.6
https://cdli.ucla.edu
Show this document on CDLI
{webBase}/search/search_results.php?SearchMode=Text&ObjectID=<1>
v1.6
{type}
0
collection volume docnumber docnote
srcLnNum
object
srcLnNum
remarks translation@en
srcLnNum
collated remarkable question damage det uncertain missing excised supplied langalt comment remarks repeat fraction operator grapheme
True
True
0
akk
Some bits are familiar from above, when you ran the text-fabric
command in the terminal.
Other bits are links to the documentation, they point to the same places as the links on the Text-Fabric browser.
You see a list of all the data features that have been loaded.
And a list of references to the API documentation, which tells you how you can use this data in your program statements.
We do the same search again, but now inside our program.
That means that we can capture the results in a list for further processing.
results = A.search('''
line
=: word
=: sign reading=um
<: sign reading=ma
:=
< sign reading=ma
:=
''')
0.29s 1274 results
In less than a second, we have all the results!
Let's look at the first one:
results[0]
(230790, 258166, 11, 12, 20)
Each result is a list of numbers: for a
Here is the second one:
results[1]
(230826, 258317, 359, 360, 366)
And here the last one:
results[-1]
(258128, 334552, 202886, 202887, 202894)
Now we want to find out something for each result line: which line number does it have among the lines on the same tablet face?
Click the link Feature docs
above, and read a bit under Node type line.
There you see that the feature ln
is of particular interest to us.
First we get the line number of result 1000:
node = results[999][0]
print(node)
lineNumber = F.ln.v(node)
print(lineNumber)
252681 3
Now we collect the set of all line numbers that our result lines have:
{F.ln.v(result[0]) for result in results}
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 31}
What we really want to know is how the result lines are distributed over the line numbers.
import collections
distribution = collections.Counter()
for result in results:
lineNumber = F.ln.v(result[0])
distribution[lineNumber] += 1
print(distribution)
Counter({3: 834, 2: 110, 4: 102, 6: 42, 7: 37, 5: 33, 8: 31, 9: 16, 10: 13, 1: 11, 12: 9, 11: 8, 13: 8, 16: 5, 15: 4, 14: 4, 20: 2, 17: 2, 31: 1, 19: 1, 21: 1})
An overwhelming majority has it on line 3
Let's make the output a bit more friendly:
for (lineNumber, amount) in sorted(distribution.items()):
print(f'line {lineNumber:>2} is home to {amount:>3} results')
line 1 is home to 11 results line 2 is home to 110 results line 3 is home to 834 results line 4 is home to 102 results line 5 is home to 33 results line 6 is home to 42 results line 7 is home to 37 results line 8 is home to 31 results line 9 is home to 16 results line 10 is home to 13 results line 11 is home to 8 results line 12 is home to 9 results line 13 is home to 8 results line 14 is home to 4 results line 15 is home to 4 results line 16 is home to 5 results line 17 is home to 2 results line 19 is home to 1 results line 20 is home to 2 results line 21 is home to 1 results line 31 is home to 1 results
We can now inspect more closely what is going on, for example where results appear late in the tablet, after line 16:
results16 = A.search('''
line ln>16
=: word
=: sign reading=um
<: sign reading=ma
:=
< sign reading=ma
:=
''')
0.28s 7 results
And we can show them here too:
A.table(results16)
n | p | line | word | sign | sign | sign |
---|---|---|---|---|---|---|
1 | P365130 obverse:20 | um-ma a-ma-na-nu-um-ma | um-ma | um- | ma | ma |
2 | P479269 obverse:20 | um-ma szu-ma | um-ma | um- | ma | ma |
3 | P479269 obverse:31 | um-ma szu-ma | um-ma | um- | ma | ma |
4 | P387306 obverse:19 | um-ma at-ta-a-ma | um-ma | um- | ma | ma |
5 | P387324 obverse:17 | um-ma at!-ta-ma# | um-ma | um- | ma | ma# |
6 | P372422 obverse:17 | um-ma _sag-geme2_-ma | um-ma | um- | ma | ma |
7 | P372422 obverse:21 | um-ma szu-u2-ma | um-ma | um- | ma | ma |
But at this point it might be easier to take the new query back to the Text-Fabric browser and query it there: