To get started: consult start
In a notebook, you can perform searches and view them in a tabular display and zoom in on items with pretty displays.
But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just work with them in other programs, such as Excel.
You want to do that not only with query results, but with all kinds of lists of tuples of nodes.
There is a function for that, A.export()
, and here we show what it can do.
%load_ext autoreload
%autoreload 2
import os
from tf.app import use
A = use("Nino-cunei/oldbabylonian", hoist=globals())
This is Text-Fabric 9.2.2 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 67 features found and 0 ignored
We write a function that can peek into file on your system, and show the first few lines. We'll use it to inspect the exported files that we are going to produce.
EXPORT_FILE = os.path.expanduser("~/Downloads/results.tsv")
UPTO = 10
def checkout():
with open(EXPORT_FILE, encoding="utf_16") as fh:
for (i, line) in enumerate(fh):
if i >= UPTO:
break
print(line.rstrip("\n"))
Our exported .tsv
files open in Excel without hassle, even if they contain non-latin characters.
That is because TF writes such files in an
encoding that works well with Excel: utf_16_le
.
You can just open them in Excel, there is no need for conversion before or after opening these files.
Should you want to process these files by means of a (Python) program,
take care to read them with encoding utf_16
.
We first run a query in order to export the results.
query = """
line ln<4
=: sign reading=um
<: sign reading=ma
< sign reading=ma
"""
results = A.search(query)
0.45s 1052 results
You can export the table of results to Excel.
The following command writes a tab-separated file results.tsv
to your downloads directory.
You can specify arguments toDir=directory
and toFile=file name
to write to a different file.
If the directory does not exist, it will be created.
We stick to the default, however.
A.export(results)
Check out the contents:
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P509373 obverse 3 230790 line um-ma _{d}en-lil2_-sza-du-u2-ni-ma 3 11 sign um- um 12 sign ma ma 20 sign ma ma 2 P509374 obverse 3 230826 line um-[ma] _dingir_ x zi x [...] ma 3 359 sign um- um 360 sign [ma] ma 366 sign ma ma 3 P509375 obverse 3 230844 line [um-ma x]-na-su-[x]-ma 3 509 sign [um- um 510 sign ma ma 515 sign ma ma 4 P481190 obverse 3 230916 line um-ma nu#-ur2#-i3-li2-szu-ma 3 1123 sign um- um 1124 sign ma ma 1130 sign ma ma 5 P510526 obverse 3 231004 line um-ma geme2-{d}utu-ma 3 1686 sign um- um 1687 sign ma ma 1691 sign ma ma 6 P510527 obverse 3 231022 line um-ma i-lu-ni-ma 3 1808 sign um- um 1809 sign ma ma 1813 sign ma ma 7 P510528 obverse 3 231055 line [um]-ma# i-lu-ni-ma 3 2161 sign [um]- um 2162 sign ma# ma 2166 sign ma ma 8 P510529 obverse 3 231068 line um-ma i-lu-ni-ma 3 2272 sign um- um 2273 sign ma ma 2277 sign ma ma 9 P510530 obverse 3 231088 line [um-ma {d}na-bi]-um#-na-s,i-ir-ma 3 2401 sign [um- um 2402 sign ma ma 2410 sign ma ma
You see the following columns:
R
the sequence number of the result tuple in the result listS1 S2 S3
the section as P-number, face, line number, in separate columnsNODEi TYPEi
the node and its type, for each node i
in the result tupleTEXTi
the full text of node i
, if the node type admits a concise text representationreading2-4
the value of feature reading
, since our query mentions the feature reading
on nodes 2-4If you do not need the full text of the lines, you can leave them out by specifying a smaller condense type.
The export function provides text for all nodes whose type is not too big. What is too big is determined by the condense type.
In this corpus, the default condense type is line. Node types bigger than lines will not get text.
Now, if we change the condense type to something smaller than line, e.g. word
, the line text will be suppressed.
A.export(results, condenseType="word")
checkout()
R S1 S2 S3 NODE1 TYPE1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P509373 obverse 3 230790 line 3 11 sign um- um 12 sign ma ma 20 sign ma ma 2 P509374 obverse 3 230826 line 3 359 sign um- um 360 sign [ma] ma 366 sign ma ma 3 P509375 obverse 3 230844 line 3 509 sign [um- um 510 sign ma ma 515 sign ma ma 4 P481190 obverse 3 230916 line 3 1123 sign um- um 1124 sign ma ma 1130 sign ma ma 5 P510526 obverse 3 231004 line 3 1686 sign um- um 1687 sign ma ma 1691 sign ma ma 6 P510527 obverse 3 231022 line 3 1808 sign um- um 1809 sign ma ma 1813 sign ma ma 7 P510528 obverse 3 231055 line 3 2161 sign [um]- um 2162 sign ma# ma 2166 sign ma ma 8 P510529 obverse 3 231068 line 3 2272 sign um- um 2273 sign ma ma 2277 sign ma ma 9 P510530 obverse 3 231088 line 3 2401 sign [um- um 2402 sign ma ma 2410 sign ma ma
If we want to see the text in another format, we can specify it:
A.export(results, fmt="text-orig-unicode")
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P509373 obverse 3 230790 line ๐๐ ๐ญ๐๐ค๐ญ๐บ๐๐๐ 3 11 sign ๐ um 12 sign ๐ ma 20 sign ๐ ma 2 P509374 obverse 3 230826 line ๐๐ ๐ญ x ๐ฃ x โฆ ๐ 3 359 sign ๐ um 360 sign ๐ ma 366 sign ๐ ma 3 P509375 obverse 3 230844 line ๐๐ x๐พ๐ขx๐ 3 509 sign ๐ um 510 sign ๐ ma 515 sign ๐ ma 4 P481190 obverse 3 230916 line ๐๐ ๐ก๐ซ๐๐๐๐ 3 1123 sign ๐ um 1124 sign ๐ ma 1130 sign ๐ ma 5 P510526 obverse 3 231004 line ๐๐ ๐ฉ๐ณ๐ญ๐๐ 3 1686 sign ๐ um 1687 sign ๐ ma 1691 sign ๐ ma 6 P510527 obverse 3 231022 line ๐๐ ๐ฟ๐ป๐๐ 3 1808 sign ๐ um 1809 sign ๐ ma 1813 sign ๐ ma 7 P510528 obverse 3 231055 line ๐๐ ๐ฟ๐ป๐๐ 3 2161 sign ๐ um 2162 sign ๐ ma 2166 sign ๐ ma 8 P510529 obverse 3 231068 line ๐๐ ๐ฟ๐ป๐๐ 3 2272 sign ๐ um 2273 sign ๐ ma 2277 sign ๐ ma 9 P510530 obverse 3 231088 line ๐๐ ๐ญ๐พ๐๐๐พ๐ข๐ ๐ 3 2401 sign ๐ um 2402 sign ๐ ma 2410 sign ๐ ma
If we want to export additional features, we just have to mention them.
In order to do so and not change the result set, put a *
behind the feature.
The *
means: always true, no matter what's in the feature, even if there is nothing in there.
Lets ask for the flags on the first ma
.
query = """
line ln<4
=: sign reading=um
<: sign reading=ma flags*
< sign reading=ma
"""
results = A.search(query)
0.54s 1052 results
The same number of results.
We do the export again and peek at the results.
A.export(results, condenseType="word")
checkout()
R S1 S2 S3 NODE1 TYPE1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 flags3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P509373 obverse 3 230790 line 3 11 sign um- um 12 sign ma ma 20 sign ma ma 2 P509374 obverse 3 230826 line 3 359 sign um- um 360 sign [ma] ma 366 sign ma ma 3 P509375 obverse 3 230844 line 3 509 sign [um- um 510 sign ma ma 515 sign ma ma 4 P481190 obverse 3 230916 line 3 1123 sign um- um 1124 sign ma ma 1130 sign ma ma 5 P510526 obverse 3 231004 line 3 1686 sign um- um 1687 sign ma ma 1691 sign ma ma 6 P510527 obverse 3 231022 line 3 1808 sign um- um 1809 sign ma ma 1813 sign ma ma 7 P510528 obverse 3 231055 line 3 2161 sign [um]- um 2162 sign ma# # ma 2166 sign ma ma 8 P510529 obverse 3 231068 line 3 2272 sign um- um 2273 sign ma ma 2277 sign ma ma 9 P510530 obverse 3 231088 line 3 2401 sign [um- um 2402 sign ma ma 2410 sign ma ma
As you see, you have an extra column flags3
.
This gives you a lot of control over the generation of spreadsheets.
You can also export lists of node tuples that are not obtained by a query:
words = F.otype.s("word")[1000:1010]
signs1 = F.otype.s("sign")[100000:100010]
signs2 = F.otype.s("sign")[200000:200010]
tuples = list(zip(words, signs1, signs2))
tuples
[(259163, 100001, 200001), (259164, 100002, 200002), (259165, 100003, 200003), (259166, 100004, 200004), (259167, 100005, 200005), (259168, 100006, 200006), (259169, 100007, 200007), (259170, 100008, 200008), (259171, 100009, 200009), (259172, 100010, 200010)]
Ten rows, each row has a word node and two sign nodes.
The word and the signs in each row do not have any meaningful relationship!
Let's do a bare export:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 flags3 reading3 1 P510530 obverse 19 259163 word gin2 100001 sign s,i- s,i 200001 sign ka ka 2 P510530 obverse 19 259164 word ku3-babbar_ 100002 sign bu- bu 200002 sign ki- ki 3 P510530 obverse 20 259165 word id-di-nu-ni-ik-ku 100003 sign tam tam 200003 sign ma ma 4 P510530 reverse 1 259166 word u3 100004 sign ia ia 200004 sign ti- ti 5 P510530 reverse 1 259167 word ka-ni-ik 100005 sign ir- ir 200005 sign du du 6 P510530 reverse 1 259168 word szi-ma-tim 100006 sign szi szi 200006 sign ma- ma 7 P510530 reverse 1 259169 word ta-ad-di-in 100007 sign i- i 200007 sign WI?- ? 8 P510530 reverse 2 259170 word mi-nu-u2 100008 sign nu- nu 200008 sign i- i 9 P510530 reverse 2 259171 word a-wa-tum-ma 100009 sign ma ma 200009 sign ma ma
Wait a minute: why are the reading2
and reading3
and flags3
columns showing up?
It is because we have run a query before where we asked for these features.
If we do not want to be influenced by previous things we've run, we need to reset the display:
A.displayReset("tupleFeatures")
Again:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 NODE3 TYPE3 TEXT3 1 P510530 obverse 19 259163 word gin2 100001 sign s,i- 200001 sign ka 2 P510530 obverse 19 259164 word ku3-babbar_ 100002 sign bu- 200002 sign ki- 3 P510530 obverse 20 259165 word id-di-nu-ni-ik-ku 100003 sign tam 200003 sign ma 4 P510530 reverse 1 259166 word u3 100004 sign ia 200004 sign ti- 5 P510530 reverse 1 259167 word ka-ni-ik 100005 sign ir- 200005 sign du 6 P510530 reverse 1 259168 word szi-ma-tim 100006 sign szi 200006 sign ma- 7 P510530 reverse 1 259169 word ta-ad-di-in 100007 sign i- 200007 sign WI?- 8 P510530 reverse 2 259170 word mi-nu-u2 100008 sign nu- 200008 sign i- 9 P510530 reverse 2 259171 word a-wa-tum-ma 100009 sign ma 200009 sign ma
When we exported query results, we could mention features in the query with a *
so that they got exported.
If we do not have a previous query we can achieve the same effect by specifying the desired export features per column.
The display option tupleFeatures
takes care of that.
A.displaySetup(
tupleFeatures=(
(0, "atfpre atf atfpost"),
(1, "symr symu"),
(2, "symr symu"),
)
)
We assign extra features per member of the tuple.
In the above case:
0
) member (the word node), gets features atfpre
(preceding bracketing characters), atf
, atfpost
(subsequent clustering characters);symr
(subscript latin text) and symu
(cuneiform unicode).A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 atfpre1 atf1 atfpost1 NODE2 TYPE2 TEXT2 symr2 symu2 NODE3 TYPE3 TEXT3 symr3 symu3 1 P510530 obverse 19 259163 word gin2 gin2 100001 sign s,i- แนฃi ๐ข 200001 sign ka ka ๐ 2 P510530 obverse 19 259164 word ku3-babbar_ ku3-babbar_ 100002 sign bu- bu ๐ 200002 sign ki- ki ๐ 3 P510530 obverse 20 259165 word id-di-nu-ni-ik-ku id-di-nu-ni-ik-ku 100003 sign tam tam ๐ 200003 sign ma ma ๐ 4 P510530 reverse 1 259166 word u3 u3 100004 sign ia ia ๐ 200004 sign ti- ti ๐พ 5 P510530 reverse 1 259167 word ka-ni-ik ka-ni-ik 100005 sign ir- ir ๐ 200005 sign du du ๐บ 6 P510530 reverse 1 259168 word szi-ma-tim szi-ma-tim 100006 sign szi ลกi ๐ 200006 sign ma- ma ๐ 7 P510530 reverse 1 259169 word ta-ad-di-in ta-ad-di-in 100007 sign i- i ๐ฟ 200007 sign WI?- WI ๐ฟ 8 P510530 reverse 2 259170 word mi-nu-u2 mi-nu-u2 100008 sign nu- nu ๐ก 200008 sign i- i ๐ฟ 9 P510530 reverse 2 259171 word a-wa-tum-ma a-wa-tum-ma 100009 sign ma ma ๐ 200009 sign ma ma ๐
You can chain queries like this:
results = (
A.search(
"""
line
cluster type=missing
sign reading=disz repeat>4
"""
)
+ A.search(
"""
line
cluster type=langalt
sign reading=um
"""
)
)
0.17s 4 results 0.17s 15 results
In such cases, it is better to setup the features yourself:
A.displaySetup(
tupleFeatures=(
(0, "lnno"),
(1, "type"),
(2, "langalt missing"),
),
fmt="text-orig-rich",
)
Now we can do a fine export:
A.export(results)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 lnno1 NODE2 TYPE2 TEXT2 type2 NODE3 TYPE3 TEXT3 langalt3 missing3 1 P510754 obverse 11 235994 line 5(diลก) luโ-meลก an-nu-ti-in luโ la-ga-aลกโผki-meลก 11 207965 cluster 5(diลก) luโ-meลก an-nu-ti- missing 39275 sign 5(diลก) 1 2 P313375 obverse 10 245175 line 1(u) 8(diลก) ginโ kuโ-babbar wa-at-ra-am 10 215367 cluster 1(u) 8(diลก) missing 106850 sign 8(diลก) 1 3 P373035 reverse 3 255006 line ลกa 5(diลก) ginโ kuโ-babbar guโ hi-a 3 223992 cluster ลกa 5(diลก) missing 180799 sign 5(diลก) 1 1 4 P373044 left 1:2 255237 line 1(u) 5(diลก) ma-na siki hi-a ลกi-na 1:2 224274 cluster 1(u) 5(diลก) ma-na missing 182906 sign 5(diลก) 1 1 5 P509373 obverse 6 230793 line diลกโผลกe-ep-dโผsuen aโ-galโ dumu um-mi-a-meลก 6 203234 cluster dโผsuen aโ-galโ dumu um-mi-a-meลก langalt 47 sign um- 1 6 P510550 obverse 4 231520 line aลก-ลกum ha-za-nu-umโผsar 4 203984 cluster ha-za-nu-umโผsar langalt 5899 sign umโผ 1 7 P510550 obverse 7 231523 line ลกum-ma ha-za-nu-umโผsar la ba-aลก-lu-ma 7 203987 cluster ha-za-nu-umโผsar langalt 5924 sign umโผ 1 8 P510550 obverse 9 231525 line ha-za-nu-umโผsar i-ka-am x x-al-la-x-ma 9 203989 cluster ha-za-nu-umโผsar langalt 5941 sign umโผ 1 9 P510550 reverse 1 231528 line a-na mi-nim ha-za-nu-umโผsar 1 203992 cluster ha-za-nu-umโผsar langalt 5971 sign umโผ 1
All chapters:
See the cookbook for recipes for small, concrete tasks.
CC-BY Dirk Roorda