To get started: consult start
In a notebook, you can perform searches and view them in a tabular display and zoom in on items with pretty displays.
But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just work with them in other programs, such as Excel.
You want to do that not only with query results, but with all kinds of lists of tuples of nodes.
There is a function for that, A.export()
, and here we show what it can do.
%load_ext autoreload
%autoreload 2
import os
from tf.app import use
A = use("Nino-cunei/oldassyrian", hoist=globals())
This is Text-Fabric 9.2.2 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 67 features found and 0 ignored
We write a function that can peek into file on your system, and show the first few lines. We'll use it to inspect the exported files that we are going to produce.
EXPORT_FILE = os.path.expanduser("~/Downloads/results.tsv")
UPTO = 10
def checkout():
with open(EXPORT_FILE, encoding="utf_16") as fh:
for (i, line) in enumerate(fh):
if i >= UPTO:
break
print(line.rstrip("\n"))
Our exported .tsv
files open in Excel without hassle, even if they contain non-latin characters.
That is because TF writes such files in an
encoding that works well with Excel: utf_16_le
.
You can just open them in Excel, there is no need for conversion before or after opening these files.
Should you want to process these files by means of a (Python) program,
take care to read them with encoding utf_16
.
We first run a query in order to export the results.
query = """
line ln<4
=: sign reading=um
<: sign reading=ma
< sign reading=ma
"""
results = A.search(query)
1.66s 1384 results
You can export the table of results to Excel.
The following command writes a tab-separated file results.tsv
to your downloads directory.
You can specify arguments toDir=directory
and toFile=file name
to write to a different file.
If the directory does not exist, it will be created.
We stick to the default, however.
A.export(results)
Check out the contents:
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P361248 obverse 2 865395 line um-ma _dam-gar3_-ru-tum-ma 2 1001 sign um- um 1002 sign ma ma 1007 sign ma ma 2 P360984 obverse 2 865641 line um-ma wa-la2-wa-la2-ma 2 2636 sign um- um 2637 sign ma ma 2642 sign ma ma 3 P360987 obverse 2 865746 line [um]-ma i-ku-pi2-a-ma 2 3179 sign [um]- um 3180 sign ma ma 3185 sign ma ma 4 P361576 obverse 2 866190 line um-ma szu-su2-en6-ma 2 6070 sign um- um 6071 sign ma ma 6075 sign ma ma 5 P360469 obverse 1 866440 line um-ma ku-lu-ma-a-ma 1 7587 sign um- um 7588 sign ma ma 7591 sign ma- ma 6 P360469 obverse 1 866440 line um-ma ku-lu-ma-a-ma 1 7587 sign um- um 7588 sign ma ma 7593 sign ma ma 7 P360470 obverse 3 866464 line um-ma a-szur3-mu-ta-bi4-il5-ma 3 7763 sign um- um 7764 sign ma ma 7771 sign ma ma 8 P360471 obverse 1 866500 line um-ma e-la-ma-ma# 1 8062 sign um- um 8063 sign ma ma 8066 sign ma- ma 9 P360471 obverse 1 866500 line um-ma e-la-ma-ma# 1 8062 sign um- um 8063 sign ma ma 8067 sign ma# ma
You see the following columns:
R
the sequence number of the result tuple in the result listS1 S2 S3
the section as P-number, face, line number, in separate columnsNODEi TYPEi
the node and its type, for each node i
in the result tupleTEXTi
the full text of node i
, if the node type admits a concise text representationreading2-4
the value of feature reading
, since our query mentions the feature reading
on nodes 2-4If you do not need the full text of the lines, you can leave them out by specifying a smaller condense type.
The export function provides text for all nodes whose type is not too big. What is too big is determined by the condense type.
In this corpus, the default condense type is line. Node types bigger than lines will not get text.
Now, if we change the condense type to something smaller than line, e.g. word
, the line text will be suppressed.
A.export(results, condenseType="word")
checkout()
R S1 S2 S3 NODE1 TYPE1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P361248 obverse 2 865395 line 2 1001 sign um- um 1002 sign ma ma 1007 sign ma ma 2 P360984 obverse 2 865641 line 2 2636 sign um- um 2637 sign ma ma 2642 sign ma ma 3 P360987 obverse 2 865746 line 2 3179 sign [um]- um 3180 sign ma ma 3185 sign ma ma 4 P361576 obverse 2 866190 line 2 6070 sign um- um 6071 sign ma ma 6075 sign ma ma 5 P360469 obverse 1 866440 line 1 7587 sign um- um 7588 sign ma ma 7591 sign ma- ma 6 P360469 obverse 1 866440 line 1 7587 sign um- um 7588 sign ma ma 7593 sign ma ma 7 P360470 obverse 3 866464 line 3 7763 sign um- um 7764 sign ma ma 7771 sign ma ma 8 P360471 obverse 1 866500 line 1 8062 sign um- um 8063 sign ma ma 8066 sign ma- ma 9 P360471 obverse 1 866500 line 1 8062 sign um- um 8063 sign ma ma 8067 sign ma# ma
If we want to see the text in another format, we can specify it:
A.export(results, fmt="text-orig-unicode")
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P361248 obverse 2 865395 line ๐๐ ๐ฎ๐ผ๐๐๐ 2 1001 sign ๐ um 1002 sign ๐ ma 1007 sign ๐ ma 2 P360984 obverse 2 865641 line ๐๐ ๐๐ฒ๐๐ฒ๐ 2 2636 sign ๐ um 2637 sign ๐ ma 2642 sign ๐ ma 3 P360987 obverse 2 865746 line ๐๐ ๐ฟ๐ช๐๐๐ 2 3179 sign ๐ um 3180 sign ๐ ma 3185 sign ๐ ma 4 P361576 obverse 2 866190 line ๐๐ ๐๐ช๐ ๐ 2 6070 sign ๐ um 6071 sign ๐ ma 6075 sign ๐ ma 5 P360469 obverse 1 866440 line ๐๐ ๐ช๐ป๐ ๐๐ 1 7587 sign ๐ um 7588 sign ๐ ma 7591 sign ๐ ma 6 P360469 obverse 1 866440 line ๐๐ ๐ช๐ป๐ ๐๐ 1 7587 sign ๐ um 7588 sign ๐ ma 7593 sign ๐ ma 7 P360470 obverse 3 866464 line ๐๐ ๐๐๐ฌ๐ซ๐๐๐ 3 7763 sign ๐ um 7764 sign ๐ ma 7771 sign ๐ ma 8 P360471 obverse 1 866500 line ๐๐ ๐๐ท๐ ๐ 1 8062 sign ๐ um 8063 sign ๐ ma 8066 sign ๐ ma 9 P360471 obverse 1 866500 line ๐๐ ๐๐ท๐ ๐ 1 8062 sign ๐ um 8063 sign ๐ ma 8067 sign ๐ ma
If we want to export additional features, we just have to mention them.
In order to do so and not change the result set, put a *
behind the feature.
The *
means: always true, no matter what's in the feature, even if there is nothing in there.
Lets ask for the flags on the first ma
.
query = """
line ln<4
=: sign reading=um
<: sign reading=ma flags*
< sign reading=ma
"""
results = A.search(query)
2.08s 1384 results
The same number of results.
We do the export again and peek at the results.
A.export(results, condenseType="word")
checkout()
R S1 S2 S3 NODE1 TYPE1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 flags3 reading3 NODE4 TYPE4 TEXT4 reading4 1 P361248 obverse 2 865395 line 2 1001 sign um- um 1002 sign ma ma 1007 sign ma ma 2 P360984 obverse 2 865641 line 2 2636 sign um- um 2637 sign ma ma 2642 sign ma ma 3 P360987 obverse 2 865746 line 2 3179 sign [um]- um 3180 sign ma ma 3185 sign ma ma 4 P361576 obverse 2 866190 line 2 6070 sign um- um 6071 sign ma ma 6075 sign ma ma 5 P360469 obverse 1 866440 line 1 7587 sign um- um 7588 sign ma ma 7591 sign ma- ma 6 P360469 obverse 1 866440 line 1 7587 sign um- um 7588 sign ma ma 7593 sign ma ma 7 P360470 obverse 3 866464 line 3 7763 sign um- um 7764 sign ma ma 7771 sign ma ma 8 P360471 obverse 1 866500 line 1 8062 sign um- um 8063 sign ma ma 8066 sign ma- ma 9 P360471 obverse 1 866500 line 1 8062 sign um- um 8063 sign ma ma 8067 sign ma# ma
As you see, you have an extra column flags3
.
This gives you a lot of control over the generation of spreadsheets.
You can also export lists of node tuples that are not obtained by a query:
words = F.otype.s("word")[1000:1010]
signs1 = F.otype.s("sign")[100000:100010]
signs2 = F.otype.s("sign")[200000:200010]
tuples = list(zip(words, signs1, signs2))
tuples
[(976132, 100001, 200001), (976133, 100002, 200002), (976134, 100003, 200003), (976135, 100004, 200004), (976136, 100005, 200005), (976137, 100006, 200006), (976138, 100007, 200007), (976139, 100008, 200008), (976140, 100009, 200009), (976141, 100010, 200010)]
Ten rows, each row has a word node and two sign nodes.
The word and the signs in each row do not have any meaningful relationship!
Let's do a bare export:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 ln1 NODE2 TYPE2 TEXT2 reading2 NODE3 TYPE3 TEXT3 flags3 reading3 1 P360982 reverse 3 976132 word szu-pi3-a-ni-ka3 100001 sign num num 200001 sign [a?]- ? a 2 P360982 reverse 3 976133 word lu 100002 sign 1(u) u 200002 sign ta ta 3 P360982 reverse 4 976134 word u2-sze2-szu-bu-szu-nu 100003 sign 5(disz) disz 200003 sign i- i 4 P360982 reverse 5 976135 word _igi_ 100004 sign _gin2 gin2 200004 sign na na 5 P360982 reverse 5 976136 word a-na-na 100005 sign ku3- ku3 200005 sign bu bu 6 P360982 reverse 6 976137 word _igi_ 100006 sign babbar_ babbar 200006 sign x x 7 P360982 reverse 6 976138 word ha-ma-na-ni 100007 sign sza sza 200007 sign [...] 8 P360982 reverse 7 976139 word _igi_ 100008 sign ta- ta 200008 sign a- a 9 P360982 reverse 7 976140 word ut-ni-ih-szu 100009 sign ad- ad 200009 sign na- na
Wait a minute: why are the reading2
and reading3
and flags3
columns showing up?
It is because we have run a query before where we asked for these features.
If we do not want to be influenced by previous things we've run, we need to reset the display:
A.displayReset("tupleFeatures")
Again:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 NODE3 TYPE3 TEXT3 1 P360982 reverse 3 976132 word szu-pi3-a-ni-ka3 100001 sign num 200001 sign [a?]- 2 P360982 reverse 3 976133 word lu 100002 sign 1(u) 200002 sign ta 3 P360982 reverse 4 976134 word u2-sze2-szu-bu-szu-nu 100003 sign 5(disz) 200003 sign i- 4 P360982 reverse 5 976135 word _igi_ 100004 sign _gin2 200004 sign na 5 P360982 reverse 5 976136 word a-na-na 100005 sign ku3- 200005 sign bu 6 P360982 reverse 6 976137 word _igi_ 100006 sign babbar_ 200006 sign x 7 P360982 reverse 6 976138 word ha-ma-na-ni 100007 sign sza 200007 sign [...] 8 P360982 reverse 7 976139 word _igi_ 100008 sign ta- 200008 sign a- 9 P360982 reverse 7 976140 word ut-ni-ih-szu 100009 sign ad- 200009 sign na-
When we exported query results, we could mention features in the query with a *
so that they got exported.
If we do not have a previous query we can achieve the same effect by specifying the desired export features per column.
The display option tupleFeatures
takes care of that.
A.displaySetup(
tupleFeatures=(
(0, "atfpre atf atfpost"),
(1, "symr symu"),
(2, "symr symu"),
)
)
We assign extra features per member of the tuple.
In the above case:
0
) member (the word node), gets features atfpre
(preceding bracketing characters), atf
, atfpost
(subsequent clustering characters);symr
(subscript latin text) and symu
(cuneiform unicode).A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 atfpre1 atf1 atfpost1 NODE2 TYPE2 TEXT2 symr2 symu2 NODE3 TYPE3 TEXT3 symr3 symu3 1 P360982 reverse 3 976132 word szu-pi3-a-ni-ka3 szu-pi3-a-ni-ka3 100001 sign num num ๐ 200001 sign [a?]- a ๐ 2 P360982 reverse 3 976133 word lu lu 100002 sign 1(u) 1(u) ๐ 200002 sign ta ta ๐ซ 3 P360982 reverse 4 976134 word u2-sze2-szu-bu-szu-nu u2-sze2-szu-bu-szu-nu 100003 sign 5(disz) 5(diลก) ๐ 200003 sign i- i ๐ฟ 4 P360982 reverse 5 976135 word _igi_ _igi_ 100004 sign _gin2 ginโ ๐ 200004 sign na na ๐พ 5 P360982 reverse 5 976136 word a-na-na a-na-na 100005 sign ku3- kuโ ๐ฌ 200005 sign bu bu ๐ 6 P360982 reverse 6 976137 word _igi_ _igi_ 100006 sign babbar_ babbar ๐ 200006 sign x x x 7 P360982 reverse 6 976138 word ha-ma-na-ni ha-ma-na-ni 100007 sign sza ลกa ๐ญ 200007 sign [...] โฆ โฆ 8 P360982 reverse 7 976139 word _igi_ _igi_ 100008 sign ta- ta ๐ซ 200008 sign a- a ๐ 9 P360982 reverse 7 976140 word ut-ni-ih-szu ut-ni-ih-szu 100009 sign ad- ad ๐ 200009 sign na- na ๐พ
You can chain queries like this:
results = (
A.search(
"""
line
cluster type=missing
sign reading=disz repeat>4
"""
)
+ A.search(
"""
line
cluster type=langalt
sign reading=um
"""
)
)
0.60s 99 results 0.62s 1 result
In such cases, it is better to setup the features yourself:
A.displaySetup(
tupleFeatures=(
(0, "lnno"),
(1, "type"),
(2, "langalt missing"),
),
fmt="text-orig-rich",
)
Now we can do a fine export:
A.export(results)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 lnno1 NODE2 TYPE2 TEXT2 type2 NODE3 TYPE3 TEXT3 langalt3 missing3 1 P361431 envelope - reverse 9 865864 line 3(aลก) guโ 5(u) 5(diลก) ma-na uruda sagโโ 9 766932 cluster 3(aลก) guโ 5(u) 5(diลก) ma-na missing 3875 sign 5(diลก) 1 2 P360467 envelope - obverse 5 866400 line 8(diลก) ma-na kuโ-babbar แนฃa-ru-paโ-am 5 767260 cluster 8(diลก) ma- missing 7424 sign 8(diลก) 1 3 P360499 envelope 14 867010 line 1(diลก) ma-na 1(u) 5(diลก) ginโ 14 767648 cluster 1(u) 5(diลก) missing 11267 sign 5(diลก) 1 4 P360526 obverse 5 867983 line 1/2(diลก) ma-na 7(diลก) 1/2(diลก) ginโ 5 768250 cluster 7(diลก) missing 16045 sign 7(diลก) 1 5 P360604 obverse 1 869813 line 5(diลก) qaโ-qaโ-ra-tim 1 769396 cluster 5(diลก) missing 28635 sign 5(diลก) 1 6 P360670 reverse 11 871247 line 6(diลก) ku-ta-nu 1/2(diลก) ma-na-ta 3(diลก) ma-na 11 770396 cluster 6(diลก) ku- missing 38736 sign 6(diลก) 1 7 P360674 obverse 8 871396 line 6(diลก) anลกe 7(diลก) ma-na an-na 8 770505 cluster 7(diลก) missing 40286 sign 7(diลก) 1 8 P360705 reverse 3 872530 line 8(diลก) 1/3(diลก) ma-na an-na a-na ลกu-ma-a 3 771172 cluster 8(diลก) 1/3(diลก) missing 50915 sign 8(diลก) 1 9 P361357 obverse 46 875282 line 7(diลก) da-a-a-num ลกa diโ-na-am i-na 46 773068 cluster 7(diลก) da- missing 71923 sign 7(diลก) 1
All chapters:
See the cookbook for recipes for small, concrete tasks.
CC-BY Dirk Roorda