To get started: consult start
In a notebook, you can perform searches and view them in a tabular display and zoom in on items with pretty displays.
But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just work with them in other programs, such as Excel.
You want to do that not only with query results, but with all kinds of lists of tuples of nodes.
There is a function for that, A.export()
, and here we show what it can do.
%load_ext autoreload
%autoreload 2
import os
from tf.app import use
A = use("etcbc/dss", hoist=globals())
This is Text-Fabric 9.2.2 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 67 features found and 1 ignored
We write a function that can peek into file on your system, and show the first few lines. We'll use it to inspect the exported files that we are going to produce.
EXPORT_FILE = os.path.expanduser("~/Downloads/results.tsv")
UPTO = 10
def checkout():
with open(EXPORT_FILE, encoding="utf_16") as fh:
for (i, line) in enumerate(fh):
if i >= UPTO:
break
print(line.rstrip("\n"))
Our exported .tsv
files open in Excel without hassle, even if they contain non-latin characters.
That is because TF writes such files in an
encoding that works well with Excel: utf_16_le
.
You can just open them in Excel, there is no need for conversion before or after opening these files.
Should you want to process these files by means of a (Python) program,
take care to read them with encoding utf_16
.
We first run a query in order to export the results.
query = """
line
word vs=hitpael
sign unc
"""
results = A.search(query)
A.table(results, end=10)
0.86s 399 results
n | p | line | word | sign |
---|---|---|---|---|
1 | CD 12:23 | המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן | מתהלכים | מ |
2 | CD 12:23 | המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן | מתהלכים | כ |
3 | CD 12:23 | המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן | מתהלכים | י |
4 | CD 12:23 | המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן | מתהלכים | ם |
5 | CD 15:11 | המשפטים עד עמד׳ו לפני המבקר שמה יתפתה ב׳ו בדרש׳ו את׳ו ׃ | יתפתה | ת |
6 | CD 19:4 | ויתהלכו על פי התורה ׃ וכמשפט היסודים כסרך התורה | יתהלכו | י |
7 | 1QS 7:24 | הרבים ללכת בשרירות לב׳ו לוא ישוב אל עצת היחד עוד ׃ ואיש מאנשי היחד אשר יתערב | יתערב | י |
8 | 1QS 7:24 | הרבים ללכת בשרירות לב׳ו לוא ישוב אל עצת היחד עוד ׃ ואיש מאנשי היחד אשר יתערב | יתערב | ת |
9 | 1QSa 1:11 | ורע ׃ ובכן תקבל להעיד עלי׳ו משפטות התורא ולהתיצב במשמע משפטים ׃ | התיצב | ת |
10 | 1QSb 4:2 | ימנה את׳ו והתערב ל׳ו׃ וכליל # ε אנוש ובתענוגות בני אדם ε ך ׃ | התערב | ה |
You can export the table of results to Excel.
The following command writes a tab-separated file results.tsv
to your downloads directory.
You can specify arguments toDir=directory
and toFile=file name
to write to a different file.
If the directory does not exist, it will be created.
We stick to the default, however.
A.export(results)
Check out the contents:
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 vs2 NODE3 TYPE3 TEXT3 unc3 1 CD 12 23 1553232 line המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן 1610602 word מתהלכים hitpael 10574 sign מ 2 2 CD 12 23 1553232 line המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן 1610602 word מתהלכים hitpael 10578 sign כ 2 3 CD 12 23 1553232 line המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן 1610602 word מתהלכים hitpael 10579 sign י 1 4 CD 12 23 1553232 line המחנות המתהלכים באלה בקץ הרשעה עד עמוד משוח משיח אהרן 1610602 word מתהלכים hitpael 10580 sign ם 2 5 CD 15 11 1553289 line המשפטים עד עמד׳ו לפני המבקר שמה יתפתה ב׳ו בדרש׳ו את׳ו ׃ 1611517 word יתפתה hitpael 13025 sign ת 2 6 CD 19 4 1553321 line ויתהלכו על פי התורה ׃ וכמשפט היסודים כסרך התורה 1611956 word יתהלכו hitpael 14285 sign י 1 7 1QS 7 24 1553567 line הרבים ללכת בשרירות לב׳ו לוא ישוב אל עצת היחד עוד ׃ ואיש מאנשי היחד אשר יתערב 1616682 word יתערב hitpael 27838 sign י 1 8 1QS 7 24 1553567 line הרבים ללכת בשרירות לב׳ו לוא ישוב אל עצת היחד עוד ׃ ואיש מאנשי היחד אשר יתערב 1616682 word יתערב hitpael 27839 sign ת 2 9 1QSa 1 11 1553680 line ורע ׃ ובכן תקבל להעיד עלי׳ו משפטות התורא ולהתיצב במשמע משפטים ׃ 1618956 word התיצב hitpael 34422 sign ת 2
You see the following columns:
R
the sequence number of the result tuple in the result listS1 S2 S3
the section as scroll name, fragment, and line number, in separate columnsNODEi TYPEi
the node and its type, for each node i in the result tupleTEXTi
the full text of node i
, if the node type admits a concise text representationvs2
unc3
the value of feature vs
on the word and unc
on the sign,since our query mentions them on those nodes.
If you do not need the full text of the lines, you can leave them out by specifying a smaller condense type.
The export function provides text for all nodes whose type is not too big. What is too big is determined by the condense type.
In this corpus, the default condense type is line. Node types bigger than lines will not get text.
Now, if we change the condenseType
to something smaller than line, e.g. word
, the line text will be suppressed.
A.export(results, condenseType="word")
checkout()
R S1 S2 S3 NODE1 TYPE1 NODE2 TYPE2 TEXT2 vs2 NODE3 TYPE3 TEXT3 unc3 1 CD 12 23 1553232 line 1610602 word מתהלכים hitpael 10574 sign מ 2 2 CD 12 23 1553232 line 1610602 word מתהלכים hitpael 10578 sign כ 2 3 CD 12 23 1553232 line 1610602 word מתהלכים hitpael 10579 sign י 1 4 CD 12 23 1553232 line 1610602 word מתהלכים hitpael 10580 sign ם 2 5 CD 15 11 1553289 line 1611517 word יתפתה hitpael 13025 sign ת 2 6 CD 19 4 1553321 line 1611956 word יתהלכו hitpael 14285 sign י 1 7 1QS 7 24 1553567 line 1616682 word יתערב hitpael 27838 sign י 1 8 1QS 7 24 1553567 line 1616682 word יתערב hitpael 27839 sign ת 2 9 1QSa 1 11 1553680 line 1618956 word התיצב hitpael 34422 sign ת 2
If we want to see the text in another format, we can specify it.
Here is the Abegg encoding.
A.export(results, fmt="text-source-full")
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 vs2 NODE3 TYPE3 TEXT3 unc3 1 CD 12 23 1553232 line hmjnwt hmthlkyM balh bqX hrCoh od omwd mCwj mCyj ahrN 1610602 word mthlkyM hitpael 10574 sign m 2 2 CD 12 23 1553232 line hmjnwt hmthlkyM balh bqX hrCoh od omwd mCwj mCyj ahrN 1610602 word mthlkyM hitpael 10578 sign k 2 3 CD 12 23 1553232 line hmjnwt hmthlkyM balh bqX hrCoh od omwd mCwj mCyj ahrN 1610602 word mthlkyM hitpael 10579 sign y 1 4 CD 12 23 1553232 line hmjnwt hmthlkyM balh bqX hrCoh od omwd mCwj mCyj ahrN 1610602 word mthlkyM hitpael 10580 sign M 2 5 CD 15 11 1553289 line hmCpfyM od omd/w lpny hmbqr Cmh ytpth b/w bdrC/w at/w . 1611517 word ytpth hitpael 13025 sign t 2 6 CD 19 4 1553321 line wythlkw ol py htwrh . □ wkmCpf hyswdyM ksrK htwrh 1611956 word ythlkw hitpael 14285 sign y 1 7 1QS 7 24 1553567 line hrbyM llkt bCryrwt lb/w lwa yCwb al oxt hyjd owd . wayC manCy hyjd aCr ytorb 1616682 word ytorb hitpael 27838 sign y 1 8 1QS 7 24 1553567 line hrbyM llkt bCryrwt lb/w lwa yCwb al oxt hyjd owd . wayC manCy hyjd aCr ytorb 1616682 word ytorb hitpael 27839 sign t 2 9 1QSa 1 11 1553680 line wro . wbkN tqbl lhoyd oly/w mCpfwt htwra wlhtyxb bmCmo mCpfyM . 1618956 word htyxb hitpael 34422 sign t 2
If we want to export additional features, we just have to mention them.
In order to do so and not change the result set, put a *
behind the feature.
The *
means: always true, no matter what's in the feature, even if there is nothing in there.
Let's ask for the original lexeme and morph tags and whether the sign is a reconstruction.
query = """
line
word vs=hitpael lexo* morpho*
sign unc rec*
"""
results = A.search(query)
2.39s 399 results
The same number of results.
We do the export again and peek at the results.
A.export(results, condenseType="word")
checkout()
R S1 S2 S3 NODE1 TYPE1 NODE2 TYPE2 TEXT2 lexo2 morpho2 vs2 NODE3 TYPE3 TEXT3 rec3 unc3 1 CD 12 23 1553232 line 1610602 word מתהלכים hlK vtPmpa hitpael 10574 sign מ 2 2 CD 12 23 1553232 line 1610602 word מתהלכים hlK vtPmpa hitpael 10578 sign כ 2 3 CD 12 23 1553232 line 1610602 word מתהלכים hlK vtPmpa hitpael 10579 sign י 1 4 CD 12 23 1553232 line 1610602 word מתהלכים hlK vtPmpa hitpael 10580 sign ם 2 5 CD 15 11 1553289 line 1611517 word יתפתה pth_1 vti3ms hitpael 13025 sign ת 2 6 CD 19 4 1553321 line 1611956 word יתהלכו hlK vti3mp hitpael 14285 sign י 1 7 1QS 7 24 1553567 line 1616682 word יתערב orb_2 vti3ms hitpael 27838 sign י 1 8 1QS 7 24 1553567 line 1616682 word יתערב orb_2 vti3ms hitpael 27839 sign ת 2 9 1QSa 1 11 1553680 line 1618956 word התיצב yxb vtc hitpael 34422 sign ת 2
As you see, you have an extra columns lexo2
, morpho2
and rec3
.
This gives you a lot of control over the generation of spreadsheets.
You can also export lists of node tuples that are not obtained by a query:
words = F.otype.s("word")[1000:1010]
signs1 = F.otype.s("sign")[100000:100010]
signs2 = F.otype.s("sign")[200000:200010]
tuples = list(zip(words, signs1, signs2))
tuples
[(1607869, 100001, 200001), (1607870, 100002, 200002), (1607871, 100003, 200003), (1607872, 100004, 200004), (1607873, 100005, 200005), (1607874, 100006, 200006), (1607875, 100007, 200007), (1607876, 100008, 200008), (1607877, 100009, 200009), (1607878, 100010, 200010)]
Ten rows, each row has a word node and two sign nodes.
The word and the signs in each row do not have any meaningful relationship!
Let's do a bare export:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 lexo2 morpho2 vs2 NODE3 TYPE3 TEXT3 rec3 unc3 1 CD 4 9 1607869 word הקים 100001 sign ד 200001 sign ל 1 2 CD 4 9 1607870 word אל 100002 sign ע 200002 sign ת 1 3 CD 4 9 1607871 word ל 100003 sign ת 200003 sign ר 1 4 CD 4 9 1607872 word ראשנים 100004 sign ׳ 200004 sign י 1 5 CD 4 9 1607873 word ל 100005 sign י 200005 sign ע 1 6 CD 4 9 1607874 word כפר 100006 sign ס 200006 sign ש 1 7 CD 4 10 1607875 word על 100007 sign פ 200007 sign ר 1 8 CD 4 10 1607876 word עונותי׳הם 100008 sign ר 200008 sign ל 1 9 CD 4 10 1607877 word כן 100009 sign ת 200009 sign ׳ 1
Wait a minute: why are the lexo2
and morpho2
and rec3
and unc
columns showing up?
It is because we have run a query before where we asked for these features.
If we do not want to be influenced by previous things we've run, we need to reset the display:
A.displayReset("tupleFeatures")
Again:
A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 NODE2 TYPE2 TEXT2 NODE3 TYPE3 TEXT3 1 CD 4 9 1607869 word הקים 100001 sign ד 200001 sign ל 2 CD 4 9 1607870 word אל 100002 sign ע 200002 sign ת 3 CD 4 9 1607871 word ל 100003 sign ת 200003 sign ר 4 CD 4 9 1607872 word ראשנים 100004 sign ׳ 200004 sign י 5 CD 4 9 1607873 word ל 100005 sign י 200005 sign ע 6 CD 4 9 1607874 word כפר 100006 sign ס 200006 sign ש 7 CD 4 10 1607875 word על 100007 sign פ 200007 sign ר 8 CD 4 10 1607876 word עונותי׳הם 100008 sign ר 200008 sign ל 9 CD 4 10 1607877 word כן 100009 sign ת 200009 sign ׳
When we exported query results, we could mention features in the query with a *
so that they got exported.
If we do not have a previous query we can achieve the same effect by specifying the desired export features per column.
The display option tupleFeatures
takes care of that.
A.displaySetup(
tupleFeatures=(
(0, "fulle lexe type"),
(1, "glyphe type"),
(2, "glyphe type"),
)
)
We assign extra features per member of the tuple.
In the above case:
0
) member (the word node), gets features fulle
(full transcription in ETCBC encoding),
glyphe
(just the actual signs), type
(type of word);glyphe
and type
(type of sign).A.export(tuples)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 fulle1 lexe1 type1 NODE2 TYPE2 TEXT2 glyphe2 type2 NODE3 TYPE3 TEXT3 glyphe3 type3 1 CD 4 9 1607869 word הקים HQJm QWm glyph 100001 sign ד D cons 200001 sign ל L cons 2 CD 4 9 1607870 word אל >L >;L_5 glyph 100002 sign ע < cons 200002 sign ת T cons 3 CD 4 9 1607871 word ל L L: glyph 100003 sign ת T cons 200003 sign ר R cons 4 CD 4 9 1607872 word ראשנים R>#NJm RI>COWn glyph 100004 sign ׳ ' sep 200004 sign י J cons 5 CD 4 9 1607873 word ל L L: glyph 100005 sign י J cons 200005 sign ע < cons 6 CD 4 9 1607874 word כפר KPR KPR glyph 100006 sign ס S cons 200006 sign ש # cons 7 CD 4 10 1607875 word על <L <AL_2 glyph 100007 sign פ P cons 200007 sign ר R cons 8 CD 4 10 1607876 word עונותי׳הם <WNWTJ'Hm <@WOn glyph 100008 sign ר R cons 200008 sign ל L cons 9 CD 4 10 1607877 word כן Kn K.;n_2 glyph 100009 sign ת T cons 200009 sign ׳ ' sep
Suppose we want to find signs between uncertainty brackets with glyph he and signs between alternate brackets with glyph aleph.
If you forgot the exact type of cluster to look for, call up the types of clusters:
F.type.freqList(nodeTypes={"cluster"})
(('rec', 93733), ('vac', 3522), ('cor3', 1582), ('unc2', 906), ('rem2', 706), ('alt', 333), ('cor2', 147), ('cor', 95), ('rem', 75))
We need unc2
.
And how do we type the he and the alef? We could copy paste from a Hebrew text, but we can also use the transcription.
Consult the character table (see the link just below the incantation of the DSS) and see that
alef = a
and he = h
in Abegg's encoding.
Now if we want to have the results combined in one table we can chain two queries like this:
results = (
A.search(
"""
line
cluster type=unc2
sign glypho=h
"""
)
+ A.search(
"""
line
cluster type=alt
sign glypho=a
"""
)
)
1.02s 135 results 1.00s 52 results
In such cases, it is better to setup the features yourself:
A.displaySetup(
tupleFeatures=(
(0, "srcLn"),
(1, "type"),
(2, "glypho rec"),
),
fmt="text-source-full",
)
Now we can do a fine export:
A.export(results)
checkout()
R S1 S2 S3 NODE1 TYPE1 TEXT1 srcLn1 NODE2 TYPE2 TEXT2 type2 NODE3 TYPE3 TEXT3 glypho3 rec3 1 1QM 17 2 1554349 line rCo . watmh zkwrw mCpf ndb wabyhwa bny ahrwN aCr htqdC al bmCpf/M loyny kwl hoM . walozr 1431580 cluster h unc2 62403 sign h h 2 1Q20 6 8 1555643 line wnqbN . badyN lbn/y nCyN nsbt mN bnt aj/y wbnt/y lbny aj/y yhbt kdt jwq olma 1433777 cluster hb unc2 108754 sign h h 3 1Q20 6 23 1555658 line -- waCkjt anh nwj jN rbw wqwCf \\\ -- 1433807 cluster h unc2 109305 sign h h 4 1Q20 11 11 1555723 line adyN anh nwj npqt whlkt baroa lawrk/ha wlpwty/ha -- \\ l\\\\\\\\\N 1433909 cluster h unc2 111194 sign h h 5 1Q20 16 16 1555812 line od fynh nhra \\\\\\\\\\\\l \\\\M \\\\\\\\\\\\\\ 1434061 cluster h unc2 115148 sign h h 6 1Q29 f3_4 2 1556404 line -- yhwh alwhy/kmh -- 1435250 cluster h unc2 130666 sign h h 7 4Q88 9 12 1557399 line pry b\\ gpny/hM wlwa 1436941 cluster /h unc2 144782 sign h h 8 4Q163 f12 8 1557890 line -- kaCr xwh -- 1437896 cluster h unc2 154597 sign h h 9 4Q171 f1_2ii 13 1558347 line kya ba ywm/w . pCr/w ol oryxy hbryt aCr bbyt yhwdh aCr 1438738 cluster h unc2 164401 sign h h
All chapters:
See the cookbook for recipes for small, concrete tasks.
CC-BY Dirk Roorda