To get started: consult start
Search in Text-Fabric is a template based way of looking for structural patterns in your dataset.
Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for complicated syntactical patterns with the power of programmatically processing the results.
This notebook will show you how to get up and running.
Search is as simple as saying (just an example)
results = A.search(template)
A.show(results)
See all ins and outs in the search template docs.
The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.
%load_ext autoreload
%autoreload 2
from tf.app import use
A = use("clariah/wp6-missieven", hoist=globals())
We start with the most simple form of issuing a query. Let's look for the words in volume 4, page 235, line 17
All work involved in searching takes place under the hood.
query = """
volume n=4
page n=239
line n<9
word
"""
results = A.search(query)
A.table(results, skipCols="1 2 3")
1.80s 61 results
n | p | word |
---|---|---|
1 | 4 239:1 | IV. |
2 | 4 239:1 | RYCKLOFF |
3 | 4 239:1 | VAN |
4 | 4 239:1 | GOENS, |
5 | 4 239:1 | CORNELIS |
6 | 4 239:1 | SPEELMAN, |
7 | 4 239:1 | WILLEM |
8 | 4 239:1 | VAN |
9 | 4 239:2 | OUTHOORN, |
10 | 4 239:2 | JOANNES |
11 | 4 239:2 | CAMPHUYS |
12 | 4 239:2 | EN |
13 | 4 239:2 | JACOB |
14 | 4 239:2 | PITS, |
15 | 4 239:2 | BATAVIA |
16 | 4 239:2 | 3 |
17 | 4 239:2 | augustus |
18 | 4 239:3 | 1678. |
19 | 4 239:4 | 1212, |
20 | 4 239:4 | fol. |
21 | 4 239:4 | 699- |
22 | 4 239:4 | 706, |
23 | 4 239:4 | copie |
24 | 4 239:4 | 1220, |
25 | 4 239:4 | fol. |
26 | 4 239:4 | 96- |
27 | 4 239:4 | 109 |
28 | 4 239:5 | Met |
29 | 4 239:5 | enige |
30 | 4 239:5 | Engelse |
31 | 4 239:5 | schepen |
32 | 4 239:5 | uit |
33 | 4 239:5 | Bantam |
34 | 4 239:5 | verzonden. . |
35 | 4 239:6 | |
36 | 4 239:7 | Vgl. |
37 | 4 239:7 | Daghregisters |
38 | 4 239:7 | 1678, |
39 | 4 239:7 | p. |
40 | 4 239:7 | 189- |
41 | 4 239:7 | 421 |
42 | 4 239:7 | het |
43 | 4 239:7 | huurschip |
44 | 4 239:7 | St. |
45 | 4 239:7 | Andries |
46 | 4 239:7 | kwam |
47 | 4 239:7 | in |
48 | 4 239:8 | slechte |
49 | 4 239:8 | staat |
50 | 4 239:8 | uit |
51 | 4 239:8 | patria |
52 | 4 239:8 | te |
53 | 4 239:8 | Batavia |
54 | 4 239:8 | en |
55 | 4 239:8 | wordX |
56 | 4 239:8 | op |
57 | 4 239:8 | Onrust |
58 | 4 239:8 | gerepareerd; |
59 | 4 239:8 | grote |
60 | 4 239:8 | sterfte |
61 | 4 239:8 | op |
The hyperlinks take us to the online image of this page at the Huygens institute.
Note that we can choose start and/or end points in the results list.
A.table(results, start=44, end=53, skipCols="1 2")
We can show the results more fully with show()
.
A.show(results, skipCols="1 2 3", condensed=True, condenseType="line")
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
Now we pick all numerical words, or rather, words that contain a digit
query = """
volume n=4
page n=239
line n<9
word trans~[0-9]
"""
results = A.search(query)
A.show(results, skipCols="1 2 3", condensed=True)
2.97s 11 results
line 1
line 2
line 3
line 4
Lets look for all places where there is a remark by the editor:
query = """
word isremark
"""
results = A.search(query)
2.63s 2349087 results
We can narrow down to the page we just inspected:
query = """
volume n=4
page n=239
word isremark
"""
results = A.search(query)
1.72s 198 results
and show the results:
A.show(results, condensed=True)
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
line 11
line 12
line 13
line 14
line 15
line 16
line 17
line 18
How can we look for special characters?
Let's first see what special characters we have in the corpus.
A = use("clariah/wp6-missieven:clone", hoist=globals())
A.specialCharacters()
Special characters in text-orig-full
à
á
â
ä
ç
È
è
É
é
Ê
ê
Ë
ë
Ï
ï
Ó
ó
Ö
ö
Ü
ü
£
§
©
«
¬
®
°
±
»
¼
½
æ
ƒ
—
‘
’
“
”
„
•
…
€
™
∪
≤
≥
⌊
⌋
■
►
♦
If you click on a character it is copied to the clipboard.
We can search for all words with a black square:
results = A.search("""
word trans~■
""")
2.53s 37 results
A.table(results, condensed=True)
n | p | line | word |
---|---|---|---|
1 | 1 31:32 | ■ | |
2 | 1 80:27 | ■willen | |
3 | 1 557:3 | ■ | |
4 | 2 641:42 | ■voorsz. | |
5 | 2 660:41 | ■ | |
6 | 3 118:14 | ■witste | |
7 | 3 662:13 | ■ffi. | |
8 | 4 205:4 | ■ . . | |
9 | 4 208:43 | ■ | |
10 | 4 209:7 | ■ ' | |
11 | 4 758:24 | ■ » | |
12 | 5 336:5 | ■ | |
13 | 5 837:28 | „■ | |
14 | 6 375:16 | ■naar | |
15 | 7 489:38 | ■ - | |
16 | 7 622:2 | ■tg | |
17 | 8 66:34 | ■ | |
18 | 8 66:41 | ■ | |
19 | 9 88:17 | ■ | |
20 | 9 88:23 | ■ | |
21 | 9 94:2 | ■ | |
22 | 9 94:5 | ■ | |
23 | 9 94:9 | ■ | |
24 | 9 94:15 | ■ | |
25 | 9 94:17 | ■ | |
26 | 9 94:22 | ■ | |
27 | 9 94:25 | ■ | |
28 | 9 803:16 | ■' | |
29 | 9 803:29 | ■^ | |
30 | 11 434:20 | ■ / | |
31 | 12 436:7 | ■ | |
32 | 12 436:10 | ■ | |
33 | 12 436:11 | ■ | |
34 | 12 436:17 | ■ | |
35 | 12 436:21 | ■ | |
36 | 13 42:4 | ■ | |
37 | 13 194:45 | ■ » |
CC-BY Dirk Roorda