To get started: consult start
A research group at VU University Amsterdam (Piek Vossen VU, Sophie Arnoult) has applied a NER-algorithm to this corpus (Named Entity Recognition) and delivered the results as Text-Fabric features in cltl/voc-missives.
We can use these shared features, they are in export/tf
and we see that they have been produced
against version 1.0
of the corpus data.
See entityProto for an exploration of these entities.
Based on that we have created ent
nodes for entity occurrences and entity
nodes for collections of ent
nodes that have the same entity id and entity kind.
%load_ext autoreload
%autoreload 2
from tf.app import use
A = use("CLARIAH/wp6-missieven", checkout="latest", hoist=globals())
Locating corpus resources ...
| 1.22s T otype from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 12s T oslots from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.53s T transn from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 11s T punc from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.99s T n from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 6.26s T punco from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 4.51s T puncr from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.41s T puncn from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T title from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 5.42s T transr from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 7.79s T transo from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 14s T trans from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | | 0.21s C __levels__ from otype, oslots, otext | | 33s C __order__ from otype, oslots, __levels__ | | 1.24s C __rank__ from otype, __order__ | | 28s C __levUp__ from otype, oslots, __rank__ | | 5.25s C __levDown__ from otype, __levUp__, __rank__ | | 2.71s C __characters__ from otext | | 11s C __boundary__ from otype, oslots, __rank__ | | 1.27s C __sections__ from otype, oslots, otext, __levUp__, __levels__, n, n, n | | 0.51s C __structure__ from otype, oslots, otext, __rank__, __levUp__, n, title, n | 0.00s T author from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T authorFull from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.05s T col from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T day from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.05s T eid from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.03s T eoccs from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T isden from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.01s T isemph from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.04s T isfolio from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.35s T isnote from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T isnum from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 5.37s T isorig from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.01s T isq from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.03s T isref from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 3.77s T isremark from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T isspecial from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T issub from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.02s T issuper from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T isund from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.04s T kind from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.02s T mark from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T month from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.05s T note from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T page from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T place from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T rawdate from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.07s T row from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T seq from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T status from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T vol from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.13s T weblink from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.02s T x from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e | 0.00s T year from ~/text-fabric-data/github/CLARIAH/wp6-missieven/tf/1.0e
Name | # of nodes | # slots / node | % coverage |
---|---|---|---|
volume | 14 | 426954.79 | 100 |
letter | 607 | 9847.39 | 100 |
page | 11215 | 532.98 | 100 |
table | 491 | 137.91 | 1 |
para | 34773 | 100.79 | 59 |
remark | 24110 | 97.49 | 39 |
head | 607 | 31.12 | 0 |
note | 12476 | 16.88 | 4 |
line | 526918 | 11.34 | 100 |
row | 8350 | 8.10 | 1 |
entity | 4659 | 6.26 | 0 |
folio | 7899 | 2.63 | 0 |
cell | 32302 | 2.09 | 1 |
ent | 17756 | 1.64 | 0 |
subhead | 1864 | 1.42 | 0 |
word | 5977367 | 1.00 | 100 |
3
CLARIAH/wp6-missieven
/Users/me/text-fabric-data/github/CLARIAH/wp6-missieven/app
g61b0cb1b6bb6e9c4549a53aa5db557ffe37c1946
.remark {
font-size: large;
font-style: italic;
}
.folio {
font-size: small;
color: #668866;
}
.fmark:after {
font-size: small;
font-weight: bold;
vertical-align: super;
color: #ddaa22;
}
.note {
vertical-align: super;
font-size: small;
color: #774400;
}
.ref {
font-size: small;
font-weight: bold;
color: #666688;
}
.emph {
font-style: italic;
}
.und {
text-decoration: underline;
}
.q {
color: #777777;
font-weight: bold;
}
.num {
font-size: small;
vertical-align: super;
}
.den {
font-size: small;
vertical-align: sub;
}
.sub {
vertical-align: sub;
}
.super {
vertical-align: super;
}
.special {
font-family: monospace;
font-weight: bold;
color: #886666;
}
layoutFull
}layoutNonOrig
}layoutNoNotes
}layoutNoRemarks
}layoutNotes
}layoutOrig
}layoutRemarks
}about
https://github.com/{org}/{repo}/blob/master/docs/transcription{docExt}
''
{}
True
local
/Users/me/text-fabric-data/github/CLARIAH/wp6-missieven/_temp
General Missives Dutch East India Company 1600-1800
10.5281/zenodo.4011801
ner
CLARIAH
/tf
wp6-missieven
1.0e
http://resources.huygens.knaw.nl/retroboeken/generalemissiven
weblink
Show this document on Huygens
23
11
11
11
11
13
13
15
15
15
13
11
13
{webBase}/#page=<2>&source=<1>
v1.1
{}
The following snippet shows how the entity
and ent
nodes hang together.
firstEntity = F.otype.s("entity")[0]
entityOccurrences = E.eoccs.f(firstEntity)
print(f"entity {firstEntity} is {F.kind.v(firstEntity)} {F.eid.v(firstEntity)} having {len(entityOccurrences)} occs")
for eo in entityOccurrences:
print(f"ent {eo} is {F.kind.v(eo)} {F.eid.v(eo)} {A.sectionStrFromNode(eo)}")
entity 6656750 is PER pieter.both having 20 occs ent 6638994 is PER pieter.both 1 3:1 ent 6639002 is PER pieter.both 1 3:1 ent 6639005 is PER pieter.both 1 3:1 ent 6639023 is PER pieter.both 1 7:1 ent 6639045 is PER pieter.both 1 8:1 ent 6639063 is PER pieter.both 1 16:1 ent 6639067 is PER pieter.both 1 16:1 ent 6639073 is PER pieter.both 1 17:1 ent 6639084 is PER pieter.both 1 18:1 ent 6639086 is PER pieter.both 1 19:1 ent 6639105 is PER pieter.both 1 20:1 ent 6639109 is PER pieter.both 1 20:1 ent 6639115 is PER pieter.both 1 20:1 ent 6639116 is PER pieter.both 1 21:1 ent 6639136 is PER pieter.both 1 27:1 ent 6639138 is PER pieter.both 1 27:1 ent 6639141 is PER pieter.both 1 29:1 ent 6639169 is PER pieter.both 1 33:1 ent 6639183 is PER pieter.both 1 37:1 ent 6639217 is PER pieter.both 1 39:1
Here we show the NER API as built in into Text-Fabric.
NE = A.makeNer()
results = NE.filterContent(eVals=("japan", "LOC"))
74 lines
NE.showContent(results, start=20)
CC-BY Dirk Roorda