STAM is a data model, and accompanied tooling, for stand-off text annotation that allows researchers and developers to model annotations on text.
An annotation is any kind of remark, classification/tagging on any particular portion(s) of a text, on the resource or annotation set as a whole, in which case we can interpret annotations as metadata, or on another annotation (higher-order annotation).
Examples of annotation may be linguistic annotation, structure/layout annotation, editorial annotation, technical annotation, or whatever comes to mind. STAM does not define any vocabularies whatsoever. Instead, it provides a framework upon which you can model your annotations using whatever you see fit.
The model is thoroughly explained in its specification document. We summarize only the most important data structures here, these have direct counterparts (classes) in the python library we will be teaching in this tutorial:
Annotation
- A instance of annotation. Associated with an annotation is a
Selector
to select the target of the annotation, and one or more
AnnotationData
instances that hold the body or content of the
annotation. This is explicitly decoupled from the annotation instance itself
as multiple annotations may hold the very same content.Selector
- A selector identifies the target of an annotation and the part of the target that the annotation applies to. There are multiple types that are described here. The TextSelector
is an important one that selects a target resource and a specific text selection within it by specifying an offset.AnnotationData
- A key/value pair that acts as body or content for one or more annotations. The key is a reference to DataKey
, the value is a DataValue
. (The term feature is also seen for this in certain annotation paradigms)DataKey
- A key as referenced by AnnotationData
.DataValue
- A value with some type information (e.g. string, integer, float).TextResource
- A textual resource that is made available for annotation. This holds the actual textual content.TextSelection
- A particular selection of text within a resource, i.e. a subslice of the text.AnnotationDataSet
- An Annotation Data Set stores the keys (DataKey
) and
values (AnnotationData
) that are used by annotations. It effectively
defines a certain vocabulary, i.e. key/value pairs. How broad or narrow the
scope of the vocabulary is not defined by STAM but entirely up to the user.AnnotationStore
- The annotation store is essentially your workspace, it holds all
resources, annotation sets (i.e. keys and annotation data) and of course the
actual annotations. In the Python implementation it is a memory-based store
and you can put as much as you like into it (as long as it fits in memory).STAM is more than just a theoretical model, we offer practical implementations
that allow you to work with it directly. In this tutorial we will be using Python and
the Python library stam
.
Note: The STAM Python library is a so-called Python binding to a STAM library written in Rust. This means the library is not written in Python but is compiled to machine code and as such offers much better performance.
First of all, you will need to install the STAM Python library from the Python Package Index as follows:
!pip install stam
Requirement already satisfied: stam in ./env/lib/python3.12/site-packages (0.7.0)
text = """
# Consider Phlebas
$ author=Iain M. Banks
## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of nothing;
that’s the bottom line, the final truth.
So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
## 2
Besides,
it left the humans in the Culture free to take care of the things that really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
"""
This format of the text for STAM is in no way prescribed other than:
Before we can do anything we need to import the STAM library:
import stam
Let's add this text resource to an annotation store so we can annotate it
store = stam.AnnotationStore(id="tutorial")
resource_banks = store.add_resource(id="banks", text=text)
Here we passed the text as a string, but it could just as well have been an
external text file instead, the filename of which can be passed via the file=
keyword
argument.
Our example text is a bit Markdown-like, we have a title header "Consider Phlebas", and two subheaders (1 and 2) containing one quote from the book each.
As our first annotations, let's try to annotate this coarse structure. At this point we're already in need of some vocabulary to express the notions of title header, section header and quote, as STAM does not define any vocabulary. It is up to you to make these choices on how to represent the data.
An annotation data set effectively defines an vocabulary. Let's invent our own
simple Annotation Data Set that defines the keys and values we use in this
tutorial. In our AnnotationDataSet
We can define a DataKey
with ID structuretype
, and have it
takes values like titleheader
, sectionheader
and quote
.
We can explicitly add the set and the key. We give the dataset a public ID (tutorial-set), just as we previously assigned a public ID to both the annotationstore (tutorial) and the text resource (banks). It is good practise to assign IDs, though you can also let the library auto-generate them for you:
dataset = store.add_dataset("tutorial-set")
key_structuretype = dataset.add_key("structuretype")
To annotate the title header, we need to select the part of the text where it occurs by finding the offset, which consists of a begin and end position. STAM follows the same indexing format Python does, in which positions are 0-indexed unicode character points (as opposed to (UTF-8) bytes) and where the end is non-inclusive. After some clumsy manual counting on the source text we discover the following coordinates hold:
assert text[1:19] == "# Consider Phlebas"
And we make the annotation:
annotation = store.annotate(
target=stam.Selector.textselector(resource_banks, stam.Offset.simple(1,19)),
data={"id": "Data1", "key": key_structuretype, "value": "titleheader", "set": dataset },
id="Annotation1")
A fair amount happened there. We selected a part of the text of
resource_banks
by offset, and associated AnnotationData
with the annotation
saying that the structuretype
key has the value titleheader
, both of which
we invented as part of our AnnotationDataSet
with ID tutorial-set
. Last, we
assigned an ID to both the AnnotationData
, as well as to the Annotation
as
a whole. In this example we reused some of the variables we had created
earlier, but we could have also written out in full as shown below:
annotation = store.annotate(
target=stam.Selector.textselector(resource_banks, stam.Offset.simple(1,19)),
data={"id": "Data1", "key": "structuretype", "value": "titleheader", "set": "tutorial-set" },
id="Annotation1")
This would also have been perfectly fine, and moreover, it would also work fine
without us explicitly creating the AnnotationDataSet
and the key as we did
before! Those would have been automatically created on-the-fly for us. The
only disadvantage is that under the hood more lookups are needed, so this is
slightly less performant than passing python variables.
We can inspect the annotation we just added:
print("Annotation ID: ", annotation.id())
print("Target text: ", str(annotation))
print("Data: ")
for data in annotation.data():
print(" - Data ID: ", data.id())
print(" Data Key: ", data.key().id())
print(" Data Value: ", str(data.value()))
Annotation ID: Annotation1 Target text: # Consider Phlebas Data: - Data ID: Data1 Data Key: structuretype Data Value: titleheader
In the above example, we obtained an Annotation
instance from the return value of the annotate()
method. Once any annotation is in the store, we can retrieve it simply by its public ID using the annotation()
method. An exception will be raised if the ID does not exist.
annotation = store.annotation("Annotation1")
A similar pattern holds for almost all other data structures in the STAM model:
dataset = store.dataset("tutorial-set") #AnnotationDataSet
resource_banks = store.resource("banks") #TextResource
key_structuretype = dataset.key("structuretype") #DataKey
data = dataset.annotationdata("Data1") #AnnotationData
There are also shortcut methods available to get keys and data directly from a store, without needing to first retrieve a dataset yourself:
key_structuretype = store.key("tutorial-set","structuretype") #DataKey
data = store.annotationdata("tutorial-set","Data1") #AnnotationData
find_text()
¶We now continue by adding annotations for the two section headers. Counting offsets
manually is rather cumbersome, so we use the find_text()
method on TextResource
to find our target for annotation:
results = resource_banks.find_text("## 1")
section1 = results[0]
print(f"Text {str(section1)} found at {section1.begin()}:{section1.end()}")
annotation = store.annotate(
target=stam.Selector.textselector(resource_banks, section1.offset()),
data={"id": "Data2", "key": "structuretype", "value": "sectionheader", "set": "tutorial-set" },
id="Annotation2")
Text ## 1 found at 44:48
The find_text()
method returns a list of TextSelection
instances. These
carry an Offset
which is returned by the offset()
method. Hooray, no more
manual counting!
We do the same for the last header:
results = resource_banks.find_text("## 2")
section2 = results[0]
print(f"Text {str(section2)} found at {section2.begin()}:{section2.end()}")
annotation = store.annotate(
target=stam.Selector.textselector(resource_banks, section2.offset()),
data={"id": "Data2", "key": "structuretype", "value": "sectionheader", "set": "tutorial-set" },
id="Annotation3")
Text ## 2 found at 365:369
In the previous code the attentive reader may have noted that we are reusing the Data2
ID
rather than introducing a new Data3
ID, because the data for both
Annotation2
and Annotation3
is in fact, identical.
This is an important feature of STAM; annotations and their data are
decoupled precisely because the data may be referenced by multiple annotations, and
if that's the case, we only want to keep the data in memory once. We don't want
a copy for every annotation. Say we have AnnotationData
with key
structuretype
and value word
, and use that to tag all words in the
text, then it would be a huge amount of redundancy if there was no such
decoupling between data and annotations. The fact that they all share the same data, also
enables us to quickly look up all those annotations via a reverse index that is kept internally:
for annotationdata in store.data(set="tutorial-set", key="structuretype", value="sectionheader"):
for annotation in annotationdata.annotations():
assert annotation.id() in ("Annotation2","Annotation3")
This can also be done in one go, which is typically more performant:
for annotation in store.data(set="tutorial-set", key="structuretype", value="sectionheader").annotations():
assert annotation.id() in ("Annotation2","Annotation3")
Here we used data()
on the store as a whole, this method provides an easy way to retrieve data from scratch.
We could have also started from an annotation dataset or even a key within it if we already have an instance to it, in that case we use the data()
method and pass the key (DataKey
), which will act as a filter:
key = dataset.key("structuretype")
for annotation in dataset.data(key, value="sectionheader").annotations():
assert annotation.id() in ("Annotation2","Annotation3")
However, since we have the key already it is simpler and more performant to use it directly and reduce the example to the following:
key = dataset.key("structuretype")
for annotation in key.data(value="sectionheader").annotations():
assert annotation.id() in ("Annotation2","Annotation3")
The ability to use any STAM object as a departing point for retrieval of other objects is a characteristic of the API. The ability to pass arbitrary objects as a filter is also a characteristic that you will find on multiple methods.
The data()
method can also be used to search for all values indiscriminately:
simply omit the value
keyword parameter. Moreover, it can be used to search
for non-exact values, using the following keyword arguments:
value_not
- Negates a valuesvalue_greater
- Value must be greater than specified (int or float)value_less
- Value must be less than specified (int or float)value_greatereq
- Value must be greater than specified or equal (int or float)value_lesseq
- Value must be less than specified or equal (int or float)value_in
- Value must match any in the tuple (this is a logical OR statement)value_not_in
- Value must not match any in the tuplevalue_in_range
- Must be a numeric 2-tuple with min and max (inclusive) valuesvalue_not_in_range
- Must be a numeric 2-tuple with min and max (inclusive) valuesThe data()
method takes filter parameter as positional arguments. You can
pass as many as you like. The object you pass as filter determines what is
being filtered, you can pass a DataKey
instance, an AnnotationData
instance,
or even an Annotation
. You can also pass the result of earlier data or annotation
requests (Data
, Annotations
). If you want to filter against one/any of multiple
values, use a tuple or list of any homogeneous type.
Searching for data and then retrieving the corresponding annotations is a very
common operation and easily accomplished by simply adding .annotations()
, as
we've seen in the above examples.
We can apply data filtering operations directly to annotations()
using the
same keyword arguments we saw for data()
. The following example provides
identical results as the earlier one, but the way of getting there is
slightly different (this takes all annotations first, and tests the data filter
on each, the other example takes the data first, and goes over all annotations
that make use of the data):
key = dataset.key("structuretype")
for annotation in store.annotations(key, value="sectionheader"):
assert annotation.id() in ("Annotation2","Annotation3")
If you're interested in the underlying text selections, then you can just add
.textselections()
. This chaining of methods on collections is one of the
characteristics of the STAM API.
Now we will annotate the quotes themselves. The first one starts after the first
subheader (Annotation2) and ends just before the next subheader (Annotation3).
That would include some ugly leading and trailing whitespace/newlines, though.
We use the textselection()
method to obtain a textselection to our computed
offset and subsequently strip the whitespace using the strip_text()
method,
effectively shrinking our textselection a bit:
quote1_selection = resource_banks.textselection(stam.Offset.simple(section1.end(), section2.begin() - 1)).strip_text(" \t\r\n")
quote1 = store.annotate(
target=stam.Selector.textselector(resource_banks, quote1_selection.offset()),
data={"id": "Data3", "key": "structuretype", "value": "quote", "set": "tutorial-set" },
id="AnnotationQuote1")
The second quote goes until the end of the text, which we can retrieve using
the textlen()
method. This method is preferred over doing things in native
python like len(str(banks))
because it is way more efficient:
quote2_selection = resource_banks.textselection(stam.Offset.simple(section2.end(), resource_banks.textlen())).strip_text(" \t\r\n")
quote2 = store.annotate(
target=stam.Selector.textselector(resource_banks, quote2_selection.offset()),
data={"id": "Data3", "set": "tutorial-set"},
id="AnnotationQuote2")
In this example we also show that, since we reference existing
AnnotationData
, just specifying the ID and the set suffices. Or even shorter and better, you could pass
a variable that is an instance of AnnotationData
.
There is another structural type we could annotate: the lines with
corresponding line numbers. This is easy to do by splitting the text on
newlines, for which we use the method split_text()
on TextResource
. As you
see, various Python methods such as split()
, strip()
, find()
have
counterparts in STAM that have a *_text()
suffix and which return
TextSelection
instances and carry offset information:
for linenr, line in enumerate(resource_banks.split_text("\n")):
linenr += 1 #make it 1-indexed as is customary for line numbers
print(f"Line {linenr}: {str(line)}")
store.annotate(
target=stam.Selector.textselector(resource_banks, line.offset()),
data=[
{"id": "Data4", "key": "structuretype", "value": "line", "set": "tutorial-set" },
{"id": f"DataLine{linenr}", "key": "linenr", "value": linenr, "set": "tutorial-set" }
],
id=f"AnnotationLine{linenr}")
Line 1: Line 2: # Consider Phlebas Line 3: $ author=Iain M. Banks Line 4: Line 5: ## 1 Line 6: Everything about us, Line 7: everything around us, Line 8: everything we know [and can know of] is composed ultimately of patterns of nothing; Line 9: that’s the bottom line, the final truth. Line 10: Line 11: So where we find we have any control over those patterns, Line 12: why not make the most elegant ones, the most enjoyable and good ones, Line 13: in our own terms? Line 14: Line 15: ## 2 Line 16: Besides, Line 17: it left the humans in the Culture free to take care of the things that really mattered in life, Line 18: such as [sports, games, romance,] studying dead languages, Line 19: barbarian societies and impossible problems, Line 20: and climbing high mountains without the aid of a safety harness. Line 21:
In this example we also extended our vocabulary on-the-fly with a new field linenr
. All line annotations carry two AnnotationData
elements. Remember we can easily retrieve the data and any annotations on it with data()
and annotations()
:
line8 = dataset.data(set="tutorial-set",key="linenr", value=8).annotations(limit=1)[0]
print(str(line8))
everything we know [and can know of] is composed ultimately of patterns of nothing;
Methods that return collections such as data()
,annotations()
, textselections()
often take an optional limit
parameter (sometimes as a keyword argument, sometimes as a normal parameter). This parameter limits the amount of results returned. Using it can improve performance in certain cases. In the above example we know we're only going to use one result, so it is a good idea to set (here we happen to also know that there is only one result for linenr
8, so strictly speaking the parameter wouldn't be necessary, but we ignore that for sake of teaching the use of limit
).
When annotating, we don't have to work with the resource as a whole but can also start relative from any text selection we have. Let's take line eight and annotate the first word of it ("everything") manually:
line8_textselection = line8.textselections(limit=1)[0] #there could be multiple, but in our cases thus-far we only have one
firstword = line8_textselection.textselection(stam.Offset.simple(0,10)) #we make a textselection on a textselection
#internally, the text selection will always use absolute coordinates for the resource:
print(f"Text selection spans: {firstword.begin()}:{firstword.end()}")
annotation = store.annotate(
target=stam.Selector.textselector(resource_banks, firstword.offset()),
data= {"key": "structuretype", "value": "word", "set": "tutorial-set" },
id=f"AnnotationLine8Word1")
Text selection spans: 92:102
We know the first word of line eight is also part of quote one, for which we already made an annotation (AnnotationQuote1
) before.
Say we are interested in knowing where in quote one the first word of line eight is, we can now easily compute so as follows:
offset = firstword.relative_offset(quote1_selection)
print(f"Offset in quote one: {offset.begin()}:{offset.end()}")
Offset in quote one: 43:53
While we are at it, another conversion option that may come handy when working on a lower-level is the conversion from/to UTF-8 byte offsets. Both STAM and Python use unicode character points. Internally STAM already maps these to UTF-8 byte offsets for things like text slicing, but if you need this information you can extract it explicitly:
beginbyte = resource_banks.utf8byte(firstword.begin())
endbyte = resource_banks.utf8byte(firstword.end())
print(f"Byte offset: {beginbyte}:{endbyte}")
#and back again:
beginpos = resource_banks.utf8byte_to_charpos(beginbyte)
endpos = resource_banks.utf8byte_to_charpos(endbyte)
assert beginpos == firstword.begin()
assert endpos == firstword.end()
Byte offset: 92:102
In this case they happen to be equal because we're basically only using ASCII in our text, but as soon as you deal with multibyte characters (diacritics, other scripts, etc), they will not!
What else can we annotate? We can mark all individual words or tokens,
effectively performing simple tokenisation. For this, we will use the regular
expression search that is built into the STAM library, find_text_regex()
. The
regular expressions follow Rust's regular expression
syntax which may differ slightly
from Python's native implementation.
expressions = [
r"\w+(?:[-_]\w+)*", #this detects words,possibly with hyphens or underscores as part of it
r"[\.\?,/]+", #this detects a variety of punctuation
r"[0-9]+(?:[,\.][0-9]+)*", #this detects numbers, possibly with a fractional part
]
structuretypes = ["word", "punctuation", "number"]
for i, matchresult in enumerate(resource_banks.find_text_regex(expressions)):
#(we only have one textselection per match, but an regular expression may result in multiple textselections if capture groups are used)
textselection = matchresult['textselections'][0]
structuretype = structuretypes[matchresult['expression_index']]
print(f"Annotating \"{textselection}\" at {textselection.offset()} as {structuretype}")
store.annotate(
target=stam.Selector.textselector(resource_banks, textselection.offset()),
data=[
{"key": "structuretype", "value": structuretype, "set": "tutorial-set" }
],
id=f"AnnotationToken{i+1}")
Annotating "Consider" at 3:11 as word Annotating "Phlebas" at 12:19 as word Annotating "author" at 22:28 as word Annotating "Iain" at 29:33 as word Annotating "M" at 34:35 as word Annotating "." at 35:36 as punctuation Annotating "Banks" at 37:42 as word Annotating "1" at 47:48 as word Annotating "Everything" at 49:59 as word Annotating "about" at 60:65 as word Annotating "us" at 66:68 as word Annotating "," at 68:69 as punctuation Annotating "everything" at 70:80 as word Annotating "around" at 81:87 as word Annotating "us" at 88:90 as word Annotating "," at 90:91 as punctuation Annotating "everything" at 92:102 as word Annotating "we" at 103:105 as word Annotating "know" at 106:110 as word Annotating "and" at 112:115 as word Annotating "can" at 116:119 as word Annotating "know" at 120:124 as word Annotating "of" at 125:127 as word Annotating "is" at 129:131 as word Annotating "composed" at 132:140 as word Annotating "ultimately" at 141:151 as word Annotating "of" at 152:154 as word Annotating "patterns" at 155:163 as word Annotating "of" at 164:166 as word Annotating "nothing" at 167:174 as word Annotating "that" at 176:180 as word Annotating "s" at 181:182 as word Annotating "the" at 183:186 as word Annotating "bottom" at 187:193 as word Annotating "line" at 194:198 as word Annotating "," at 198:199 as punctuation Annotating "the" at 200:203 as word Annotating "final" at 204:209 as word Annotating "truth" at 210:215 as word Annotating "." at 215:216 as punctuation Annotating "So" at 218:220 as word Annotating "where" at 221:226 as word Annotating "we" at 227:229 as word Annotating "find" at 230:234 as word Annotating "we" at 235:237 as word Annotating "have" at 238:242 as word Annotating "any" at 243:246 as word Annotating "control" at 247:254 as word Annotating "over" at 255:259 as word Annotating "those" at 260:265 as word Annotating "patterns" at 266:274 as word Annotating "," at 274:275 as punctuation Annotating "why" at 276:279 as word Annotating "not" at 280:283 as word Annotating "make" at 284:288 as word Annotating "the" at 289:292 as word Annotating "most" at 293:297 as word Annotating "elegant" at 298:305 as word Annotating "ones" at 306:310 as word Annotating "," at 310:311 as punctuation Annotating "the" at 312:315 as word Annotating "most" at 316:320 as word Annotating "enjoyable" at 321:330 as word Annotating "and" at 331:334 as word Annotating "good" at 335:339 as word Annotating "ones" at 340:344 as word Annotating "," at 344:345 as punctuation Annotating "in" at 346:348 as word Annotating "our" at 349:352 as word Annotating "own" at 353:356 as word Annotating "terms" at 357:362 as word Annotating "?" at 362:363 as punctuation Annotating "2" at 368:369 as word Annotating "Besides" at 370:377 as word Annotating "," at 377:378 as punctuation Annotating "it" at 379:381 as word Annotating "left" at 382:386 as word Annotating "the" at 387:390 as word Annotating "humans" at 391:397 as word Annotating "in" at 398:400 as word Annotating "the" at 401:404 as word Annotating "Culture" at 405:412 as word Annotating "free" at 413:417 as word Annotating "to" at 418:420 as word Annotating "take" at 421:425 as word Annotating "care" at 426:430 as word Annotating "of" at 431:433 as word Annotating "the" at 434:437 as word Annotating "things" at 438:444 as word Annotating "that" at 445:449 as word Annotating "really" at 450:456 as word Annotating "mattered" at 457:465 as word Annotating "in" at 466:468 as word Annotating "life" at 469:473 as word Annotating "," at 473:474 as punctuation Annotating "such" at 475:479 as word Annotating "as" at 480:482 as word Annotating "sports" at 484:490 as word Annotating "," at 490:491 as punctuation Annotating "games" at 492:497 as word Annotating "," at 497:498 as punctuation Annotating "romance" at 499:506 as word Annotating "," at 506:507 as punctuation Annotating "studying" at 509:517 as word Annotating "dead" at 518:522 as word Annotating "languages" at 523:532 as word Annotating "," at 532:533 as punctuation Annotating "barbarian" at 534:543 as word Annotating "societies" at 544:553 as word Annotating "and" at 554:557 as word Annotating "impossible" at 558:568 as word Annotating "problems" at 569:577 as word Annotating "," at 577:578 as punctuation Annotating "and" at 579:582 as word Annotating "climbing" at 583:591 as word Annotating "high" at 592:596 as word Annotating "mountains" at 597:606 as word Annotating "without" at 607:614 as word Annotating "the" at 615:618 as word Annotating "aid" at 619:622 as word Annotating "of" at 623:625 as word Annotating "a" at 626:627 as word Annotating "safety" at 628:634 as word Annotating "harness" at 635:642 as word Annotating "." at 642:643 as punctuation
In this code, each matchresult
tracks which of the three expressions was
matches, in matchresult['expression_index']
. We conveniently use that
information to tie new values for structuretype
, all of which will be added
to our vocabulary (AnnotationDataSet
) on-the-fly.
Thus-far we have only seen annotations directly on the text, using
Selector.textselector()
, but STAM has various other selectors. Users may
appreciate if you add a bit of metadata about your texts. In STAM, these are
annotations that point at the resource as a whole using a
Selector.resourceselector()
, rather than at the text specifically. We add one
metadata annotation with various new fields:
annotation = store.annotate(
target=stam.Selector.resourceselector(resource_banks),
data=[
{"key": "name", "value": "Culture quotes from Iain Banks", "set": "tutorial-set" },
{"key": "compiler", "value": "Dirk Roorda", "set": "tutorial-set" },
{"key": "source", "value": "https://www.goodreads.com/work/quotes/14366-consider-phlebas", "set": "tutorial-set" },
{"key": "version", "value": "0.2", "set": "tutorial-set" },
],
id="Metadata1")
Similarly, we could annotate an AnnotationDataSet
(our vocabulary) with metadata, using a Selector.datasetselector()
.
print(f"{store.annotations_len()} annotations")
print(f"{store.resources_len()} resource")
print(f"{store.datasets_len()} annotation dataset")
print(f"{dataset.keys_len()} datakeys in our dataset")
print(f"{dataset.data_len()} annotationdata instances in our dataset")
153 annotations 1 resource 1 annotation dataset 6 datakeys in our dataset 31 annotationdata instances in our dataset
If we zoom in on the annotation data in our annotation dataset, we can extract some interesting frequency statistics right away:
for data in dataset:
count = data.annotations_len()
print(f"{data.key()}: {data.value()} occurs in {count} annotation(s)")
structuretype: titleheader occurs in 1 annotation(s) structuretype: sectionheader occurs in 2 annotation(s) structuretype: quote occurs in 2 annotation(s) structuretype: line occurs in 21 annotation(s) linenr: 1 occurs in 1 annotation(s) linenr: 2 occurs in 1 annotation(s) linenr: 3 occurs in 1 annotation(s) linenr: 4 occurs in 1 annotation(s) linenr: 5 occurs in 1 annotation(s) linenr: 6 occurs in 1 annotation(s) linenr: 7 occurs in 1 annotation(s) linenr: 8 occurs in 1 annotation(s) linenr: 9 occurs in 1 annotation(s) linenr: 10 occurs in 1 annotation(s) linenr: 11 occurs in 1 annotation(s) linenr: 12 occurs in 1 annotation(s) linenr: 13 occurs in 1 annotation(s) linenr: 14 occurs in 1 annotation(s) linenr: 15 occurs in 1 annotation(s) linenr: 16 occurs in 1 annotation(s) linenr: 17 occurs in 1 annotation(s) linenr: 18 occurs in 1 annotation(s) linenr: 19 occurs in 1 annotation(s) linenr: 20 occurs in 1 annotation(s) linenr: 21 occurs in 1 annotation(s) structuretype: word occurs in 109 annotation(s) structuretype: punctuation occurs in 17 annotation(s) name: Culture quotes from Iain Banks occurs in 1 annotation(s) compiler: Dirk Roorda occurs in 1 annotation(s) source: https://www.goodreads.com/work/quotes/14366-consider-phlebas occurs in 1 annotation(s) version: 0.2 occurs in 1 annotation(s)
We can also aggregate only by key, although that is slightly less informative for our example case:
for key in dataset.keys():
count = key.annotations_count() #this one is called _count instead of _len because it is not instantaneous like the other one
print(f"{key} occurs in {count} annotation(s)")
structuretype occurs in 152 annotation(s) linenr occurs in 21 annotation(s) name occurs in 1 annotation(s) compiler occurs in 1 annotation(s) source occurs in 1 annotation(s) version occurs in 1 annotation(s)
Just like we iterated over the annotation dataset above, we can also iterate over various things in the AnnotationStore
. Let's write a small script that simply prints out most of the things in our store. At this point though, the output will get a bit verbose:
print("Datasets:")
for dataset in store.datasets():
print(f" - ID: {dataset.id()}")
print("Resources:")
for resource in store.resources():
print(f" - ID: {resource.id()}")
print(f" - Text length: {resource.textlen()}")
print("Annotations:")
for annotation in store.annotations():
print(f" - ID: {annotation.id()}")
print(f" Target selector type: {annotation.selector_kind()}")
print(f" Target resources: {annotation.resources()}")
print(f" Target offset: {annotation.offset()}")
print(f" Target text: {annotation.text()}")
print(f" Target annotations: ", [ a.id() for a in annotation.annotations_in_targets() ])
print(f" Data:")
for data in annotation:
print(f" - ID: {data.id()}")
print(f" Set: {data.dataset().id()}")
print(f" Key: {data.key()}")
print(f" Value: {data.value()}")
Datasets: - ID: tutorial-set Resources: - ID: banks - Text length: 644 Annotations: - ID: Annotation1 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 1:19 Target text: ['# Consider Phlebas'] Target annotations: [] Data: - ID: Data1 Set: tutorial-set Key: structuretype Value: titleheader - ID: Annotation2 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 44:48 Target text: ['## 1'] Target annotations: [] Data: - ID: Data2 Set: tutorial-set Key: structuretype Value: sectionheader - ID: Annotation3 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 365:369 Target text: ['## 2'] Target annotations: [] Data: - ID: Data2 Set: tutorial-set Key: structuretype Value: sectionheader - ID: AnnotationQuote1 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 49:363 Target text: ['Everything about us,\neverything around us,\neverything we know [and can know of] is composed ultimately of patterns of nothing;\nthat’s the bottom line, the final truth.\n\nSo where we find we have any control over those patterns,\nwhy not make the most elegant ones, the most enjoyable and good ones,\nin our own terms?'] Target annotations: [] Data: - ID: Data3 Set: tutorial-set Key: structuretype Value: quote - ID: AnnotationQuote2 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 370:643 Target text: ['Besides,\nit left the humans in the Culture free to take care of the things that really mattered in life,\nsuch as [sports, games, romance,] studying dead languages,\nbarbarian societies and impossible problems,\nand climbing high mountains without the aid of a safety harness.'] Target annotations: [] Data: - ID: Data3 Set: tutorial-set Key: structuretype Value: quote - ID: AnnotationLine1 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 0:0 Target text: [''] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine1 Set: tutorial-set Key: linenr Value: 1 - ID: AnnotationLine2 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 1:19 Target text: ['# Consider Phlebas'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine2 Set: tutorial-set Key: linenr Value: 2 - ID: AnnotationLine3 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 20:42 Target text: ['$ author=Iain M. Banks'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine3 Set: tutorial-set Key: linenr Value: 3 - ID: AnnotationLine4 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 43:43 Target text: [''] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine4 Set: tutorial-set Key: linenr Value: 4 - ID: AnnotationLine5 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 44:48 Target text: ['## 1'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine5 Set: tutorial-set Key: linenr Value: 5 - ID: AnnotationLine6 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 49:69 Target text: ['Everything about us,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine6 Set: tutorial-set Key: linenr Value: 6 - ID: AnnotationLine7 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 70:91 Target text: ['everything around us,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine7 Set: tutorial-set Key: linenr Value: 7 - ID: AnnotationLine8 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 92:175 Target text: ['everything we know [and can know of] is composed ultimately of patterns of nothing;'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine8 Set: tutorial-set Key: linenr Value: 8 - ID: AnnotationLine9 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 176:216 Target text: ['that’s the bottom line, the final truth.'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine9 Set: tutorial-set Key: linenr Value: 9 - ID: AnnotationLine10 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 217:217 Target text: [''] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine10 Set: tutorial-set Key: linenr Value: 10 - ID: AnnotationLine11 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 218:275 Target text: ['So where we find we have any control over those patterns,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine11 Set: tutorial-set Key: linenr Value: 11 - ID: AnnotationLine12 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 276:345 Target text: ['why not make the most elegant ones, the most enjoyable and good ones,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine12 Set: tutorial-set Key: linenr Value: 12 - ID: AnnotationLine13 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 346:363 Target text: ['in our own terms?'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine13 Set: tutorial-set Key: linenr Value: 13 - ID: AnnotationLine14 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 364:364 Target text: [''] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine14 Set: tutorial-set Key: linenr Value: 14 - ID: AnnotationLine15 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 365:369 Target text: ['## 2'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine15 Set: tutorial-set Key: linenr Value: 15 - ID: AnnotationLine16 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 370:378 Target text: ['Besides,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine16 Set: tutorial-set Key: linenr Value: 16 - ID: AnnotationLine17 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 379:474 Target text: ['it left the humans in the Culture free to take care of the things that really mattered in life,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine17 Set: tutorial-set Key: linenr Value: 17 - ID: AnnotationLine18 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 475:533 Target text: ['such as [sports, games, romance,] studying dead languages,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine18 Set: tutorial-set Key: linenr Value: 18 - ID: AnnotationLine19 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 534:578 Target text: ['barbarian societies and impossible problems,'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine19 Set: tutorial-set Key: linenr Value: 19 - ID: AnnotationLine20 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 579:643 Target text: ['and climbing high mountains without the aid of a safety harness.'] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine20 Set: tutorial-set Key: linenr Value: 20 - ID: AnnotationLine21 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 644:644 Target text: [''] Target annotations: [] Data: - ID: Data4 Set: tutorial-set Key: structuretype Value: line - ID: DataLine21 Set: tutorial-set Key: linenr Value: 21 - ID: AnnotationLine8Word1 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 92:102 Target text: ['everything'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken1 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 3:11 Target text: ['Consider'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken2 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 12:19 Target text: ['Phlebas'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken3 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 22:28 Target text: ['author'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken4 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 29:33 Target text: ['Iain'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken5 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 34:35 Target text: ['M'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken6 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 35:36 Target text: ['.'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken7 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 37:42 Target text: ['Banks'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken8 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 47:48 Target text: ['1'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken9 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 49:59 Target text: ['Everything'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken10 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 60:65 Target text: ['about'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken11 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 66:68 Target text: ['us'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken12 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 68:69 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken13 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 70:80 Target text: ['everything'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken14 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 81:87 Target text: ['around'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken15 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 88:90 Target text: ['us'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken16 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 90:91 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken17 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 92:102 Target text: ['everything'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken18 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 103:105 Target text: ['we'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken19 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 106:110 Target text: ['know'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken20 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 112:115 Target text: ['and'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken21 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 116:119 Target text: ['can'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken22 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 120:124 Target text: ['know'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken23 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 125:127 Target text: ['of'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken24 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 129:131 Target text: ['is'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken25 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 132:140 Target text: ['composed'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken26 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 141:151 Target text: ['ultimately'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken27 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 152:154 Target text: ['of'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken28 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 155:163 Target text: ['patterns'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken29 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 164:166 Target text: ['of'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken30 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 167:174 Target text: ['nothing'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken31 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 176:180 Target text: ['that'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken32 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 181:182 Target text: ['s'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken33 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 183:186 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken34 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 187:193 Target text: ['bottom'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken35 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 194:198 Target text: ['line'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken36 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 198:199 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken37 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 200:203 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken38 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 204:209 Target text: ['final'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken39 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 210:215 Target text: ['truth'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken40 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 215:216 Target text: ['.'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken41 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 218:220 Target text: ['So'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken42 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 221:226 Target text: ['where'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken43 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 227:229 Target text: ['we'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken44 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 230:234 Target text: ['find'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken45 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 235:237 Target text: ['we'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken46 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 238:242 Target text: ['have'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken47 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 243:246 Target text: ['any'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken48 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 247:254 Target text: ['control'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken49 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 255:259 Target text: ['over'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken50 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 260:265 Target text: ['those'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken51 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 266:274 Target text: ['patterns'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken52 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 274:275 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken53 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 276:279 Target text: ['why'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken54 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 280:283 Target text: ['not'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken55 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 284:288 Target text: ['make'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken56 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 289:292 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken57 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 293:297 Target text: ['most'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken58 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 298:305 Target text: ['elegant'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken59 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 306:310 Target text: ['ones'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken60 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 310:311 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken61 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 312:315 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken62 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 316:320 Target text: ['most'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken63 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 321:330 Target text: ['enjoyable'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken64 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 331:334 Target text: ['and'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken65 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 335:339 Target text: ['good'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken66 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 340:344 Target text: ['ones'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken67 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 344:345 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken68 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 346:348 Target text: ['in'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken69 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 349:352 Target text: ['our'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken70 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 353:356 Target text: ['own'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken71 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 357:362 Target text: ['terms'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken72 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 362:363 Target text: ['?'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken73 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 368:369 Target text: ['2'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken74 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 370:377 Target text: ['Besides'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken75 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 377:378 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken76 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 379:381 Target text: ['it'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken77 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 382:386 Target text: ['left'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken78 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 387:390 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken79 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 391:397 Target text: ['humans'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken80 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 398:400 Target text: ['in'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken81 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 401:404 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken82 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 405:412 Target text: ['Culture'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken83 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 413:417 Target text: ['free'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken84 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 418:420 Target text: ['to'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken85 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 421:425 Target text: ['take'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken86 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 426:430 Target text: ['care'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken87 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 431:433 Target text: ['of'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken88 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 434:437 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken89 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 438:444 Target text: ['things'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken90 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 445:449 Target text: ['that'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken91 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 450:456 Target text: ['really'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken92 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 457:465 Target text: ['mattered'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken93 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 466:468 Target text: ['in'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken94 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 469:473 Target text: ['life'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken95 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 473:474 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken96 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 475:479 Target text: ['such'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken97 Target selector type: <stam.SelectorKind object at 0x71a048f19230> Target resources: [<stam.TextResource object at 0x71a048f19230>] Target offset: 480:482 Target text: ['as'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken98 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 484:490 Target text: ['sports'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken99 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 490:491 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken100 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 492:497 Target text: ['games'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken101 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 497:498 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken102 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 499:506 Target text: ['romance'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken103 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 506:507 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken104 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 509:517 Target text: ['studying'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken105 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 518:522 Target text: ['dead'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken106 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 523:532 Target text: ['languages'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken107 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 532:533 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken108 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 534:543 Target text: ['barbarian'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken109 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 544:553 Target text: ['societies'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken110 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 554:557 Target text: ['and'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken111 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 558:568 Target text: ['impossible'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken112 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 569:577 Target text: ['problems'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken113 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 577:578 Target text: [','] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: AnnotationToken114 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 579:582 Target text: ['and'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken115 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 583:591 Target text: ['climbing'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken116 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 592:596 Target text: ['high'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken117 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 597:606 Target text: ['mountains'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken118 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 607:614 Target text: ['without'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken119 Target selector type: <stam.SelectorKind object at 0x71a048f1a040> Target resources: [<stam.TextResource object at 0x71a048f1a040>] Target offset: 615:618 Target text: ['the'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken120 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 619:622 Target text: ['aid'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken121 Target selector type: <stam.SelectorKind object at 0x71a048f1b300> Target resources: [<stam.TextResource object at 0x71a048f1b300>] Target offset: 623:625 Target text: ['of'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken122 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 626:627 Target text: ['a'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken123 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 628:634 Target text: ['safety'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken124 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [<stam.TextResource object at 0x71a04998e070>] Target offset: 635:642 Target text: ['harness'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: word - ID: AnnotationToken125 Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0> Target resources: [<stam.TextResource object at 0x71a048f1b5a0>] Target offset: 642:643 Target text: ['.'] Target annotations: [] Data: - ID: None Set: tutorial-set Key: structuretype Value: punctuation - ID: Metadata1 Target selector type: <stam.SelectorKind object at 0x71a04998e070> Target resources: [] Target offset: None Target text: [] Target annotations: [] Data: - ID: None Set: tutorial-set Key: name Value: Culture quotes from Iain Banks - ID: None Set: tutorial-set Key: compiler Value: Dirk Roorda - ID: None Set: tutorial-set Key: source Value: https://www.goodreads.com/work/quotes/14366-consider-phlebas - ID: None Set: tutorial-set Key: version Value: 0.2
We already introduced the methods annotations()
, data()
and textselections()
in a previous sections.
They return collections, classes like Annotations
, Data
or TextSelections
, which in turn
contain instances of Annotation
, AnnotationData
, and TextSelection
,
respectively.
Internally the STAM library maintains various forward and reverse indices, representing relationships between all kinds of entities in the STAM model. The aforementioned methods operate via these indices.
The annotations()
method is often a lookup via the reverse index. We have
already seen some example of it. Another nice example of the reverse index is
that it allows us to obtain annotations for any arbitrary selection of the text
we make:
textselection = resource_banks.textselection(stam.Offset.simple(155,163))
for annotation in textselection.annotations():
print(f" - ID: {annotation.id()}")
print(f" Text: {str(annotation)}")
print(f" Data:")
for data in annotation:
print(f" {data.key()}={data.value()}")
- ID: AnnotationToken28 Text: patterns Data: structuretype=word
Of course, I cheated a bit here and knew in advance there was going to be a match for this offset, but the point to take home is that given any textselection, you can easily get annotations that reference it.
In the above example we iterate over all annotations and then over all the data
pertaining to the found annotations. Often though, you are searching for
specific data and would have some kind of extra test in there. This is
accomplished by passing filters via positional arguments or keyword arguments
like value
, to the annotations()
method. We have seen an example of this
before, here is another:
textselection = resource_banks.textselection(stam.Offset.simple(155,163))
dataset = store.dataset("tutorial-set")
key = dataset.key("structuretype")
for annotation in textselection.annotations(key, value="word"):
print(f" - ID: {annotation.id()}")
print(f" Text: {str(annotation)}")
- ID: AnnotationToken28 Text: patterns
The use of filters in methods like annotations()
and data()
is always
preferable to manually writing it out in lower-level code, because the internal
library is more performant and passing data back and forth to Python always
comes with a performance penalty.
In the example above, however, we see that we filter on data, but do not actually get the data that was matched as a return value. If you do want that, you need a two-step process as follows:
textselection = resource_banks.textselection(stam.Offset.simple(155,163))
dataset = store.dataset("tutorial-set")
key = dataset.key("structuretype")
for annotation in textselection.annotations(key, value="word"):
print(f" - ID: {annotation.id()}")
print(f" Text: {str(annotation)}")
annotationdata = annotation.data(key, value="word",limit=1)[0]
print(f" Data: {str(annotationdata)}")
- ID: AnnotationToken28 Text: patterns Data: word
Sometimes you don't really care to retrieve the data or the annotations, but
merely want to test whether certain data is present on an annotation and return
a boolean. For this use can use methods like test_annotations()
and test_data()
, which take the same
keyword parameters for filtering as their counterparts annotations()
and data()
, but instead of returning a collection, it simply returns a boolean, which is more performant.
This following example confirms to us that the textselection is indeed a word:
textselection = resource_banks.textselection(stam.Offset.simple(155,163))
dataset = store.dataset("tutorial-set")
key = dataset.key("structuretype")
assert textselection.test_data(key, value="word")
It is possible to retrieve all known text selections for a given resource. A text selection is 'known' if there is at least one annotation that references it:
for textselection in resource_banks.textselections():
print(textselection)
# Consider Phlebas Consider Phlebas $ author=Iain M. Banks author Iain M . Banks ## 1 1 Everything about us, everything around us, everything we know [and can know of] is composed ultimately of patterns of nothing; that’s the bottom line, the final truth. So where we find we have any control over those patterns, why not make the most elegant ones, the most enjoyable and good ones, in our own terms? Everything about us, Everything about us , everything around us, everything around us , everything we know [and can know of] is composed ultimately of patterns of nothing; everything we know and can know of is composed ultimately of patterns of nothing that’s the bottom line, the final truth. that s the bottom line , the final truth . So where we find we have any control over those patterns, So where we find we have any control over those patterns , why not make the most elegant ones, the most enjoyable and good ones, why not make the most elegant ones , the most enjoyable and good ones , in our own terms? in our own terms ? ## 2 2 Besides, it left the humans in the Culture free to take care of the things that really mattered in life, such as [sports, games, romance,] studying dead languages, barbarian societies and impossible problems, and climbing high mountains without the aid of a safety harness. Besides, Besides , it left the humans in the Culture free to take care of the things that really mattered in life, it left the humans in the Culture free to take care of the things that really mattered in life , such as [sports, games, romance,] studying dead languages, such as sports , games , romance , studying dead languages , barbarian societies and impossible problems, barbarian societies and impossible problems , and climbing high mountains without the aid of a safety harness. and climbing high mountains without the aid of a safety harness .
It's easy to see how you can combine some of the examples to retrieve all annotations in a reverse way (i.e. via the text).
You can consider a STAM model as a graph in which the annotations, resource,
data make up the nodes. The forward indices and reverse indices encode how
these nodes are related and form the edges of the graph. These edges can be
traversed in almost any direction using the various methods at your disposal in
this STAM library. Methods like data()
,annotations()
, textselections()
and their filtering abilities, as well as their test counterparts, essential
tools to accomplish this.
Now we get to the fun part. When you select any two parts of a text, i.e. create two text selections, then between these text selections there can be a number of relationships that hold true or not:
In STAM, the TextSelectionOperator
captures these relationships.
Remember our example in which we annotated the first word of line eight? The
textselection for this word is embedded within the textselection for line
eight as a whole. We can test that as follows using the test()
method on
TextSelection
:
assert firstword.test(stam.TextSelectionOperator.embedded(), line8_textselection)
# the reverse then also holds:
assert line8_textselection.test(stam.TextSelectionOperator.embeds(), firstword)
# an embedding is essentially a stricter form of an overlap relation, so this holds too:
assert firstword.test(stam.TextSelectionOperator.overlaps(), line8_textselection)
assert line8_textselection.test(stam.TextSelectionOperator.overlaps(), firstword)
Not only can we test any given text selections, we can use this functionality
to actively find text selections that are in a particular relationship with
another, in other words we find related text selections. This is a
core feature of the STAM library and a primary method of finding text
selections and their annotations. We use the related_text()
method for this.
Let's find all text selections (which we previously annotated) in line eight:
for textselection in line8_textselection.related_text(stam.TextSelectionOperator.embeds()):
print(f"{textselection} @{textselection.offset()}")
everything @92:102 we @103:105 know @106:110 and @112:115 can @116:119 know @120:124 of @125:127 is @129:131 composed @132:140 ultimately @141:151 of @152:154 patterns @155:163 of @164:166 nothing @167:174
Often, what we are interested in is not the text selections as such, but the annotations that reference these text selections.
Simply add .annotations()
:
for annotation in line8_textselection.related_text(stam.TextSelectionOperator.embeds()).annotations():
print(f" - ID: {annotation.id()}")
print(f" Text: {str(annotation)}")
print(f" Data:")
for data in annotation:
print(f" {data.key()}={data.value()}")
- ID: AnnotationLine8Word1 Text: everything Data: structuretype=word - ID: AnnotationToken17 Text: everything Data: structuretype=word - ID: AnnotationToken18 Text: we Data: structuretype=word - ID: AnnotationToken19 Text: know Data: structuretype=word - ID: AnnotationToken20 Text: and Data: structuretype=word - ID: AnnotationToken21 Text: can Data: structuretype=word - ID: AnnotationToken22 Text: know Data: structuretype=word - ID: AnnotationToken23 Text: of Data: structuretype=word - ID: AnnotationToken24 Text: is Data: structuretype=word - ID: AnnotationToken25 Text: composed Data: structuretype=word - ID: AnnotationToken26 Text: ultimately Data: structuretype=word - ID: AnnotationToken27 Text: of Data: structuretype=word - ID: AnnotationToken28 Text: patterns Data: structuretype=word - ID: AnnotationToken29 Text: of Data: structuretype=word - ID: AnnotationToken30 Text: nothing Data: structuretype=word
The related_text()
method is available on TextSelection
(and TextSelections
) and Annotation
(and Annotations
) in
which case the latter is again a shortcut so you don't have to retrieve the
text selections yourself first. As said before: do use all the shortcuts the
library offers, because the more the library can do for you, the more
performant things are, as it's compiled to machine code and not written in
Python itself.
In the last output, you may note that we got two annotations for the first word of line eight, that's because we did one manually, and the other one via our regular-expression based tokeniser.
In the previous example all we got was data with key structuretype
and value word
. We could have specifically selected for this by adding some filters to annotations()
:
key = store.dataset("tutorial-set").key("structuretype")
for annotation in line8_textselection.related_text(stam.TextSelectionOperator.embeds()).annotations(key, value="word"):
print(f" - ID: {annotation.id()}")
print(f" Text: {str(annotation)}")
print(f" Data:")
for data in annotation:
print(f" {data.key()}={data.value()}")
STAM Query error: [StamError] QuerySyntaxError: Malformed query: Variable ?v1 of type KEY not found - QUERY DEBUG: [ Query { name: Some( "main", ), querytype: Select, resulttype: Some( Annotation, ), assignments: [], constraints: [ TextSelections( Collection<TextSelection> { array: [ ( TextResourceHandle( 0, ), TextSelectionHandle( 23, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 40, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 41, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 42, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 43, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 44, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 45, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 46, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 47, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 48, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 49, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 50, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 51, ), ), ( TextResourceHandle( 0, ), TextSelectionHandle( 52, ), ), ], sorted: false, }, Normal, ), ], subquery: None, contextvars: {}, }, Query { name: Some( "sub", ), querytype: Select, resulttype: Some( Annotation, ), assignments: [], constraints: [ TextVariable( "main", ), KeyValueVariable( "v1", Equals( "word", ), Normal, ), ], subquery: None, contextvars: { "v1": DataKey( AnnotationDataSetHandle( 0, ), DataKeyHandle( 0, ), ), }, }, ] ()
Instead of querying data using the various python objects and methods we have
seen thus-far, it is also possible to formulate a query in a query language
called STAMQL. The query language is described in detail here <https://github.com/annotation/stam/tree/master/extensions/stam-query>_
. We
will only cover some of the basics here and show how to call it from Python.
A query starts with a SELECT
statement, then a return type specifying what
kind of data you want the query to return (ANNOTATION
, DATA
, TEXT
,
KEY
,DATASET
). Then you must specify a variable name to bind the results we obtain to (variables always start with a ?
in STMAQL) and WHERE
statement introducing a series of one or more constraints, each ends with a semicolon.
Let's illustrate all this with an example, we obtain obtain line 8 from our data, which we had explicitly annotated earlier:
query = """
SELECT ANNOTATION ?a WHERE
DATA "tutorial-set" "linenr" = 8;
"""
for result in store.query(query):
annotation = result['a']
assert isinstance(annotation, stam.Annotation)
print("ID: ", annotation.id())
print("Text: ", str(annotation))
ID: AnnotationLine8 Text: everything we know [and can know of] is composed ultimately of patterns of nothing;
Here we formulated a query in STAMQL and passed it to the query()
method as a
string, and this gives us the results back in a list of dictionaries. The keys
in the dictionary correspond to the variable binds we chose in the SELECT
statement (without the ?
prefix). In this case we obtain one result
containing one variable a
.
Instead of querying for the annotation, we could have queried directly for the text as well, we could also add extra constraints that must all be satisfied:
query = """
SELECT TEXT ?t WHERE
DATA "tutorial-set" "linenr" = 8;
DATA "tutorial-set" "structuretype" = "line";
"""
for result in store.query(query):
print(result['t'])
everything we know [and can know of] is composed ultimately of patterns of nothing;
Querying for text rather than annotations has a subtle difference when you add multiple DATA
constraints like we did above. If we query for text, then it selects text which has annotations with the specified data. The data does not necessarily have to pertain to the same annotation (as long as it covers the same text). If you query for annotations and have multiple DATA
constraints, then a single annotation must have both data items.
The query language supports query composition to chain multiple queries/subqueries together. A subquery is introduced using curled braces. Take a look at the following example where we again select line 8, and then all words in line 8 (here we use a textual overlap relation):
query = """
SELECT ANNOTATION ?line WHERE
DATA "tutorial-set" "linenr" = 8;
{
SELECT ANNOTATION ?word WHERE
RELATION ?line EMBEDS;
DATA "tutorial-set" "structuretype" = "word";
}
"""
for result in store.query(query):
#the ?line annotation will be returned for each
assert 'line' in result
annotation = result['word']
assert isinstance(annotation, stam.Annotation)
print("ID: ", annotation.id())
print("Text: ", str(annotation))
ID: AnnotationLine8Word1 Text: everything ID: AnnotationToken17 Text: everything ID: AnnotationToken18 Text: we ID: AnnotationToken19 Text: know ID: AnnotationToken20 Text: and ID: AnnotationToken21 Text: can ID: AnnotationToken22 Text: know ID: AnnotationToken23 Text: of ID: AnnotationToken24 Text: is ID: AnnotationToken25 Text: composed ID: AnnotationToken26 Text: ultimately ID: AnnotationToken27 Text: of ID: AnnotationToken28 Text: patterns ID: AnnotationToken29 Text: of ID: AnnotationToken30 Text: nothing
The constraint RELATION ?line EMBEDS;
in the subquery is essential here, it
can be read as "?line embeds ?word" and ensures that there is a specific
textual relation between the two select statements. . It is even a requirement
in a subquery to have a constraint that refers back to the parent query. Each
subquery can itself have a subquery to you can build long chains.
Aside from EMBEDS
, there other relations you can use such as OVERLAPS
,
PRECEDES
, SUCCEEDS
, BEFORE
, AFTER
, SAMEBEGIN
, SAMEEND
, EQUALS
.
These are the STAMQL keywords representing the TextSelectionOperator
you have
already seen before.
You have the choice whether to express your queries through STAMQL or using
Python objects and methods. Internally, the stam library will convert the
latter to the former whenever you apply any filtering, so there is not too much
difference performance-wise. There is some performance overhead though in the
conversion of results when you call query()
explicitly with a STAMQL query.
When calling query()
, you may inject context variables yourself via keyword
arguments. These will subsequently be available to be used as constraints in
your query. As an example, we repeat the previous query but inject the line
variable manually, we already had an instance to it laying around anyway:
query = """
SELECT ANNOTATION ?word WHERE
RELATION ?line EMBEDS;
DATA "tutorial-set" "structuretype" = "word";
"""
for result in store.query(query, line=line8_textselection):
annotation = result['word']
assert isinstance(annotation, stam.Annotation)
print("ID: ", annotation.id())
print("Text: ", str(annotation))
ID: AnnotationLine8Word1 Text: everything ID: AnnotationToken17 Text: everything ID: AnnotationToken18 Text: we ID: AnnotationToken19 Text: know ID: AnnotationToken20 Text: and ID: AnnotationToken21 Text: can ID: AnnotationToken22 Text: know ID: AnnotationToken23 Text: of ID: AnnotationToken24 Text: is ID: AnnotationToken25 Text: composed ID: AnnotationToken26 Text: ultimately ID: AnnotationToken27 Text: of ID: AnnotationToken28 Text: patterns ID: AnnotationToken29 Text: of ID: AnnotationToken30 Text: nothing
All annotations we have done so far reference the text as a whole with absolute offsets via a TextSelector, even though we formulated some of these offsets (first word of line eight) in relative terms.
STAM also allows you to adopt another annotation paradigm in which you point an annotation not at a text via TextSelector, but at another annotation via an AnnotationSelector, and that other annotation, or the final one of however many there are in between, points at the text with a TextSelector. You can specify an offset, which will then be interpreted relative to the [text selection of] the targeted annotation:
line8 = store.annotation("AnnotationLine8")
annotation = store.annotate(
target=stam.Selector.annotationselector(line8, stam.Offset.simple(0,10)),
data= {"key": "structuretype", "value": "word", "set": "tutorial-set" },
id=f"AnnotationLine8Word1_explicit")
Here we are effectively annotating an annotation, so we call this a form of higher-order annotation. We explicitly capture and model a relationship. Whether to do this explicitly or use the STAM library's functionality to resolve it implicitly is entirely up to you, the modeller, and your use-case!
We can also do higher-order annotation to associate metadata with annotations, such as encoding the person who did the annotation. In such cases, we can choose not to reference the text at all, because the annotation no longer says something about the text.
line8 = store.annotation("AnnotationLine8")
annotation = store.annotate(
target=stam.Selector.annotationselector(line8),
data= [
{"key": "annotator", "value": "Maarten van Gompel", "set": "tutorial-set" },
{"key": "datetime", "value": "2023-04-18T17:48:56", "set": "tutorial-set" },
],
id=f"AnnotationAnnotator")
Note that we invented some more keys that were added on-the-fly to our annotation dataset (i.e. the vocabulary).
This too, needn't be a higher-order annotation, you can chose to associate the
AnnotationData
directly with the annotation. The idea about an annotation
though, is that once it is made, it is immutable; no adding/editing of
annotation data or targets at later points in time. Information such as
annotators and date/time information could well be associated with the
annotation upon creation, but sometimes there may be data which you want to
associate with an annotation at a later point in time. That would be a use case
for higher-order annotation.
Rather than point at a single target, sometimes you want to annotate something that can not be captured by a single simple selector. Take for example, again, line eight from our text:
everything we know [and can know of] is composed ultimately of patterns of nothing
Say we want to annotate the parts of the sentence without the portion in square brackets, then a single text selection could not capture it because it is discontinuous. Two text selections, however, do the job. To combine the two text selectors (or any other type of simple selector) STAM has the CompositeSelector:
part1 = line8_textselection.textselection(stam.Offset.simple(0,18))
part2 = line8_textselection.textselection(stam.Offset.simple(37,82))
line8mainsentence = store.annotate(
target=stam.Selector.compositeselector(
stam.Selector.textselector(resource_banks, part1.offset()),
stam.Selector.textselector(resource_banks, part2.offset()),
),
data= [
{"key": "structuretype", "value": "mainsentence", "set": "tutorial-set" },
],
id=f"AnnotationLine8Mainsentence")
If we ask the STAM library to get the text using str()
, it will concatenate
the parts with a space, which may not always be appropriate:
print(f"\"{line8mainsentence}\"")
assert str(line8mainsentence) == "everything we know is composed ultimately of patterns of nothing"
"everything we know is composed ultimately of patterns of nothing"
Use the text()
method instead if you want to retain the parts:
print(line8mainsentence.text())
assert line8mainsentence.text() == ["everything we know", "is composed ultimately of patterns of nothing"]
['everything we know', 'is composed ultimately of patterns of nothing']
In a similar fashion, you can also call the textselections()
methods to
obtain all text selections. We already used this method before and remarked it
always returns a TextSelections
collection and not just a single text TextSelection
, now you know why.
When the composite selector is used, the target must be interpreted jointly; the annotation applies to the whole composition rather than to individual parts.
There is also the MultiSelector, which selects multiple targets and the annotation applies to each of them individually and independently. It offers a convenient way to express multiple annotations more concisely, conserving memory usage.
Last, there is the DirectionalSelector which expresses multiple targets with a very specific order that is meaningful. For example, taking line eight again, we can express the dependency relation where the word ultimately is an adverbial modifier to the verb composed:
head = line8_textselection.textselection(stam.Offset.simple(41,48))
dependant = line8_textselection.textselection(stam.Offset.simple(50,59))
line8mainsentence = store.annotate(
target=stam.Selector.directionalselector(
stam.Selector.textselector(resource_banks, head.offset()),
stam.Selector.textselector(resource_banks, dependant.offset()),
),
data= [
{"key": "dependency", "value": "advmod", "set": "tutorial-set" },
],
id=f"AnnotationDependency")
You can interpret the different selectors under a directional selector akin to positional function parameters. You, the modeller, determine how the ordering is interpreted.
We already explained how this is a bad idea and should be avoided: the canonical
way to edit an annotation is to remove the old annotation from the store and
make a new one. Removing an annotation, or any other STAM object, can be done by passing it to AnnotationStore.remove()
.
We can dive into the motivation behind this constraint a bit more: From a semantic perspective annotations are essentially a commentary about something else. If that what you comment on is subject to change, possibly unbeknownst to you, then such a change might invalidate your commentary, as it is no longer the same thing as what you based your comment on! The STAM model prevents these pitfalls.
Nevertheless, at the low-level there are ways around this constraint. After all, as long as you don't publish the annotations you have some liberty in editing them. Currently though, the Python library does not yet expose this.
When using AnnotationStore.remove()
on any variable, you must yourself take
care not to use that variable again. Also note that removing an item will
removing everything that depends on it. So if you remove an item like an
annotation, text resource or data item, then all annotations on it and
everything that references it will be automatically removed as well."
All this time we've been annotating but have not committed our results to any form of persistent storage. You will likely want to save your annotation store to file, and load it all again at any later point in time.
STAM's canonical serialisation format is STAM JSON:
store.set_filename("tutorial.store.stam.json")
store.save()
The save()
method will use the filename that the annotation store was
initially loaded from. We had none yet, so we set it via set_filename()
first. In our current example, everything is saved into a single JSON file.
However, the set_filename()
method is also available on AnnotationDataSet
and TextResource
. If set, these are kept in stand-off files. Annotation data
sets usually use STAM JSON, but text resources generally just use plain text.
The extension you use determines the file format.
There is also a STAM CSV format defined as an extension which is supported by this library. Whereas the JSON format is very verbose (=large files), the CSV is a bit more concise.
Loading an annotation store (including all stand-off files) is as simple as:
store2 = stam.AnnotationStore(file="tutorial.store.stam.json")
Having made annotations, you may want to visualize them. This can be done via
the view()
method on AnnotationStore
. It takes as input a selection query
and zero or more highlight queries, all in STAMQL; and produces either HTML
output or colored ANSI text. The outputted HTML is a self-contained and standalone document.
The selection query determines what the main selection is and can be anything you can query that has text (i.e. resources, annotations, text selections).
The highlight queries determine what parts of the selections produced by the
selection query you want to highlight. Highlighting is done by drawing a line
underneath the text and optionally by a tag that shows extra information.
Specific display options are configurable via attributes (starting with @
)
that precede the actual STAMQL query.
Tags can be enabled by prepending the query with one of the following attributes:
@KEYTAG
- Outputs a tag with the key, pertaining to the first DATA constraint in the query@KEYVALUETAG
- Outputs a tag with the key and the value, pertaining to the first DATA constraint in the query@VALUETAG
- Outputs a tag with the value only, pertaining to the first DATA constraint in the query@IDTAG
- Outputs a tag with the public identifier of the ANNOTATION that has been selectedIf you don't want to match the first DATA constraint, but the n-th, then specify a number to refer to the DATA constraint (1-indexed) in the order specifies. Note that only DATA constraints are counted:
@KEYTAG=
n - Outputs a tag with the key, pertaining to the n-th DATA constraint in the query@KEYVALUETAG=
n - Outputs a tag with the key and the value, pertaining to the n-th DATA constraint in the query@VALUETAG=
n - Outputs a tag with the value only, pertaining to the n-th DATA constraint in the queryAttributes may also be provided for styling HTML output:
@STYLE=
class - Will associate the mentioned CSS class (it's up to you to associate a proper stylesheet). The default one predefines only a few simple classes: italic
, bold
, red
,green
,blue
, super
.@HIDE
- Do not add the highlight underline and do not add an entry to the legend. This may be useful if you only want to apply @STYLE
.If no attribute is provided, there will be no tags or styling shown for that query, only a highlight underline in the HTML output.
Note: This is the same functionality as is exposed in the collection of command-line tools called stam-tools.
To display HTML in this Jupyter Notebook we import this first:
from IPython.display import display, HTML
Let's take a look at the data we have been creating thus-far. Let's first just query for the text of the two quotes our document consists of:
display(HTML(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";')))
We can add additional queries to highlight parts of this output, such as the words or line eight, both of which we have annotated earlier:
display(HTML(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";', \
'SELECT ANNOTATION ?word WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "structuretype" = "word";', \
'SELECT ANNOTATION ?line_8 WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "linenr" = 8;')))
It is important that highlight queries always reference the variable from the primary selection query (?quote
in the above example), otherwise they query too much and performance is drastically suboptimal.
We can also output additional tags by prepending an attribute (@IDTAG
,@KEYTAG
,@VALUETAG
or @KEYVALUETAG
) to a highlight query:
display(HTML(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";', \
'@IDTAG SELECT ANNOTATION ?word WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "structuretype" = "word";', \
'@KEYVALUETAG SELECT ANNOTATION ?line_8 WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "linenr" = 8;')))
Alternatively, you can output annotations as text with ANSI escape sentences, by setting keyword argument format=ansi
. This is designed for terminal output, but it can also be visualised here:
print(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";', \
'SELECT ANNOTATION ?word WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "structuretype" = "word";', \
'@KEYVALUETAG SELECT ANNOTATION ?line_8 WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "linenr" = 8;',
format="ansi"))
Legend: 1. word 2. line 8 ----------------------------------- 1. AnnotationQuote1 ----------------------------------- [Everything] [about] [us], [everything] [around] [us], [[everything]]] [we] [know] [[and] [can] [know] [of]] [is] [composed] [ultimately] [of] [patterns] [of] [nothing];|linenr: 8] [that]’[s] [the] [bottom] [line], [the] [final] [truth]. [So] [where] [we] [find] [we] [have] [any] [control] [over] [those] [patterns], [why] [not] [make] [the] [most] [elegant] [ones], [the] [most] [enjoyable] [and] [good] [ones], [in] [our] [own] [terms]? ----------------------------------- 2. AnnotationQuote2 ----------------------------------- [Besides], [it] [left] [the] [humans] [in] [the] [Culture] [free] [to] [take] [care] [of] [the] [things] [that] [really] [mattered] [in] [life], [such] [as] [[sports], [games], [romance],] [studying] [dead] [languages], [barbarian] [societies] [and] [impossible] [problems], [and] [climbing] [high] [mountains] [without] [the] [aid] [of] [a] [safety] [harness].
This concludes this tutorial. We hope to have shown you how to use the STAM python library.