Notebook

STAM Tutorial: Standoff Text Annotation for Pythonistas¶

Introduction¶

STAM is a data model, and accompanied tooling, for stand-off text annotation that allows researchers and developers to model annotations on text.

An annotation is any kind of remark, classification/tagging on any particular portion(s) of a text, on the resource or annotation set as a whole, in which case we can interpret annotations as metadata, or on another annotation (higher-order annotation).

Examples of annotation may be linguistic annotation, structure/layout annotation, editorial annotation, technical annotation, or whatever comes to mind. STAM does not define any vocabularies whatsoever. Instead, it provides a framework upon which you can model your annotations using whatever you see fit.

The model is thoroughly explained in its specification document. We summarize only the most important data structures here, these have direct counterparts (classes) in the python library we will be teaching in this tutorial:

Annotation - A instance of annotation. Associated with an annotation is a Selector to select the target of the annotation, and one or more AnnotationData instances that hold the body or content of the annotation. This is explicitly decoupled from the annotation instance itself as multiple annotations may hold the very same content.
Selector - A selector identifies the target of an annotation and the part of the target that the annotation applies to. There are multiple types that are described here. The TextSelector is an important one that selects a target resource and a specific text selection within it by specifying an offset.
AnnotationData - A key/value pair that acts as body or content for one or more annotations. The key is a reference to DataKey, the value is a DataValue. (The term feature is also seen for this in certain annotation paradigms)
DataKey - A key as referenced by AnnotationData.
DataValue - A value with some type information (e.g. string, integer, float).
TextResource - A textual resource that is made available for annotation. This holds the actual textual content.
TextSelection - A particular selection of text within a resource, i.e. a subslice of the text.
AnnotationDataSet - An Annotation Data Set stores the keys (DataKey) and values (AnnotationData) that are used by annotations. It effectively defines a certain vocabulary, i.e. key/value pairs. How broad or narrow the scope of the vocabulary is not defined by STAM but entirely up to the user.
AnnotationStore - The annotation store is essentially your workspace, it holds all resources, annotation sets (i.e. keys and annotation data) and of course the actual annotations. In the Python implementation it is a memory-based store and you can put as much as you like into it (as long as it fits in memory).

STAM is more than just a theoretical model, we offer practical implementations that allow you to work with it directly. In this tutorial we will be using Python and the Python library stam.

Note: The STAM Python library is a so-called Python binding to a STAM library written in Rust. This means the library is not written in Python but is compiled to machine code and as such offers much better performance.

Installation¶

First of all, you will need to install the STAM Python library from the Python Package Index as follows:

In [1]:

!pip install stam

Requirement already satisfied: stam in ./env/lib/python3.12/site-packages (0.7.0)

Annotating from scratch¶

Adding a text¶

Let us start with a mini corpus consisting of two quotes from the book "Consider Phlebas" by renowned sci-fi author Iain M. Banks.

In [2]:

text = """
# Consider Phlebas
$ author=Iain M. Banks

## 1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of nothing;
that’s the bottom line, the final truth.

So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?

## 2
Besides,
it left the humans in the Culture free to take care of the things that really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
"""

This format of the text for STAM is in no way prescribed other than:

It must be plain text
It must be UTF-8 encoded
It should ideally be in Unicode Normalization Form C. (don't worry if this means nothing to you yet)

Before we can do anything we need to import the STAM library:

In [3]:

import stam

Let's add this text resource to an annotation store so we can annotate it

In [4]:

store = stam.AnnotationStore(id="tutorial")
resource_banks = store.add_resource(id="banks", text=text)

Here we passed the text as a string, but it could just as well have been an external text file instead, the filename of which can be passed via the file= keyword argument.

Creating an annotation dataset (vocabulary)¶

Our example text is a bit Markdown-like, we have a title header "Consider Phlebas", and two subheaders (1 and 2) containing one quote from the book each.

As our first annotations, let's try to annotate this coarse structure. At this point we're already in need of some vocabulary to express the notions of title header, section header and quote, as STAM does not define any vocabulary. It is up to you to make these choices on how to represent the data.

An annotation data set effectively defines an vocabulary. Let's invent our own simple Annotation Data Set that defines the keys and values we use in this tutorial. In our AnnotationDataSet We can define a DataKey with ID structuretype, and have it takes values like titleheader, sectionheader and quote.

We can explicitly add the set and the key. We give the dataset a public ID (tutorial-set), just as we previously assigned a public ID to both the annotationstore (tutorial) and the text resource (banks). It is good practise to assign IDs, though you can also let the library auto-generate them for you:

In [5]:

dataset = store.add_dataset("tutorial-set")
key_structuretype = dataset.add_key("structuretype")

The first annotations with text selectors¶

To annotate the title header, we need to select the part of the text where it occurs by finding the offset, which consists of a begin and end position. STAM follows the same indexing format Python does, in which positions are 0-indexed unicode character points (as opposed to (UTF-8) bytes) and where the end is non-inclusive. After some clumsy manual counting on the source text we discover the following coordinates hold:

In [6]:

assert text[1:19] == "# Consider Phlebas"

And we make the annotation:

In [7]:

annotation = store.annotate(
    target=stam.Selector.textselector(resource_banks, stam.Offset.simple(1,19)),
    data={"id": "Data1", "key": key_structuretype, "value": "titleheader", "set": dataset },
    id="Annotation1")

A fair amount happened there. We selected a part of the text of resource_banks by offset, and associated AnnotationData with the annotation saying that the structuretype key has the value titleheader, both of which we invented as part of our AnnotationDataSet with ID tutorial-set. Last, we assigned an ID to both the AnnotationData, as well as to the Annotation as a whole. In this example we reused some of the variables we had created earlier, but we could have also written out in full as shown below:

annotation = store.annotate(
    target=stam.Selector.textselector(resource_banks, stam.Offset.simple(1,19)),
    data={"id": "Data1", "key": "structuretype", "value": "titleheader", "set": "tutorial-set" },
    id="Annotation1")

This would also have been perfectly fine, and moreover, it would also work fine without us explicitly creating the AnnotationDataSet and the key as we did before! Those would have been automatically created on-the-fly for us. The only disadvantage is that under the hood more lookups are needed, so this is slightly less performant than passing python variables.

Inspecting data (1)¶

We can inspect the annotation we just added:

In [8]:

print("Annotation ID: ", annotation.id())
print("Target text: ", str(annotation))
print("Data: ")
for data in annotation.data():
    print(" - Data ID: ", data.id())
    print("   Data Key: ", data.key().id())
    print("   Data Value: ", str(data.value()))

Annotation ID:  Annotation1
Target text:  # Consider Phlebas
Data: 
 - Data ID:  Data1
   Data Key:  structuretype
   Data Value:  titleheader

In the above example, we obtained an Annotation instance from the return value of the annotate() method. Once any annotation is in the store, we can retrieve it simply by its public ID using the annotation() method. An exception will be raised if the ID does not exist.

In [9]:

annotation = store.annotation("Annotation1")

A similar pattern holds for almost all other data structures in the STAM model:

In [10]:

dataset = store.dataset("tutorial-set")            #AnnotationDataSet
resource_banks = store.resource("banks")           #TextResource
key_structuretype = dataset.key("structuretype")   #DataKey
data = dataset.annotationdata("Data1")             #AnnotationData

There are also shortcut methods available to get keys and data directly from a store, without needing to first retrieve a dataset yourself:

In [11]:

key_structuretype = store.key("tutorial-set","structuretype")   #DataKey
data = store.annotationdata("tutorial-set","Data1")                        #AnnotationData

Annotating via `find_text()`¶

We now continue by adding annotations for the two section headers. Counting offsets manually is rather cumbersome, so we use the find_text() method on TextResource to find our target for annotation:

In [12]:

results = resource_banks.find_text("## 1")
section1 = results[0]
print(f"Text {str(section1)} found at {section1.begin()}:{section1.end()}")

annotation = store.annotate(
    target=stam.Selector.textselector(resource_banks, section1.offset()),
    data={"id": "Data2", "key": "structuretype", "value": "sectionheader", "set": "tutorial-set" },
    id="Annotation2")

Text ## 1 found at 44:48

The find_text() method returns a list of TextSelection instances. These carry an Offset which is returned by the offset() method. Hooray, no more manual counting!

We do the same for the last header:

In [13]:

results = resource_banks.find_text("## 2")
section2 = results[0]
print(f"Text {str(section2)} found at {section2.begin()}:{section2.end()}")

annotation = store.annotate(
    target=stam.Selector.textselector(resource_banks, section2.offset()),
    data={"id": "Data2", "key": "structuretype", "value": "sectionheader", "set": "tutorial-set" },
    id="Annotation3")

Text ## 2 found at 365:369

Inspecting data (2)¶

In the previous code the attentive reader may have noted that we are reusing the Data2 ID rather than introducing a new Data3 ID, because the data for both Annotation2 and Annotation3 is in fact, identical.

This is an important feature of STAM; annotations and their data are decoupled precisely because the data may be referenced by multiple annotations, and if that's the case, we only want to keep the data in memory once. We don't want a copy for every annotation. Say we have AnnotationData with key structuretype and value word, and use that to tag all words in the text, then it would be a huge amount of redundancy if there was no such decoupling between data and annotations. The fact that they all share the same data, also enables us to quickly look up all those annotations via a reverse index that is kept internally:

In [14]:

for annotationdata in store.data(set="tutorial-set", key="structuretype", value="sectionheader"):
    for annotation in annotationdata.annotations():
        assert annotation.id() in ("Annotation2","Annotation3")

This can also be done in one go, which is typically more performant:

In [15]:

for annotation in store.data(set="tutorial-set", key="structuretype", value="sectionheader").annotations():
    assert annotation.id() in ("Annotation2","Annotation3")

Here we used data() on the store as a whole, this method provides an easy way to retrieve data from scratch. We could have also started from an annotation dataset or even a key within it if we already have an instance to it, in that case we use the data() method and pass the key (DataKey), which will act as a filter:

In [16]:

key = dataset.key("structuretype")
for annotation in dataset.data(key, value="sectionheader").annotations():
    assert annotation.id() in ("Annotation2","Annotation3")

However, since we have the key already it is simpler and more performant to use it directly and reduce the example to the following:

In [17]:

key = dataset.key("structuretype")
for annotation in key.data(value="sectionheader").annotations():
    assert annotation.id() in ("Annotation2","Annotation3")

The ability to use any STAM object as a departing point for retrieval of other objects is a characteristic of the API. The ability to pass arbitrary objects as a filter is also a characteristic that you will find on multiple methods.

The data() method can also be used to search for all values indiscriminately: simply omit the value keyword parameter. Moreover, it can be used to search for non-exact values, using the following keyword arguments:

value_not - Negates a values
value_greater - Value must be greater than specified (int or float)
value_less - Value must be less than specified (int or float)
value_greatereq - Value must be greater than specified or equal (int or float)
value_lesseq - Value must be less than specified or equal (int or float)
value_in - Value must match any in the tuple (this is a logical OR statement)
value_not_in - Value must not match any in the tuple
value_in_range - Must be a numeric 2-tuple with min and max (inclusive) values
value_not_in_range - Must be a numeric 2-tuple with min and max (inclusive) values

The data() method takes filter parameter as positional arguments. You can pass as many as you like. The object you pass as filter determines what is being filtered, you can pass a DataKey instance, an AnnotationData instance, or even an Annotation. You can also pass the result of earlier data or annotation requests (Data, Annotations). If you want to filter against one/any of multiple values, use a tuple or list of any homogeneous type.

Searching for data and then retrieving the corresponding annotations is a very common operation and easily accomplished by simply adding .annotations(), as we've seen in the above examples.

We can apply data filtering operations directly to annotations() using the same keyword arguments we saw for data(). The following example provides identical results as the earlier one, but the way of getting there is slightly different (this takes all annotations first, and tests the data filter on each, the other example takes the data first, and goes over all annotations that make use of the data):

key = dataset.key("structuretype")
for annotation in store.annotations(key, value="sectionheader"):
    assert annotation.id() in ("Annotation2","Annotation3")

If you're interested in the underlying text selections, then you can just add .textselections(). This chaining of methods on collections is one of the characteristics of the STAM API.

Annotations via text selections¶

Now we will annotate the quotes themselves. The first one starts after the first subheader (Annotation2) and ends just before the next subheader (Annotation3). That would include some ugly leading and trailing whitespace/newlines, though. We use the textselection() method to obtain a textselection to our computed offset and subsequently strip the whitespace using the strip_text() method, effectively shrinking our textselection a bit:

In [18]:

quote1_selection = resource_banks.textselection(stam.Offset.simple(section1.end(), section2.begin() - 1)).strip_text(" \t\r\n")
quote1 = store.annotate(
    target=stam.Selector.textselector(resource_banks, quote1_selection.offset()),
    data={"id": "Data3", "key": "structuretype", "value": "quote", "set": "tutorial-set" },
    id="AnnotationQuote1")

The second quote goes until the end of the text, which we can retrieve using the textlen() method. This method is preferred over doing things in native python like len(str(banks)) because it is way more efficient:

In [19]:

quote2_selection = resource_banks.textselection(stam.Offset.simple(section2.end(), resource_banks.textlen())).strip_text(" \t\r\n")
quote2 = store.annotate(
    target=stam.Selector.textselector(resource_banks, quote2_selection.offset()),
    data={"id": "Data3", "set": "tutorial-set"},
    id="AnnotationQuote2")

In this example we also show that, since we reference existing AnnotationData, just specifying the ID and the set suffices. Or even shorter and better, you could pass a variable that is an instance of AnnotationData.

There is another structural type we could annotate: the lines with corresponding line numbers. This is easy to do by splitting the text on newlines, for which we use the method split_text() on TextResource. As you see, various Python methods such as split(), strip(), find() have counterparts in STAM that have a *_text() suffix and which return TextSelection instances and carry offset information:

In [20]:

for linenr, line in enumerate(resource_banks.split_text("\n")):
    linenr += 1      #make it 1-indexed as is customary for line numbers
    print(f"Line {linenr}: {str(line)}")
    store.annotate(
        target=stam.Selector.textselector(resource_banks, line.offset()),
        data=[ 
            {"id": "Data4", "key": "structuretype", "value": "line", "set": "tutorial-set" },
            {"id": f"DataLine{linenr}", "key": "linenr", "value": linenr, "set": "tutorial-set" }
        ],
        id=f"AnnotationLine{linenr}")

Line 1: 
Line 2: # Consider Phlebas
Line 3: $ author=Iain M. Banks
Line 4: 
Line 5: ## 1
Line 6: Everything about us,
Line 7: everything around us,
Line 8: everything we know [and can know of] is composed ultimately of patterns of nothing;
Line 9: that’s the bottom line, the final truth.
Line 10: 
Line 11: So where we find we have any control over those patterns,
Line 12: why not make the most elegant ones, the most enjoyable and good ones,
Line 13: in our own terms?
Line 14: 
Line 15: ## 2
Line 16: Besides,
Line 17: it left the humans in the Culture free to take care of the things that really mattered in life,
Line 18: such as [sports, games, romance,] studying dead languages,
Line 19: barbarian societies and impossible problems,
Line 20: and climbing high mountains without the aid of a safety harness.
Line 21:

In this example we also extended our vocabulary on-the-fly with a new field linenr. All line annotations carry two AnnotationData elements. Remember we can easily retrieve the data and any annotations on it with data() and annotations():

In [21]:

line8 = dataset.data(set="tutorial-set",key="linenr", value=8).annotations(limit=1)[0]
print(str(line8))

everything we know [and can know of] is composed ultimately of patterns of nothing;

Methods that return collections such as data(),annotations(), textselections() often take an optional limit parameter (sometimes as a keyword argument, sometimes as a normal parameter). This parameter limits the amount of results returned. Using it can improve performance in certain cases. In the above example we know we're only going to use one result, so it is a good idea to set (here we happen to also know that there is only one result for linenr 8, so strictly speaking the parameter wouldn't be necessary, but we ignore that for sake of teaching the use of limit).

When annotating, we don't have to work with the resource as a whole but can also start relative from any text selection we have. Let's take line eight and annotate the first word of it ("everything") manually:

In [22]:

line8_textselection = line8.textselections(limit=1)[0] #there could be multiple, but in our cases thus-far we only have one
firstword = line8_textselection.textselection(stam.Offset.simple(0,10))  #we make a textselection on a textselection

#internally, the text selection will always use absolute coordinates for the resource:
print(f"Text selection spans: {firstword.begin()}:{firstword.end()}")

annotation = store.annotate(
    target=stam.Selector.textselector(resource_banks, firstword.offset()),
    data= {"key": "structuretype", "value": "word", "set": "tutorial-set" },
    id=f"AnnotationLine8Word1")

Text selection spans: 92:102

Converting offsets¶

We know the first word of line eight is also part of quote one, for which we already made an annotation (AnnotationQuote1) before. Say we are interested in knowing where in quote one the first word of line eight is, we can now easily compute so as follows:

In [23]:

offset = firstword.relative_offset(quote1_selection)
print(f"Offset in quote one: {offset.begin()}:{offset.end()}")

Offset in quote one: 43:53

While we are at it, another conversion option that may come handy when working on a lower-level is the conversion from/to UTF-8 byte offsets. Both STAM and Python use unicode character points. Internally STAM already maps these to UTF-8 byte offsets for things like text slicing, but if you need this information you can extract it explicitly:

In [24]:

beginbyte = resource_banks.utf8byte(firstword.begin())
endbyte = resource_banks.utf8byte(firstword.end())
print(f"Byte offset: {beginbyte}:{endbyte}")

#and back again:
beginpos = resource_banks.utf8byte_to_charpos(beginbyte)
endpos = resource_banks.utf8byte_to_charpos(endbyte)

assert beginpos == firstword.begin()
assert endpos == firstword.end()

Byte offset: 92:102

In this case they happen to be equal because we're basically only using ASCII in our text, but as soon as you deal with multibyte characters (diacritics, other scripts, etc), they will not!

Tokenisation via regular expressions¶

What else can we annotate? We can mark all individual words or tokens, effectively performing simple tokenisation. For this, we will use the regular expression search that is built into the STAM library, find_text_regex(). The regular expressions follow Rust's regular expression syntax which may differ slightly from Python's native implementation.

In [25]:

expressions = [
    r"\w+(?:[-_]\w+)*", #this detects words,possibly with hyphens or underscores as part of it
    r"[\.\?,/]+", #this detects a variety of punctuation
    r"[0-9]+(?:[,\.][0-9]+)*", #this detects numbers, possibly with a fractional part
]
structuretypes = ["word", "punctuation", "number"]

for i, matchresult in enumerate(resource_banks.find_text_regex(expressions)):
    #(we only have one textselection per match, but an regular expression may result in multiple textselections if capture groups are used)
    textselection = matchresult['textselections'][0]
    structuretype = structuretypes[matchresult['expression_index']]
    print(f"Annotating \"{textselection}\" at {textselection.offset()} as {structuretype}")
    store.annotate(
        target=stam.Selector.textselector(resource_banks, textselection.offset()),
        data=[ 
            {"key": "structuretype", "value": structuretype, "set": "tutorial-set" }
        ],
        id=f"AnnotationToken{i+1}")

Annotating "Consider" at 3:11 as word
Annotating "Phlebas" at 12:19 as word
Annotating "author" at 22:28 as word
Annotating "Iain" at 29:33 as word
Annotating "M" at 34:35 as word
Annotating "." at 35:36 as punctuation
Annotating "Banks" at 37:42 as word
Annotating "1" at 47:48 as word
Annotating "Everything" at 49:59 as word
Annotating "about" at 60:65 as word
Annotating "us" at 66:68 as word
Annotating "," at 68:69 as punctuation
Annotating "everything" at 70:80 as word
Annotating "around" at 81:87 as word
Annotating "us" at 88:90 as word
Annotating "," at 90:91 as punctuation
Annotating "everything" at 92:102 as word
Annotating "we" at 103:105 as word
Annotating "know" at 106:110 as word
Annotating "and" at 112:115 as word
Annotating "can" at 116:119 as word
Annotating "know" at 120:124 as word
Annotating "of" at 125:127 as word
Annotating "is" at 129:131 as word
Annotating "composed" at 132:140 as word
Annotating "ultimately" at 141:151 as word
Annotating "of" at 152:154 as word
Annotating "patterns" at 155:163 as word
Annotating "of" at 164:166 as word
Annotating "nothing" at 167:174 as word
Annotating "that" at 176:180 as word
Annotating "s" at 181:182 as word
Annotating "the" at 183:186 as word
Annotating "bottom" at 187:193 as word
Annotating "line" at 194:198 as word
Annotating "," at 198:199 as punctuation
Annotating "the" at 200:203 as word
Annotating "final" at 204:209 as word
Annotating "truth" at 210:215 as word
Annotating "." at 215:216 as punctuation
Annotating "So" at 218:220 as word
Annotating "where" at 221:226 as word
Annotating "we" at 227:229 as word
Annotating "find" at 230:234 as word
Annotating "we" at 235:237 as word
Annotating "have" at 238:242 as word
Annotating "any" at 243:246 as word
Annotating "control" at 247:254 as word
Annotating "over" at 255:259 as word
Annotating "those" at 260:265 as word
Annotating "patterns" at 266:274 as word
Annotating "," at 274:275 as punctuation
Annotating "why" at 276:279 as word
Annotating "not" at 280:283 as word
Annotating "make" at 284:288 as word
Annotating "the" at 289:292 as word
Annotating "most" at 293:297 as word
Annotating "elegant" at 298:305 as word
Annotating "ones" at 306:310 as word
Annotating "," at 310:311 as punctuation
Annotating "the" at 312:315 as word
Annotating "most" at 316:320 as word
Annotating "enjoyable" at 321:330 as word
Annotating "and" at 331:334 as word
Annotating "good" at 335:339 as word
Annotating "ones" at 340:344 as word
Annotating "," at 344:345 as punctuation
Annotating "in" at 346:348 as word
Annotating "our" at 349:352 as word
Annotating "own" at 353:356 as word
Annotating "terms" at 357:362 as word
Annotating "?" at 362:363 as punctuation
Annotating "2" at 368:369 as word
Annotating "Besides" at 370:377 as word
Annotating "," at 377:378 as punctuation
Annotating "it" at 379:381 as word
Annotating "left" at 382:386 as word
Annotating "the" at 387:390 as word
Annotating "humans" at 391:397 as word
Annotating "in" at 398:400 as word
Annotating "the" at 401:404 as word
Annotating "Culture" at 405:412 as word
Annotating "free" at 413:417 as word
Annotating "to" at 418:420 as word
Annotating "take" at 421:425 as word
Annotating "care" at 426:430 as word
Annotating "of" at 431:433 as word
Annotating "the" at 434:437 as word
Annotating "things" at 438:444 as word
Annotating "that" at 445:449 as word
Annotating "really" at 450:456 as word
Annotating "mattered" at 457:465 as word
Annotating "in" at 466:468 as word
Annotating "life" at 469:473 as word
Annotating "," at 473:474 as punctuation
Annotating "such" at 475:479 as word
Annotating "as" at 480:482 as word
Annotating "sports" at 484:490 as word
Annotating "," at 490:491 as punctuation
Annotating "games" at 492:497 as word
Annotating "," at 497:498 as punctuation
Annotating "romance" at 499:506 as word
Annotating "," at 506:507 as punctuation
Annotating "studying" at 509:517 as word
Annotating "dead" at 518:522 as word
Annotating "languages" at 523:532 as word
Annotating "," at 532:533 as punctuation
Annotating "barbarian" at 534:543 as word
Annotating "societies" at 544:553 as word
Annotating "and" at 554:557 as word
Annotating "impossible" at 558:568 as word
Annotating "problems" at 569:577 as word
Annotating "," at 577:578 as punctuation
Annotating "and" at 579:582 as word
Annotating "climbing" at 583:591 as word
Annotating "high" at 592:596 as word
Annotating "mountains" at 597:606 as word
Annotating "without" at 607:614 as word
Annotating "the" at 615:618 as word
Annotating "aid" at 619:622 as word
Annotating "of" at 623:625 as word
Annotating "a" at 626:627 as word
Annotating "safety" at 628:634 as word
Annotating "harness" at 635:642 as word
Annotating "." at 642:643 as punctuation

In this code, each matchresult tracks which of the three expressions was matches, in matchresult['expression_index']. We conveniently use that information to tie new values for structuretype, all of which will be added to our vocabulary (AnnotationDataSet) on-the-fly.

Annotating Metadata¶

Thus-far we have only seen annotations directly on the text, using Selector.textselector(), but STAM has various other selectors. Users may appreciate if you add a bit of metadata about your texts. In STAM, these are annotations that point at the resource as a whole using a Selector.resourceselector(), rather than at the text specifically. We add one metadata annotation with various new fields:

In [26]:

annotation = store.annotate(
    target=stam.Selector.resourceselector(resource_banks),
    data=[ 
        {"key": "name", "value": "Culture quotes from Iain Banks", "set": "tutorial-set" },
        {"key": "compiler", "value": "Dirk Roorda", "set": "tutorial-set" },
        {"key": "source", "value": "https://www.goodreads.com/work/quotes/14366-consider-phlebas", "set": "tutorial-set" },
        {"key": "version", "value": "0.2", "set": "tutorial-set" },
    ],
    id="Metadata1")

Similarly, we could annotate an AnnotationDataSet (our vocabulary) with metadata, using a Selector.datasetselector().

Navigating through your data¶

Basic iterating and counting¶

If you followed all of the previous section, we now have a fair amount of annotations. In fact, we have:

In [27]:

print(f"{store.annotations_len()} annotations")
print(f"{store.resources_len()} resource")
print(f"{store.datasets_len()} annotation dataset")
print(f"{dataset.keys_len()} datakeys in our dataset")
print(f"{dataset.data_len()} annotationdata instances in our dataset")

153 annotations
1 resource
1 annotation dataset
6 datakeys in our dataset
31 annotationdata instances in our dataset

If we zoom in on the annotation data in our annotation dataset, we can extract some interesting frequency statistics right away:

In [28]:

for data in dataset:
    count = data.annotations_len()
    print(f"{data.key()}: {data.value()} occurs in {count} annotation(s)")

structuretype: titleheader occurs in 1 annotation(s)
structuretype: sectionheader occurs in 2 annotation(s)
structuretype: quote occurs in 2 annotation(s)
structuretype: line occurs in 21 annotation(s)
linenr: 1 occurs in 1 annotation(s)
linenr: 2 occurs in 1 annotation(s)
linenr: 3 occurs in 1 annotation(s)
linenr: 4 occurs in 1 annotation(s)
linenr: 5 occurs in 1 annotation(s)
linenr: 6 occurs in 1 annotation(s)
linenr: 7 occurs in 1 annotation(s)
linenr: 8 occurs in 1 annotation(s)
linenr: 9 occurs in 1 annotation(s)
linenr: 10 occurs in 1 annotation(s)
linenr: 11 occurs in 1 annotation(s)
linenr: 12 occurs in 1 annotation(s)
linenr: 13 occurs in 1 annotation(s)
linenr: 14 occurs in 1 annotation(s)
linenr: 15 occurs in 1 annotation(s)
linenr: 16 occurs in 1 annotation(s)
linenr: 17 occurs in 1 annotation(s)
linenr: 18 occurs in 1 annotation(s)
linenr: 19 occurs in 1 annotation(s)
linenr: 20 occurs in 1 annotation(s)
linenr: 21 occurs in 1 annotation(s)
structuretype: word occurs in 109 annotation(s)
structuretype: punctuation occurs in 17 annotation(s)
name: Culture quotes from Iain Banks occurs in 1 annotation(s)
compiler: Dirk Roorda occurs in 1 annotation(s)
source: https://www.goodreads.com/work/quotes/14366-consider-phlebas occurs in 1 annotation(s)
version: 0.2 occurs in 1 annotation(s)

We can also aggregate only by key, although that is slightly less informative for our example case:

In [29]:

for key in dataset.keys():
    count = key.annotations_count()   #this one is called _count instead of _len because it is not instantaneous like the other one
    print(f"{key} occurs in {count} annotation(s)")

structuretype occurs in 152 annotation(s)
linenr occurs in 21 annotation(s)
name occurs in 1 annotation(s)
compiler occurs in 1 annotation(s)
source occurs in 1 annotation(s)
version occurs in 1 annotation(s)

Just like we iterated over the annotation dataset above, we can also iterate over various things in the AnnotationStore. Let's write a small script that simply prints out most of the things in our store. At this point though, the output will get a bit verbose:

In [30]:

print("Datasets:")
for dataset in store.datasets():
    print(f" - ID: {dataset.id()}")

print("Resources:")
for resource in store.resources():
    print(f" - ID: {resource.id()}")
    print(f" - Text length: {resource.textlen()}")

print("Annotations:")
for annotation in store.annotations():
    print(f" - ID: {annotation.id()}")
    print(f"   Target selector type: {annotation.selector_kind()}")
    print(f"   Target resources: {annotation.resources()}")
    print(f"   Target offset: {annotation.offset()}")
    print(f"   Target text: {annotation.text()}")
    print(f"   Target annotations: ", [ a.id() for a in annotation.annotations_in_targets() ])
    print(f"   Data:")
    for data in annotation:
        print(f"    - ID:  {data.id()}")
        print(f"      Set: {data.dataset().id()}")
        print(f"      Key: {data.key()}")
        print(f"      Value: {data.value()}")

Datasets:
 - ID: tutorial-set
Resources:
 - ID: banks
 - Text length: 644
Annotations:
 - ID: Annotation1
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 1:19
   Target text: ['# Consider Phlebas']
   Target annotations:  []
   Data:
    - ID:  Data1
      Set: tutorial-set
      Key: structuretype
      Value: titleheader
 - ID: Annotation2
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 44:48
   Target text: ['## 1']
   Target annotations:  []
   Data:
    - ID:  Data2
      Set: tutorial-set
      Key: structuretype
      Value: sectionheader
 - ID: Annotation3
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 365:369
   Target text: ['## 2']
   Target annotations:  []
   Data:
    - ID:  Data2
      Set: tutorial-set
      Key: structuretype
      Value: sectionheader
 - ID: AnnotationQuote1
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 49:363
   Target text: ['Everything about us,\neverything around us,\neverything we know [and can know of] is composed ultimately of patterns of nothing;\nthat’s the bottom line, the final truth.\n\nSo where we find we have any control over those patterns,\nwhy not make the most elegant ones, the most enjoyable and good ones,\nin our own terms?']
   Target annotations:  []
   Data:
    - ID:  Data3
      Set: tutorial-set
      Key: structuretype
      Value: quote
 - ID: AnnotationQuote2
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 370:643
   Target text: ['Besides,\nit left the humans in the Culture free to take care of the things that really mattered in life,\nsuch as [sports, games, romance,] studying dead languages,\nbarbarian societies and impossible problems,\nand climbing high mountains without the aid of a safety harness.']
   Target annotations:  []
   Data:
    - ID:  Data3
      Set: tutorial-set
      Key: structuretype
      Value: quote
 - ID: AnnotationLine1
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 0:0
   Target text: ['']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine1
      Set: tutorial-set
      Key: linenr
      Value: 1
 - ID: AnnotationLine2
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 1:19
   Target text: ['# Consider Phlebas']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine2
      Set: tutorial-set
      Key: linenr
      Value: 2
 - ID: AnnotationLine3
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 20:42
   Target text: ['$ author=Iain M. Banks']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine3
      Set: tutorial-set
      Key: linenr
      Value: 3
 - ID: AnnotationLine4
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 43:43
   Target text: ['']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine4
      Set: tutorial-set
      Key: linenr
      Value: 4
 - ID: AnnotationLine5
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 44:48
   Target text: ['## 1']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine5
      Set: tutorial-set
      Key: linenr
      Value: 5
 - ID: AnnotationLine6
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 49:69
   Target text: ['Everything about us,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine6
      Set: tutorial-set
      Key: linenr
      Value: 6
 - ID: AnnotationLine7
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 70:91
   Target text: ['everything around us,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine7
      Set: tutorial-set
      Key: linenr
      Value: 7
 - ID: AnnotationLine8
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 92:175
   Target text: ['everything we know [and can know of] is composed ultimately of patterns of nothing;']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine8
      Set: tutorial-set
      Key: linenr
      Value: 8
 - ID: AnnotationLine9
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 176:216
   Target text: ['that’s the bottom line, the final truth.']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine9
      Set: tutorial-set
      Key: linenr
      Value: 9
 - ID: AnnotationLine10
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 217:217
   Target text: ['']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine10
      Set: tutorial-set
      Key: linenr
      Value: 10
 - ID: AnnotationLine11
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 218:275
   Target text: ['So where we find we have any control over those patterns,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine11
      Set: tutorial-set
      Key: linenr
      Value: 11
 - ID: AnnotationLine12
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 276:345
   Target text: ['why not make the most elegant ones, the most enjoyable and good ones,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine12
      Set: tutorial-set
      Key: linenr
      Value: 12
 - ID: AnnotationLine13
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 346:363
   Target text: ['in our own terms?']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine13
      Set: tutorial-set
      Key: linenr
      Value: 13
 - ID: AnnotationLine14
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 364:364
   Target text: ['']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine14
      Set: tutorial-set
      Key: linenr
      Value: 14
 - ID: AnnotationLine15
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 365:369
   Target text: ['## 2']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine15
      Set: tutorial-set
      Key: linenr
      Value: 15
 - ID: AnnotationLine16
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 370:378
   Target text: ['Besides,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine16
      Set: tutorial-set
      Key: linenr
      Value: 16
 - ID: AnnotationLine17
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 379:474
   Target text: ['it left the humans in the Culture free to take care of the things that really mattered in life,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine17
      Set: tutorial-set
      Key: linenr
      Value: 17
 - ID: AnnotationLine18
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 475:533
   Target text: ['such as [sports, games, romance,] studying dead languages,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine18
      Set: tutorial-set
      Key: linenr
      Value: 18
 - ID: AnnotationLine19
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 534:578
   Target text: ['barbarian societies and impossible problems,']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine19
      Set: tutorial-set
      Key: linenr
      Value: 19
 - ID: AnnotationLine20
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 579:643
   Target text: ['and climbing high mountains without the aid of a safety harness.']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine20
      Set: tutorial-set
      Key: linenr
      Value: 20
 - ID: AnnotationLine21
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 644:644
   Target text: ['']
   Target annotations:  []
   Data:
    - ID:  Data4
      Set: tutorial-set
      Key: structuretype
      Value: line
    - ID:  DataLine21
      Set: tutorial-set
      Key: linenr
      Value: 21
 - ID: AnnotationLine8Word1
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 92:102
   Target text: ['everything']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken1
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 3:11
   Target text: ['Consider']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken2
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 12:19
   Target text: ['Phlebas']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken3
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 22:28
   Target text: ['author']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken4
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 29:33
   Target text: ['Iain']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken5
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 34:35
   Target text: ['M']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken6
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 35:36
   Target text: ['.']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken7
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 37:42
   Target text: ['Banks']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken8
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 47:48
   Target text: ['1']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken9
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 49:59
   Target text: ['Everything']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken10
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 60:65
   Target text: ['about']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken11
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 66:68
   Target text: ['us']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken12
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 68:69
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken13
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 70:80
   Target text: ['everything']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken14
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 81:87
   Target text: ['around']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken15
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 88:90
   Target text: ['us']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken16
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 90:91
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken17
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 92:102
   Target text: ['everything']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken18
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 103:105
   Target text: ['we']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken19
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 106:110
   Target text: ['know']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken20
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 112:115
   Target text: ['and']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken21
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 116:119
   Target text: ['can']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken22
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 120:124
   Target text: ['know']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken23
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 125:127
   Target text: ['of']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken24
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 129:131
   Target text: ['is']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken25
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 132:140
   Target text: ['composed']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken26
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 141:151
   Target text: ['ultimately']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken27
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 152:154
   Target text: ['of']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken28
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 155:163
   Target text: ['patterns']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken29
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 164:166
   Target text: ['of']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken30
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 167:174
   Target text: ['nothing']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken31
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 176:180
   Target text: ['that']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken32
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 181:182
   Target text: ['s']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken33
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 183:186
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken34
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 187:193
   Target text: ['bottom']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken35
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 194:198
   Target text: ['line']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken36
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 198:199
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken37
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 200:203
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken38
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 204:209
   Target text: ['final']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken39
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 210:215
   Target text: ['truth']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken40
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 215:216
   Target text: ['.']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken41
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 218:220
   Target text: ['So']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken42
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 221:226
   Target text: ['where']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken43
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 227:229
   Target text: ['we']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken44
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 230:234
   Target text: ['find']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken45
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 235:237
   Target text: ['we']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken46
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 238:242
   Target text: ['have']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken47
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 243:246
   Target text: ['any']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken48
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 247:254
   Target text: ['control']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken49
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 255:259
   Target text: ['over']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken50
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 260:265
   Target text: ['those']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken51
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 266:274
   Target text: ['patterns']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken52
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 274:275
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken53
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 276:279
   Target text: ['why']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken54
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 280:283
   Target text: ['not']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken55
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 284:288
   Target text: ['make']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken56
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 289:292
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken57
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 293:297
   Target text: ['most']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken58
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 298:305
   Target text: ['elegant']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken59
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 306:310
   Target text: ['ones']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken60
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 310:311
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken61
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 312:315
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken62
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 316:320
   Target text: ['most']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken63
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 321:330
   Target text: ['enjoyable']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken64
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 331:334
   Target text: ['and']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken65
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 335:339
   Target text: ['good']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken66
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 340:344
   Target text: ['ones']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken67
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 344:345
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken68
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 346:348
   Target text: ['in']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken69
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 349:352
   Target text: ['our']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken70
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 353:356
   Target text: ['own']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken71
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 357:362
   Target text: ['terms']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken72
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 362:363
   Target text: ['?']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken73
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 368:369
   Target text: ['2']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken74
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 370:377
   Target text: ['Besides']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken75
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 377:378
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken76
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 379:381
   Target text: ['it']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken77
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 382:386
   Target text: ['left']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken78
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 387:390
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken79
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 391:397
   Target text: ['humans']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken80
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 398:400
   Target text: ['in']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken81
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 401:404
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken82
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 405:412
   Target text: ['Culture']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken83
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 413:417
   Target text: ['free']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken84
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 418:420
   Target text: ['to']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken85
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 421:425
   Target text: ['take']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken86
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 426:430
   Target text: ['care']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken87
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 431:433
   Target text: ['of']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken88
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 434:437
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken89
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 438:444
   Target text: ['things']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken90
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 445:449
   Target text: ['that']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken91
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 450:456
   Target text: ['really']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken92
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 457:465
   Target text: ['mattered']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken93
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 466:468
   Target text: ['in']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken94
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 469:473
   Target text: ['life']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken95
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 473:474
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken96
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 475:479
   Target text: ['such']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken97
   Target selector type: <stam.SelectorKind object at 0x71a048f19230>
   Target resources: [<stam.TextResource object at 0x71a048f19230>]
   Target offset: 480:482
   Target text: ['as']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken98
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 484:490
   Target text: ['sports']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken99
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 490:491
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken100
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 492:497
   Target text: ['games']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken101
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 497:498
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken102
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 499:506
   Target text: ['romance']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken103
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 506:507
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken104
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 509:517
   Target text: ['studying']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken105
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 518:522
   Target text: ['dead']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken106
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 523:532
   Target text: ['languages']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken107
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 532:533
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken108
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 534:543
   Target text: ['barbarian']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken109
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 544:553
   Target text: ['societies']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken110
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 554:557
   Target text: ['and']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken111
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 558:568
   Target text: ['impossible']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken112
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 569:577
   Target text: ['problems']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken113
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 577:578
   Target text: [',']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: AnnotationToken114
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 579:582
   Target text: ['and']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken115
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 583:591
   Target text: ['climbing']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken116
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 592:596
   Target text: ['high']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken117
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 597:606
   Target text: ['mountains']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken118
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 607:614
   Target text: ['without']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken119
   Target selector type: <stam.SelectorKind object at 0x71a048f1a040>
   Target resources: [<stam.TextResource object at 0x71a048f1a040>]
   Target offset: 615:618
   Target text: ['the']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken120
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 619:622
   Target text: ['aid']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken121
   Target selector type: <stam.SelectorKind object at 0x71a048f1b300>
   Target resources: [<stam.TextResource object at 0x71a048f1b300>]
   Target offset: 623:625
   Target text: ['of']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken122
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 626:627
   Target text: ['a']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken123
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 628:634
   Target text: ['safety']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken124
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: [<stam.TextResource object at 0x71a04998e070>]
   Target offset: 635:642
   Target text: ['harness']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: word
 - ID: AnnotationToken125
   Target selector type: <stam.SelectorKind object at 0x71a048f1b5a0>
   Target resources: [<stam.TextResource object at 0x71a048f1b5a0>]
   Target offset: 642:643
   Target text: ['.']
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: structuretype
      Value: punctuation
 - ID: Metadata1
   Target selector type: <stam.SelectorKind object at 0x71a04998e070>
   Target resources: []
   Target offset: None
   Target text: []
   Target annotations:  []
   Data:
    - ID:  None
      Set: tutorial-set
      Key: name
      Value: Culture quotes from Iain Banks
    - ID:  None
      Set: tutorial-set
      Key: compiler
      Value: Dirk Roorda
    - ID:  None
      Set: tutorial-set
      Key: source
      Value: https://www.goodreads.com/work/quotes/14366-consider-phlebas
    - ID:  None
      Set: tutorial-set
      Key: version
      Value: 0.2

Finding data¶

We already introduced the methods annotations(), data() and textselections() in a previous sections. They return collections, classes like Annotations, Data or TextSelections, which in turn contain instances of Annotation, AnnotationData, and TextSelection, respectively.

Internally the STAM library maintains various forward and reverse indices, representing relationships between all kinds of entities in the STAM model. The aforementioned methods operate via these indices.

The annotations() method is often a lookup via the reverse index. We have already seen some example of it. Another nice example of the reverse index is that it allows us to obtain annotations for any arbitrary selection of the text we make:

In [31]:

textselection = resource_banks.textselection(stam.Offset.simple(155,163))
for annotation in textselection.annotations():
    print(f" - ID: {annotation.id()}")
    print(f"   Text: {str(annotation)}")
    print(f"   Data:")
    for data in annotation:
        print(f"      {data.key()}={data.value()}")

 - ID: AnnotationToken28
   Text: patterns
   Data:
      structuretype=word

Of course, I cheated a bit here and knew in advance there was going to be a match for this offset, but the point to take home is that given any textselection, you can easily get annotations that reference it.

In the above example we iterate over all annotations and then over all the data pertaining to the found annotations. Often though, you are searching for specific data and would have some kind of extra test in there. This is accomplished by passing filters via positional arguments or keyword arguments like value, to the annotations() method. We have seen an example of this before, here is another:

In [32]:

textselection = resource_banks.textselection(stam.Offset.simple(155,163))
dataset = store.dataset("tutorial-set")
key = dataset.key("structuretype")
for annotation in textselection.annotations(key, value="word"):
    print(f" - ID: {annotation.id()}")
    print(f"   Text: {str(annotation)}")

 - ID: AnnotationToken28
   Text: patterns

The use of filters in methods like annotations() and data() is always preferable to manually writing it out in lower-level code, because the internal library is more performant and passing data back and forth to Python always comes with a performance penalty.

In the example above, however, we see that we filter on data, but do not actually get the data that was matched as a return value. If you do want that, you need a two-step process as follows:

In [33]:

textselection = resource_banks.textselection(stam.Offset.simple(155,163))
dataset = store.dataset("tutorial-set")
key = dataset.key("structuretype")
for annotation in textselection.annotations(key, value="word"):
    print(f" - ID: {annotation.id()}")
    print(f"   Text: {str(annotation)}")
    annotationdata = annotation.data(key, value="word",limit=1)[0]
    print(f"   Data: {str(annotationdata)}")

 - ID: AnnotationToken28
   Text: patterns
   Data: word

Sometimes you don't really care to retrieve the data or the annotations, but merely want to test whether certain data is present on an annotation and return a boolean. For this use can use methods like test_annotations() and test_data(), which take the same keyword parameters for filtering as their counterparts annotations() and data(), but instead of returning a collection, it simply returns a boolean, which is more performant.

This following example confirms to us that the textselection is indeed a word:

In [34]:

textselection = resource_banks.textselection(stam.Offset.simple(155,163))
dataset = store.dataset("tutorial-set")
key = dataset.key("structuretype")
assert textselection.test_data(key, value="word")

It is possible to retrieve all known text selections for a given resource. A text selection is 'known' if there is at least one annotation that references it:

In [35]:

for textselection in resource_banks.textselections():
    print(textselection)

# Consider Phlebas
Consider
Phlebas
$ author=Iain M. Banks
author
Iain
M
.
Banks

## 1
1
Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of nothing;
that’s the bottom line, the final truth.

So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?
Everything about us,
Everything
about
us
,
everything around us,
everything
around
us
,
everything we know [and can know of] is composed ultimately of patterns of nothing;
everything
we
know
and
can
know
of
is
composed
ultimately
of
patterns
of
nothing
that’s the bottom line, the final truth.
that
s
the
bottom
line
,
the
final
truth
.

So where we find we have any control over those patterns,
So
where
we
find
we
have
any
control
over
those
patterns
,
why not make the most elegant ones, the most enjoyable and good ones,
why
not
make
the
most
elegant
ones
,
the
most
enjoyable
and
good
ones
,
in our own terms?
in
our
own
terms
?

## 2
2
Besides,
it left the humans in the Culture free to take care of the things that really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.
Besides,
Besides
,
it left the humans in the Culture free to take care of the things that really mattered in life,
it
left
the
humans
in
the
Culture
free
to
take
care
of
the
things
that
really
mattered
in
life
,
such as [sports, games, romance,] studying dead languages,
such
as
sports
,
games
,
romance
,
studying
dead
languages
,
barbarian societies and impossible problems,
barbarian
societies
and
impossible
problems
,
and climbing high mountains without the aid of a safety harness.
and
climbing
high
mountains
without
the
aid
of
a
safety
harness
.

It's easy to see how you can combine some of the examples to retrieve all annotations in a reverse way (i.e. via the text).

You can consider a STAM model as a graph in which the annotations, resource, data make up the nodes. The forward indices and reverse indices encode how these nodes are related and form the edges of the graph. These edges can be traversed in almost any direction using the various methods at your disposal in this STAM library. Methods like data(),annotations(), textselections() and their filtering abilities, as well as their test counterparts, essential tools to accomplish this.

Text Relations¶

Now we get to the fun part. When you select any two parts of a text, i.e. create two text selections, then between these text selections there can be a number of relationships that hold true or not:

The text selections may overlap
The text selections may be embedded entirely in one another (one overlaps fully with the other)
The text selections may come before or after another with any amount of distance in between
The text selections may succeed or precede another, one's end is the other's begin of vice versa.
The text selections may have the very same begin and/or end offset

In STAM, the TextSelectionOperator captures these relationships.

Remember our example in which we annotated the first word of line eight? The textselection for this word is embedded within the textselection for line eight as a whole. We can test that as follows using the test() method on TextSelection:

In [36]:

assert firstword.test(stam.TextSelectionOperator.embedded(), line8_textselection)

# the reverse then also holds:
assert line8_textselection.test(stam.TextSelectionOperator.embeds(), firstword)

# an embedding is essentially a stricter form of an overlap relation, so this holds too:
assert firstword.test(stam.TextSelectionOperator.overlaps(), line8_textselection)
assert line8_textselection.test(stam.TextSelectionOperator.overlaps(), firstword)

Not only can we test any given text selections, we can use this functionality to actively find text selections that are in a particular relationship with another, in other words we find related text selections. This is a core feature of the STAM library and a primary method of finding text selections and their annotations. We use the related_text() method for this.

Let's find all text selections (which we previously annotated) in line eight:

In [37]:

for textselection in line8_textselection.related_text(stam.TextSelectionOperator.embeds()):
    print(f"{textselection} @{textselection.offset()}")

everything @92:102
we @103:105
know @106:110
and @112:115
can @116:119
know @120:124
of @125:127
is @129:131
composed @132:140
ultimately @141:151
of @152:154
patterns @155:163
of @164:166
nothing @167:174

Often, what we are interested in is not the text selections as such, but the annotations that reference these text selections. Simply add .annotations():

In [38]:

for annotation in line8_textselection.related_text(stam.TextSelectionOperator.embeds()).annotations():
    print(f" - ID: {annotation.id()}")
    print(f"   Text: {str(annotation)}")
    print(f"   Data:")
    for data in annotation:
        print(f"      {data.key()}={data.value()}")

 - ID: AnnotationLine8Word1
   Text: everything
   Data:
      structuretype=word
 - ID: AnnotationToken17
   Text: everything
   Data:
      structuretype=word
 - ID: AnnotationToken18
   Text: we
   Data:
      structuretype=word
 - ID: AnnotationToken19
   Text: know
   Data:
      structuretype=word
 - ID: AnnotationToken20
   Text: and
   Data:
      structuretype=word
 - ID: AnnotationToken21
   Text: can
   Data:
      structuretype=word
 - ID: AnnotationToken22
   Text: know
   Data:
      structuretype=word
 - ID: AnnotationToken23
   Text: of
   Data:
      structuretype=word
 - ID: AnnotationToken24
   Text: is
   Data:
      structuretype=word
 - ID: AnnotationToken25
   Text: composed
   Data:
      structuretype=word
 - ID: AnnotationToken26
   Text: ultimately
   Data:
      structuretype=word
 - ID: AnnotationToken27
   Text: of
   Data:
      structuretype=word
 - ID: AnnotationToken28
   Text: patterns
   Data:
      structuretype=word
 - ID: AnnotationToken29
   Text: of
   Data:
      structuretype=word
 - ID: AnnotationToken30
   Text: nothing
   Data:
      structuretype=word

The related_text() method is available on TextSelection (and TextSelections) and Annotation (and Annotations) in which case the latter is again a shortcut so you don't have to retrieve the text selections yourself first. As said before: do use all the shortcuts the library offers, because the more the library can do for you, the more performant things are, as it's compiled to machine code and not written in Python itself.

In the last output, you may note that we got two annotations for the first word of line eight, that's because we did one manually, and the other one via our regular-expression based tokeniser.

In the previous example all we got was data with key structuretype and value word. We could have specifically selected for this by adding some filters to annotations():

In [39]:

key = store.dataset("tutorial-set").key("structuretype")
for annotation in line8_textselection.related_text(stam.TextSelectionOperator.embeds()).annotations(key, value="word"):
    print(f" - ID: {annotation.id()}")
    print(f"   Text: {str(annotation)}")
    print(f"   Data:")
    for data in annotation:
        print(f"      {data.key()}={data.value()}")

STAM Query error: [StamError] QuerySyntaxError: Malformed query: Variable ?v1 of type KEY not found - QUERY DEBUG: [
    Query {
        name: Some(
            "main",
        ),
        querytype: Select,
        resulttype: Some(
            Annotation,
        ),
        assignments: [],
        constraints: [
            TextSelections(
                Collection<TextSelection> {
                    array: [
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                23,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                40,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                41,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                42,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                43,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                44,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                45,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                46,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                47,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                48,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                49,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                50,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                51,
                            ),
                        ),
                        (
                            TextResourceHandle(
                                0,
                            ),
                            TextSelectionHandle(
                                52,
                            ),
                        ),
                    ],
                    sorted: false,
                },
                Normal,
            ),
        ],
        subquery: None,
        contextvars: {},
    },
    Query {
        name: Some(
            "sub",
        ),
        querytype: Select,
        resulttype: Some(
            Annotation,
        ),
        assignments: [],
        constraints: [
            TextVariable(
                "main",
            ),
            KeyValueVariable(
                "v1",
                Equals(
                    "word",
                ),
                Normal,
            ),
        ],
        subquery: None,
        contextvars: {
            "v1": DataKey(
                AnnotationDataSetHandle(
                    0,
                ),
                DataKeyHandle(
                    0,
                ),
            ),
        },
    },
] ()

Querying with STAMQL¶

Instead of querying data using the various python objects and methods we have seen thus-far, it is also possible to formulate a query in a query language called STAMQL. The query language is described in detail here <https://github.com/annotation/stam/tree/master/extensions/stam-query>_. We will only cover some of the basics here and show how to call it from Python.

A query starts with a SELECT statement, then a return type specifying what kind of data you want the query to return (ANNOTATION, DATA, TEXT, KEY,DATASET). Then you must specify a variable name to bind the results we obtain to (variables always start with a ? in STMAQL) and WHERE statement introducing a series of one or more constraints, each ends with a semicolon.

Let's illustrate all this with an example, we obtain obtain line 8 from our data, which we had explicitly annotated earlier:

In [40]:

query = """
SELECT ANNOTATION ?a WHERE
    DATA "tutorial-set" "linenr" = 8;
"""

for result in store.query(query):
    annotation = result['a']
    assert isinstance(annotation, stam.Annotation)
    print("ID: ", annotation.id())
    print("Text: ", str(annotation))

ID:  AnnotationLine8
Text:  everything we know [and can know of] is composed ultimately of patterns of nothing;

Here we formulated a query in STAMQL and passed it to the query() method as a string, and this gives us the results back in a list of dictionaries. The keys in the dictionary correspond to the variable binds we chose in the SELECT statement (without the ? prefix). In this case we obtain one result containing one variable a.

Instead of querying for the annotation, we could have queried directly for the text as well, we could also add extra constraints that must all be satisfied:

In [41]:

query = """
SELECT TEXT ?t WHERE
    DATA "tutorial-set" "linenr" = 8;
    DATA "tutorial-set" "structuretype" = "line";
"""

for result in store.query(query):
    print(result['t'])

everything we know [and can know of] is composed ultimately of patterns of nothing;

Querying for text rather than annotations has a subtle difference when you add multiple DATA constraints like we did above. If we query for text, then it selects text which has annotations with the specified data. The data does not necessarily have to pertain to the same annotation (as long as it covers the same text). If you query for annotations and have multiple DATA constraints, then a single annotation must have both data items.

The query language supports query composition to chain multiple queries/subqueries together. A subquery is introduced using curled braces. Take a look at the following example where we again select line 8, and then all words in line 8 (here we use a textual overlap relation):

In [42]:

query = """
SELECT ANNOTATION ?line WHERE
    DATA "tutorial-set" "linenr" = 8;
{
    SELECT ANNOTATION ?word WHERE
        RELATION ?line EMBEDS;
        DATA "tutorial-set" "structuretype" = "word";
}

"""

for result in store.query(query):
    #the ?line annotation will be returned for each
    assert 'line' in result
    annotation = result['word']
    assert isinstance(annotation, stam.Annotation)
    print("ID: ", annotation.id())
    print("Text: ", str(annotation))

ID:  AnnotationLine8Word1
Text:  everything
ID:  AnnotationToken17
Text:  everything
ID:  AnnotationToken18
Text:  we
ID:  AnnotationToken19
Text:  know
ID:  AnnotationToken20
Text:  and
ID:  AnnotationToken21
Text:  can
ID:  AnnotationToken22
Text:  know
ID:  AnnotationToken23
Text:  of
ID:  AnnotationToken24
Text:  is
ID:  AnnotationToken25
Text:  composed
ID:  AnnotationToken26
Text:  ultimately
ID:  AnnotationToken27
Text:  of
ID:  AnnotationToken28
Text:  patterns
ID:  AnnotationToken29
Text:  of
ID:  AnnotationToken30
Text:  nothing

The constraint RELATION ?line EMBEDS; in the subquery is essential here, it can be read as "?line embeds ?word" and ensures that there is a specific textual relation between the two select statements. . It is even a requirement in a subquery to have a constraint that refers back to the parent query. Each subquery can itself have a subquery to you can build long chains.

Aside from EMBEDS, there other relations you can use such as OVERLAPS, PRECEDES, SUCCEEDS, BEFORE, AFTER, SAMEBEGIN, SAMEEND, EQUALS. These are the STAMQL keywords representing the TextSelectionOperator you have already seen before.

You have the choice whether to express your queries through STAMQL or using Python objects and methods. Internally, the stam library will convert the latter to the former whenever you apply any filtering, so there is not too much difference performance-wise. There is some performance overhead though in the conversion of results when you call query() explicitly with a STAMQL query.

When calling query(), you may inject context variables yourself via keyword arguments. These will subsequently be available to be used as constraints in your query. As an example, we repeat the previous query but inject the line variable manually, we already had an instance to it laying around anyway:

In [43]:

query = """
SELECT ANNOTATION ?word WHERE
    RELATION ?line EMBEDS;
    DATA "tutorial-set" "structuretype" = "word";
"""

for result in store.query(query, line=line8_textselection):
    annotation = result['word']
    assert isinstance(annotation, stam.Annotation)
    print("ID: ", annotation.id())
    print("Text: ", str(annotation))

ID:  AnnotationLine8Word1
Text:  everything
ID:  AnnotationToken17
Text:  everything
ID:  AnnotationToken18
Text:  we
ID:  AnnotationToken19
Text:  know
ID:  AnnotationToken20
Text:  and
ID:  AnnotationToken21
Text:  can
ID:  AnnotationToken22
Text:  know
ID:  AnnotationToken23
Text:  of
ID:  AnnotationToken24
Text:  is
ID:  AnnotationToken25
Text:  composed
ID:  AnnotationToken26
Text:  ultimately
ID:  AnnotationToken27
Text:  of
ID:  AnnotationToken28
Text:  patterns
ID:  AnnotationToken29
Text:  of
ID:  AnnotationToken30
Text:  nothing

Advanced annotation¶

Higher-order Annotation¶

All annotations we have done so far reference the text as a whole with absolute offsets via a TextSelector, even though we formulated some of these offsets (first word of line eight) in relative terms.

STAM also allows you to adopt another annotation paradigm in which you point an annotation not at a text via TextSelector, but at another annotation via an AnnotationSelector, and that other annotation, or the final one of however many there are in between, points at the text with a TextSelector. You can specify an offset, which will then be interpreted relative to the [text selection of] the targeted annotation:

In [44]:

line8 = store.annotation("AnnotationLine8")
annotation = store.annotate(
    target=stam.Selector.annotationselector(line8, stam.Offset.simple(0,10)),
    data= {"key": "structuretype", "value": "word", "set": "tutorial-set" },
    id=f"AnnotationLine8Word1_explicit")

Here we are effectively annotating an annotation, so we call this a form of higher-order annotation. We explicitly capture and model a relationship. Whether to do this explicitly or use the STAM library's functionality to resolve it implicitly is entirely up to you, the modeller, and your use-case!

We can also do higher-order annotation to associate metadata with annotations, such as encoding the person who did the annotation. In such cases, we can choose not to reference the text at all, because the annotation no longer says something about the text.

In [45]:

line8 = store.annotation("AnnotationLine8")
annotation = store.annotate(
    target=stam.Selector.annotationselector(line8),
    data= [
        {"key": "annotator", "value": "Maarten van Gompel", "set": "tutorial-set" },
        {"key": "datetime", "value": "2023-04-18T17:48:56", "set": "tutorial-set" },
    ],
    id=f"AnnotationAnnotator")

Note that we invented some more keys that were added on-the-fly to our annotation dataset (i.e. the vocabulary).

This too, needn't be a higher-order annotation, you can chose to associate the AnnotationData directly with the annotation. The idea about an annotation though, is that once it is made, it is immutable; no adding/editing of annotation data or targets at later points in time. Information such as annotators and date/time information could well be associated with the annotation upon creation, but sometimes there may be data which you want to associate with an annotation at a later point in time. That would be a use case for higher-order annotation.

Complex selectors¶

Rather than point at a single target, sometimes you want to annotate something that can not be captured by a single simple selector. Take for example, again, line eight from our text:

everything we know [and can know of] is composed ultimately of patterns of nothing

Say we want to annotate the parts of the sentence without the portion in square brackets, then a single text selection could not capture it because it is discontinuous. Two text selections, however, do the job. To combine the two text selectors (or any other type of simple selector) STAM has the CompositeSelector:

In [46]:

part1 = line8_textselection.textselection(stam.Offset.simple(0,18))
part2 = line8_textselection.textselection(stam.Offset.simple(37,82))
line8mainsentence = store.annotate(
    target=stam.Selector.compositeselector(
        stam.Selector.textselector(resource_banks, part1.offset()),
        stam.Selector.textselector(resource_banks, part2.offset()),
    ),
    data= [
        {"key": "structuretype", "value": "mainsentence", "set": "tutorial-set" },
    ],
    id=f"AnnotationLine8Mainsentence")

If we ask the STAM library to get the text using str(), it will concatenate the parts with a space, which may not always be appropriate:

In [47]:

print(f"\"{line8mainsentence}\"")
assert str(line8mainsentence) == "everything we know is composed ultimately of patterns of nothing"

"everything we know is composed ultimately of patterns of nothing"

Use the text() method instead if you want to retain the parts:

In [48]:

print(line8mainsentence.text())
assert line8mainsentence.text() == ["everything we know", "is composed ultimately of patterns of nothing"]

['everything we know', 'is composed ultimately of patterns of nothing']

In a similar fashion, you can also call the textselections() methods to obtain all text selections. We already used this method before and remarked it always returns a TextSelections collection and not just a single text TextSelection, now you know why.

When the composite selector is used, the target must be interpreted jointly; the annotation applies to the whole composition rather than to individual parts.

There is also the MultiSelector, which selects multiple targets and the annotation applies to each of them individually and independently. It offers a convenient way to express multiple annotations more concisely, conserving memory usage.

Last, there is the DirectionalSelector which expresses multiple targets with a very specific order that is meaningful. For example, taking line eight again, we can express the dependency relation where the word ultimately is an adverbial modifier to the verb composed:

In [49]:

head = line8_textselection.textselection(stam.Offset.simple(41,48))
dependant = line8_textselection.textselection(stam.Offset.simple(50,59))
line8mainsentence = store.annotate(
    target=stam.Selector.directionalselector(
        stam.Selector.textselector(resource_banks, head.offset()),
        stam.Selector.textselector(resource_banks, dependant.offset()),
    ),
    data= [
        {"key": "dependency", "value": "advmod", "set": "tutorial-set" },
    ],
    id=f"AnnotationDependency")

You can interpret the different selectors under a directional selector akin to positional function parameters. You, the modeller, determine how the ordering is interpreted.

Editing annotations¶

We already explained how this is a bad idea and should be avoided: the canonical way to edit an annotation is to remove the old annotation from the store and make a new one. Removing an annotation, or any other STAM object, can be done by passing it to AnnotationStore.remove().

We can dive into the motivation behind this constraint a bit more: From a semantic perspective annotations are essentially a commentary about something else. If that what you comment on is subject to change, possibly unbeknownst to you, then such a change might invalidate your commentary, as it is no longer the same thing as what you based your comment on! The STAM model prevents these pitfalls.

Nevertheless, at the low-level there are ways around this constraint. After all, as long as you don't publish the annotations you have some liberty in editing them. Currently though, the Python library does not yet expose this.

When using AnnotationStore.remove() on any variable, you must yourself take care not to use that variable again. Also note that removing an item will
removing everything that depends on it. So if you remove an item like an
annotation, text resource or data item, then all annotations on it and
everything that references it will be automatically removed as well."

Saving and loading data¶

All this time we've been annotating but have not committed our results to any form of persistent storage. You will likely want to save your annotation store to file, and load it all again at any later point in time.

STAM's canonical serialisation format is STAM JSON:

In [50]:

store.set_filename("tutorial.store.stam.json")
store.save()

The save() method will use the filename that the annotation store was initially loaded from. We had none yet, so we set it via set_filename() first. In our current example, everything is saved into a single JSON file.

However, the set_filename() method is also available on AnnotationDataSet and TextResource. If set, these are kept in stand-off files. Annotation data sets usually use STAM JSON, but text resources generally just use plain text. The extension you use determines the file format.

There is also a STAM CSV format defined as an extension which is supported by this library. Whereas the JSON format is very verbose (=large files), the CSV is a bit more concise.

Loading an annotation store (including all stand-off files) is as simple as:

In [51]:

store2 = stam.AnnotationStore(file="tutorial.store.stam.json")

Visualising annotations¶

Having made annotations, you may want to visualize them. This can be done via the view() method on AnnotationStore. It takes as input a selection query and zero or more highlight queries, all in STAMQL; and produces either HTML output or colored ANSI text. The outputted HTML is a self-contained and standalone document.

The selection query determines what the main selection is and can be anything you can query that has text (i.e. resources, annotations, text selections).

The highlight queries determine what parts of the selections produced by the selection query you want to highlight. Highlighting is done by drawing a line underneath the text and optionally by a tag that shows extra information. Specific display options are configurable via attributes (starting with @) that precede the actual STAMQL query.

Tags can be enabled by prepending the query with one of the following attributes:

@KEYTAG - Outputs a tag with the key, pertaining to the first DATA constraint in the query
@KEYVALUETAG - Outputs a tag with the key and the value, pertaining to the first DATA constraint in the query
@VALUETAG - Outputs a tag with the value only, pertaining to the first DATA constraint in the query
@IDTAG - Outputs a tag with the public identifier of the ANNOTATION that has been selected

If you don't want to match the first DATA constraint, but the n-th, then specify a number to refer to the DATA constraint (1-indexed) in the order specifies. Note that only DATA constraints are counted:

@KEYTAG=n - Outputs a tag with the key, pertaining to the n-th DATA constraint in the query
@KEYVALUETAG=n - Outputs a tag with the key and the value, pertaining to the n-th DATA constraint in the query
@VALUETAG=n - Outputs a tag with the value only, pertaining to the n-th DATA constraint in the query

Attributes may also be provided for styling HTML output:

@STYLE=class - Will associate the mentioned CSS class (it's up to you to associate a proper stylesheet). The default one predefines only a few simple classes: italic, bold, red,green,blue, super.
@HIDE - Do not add the highlight underline and do not add an entry to the legend. This may be useful if you only want to apply @STYLE.

If no attribute is provided, there will be no tags or styling shown for that query, only a highlight underline in the HTML output.

Note: This is the same functionality as is exposed in the collection of command-line tools called stam-tools.

To display HTML in this Jupyter Notebook we import this first:

In [52]:

from IPython.display import display, HTML

Let's take a look at the data we have been creating thus-far. Let's first just query for the text of the two quotes our document consists of:

In [53]:

display(HTML(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";')))

1. AnnotationQuote1

Everything about us,
everything around us,
everything we know [and can know of] is composed ultimately of patterns of nothing;
that’s the bottom line, the final truth.

So where we find we have any control over those patterns,
why not make the most elegant ones, the most enjoyable and good ones,
in our own terms?

2. AnnotationQuote2

Besides,
it left the humans in the Culture free to take care of the things that really mattered in life,
such as [sports, games, romance,] studying dead languages,
barbarian societies and impossible problems,
and climbing high mountains without the aid of a safety harness.

We can add additional queries to highlight parts of this output, such as the words or line eight, both of which we have annotated earlier:

In [54]:

display(HTML(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";', \
           'SELECT ANNOTATION ?word WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "structuretype" = "word";', \
           'SELECT ANNOTATION ?line_8 WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "linenr" = 8;')))

word
line 8

1. AnnotationQuote1

2. AnnotationQuote2

It is important that highlight queries always reference the variable from the primary selection query (?quote in the above example), otherwise they query too much and performance is drastically suboptimal.

We can also output additional tags by prepending an attribute (@IDTAG,@KEYTAG,@VALUETAG or @KEYVALUETAG) to a highlight query:

In [55]:

display(HTML(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";', \
           '@IDTAG SELECT ANNOTATION ?word WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "structuretype" = "word";', \
           '@KEYVALUETAG SELECT ANNOTATION ?line_8 WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "linenr" = 8;')))

word
line 8

1. AnnotationQuote1

EverythingAnnotationToken9 aboutAnnotationToken10 usAnnotationToken11,
everythingAnnotationToken13 aroundAnnotationToken14 usAnnotationToken15,
everythingAnnotationLine8Word1AnnotationToken17AnnotationLine8Word1_explicit weAnnotationToken18 knowAnnotationToken19 [andAnnotationToken20 canAnnotationToken21 knowAnnotationToken22 ofAnnotationToken23] isAnnotationToken24 composedAnnotationToken25 ultimatelyAnnotationToken26 ofAnnotationToken27 patternsAnnotationToken28 ofAnnotationToken29 nothingAnnotationToken30;linenr: 8
thatAnnotationToken31’sAnnotationToken32 theAnnotationToken33 bottomAnnotationToken34 lineAnnotationToken35, theAnnotationToken37 finalAnnotationToken38 truthAnnotationToken39.

SoAnnotationToken41 whereAnnotationToken42 weAnnotationToken43 findAnnotationToken44 weAnnotationToken45 haveAnnotationToken46 anyAnnotationToken47 controlAnnotationToken48 overAnnotationToken49 thoseAnnotationToken50 patternsAnnotationToken51,
whyAnnotationToken53 notAnnotationToken54 makeAnnotationToken55 theAnnotationToken56 mostAnnotationToken57 elegantAnnotationToken58 onesAnnotationToken59, theAnnotationToken61 mostAnnotationToken62 enjoyableAnnotationToken63 andAnnotationToken64 goodAnnotationToken65 onesAnnotationToken66,
inAnnotationToken68 ourAnnotationToken69 ownAnnotationToken70 termsAnnotationToken71?

2. AnnotationQuote2

BesidesAnnotationToken74,
itAnnotationToken76 leftAnnotationToken77 theAnnotationToken78 humansAnnotationToken79 inAnnotationToken80 theAnnotationToken81 CultureAnnotationToken82 freeAnnotationToken83 toAnnotationToken84 takeAnnotationToken85 careAnnotationToken86 ofAnnotationToken87 theAnnotationToken88 thingsAnnotationToken89 thatAnnotationToken90 reallyAnnotationToken91 matteredAnnotationToken92 inAnnotationToken93 lifeAnnotationToken94,
suchAnnotationToken96 asAnnotationToken97 [sportsAnnotationToken98, gamesAnnotationToken100, romanceAnnotationToken102,] studyingAnnotationToken104 deadAnnotationToken105 languagesAnnotationToken106,
barbarianAnnotationToken108 societiesAnnotationToken109 andAnnotationToken110 impossibleAnnotationToken111 problemsAnnotationToken112,
andAnnotationToken114 climbingAnnotationToken115 highAnnotationToken116 mountainsAnnotationToken117 withoutAnnotationToken118 theAnnotationToken119 aidAnnotationToken120 ofAnnotationToken121 aAnnotationToken122 safetyAnnotationToken123 harnessAnnotationToken124.

Alternatively, you can output annotations as text with ANSI escape sentences, by setting keyword argument format=ansi. This is designed for terminal output, but it can also be visualised here:

In [56]:

print(store.view('SELECT ANNOTATION ?quote WHERE DATA "tutorial-set" "structuretype" = "quote";', \
          'SELECT ANNOTATION ?word WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "structuretype" = "word";', \
          '@KEYVALUETAG SELECT ANNOTATION ?line_8 WHERE RELATION ?quote EMBEDS; DATA "tutorial-set" "linenr" = 8;',
          format="ansi"))

Legend:
       1. word
       2. line 8

----------------------------------- 1. AnnotationQuote1 -----------------------------------
[Everything] [about] [us],
[everything] [around] [us],
[[everything]]] [we] [know] [[and] [can] [know] [of]] [is] [composed] [ultimately] [of] [patterns] [of] [nothing];|linenr: 8]
[that]’[s] [the] [bottom] [line], [the] [final] [truth].

[So] [where] [we] [find] [we] [have] [any] [control] [over] [those] [patterns],
[why] [not] [make] [the] [most] [elegant] [ones], [the] [most] [enjoyable] [and] [good] [ones],
[in] [our] [own] [terms]?
----------------------------------- 2. AnnotationQuote2 -----------------------------------
[Besides],
[it] [left] [the] [humans] [in] [the] [Culture] [free] [to] [take] [care] [of] [the] [things] [that] [really] [mattered] [in] [life],
[such] [as] [[sports], [games], [romance],] [studying] [dead] [languages],
[barbarian] [societies] [and] [impossible] [problems],
[and] [climbing] [high] [mountains] [without] [the] [aid] [of] [a] [safety] [harness].

This concludes this tutorial. We hope to have shown you how to use the STAM python library.

STAM Tutorial: Standoff Text Annotation for Pythonistas¶

Introduction¶

Installation¶

Annotating from scratch¶

Adding a text¶

Creating an annotation dataset (vocabulary)¶

The first annotations with text selectors¶

Inspecting data (1)¶

Annotating via find_text()¶

Inspecting data (2)¶

Annotations via text selections¶

Converting offsets¶

Tokenisation via regular expressions¶

Annotating Metadata¶

Navigating through your data¶

Basic iterating and counting¶

Finding data¶

Text Relations¶

Querying with STAMQL¶

Advanced annotation¶

Higher-order Annotation¶

Complex selectors¶

Editing annotations¶

Saving and loading data¶

Visualising annotations¶

1. AnnotationQuote1

2. AnnotationQuote2

1. AnnotationQuote1

2. AnnotationQuote2

1. AnnotationQuote1

2. AnnotationQuote2

Annotating via `find_text()`¶