Unit 3: Showcase and demos

  • Rus′ primary chronicle (Povest′ vremennyx let, PVL)
  • Samuel Beckett Digital Manuscript Project
  • Textgrid
  • Selma Lagerlöf Archive

CollateX is used in different projects, including digital editions, virtual environment for digital textual research and digital archives. Some examples below.

Rus′ primary chronicle (Povest′ vremennyx let, PVL)

What is the PVL?

  • Principal indigenous historical record of the Eastern Slavs (ancestors of modern Russia, Ukraine, Belarus)
  • Incorporates Byzantine chronicles (beginning with the division of the earth among the sons of Noah) and Rusian oral tradition
  • Multiple redactions produced in different monasteries (under the patronage of different princes) through the early 12th century
  • Five principal extant manuscripts, the earliest dated to 1377
  • Traditional scholarship favors the oldest manuscript, although stemmatic analysis suggests that it does not always provide the best text
  • Important in literary, linguistic, and historical study

The Harvard Ukrainian Research Institute edition

Editorial method

  • Full interlinear collation based on new diplomatic transcriptions of all principal witness, augmented by others where the principal ones are defective
  • Dynamic critical text (paradosis ‘best reading’)


  • Typeset in troff in the 1980s for paper publication (2004)
  • Converted to SGML, then XML, then HTML for web publication in the 1990s
  • Original collation is line-level and was performed by hand
  • Now using CollateX to introduce word-level collation
  • No funding (= must automate as much as possible)

HTML output



  • What to align (word division? punctuation? page and line breaks?) (Gothenburg tokenization)
  • Diplomatic transcription means lots of insignificant variation (Gothenburg normalization and analysis)


  • Internal markup (deletions, additions, corrections, choices)
  • Text is already subdivided into ~8200 blocks (line-level collation sets), which can be regarded as individual collation tasks for word-level alignment

For more information, including the full digital edition, see http://pvl.obdurodon.org.

Samuel Beckett Digital Manuscript Project

The Samuel Beckett Digital Manuscript Project aims to collect the manuscripts of Samuel Beckett's works in a digital way, and to facilitate genetic research. Beckett's works has an additional interest as the author was also the translator of his own texts.

The example below is drawn from L'Innommable / The Unnamable.

Starting from one manuscript, the user can compare all the different versions of each sentence, or portion of the text, in the synoptic sentence view. The user has the possibility to collate all French or English versions, on the fly, using Collatex.

The Beckett Project guarantees the quality of the transcriptions and it uses CollateX to collate versions, without providing an apparatus manually corrected. The collation is made on texts which have multiple layers, as in the case of addition and deletion.


Textgrid is a workbench for digital editing and a repository for archiving digital texts and editions. It is a virtual research environment, which includes an XML editor, support for linguistic analysis, facilities for working with images and for linking the image with the text, and it integrates CollateX (available through the Eclipse Marketplace).

In TextGrid, CollateX runs on a collation set, that the user can define gathering plain text documents. The results can be shown as an alignment table, a variant graph or in XML.

Users can also establish an equivalence set, to define the tokens to be treated as identical. Variance between them will be ignored by the tool, producing different results

Selma Lagerlöf Archive

CollateX has been used in the Selma Lagerlöf Archive to collate TEI-xml documents also taking the structure into account. This was originally supported by Svenska vitterhetssamfundet (SVS), (approximately The Swedish Philological Society). Later on merged into the Swedish Literature Bank. Take a look at the branded UI Selma Lagerlöf Archive at Litteraturbanken.se The underlying eXist-db xar app developed by Friprogramvarusyndikatet.se has been un-branded and will be publically available. The app has been heavily tested in an extensive and extended limited availability period which is finally reaching its end. The latest sponsor supporting with a handful of days being the Swedish Literary Society in Finland (SLS).

In [ ]: