We show how to convert a TEI data source into TF.
This has two stages:
A dataset based on characters is precise, but rather inefficient. The second step makes the dataset much more efficient.
More ways to do it!
convertExpress
: as few commands/feedback/interaction as possible,We start with a case where the input does not validate.
!tf-fromtei all
Start folder proeftuin: 1 MD letter md 19090216y_IONG_1303.xml 2 MD letter md 19090407y_IONG_1739.xml 3 MD letter md 19090421y_IONG_1304.xml 4 MD letter md 19090426y_IONG_1738.xml 5 MD letter md 19090513y_IONG_1293.xml 6 MD letter md 19090624_IONG_1294.xml 7 MD letter md 19090807y_IONG_1296.xml 8 MD letter md 19090824y_KNAP_1747.xml 9 MD letter md 19090905y_IONG_1295.xml 10 MD letter md 190909XX_QUER_1654.xml 11 MD letter md 19091024_SPOO_0016.xml 12 MD letter md 19091024y_IONG_1297.xml 13 MD letter md 19091025y_IONG_1298.xml 14 MD letter md 19100131_SAAL_ARNO_0018.xml End folder proeftuin Start folder backmatter: 15 artwork artworklist artwork.xml 16 TEI bibliolist biblio.xml End folder backmatter Validating ... Validating ... Validating ... 0 processing instructions encountered. Namespaces OK Start folder proeftuin: 1 MD md letter 19090216y_IONG_1303.xml 2 MD md letter 19090407y_IONG_1739.xml 3 MD md letter 19090421y_IONG_1304.xml 4 MD md letter 19090426y_IONG_1738.xml 5 MD md letter 19090513y_IONG_1293.xml 6 MD md letter 19090624_IONG_1294.xml 7 MD md letter 19090807y_IONG_1296.xml 8 MD md letter 19090824y_KNAP_1747.xml 9 MD md letter 19090905y_IONG_1295.xml 10 MD md letter 190909XX_QUER_1654.xml 11 MD md letter 19091024_SPOO_0016.xml 12 MD md letter 19091024y_IONG_1297.xml 13 MD md letter 19091025y_IONG_1298.xml 14 MD md letter 19100131_SAAL_ARNO_0018.xml End folder proeftuin Start folder backmatter: 15 artwork artworklist artwork.xml 16 TEI bibliolist biblio.xml End folder backmatter App updated
However, the previous version is correct, so we revert to it. That is what the tei=-1
does.
!tf-fromtei all tei=-1
Start folder proeftuin: 14 19100131_SAAL_ARNO_0018.xml End folder proeftuin Validation OK Namespaces OK Start folder proeftuin: 14 19100131_SAAL_ARNO_0018.xml End folder proeftuin App updated
Now we have a preliminary TF dataset to work with. The next step is no longer involved with the source TEI.
!addnlp all
!tf-fromtei apptoken
0.14s Using NLP pipeline Spacy (en) ... 4.03s NLP done 0.00s Feature overview: 45 for nodes; 1 for edges; 1 configs; 9 computed App updated with tokens and sentences
This is for producing a zip file to attach to the latest release, so that TF can download the data smoothly.
!tf-zipall
loading tf app ... Data to be zipped: OK app (v0.8.2 5843b9) : ~/github/annotation/mondriaan/app OK main data (v0.8.2 5843b9) : ~/github/annotation/mondriaan/tf/0.8.2 OK graphics (v0.8.2 5843b9) : ~/github/annotation/mondriaan/illustrations Writing zip file ... Result: ~/Downloads/github/annotation/mondriaan/complete.zip
We view the result in the TF browser.
To stop the browser, interrupt the kernel (Press i
twice).
!tf-fromtei browse
This is Text-Fabric 11.4.3 Starting new kernel listening on 10990 Loading data for annotation/mondriaan. Please wait ... Setting up TF kernel for annotation/mondriaan **Locating corpus resources ...** Using app in ~/github/annotation/mondriaan/app: repo clone offline under ~/github (local github) Using data in ~/github/annotation/mondriaan/tf/0.8.2: repo clone offline under ~/github (local github) Using data in ~/github/annotation/mondriaan/illustrations: repo clone offline under ~/github (local github) <IPython.core.display.HTML object> TF setup done. Starting new webserver listening on 20990 WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on http://localhost:20990 Press CTRL+C to quit Opening annotation/mondriaan in browser Press <Ctrl+C> to stop the TF browser Kernel listening at port 10990 127.0.0.1 - - [02/May/2023 09:19:04] "GET / HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/display.css HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/highlight.css HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fonts.css HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/base.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /data/static/logo.png HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/index.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fontawesome.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/tf3.0.js HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/jquery.js HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/icon.png HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/huc.png HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fonts/fa-solid-900.woff2 HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "POST /passage HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fonts/fa-regular-400.woff2 HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "POST / HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/base.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/display.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/highlight.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fonts.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/index.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fontawesome.css HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/jquery.js HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/tf3.0.js HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/icon.png HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/huc.png HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /data/static/logo.png HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fonts/fa-solid-900.woff2 HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/fonts/fa-regular-400.woff2 HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:04] "POST /passage HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:04] "GET /server/static/favicon.ico HTTP/1.1" 304 - 127.0.0.1 - - [02/May/2023 09:19:11] "POST /passage HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:16] "POST /passage HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:16] "GET /local/illustrations/artwork-m-15163.jpg HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:16] "GET /local/illustrations/artwork-m-15164.jpg HTTP/1.1" 200 - 127.0.0.1 - - [02/May/2023 09:19:31] "POST /passage/6 HTTP/1.1" 200 -