#!/usr/bin/env python # coding: utf-8 # # # # # You might want to consider the [start](search.ipynb) of this tutorial. # # Short introductions to other TF datasets: # # * [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb), # * [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb), # or the # * [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb) # # # Export # # Text-Fabric is not a world to stay in for ever. # When you go to other worlds, you can travel with the corpus data in your backpack. # # Here we show two destinations (and one of them is also an origin): # Pandas and Emdros. # # Before we go there, we load the corpus. # In[1]: get_ipython().run_line_magic('load_ext', 'autoreload') get_ipython().run_line_magic('autoreload', '2') # # Incantation # # The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are # explained in the [start tutorial](start.ipynb). # In[2]: from tf.app import use # In[3]: A = use("ETCBC/bhsa", hoist=globals()) # # Pandas # # The first journey is to # [Pandas](https://pandas.pydata.org). # # We convert the data to a dataframe, via a tab-separated text file. # # The nodes are exported as rows, they correspond to the text objects such as word, phrase, clause, sentence, verse, chapter, book and a few others. # # The BHSA features become the columns, so each row tells what values the features have for the corresponding node. # # The edges corresponding to the BHSA features *mother*, *functional_parent*, *distributional_parent* are # exported as extra columns. For each row, such a column indicates the target of a corresponding outgoing edge. # # We also write the data that says which objects are contained in which. # To each row we add the following columns: # # * for each node type, except `word` there is a column with that node type as name; # the value in that column is the node of this type that contains the row node (if any). # # Extra data such as lexicon (including frequency and rank features), phonetic transcription, and ketiv-qere are also included. # While exporting the data to Pandas format, the program # composes the big table and saves it as a tab delimited file. # This is stored in a temporary directory (not visible on GitHub). # # This temporary file can also be read by R, but we proceed with Pandas. # Pandas offers functions in the same spirit as R, but is more Pythonic and also faster. # In[4]: A.exportPandas() # ## How to use the Pandas file # # See # [pandas](pandas.ipynb) # for a tutorial on how to work with the BHSA as a dataframe. # # We collect a few pieces of data that will come in handy. # # Here is the fhe first verse node: # In[5]: F.otype.s("verse")[0] # # MQL # # The next journey is to MQL, a text-database format not unlike SQL, supported by the Emdros software. # # [EMDROS](http://emdros.org), written by Ulrik Petersen, # is a text database system with the powerful *topographic* query language MQL. # The ideas are based on a model devised by Christ-Jan Doedens in # [Text Databases: One Database Model and Several Retrieval Languages](https://books.google.nl/books?id=9ggOBRz1dO4C). # # Text-Fabric's model of slots, nodes and edges is a fairly straightforward translation of the models of Christ-Jan Doedens and Ulrik Petersen. # # [SHEBANQ](https://shebanq.ancient-data.org) uses EMDROS to offer users to execute and save MQL queries against the Hebrew Text Database of the ETCBC. # # So it is kind of logical and convenient to be able to work with a Text-Fabric resource through MQL. # # If you have obtained an MQL dataset somehow, you can turn it into a text-fabric data set by `importMQL()`, # which we will not show here. # # And if you want to export a Text-Fabric data set to MQL, that is also possible. # # After the `Fabric(modules=...)` call, you can call `exportMQL()` in order to save all features of the # indicated modules into a big MQL dump, which can be imported by an EMDROS database. # In[4]: A.exportMQL("mybhsa", exportDir="~/Downloads/mql") # Now you have a file `~/Downloads/mql/mybhsa.mql` of 530 MB. # You can import it into an Emdros database by saying: # # cd ~/Downloads/mql # rm mybhsa.mql # mql -b 3 < mybhsa.mql # # The result is an SQLite3 database `mybhsa` in the same directory (168 MB). # You can run a query against it by creating a text file test.mql with this contents: # # select all objects where # [lex gloss ~ 'make' # [word FOCUS] # ] # # And then say # # mql -b 3 -d mybhsa test.mql # # You will see raw query results: all word occurrences that belong to lexemes with `make` in their gloss. # # It is not very pretty, and probably you should use a more visual Emdros tool to run those queries. # You see a lot of node numbers, but the good thing is, you can look those node numbers up in Text-Fabric. # # All steps # # * **[start](start.ipynb)** your first step in mastering the bible computationally # * **[display](display.ipynb)** become an expert in creating pretty displays of your text structures # * **[search](search.ipynb)** turbo charge your hand-coding with search templates # * **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results # * **[share](share.ipynb)** draw in other people's data and let them use yours # * **export** export your dataset as an Emdros database # * **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features # * **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus # * **[volumes](volumes.ipynb)** work with selected books only # * **[trees](trees.ipynb)** work with the BHSA data as syntax trees # # CC-BY Dirk Roorda