#!/usr/bin/env python
# coding: utf-8
#
#
#
#
# You might want to consider the [start](search.ipynb) of this tutorial.
#
# Short introductions to other TF datasets:
#
# * [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
# * [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
# or the
# * [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)
#
# # Export
#
# Text-Fabric is not a world to stay in for ever.
# When you go to other worlds, you can travel with the corpus data in your backpack.
#
# Here we show two destinations (and one of them is also an origin):
# Pandas and Emdros.
#
# Before we go there, we load the corpus.
# In[1]:
get_ipython().run_line_magic('load_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')
# # Incantation
#
# The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are
# explained in the [start tutorial](start.ipynb).
# In[2]:
from tf.app import use
# In[3]:
A = use("ETCBC/bhsa", hoist=globals())
# # Pandas
#
# The first journey is to
# [Pandas](https://pandas.pydata.org).
#
# We convert the data to a dataframe, via a tab-separated text file.
#
# The nodes are exported as rows, they correspond to the text objects such as word, phrase, clause, sentence, verse, chapter, book and a few others.
#
# The BHSA features become the columns, so each row tells what values the features have for the corresponding node.
#
# The edges corresponding to the BHSA features *mother*, *functional_parent*, *distributional_parent* are
# exported as extra columns. For each row, such a column indicates the target of a corresponding outgoing edge.
#
# We also write the data that says which objects are contained in which.
# To each row we add the following columns:
#
# * for each node type, except `word` there is a column with that node type as name;
# the value in that column is the node of this type that contains the row node (if any).
#
# Extra data such as lexicon (including frequency and rank features), phonetic transcription, and ketiv-qere are also included.
# While exporting the data to Pandas format, the program
# composes the big table and saves it as a tab delimited file.
# This is stored in a temporary directory (not visible on GitHub).
#
# This temporary file can also be read by R, but we proceed with Pandas.
# Pandas offers functions in the same spirit as R, but is more Pythonic and also faster.
# In[4]:
A.exportPandas()
# ## How to use the Pandas file
#
# See
# [pandas](pandas.ipynb)
# for a tutorial on how to work with the BHSA as a dataframe.
#
# We collect a few pieces of data that will come in handy.
#
# Here is the fhe first verse node:
# In[5]:
F.otype.s("verse")[0]
# # MQL
#
# The next journey is to MQL, a text-database format not unlike SQL, supported by the Emdros software.
#
# [EMDROS](http://emdros.org), written by Ulrik Petersen,
# is a text database system with the powerful *topographic* query language MQL.
# The ideas are based on a model devised by Christ-Jan Doedens in
# [Text Databases: One Database Model and Several Retrieval Languages](https://books.google.nl/books?id=9ggOBRz1dO4C).
#
# Text-Fabric's model of slots, nodes and edges is a fairly straightforward translation of the models of Christ-Jan Doedens and Ulrik Petersen.
#
# [SHEBANQ](https://shebanq.ancient-data.org) uses EMDROS to offer users to execute and save MQL queries against the Hebrew Text Database of the ETCBC.
#
# So it is kind of logical and convenient to be able to work with a Text-Fabric resource through MQL.
#
# If you have obtained an MQL dataset somehow, you can turn it into a text-fabric data set by `importMQL()`,
# which we will not show here.
#
# And if you want to export a Text-Fabric data set to MQL, that is also possible.
#
# After the `Fabric(modules=...)` call, you can call `exportMQL()` in order to save all features of the
# indicated modules into a big MQL dump, which can be imported by an EMDROS database.
# In[4]:
A.exportMQL("mybhsa", exportDir="~/Downloads/mql")
# Now you have a file `~/Downloads/mql/mybhsa.mql` of 530 MB.
# You can import it into an Emdros database by saying:
#
# cd ~/Downloads/mql
# rm mybhsa.mql
# mql -b 3 < mybhsa.mql
#
# The result is an SQLite3 database `mybhsa` in the same directory (168 MB).
# You can run a query against it by creating a text file test.mql with this contents:
#
# select all objects where
# [lex gloss ~ 'make'
# [word FOCUS]
# ]
#
# And then say
#
# mql -b 3 -d mybhsa test.mql
#
# You will see raw query results: all word occurrences that belong to lexemes with `make` in their gloss.
#
# It is not very pretty, and probably you should use a more visual Emdros tool to run those queries.
# You see a lot of node numbers, but the good thing is, you can look those node numbers up in Text-Fabric.
# # All steps
#
# * **[start](start.ipynb)** your first step in mastering the bible computationally
# * **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
# * **[search](search.ipynb)** turbo charge your hand-coding with search templates
# * **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
# * **[share](share.ipynb)** draw in other people's data and let them use yours
# * **export** export your dataset as an Emdros database
# * **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features
# * **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus
# * **[volumes](volumes.ipynb)** work with selected books only
# * **[trees](trees.ipynb)** work with the BHSA data as syntax trees
#
# CC-BY Dirk Roorda