Notebook

You might want to consider the start of this tutorial.

Short introductions to other TF datasets:

or the

Quran

Export¶

Text-Fabric is not a world to stay in for ever. When you go to other worlds, you can travel with the corpus data in your backpack.

Here we show two destinations (and one of them is also an origin): Pandas and Emdros.

Before we go there, we load the corpus.

In [1]:

%load_ext autoreload
%autoreload 2

Incantation¶

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.

In [2]:

from tf.app import use

In [3]:

A = use("ETCBC/bhsa", hoist=globals())

Locating corpus resources ...

app: ~/text-fabric-data/github/ETCBC/bhsa/app

data: ~/text-fabric-data/github/ETCBC/bhsa/tf/2021

data: ~/text-fabric-data/github/ETCBC/phono/tf/2021

data: ~/text-fabric-data/github/ETCBC/parallels/tf/2021

TF: TF API 12.1.4, ETCBC/bhsa/app v3, Search Reference
Data: ETCBC - bhsa 2021, Character table, Feature docs

Node types

Name	# of nodes	# slots / node	% coverage
book	39	10938.21	100
chapter	929	459.19	100
lex	9230	46.22	100
verse	23213	18.38	100
half_verse	45179	9.44	100
sentence	63717	6.70	100
sentence_atom	64514	6.61	100
clause	88131	4.84	100
clause_atom	90704	4.70	100
phrase	253203	1.68	100
phrase_atom	267532	1.59	100
subphrase	113850	1.42	38
word	426590	1.00	100

Sets: no custom sets
Features:

Parallel Passages

crossref

int

🆗 links between similar passages

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis

book

str

✅ book name in Latin (Genesis; Numeri; Reges1; ...)

book@ll

str

✅ book name in amharic (ኣማርኛ)

chapter

int

✅ chapter number (1; 2; 3; ...)

code

int

✅ identifier of a clause atom relationship (0; 74; 367; ...)

det

str

✅ determinedness of phrase(atom) (det; und; NA.)

domain

str

✅ text type of clause (? (Unknown); N (narrative); D (discursive); Q (Quotation).)

freq_lex

int

✅ frequency of lexemes

function

str

✅ syntactic function of phrase (Cmpl; Objc; Pred; ...)

g_cons

str

✅ word consonantal-transliterated (B R>CJT BR> >LHJM ...)

g_cons_utf8

str

✅ word consonantal-Hebrew (ב ראשׁית ברא אלהים)

g_lex

str

✅ lexeme pointed-transliterated (B.:- R;>CIJT B.@R@> >:ELOH ...)

g_lex_utf8

str

✅ lexeme pointed-Hebrew (בְּ רֵאשִׁית בָּרָא אֱלֹה)

g_word

str

✅ word pointed-transliterated (B.:- R;>CI73JT B.@R@74> >:ELOHI92JM)

g_word_utf8

str

✅ word pointed-Hebrew (בְּ רֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים)

gloss

str

🆗 english translation of lexeme (beginning create god(s))

str

✅ grammatical gender (m; f; NA; unknown.)

label

str

✅ (half-)verse label (half verses: A; B; C; verses: GEN 01,02)

language

str

✅ of word or lexeme (Hebrew; Aramaic.)

lex

str

✅ lexeme consonantal-transliterated (B R>CJT/ BR>[ >LHJM/)

lex_utf8

str

✅ lexeme consonantal-Hebrew (ב ראשׁית֜ ברא אלהים֜)

str

✅ lexical set, subclassification of part-of-speech (card; ques; mult)

nametype

str

⚠️ named entity type (pers; mens; gens; topo; ppde.)

nme

str

✅ nominal ending consonantal-transliterated (absent; n/a; JM, ...)

str

✅ grammatical number (sg; du; pl; NA; unknown.)

number

int

✅ sequence number of an object within its context

otype

str

pargr

str

🆗 hierarchical paragraph number (1; 1.2; 1.2.3.4; ...)

pdp

str

✅ phrase dependent part-of-speech (art; verb; subs; nmpr, ...)

pfm

str

✅ preformative consonantal-transliterated (absent; n/a; J, ...)

prs

str

✅ pronominal suffix consonantal-transliterated (absent; n/a; W; ...)

prs_gn

str

✅ pronominal suffix gender (m; f; NA; unknown.)

prs_nu

str

✅ pronominal suffix number (sg; du; pl; NA; unknown.)

prs_ps

str

✅ pronominal suffix person (p1; p2; p3; NA; unknown.)

str

✅ grammatical person (p1; p2; p3; NA; unknown.)

qere

str

✅ word pointed-transliterated masoretic reading correction

qere_trailer

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_trailer_utf8

str

✅ interword material -pointed-transliterated (Masoretic correction)

qere_utf8

str

✅ word pointed-Hebrew masoretic reading correction

rank_lex

int

✅ ranking of lexemes based on freqnuecy

rela

str

✅ linguistic relation between clause/(sub)phrase(atom) (ADJ; MOD; ATR; ...)

str

✅ part-of-speech (art; verb; subs; nmpr, ...)

str

✅ state of a noun (a (absolute); c (construct); e (emphatic).)

tab

int

✅ clause atom: its level in the linguistic embedding

trailer

str

✅ interword material pointed-transliterated (& 00 05 00_P ...)

trailer_utf8

str

✅ interword material pointed-Hebrew (־ ׃)

txt

str

✅ text type of clause and surrounding (repetion of ? N D Q as in feature domain)

typ

str

✅ clause/phrase(atom) type (VP; NP; Ellp; Ptcp; WayX)

uvf

str

✅ univalent final consonant consonantal-transliterated (absent; N; J; ...)

vbe

str

✅ verbal ending consonantal-transliterated (n/a; W; ...)

vbs

str

✅ root formation consonantal-transliterated (absent; n/a; H; ...)

verse

int

✅ verse number

voc_lex

str

✅ vocalized lexeme pointed-transliterated (B.: R;>CIJT BR> >:ELOHIJM)

voc_lex_utf8

str

✅ vocalized lexeme pointed-Hebrew (בְּ רֵאשִׁית ברא אֱלֹהִים)

str

✅ verbal stem (qal; piel; hif; apel; pael)

str

✅ verbal tense (perf; impv; wayq; infc)

mother

none

✅ linguistic dependency between textual objects

oslots

none

Phonetic Transcriptions

phono

str

🆗 phonological transcription (bᵊ rēšˌîṯ bārˈā ʔᵉlōhˈîm)

phono_trailer

str

🆗 interword material in phonological transcription

Settings:

specified

apiVersion: 3
appName: ETCBC/bhsa
appPath: /Users/me/text-fabric-data/github/ETCBC/bhsa/app
commit: gb112c161cfd21eae403d51a2733740d8743460e7
css: ''
dataDisplay:
- exampleSectionHtml:
  <code>Genesis 1:1</code> (use <a href="https://github.com/{org}/{repo}/blob/master/tf/{version}/book%40en.tf" target="_blank">English book names</a>)
- excludedFeatures:
  - g_uvf_utf8
  - g_vbs
  - kq_hybrid
  - languageISO
  - g_nme
  - lex0
  - is_root
  - g_vbs_utf8
  - g_uvf
  - dist
  - root
  - suffix_person
  - g_vbe
  - dist_unit
  - suffix_number
  - distributional_parent
  - kq_hybrid_utf8
  - crossrefSET
  - instruction
  - g_prs
  - lexeme_count
  - rank_occ
  - g_pfm_utf8
  - freq_occ
  - crossrefLCS
  - functional_parent
  - g_pfm
  - g_nme_utf8
  - g_vbe_utf8
  - kind
  - g_prs_utf8
  - suffix_gender
  - mother_object_type
- noneValues:
  - absent
  - n/a
  - none
  - unknown
  - no value
  - NA
docs:
- docBase: {docRoot}/{repo}
- docExt: ''
- docPage: ''
- docRoot: https://{org}.github.io
- featurePage: 0_home
interfaceDefaults: {}
isCompatible: True
local: local
localDir: /Users/me/text-fabric-data/github/ETCBC/bhsa/_temp
provenanceSpec:
- corpus: BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis
- doi: 10.5281/zenodo.1007624
- extraData: ner
- moduleSpecs:
  - :
    backend: no value
    corpus: Phonetic Transcriptions
    docUrl:
    https://nbviewer.jupyter.org/github/etcbc/phono/blob/master/programs/phono.ipynb
    doi: 10.5281/zenodo.1007636
    org: ETCBC
    relative: /tf
    repo: phono
  - :
    backend: no value
    corpus: Parallel Passages
    docUrl:
    https://nbviewer.jupyter.org/github/ETCBC/parallels/blob/master/programs/parallels.ipynb
    doi: 10.5281/zenodo.1007642
    org: ETCBC
    relative: /tf
    repo: parallels
- org: ETCBC
- relative: /tf
- repo: bhsa
- version: 2021
- webBase: https://shebanq.ancient-data.org/hebrew
- webHint: Show this on SHEBANQ
- webLang: la
- webLexId: True
- webUrl:
  {webBase}/text?book=<1>&chapter=<2>&verse=<3>&version={version}&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt
- webUrlLex: {webBase}/word?version={version}&id=<lid>
release: v1.8.1
typeDisplay:
- clause:
  - label: {typ} {rela}
  - style: ''
- clause_atom:
  - hidden: True
  - label: {code}
  - level: 1
  - style: ''
- half_verse:
  - hidden: True
  - label: {label}
  - style: ''
  - verselike: True
- lex:
  - featuresBare: gloss
  - label: {voc_lex_utf8}
  - lexOcc: word
  - style: orig
  - template: {voc_lex_utf8}
- phrase:
  - label: {typ} {function}
  - style: ''
- phrase_atom:
  - hidden: True
  - label: {typ} {rela}
  - level: 1
  - style: ''
- sentence:
  - label: {number}
  - style: ''
- sentence_atom:
  - hidden: True
  - label: {number}
  - level: 1
  - style: ''
- subphrase:
  - hidden: True
  - label: {number}
  - style: ''
- word:
  - features: pdp vs vt
  - featuresBare: lex:gloss
writing: hbo

TF API: names N F E L T S C TF Fs Fall Es Eall Cs Call directly usable

Pandas¶

The first journey is to Pandas.

We convert the data to a data frame, via a tab-separated text file.

The nodes are exported as rows, they correspond to the text objects such as word, phrase, clause, sentence, verse, chapter, book and a few others.

The BHSA features become the columns, so each row tells what values the features have for the corresponding node.

The edges corresponding to the BHSA features mother, functional_parent, distributional_parent are exported as extra columns. For each row, such a column indicates the target of a corresponding outgoing edge.

We also write the data that says which objects are contained in which. To each row we add the following columns:

for each node type, except word there is a column with that node type as name; the value in that column is the node of this type that contains the row node (if any).

Extra data such as lexicon (including frequency and rank features), phonetic transcription, and ketiv-qere are also included.

While exporting the data to Pandas format, the program composes the big table and saves it as a tab delimited file. This is stored in a temporary directory (not visible on GitHub).

This temporary file can also be read by R, but we proceed with Pandas. Pandas offers functions in the same spirit as R, but is more Pythonic and also faster.

In [4]:

A.exportPandas()

  0.00s Create tsv file ...
   |     2.96s   5%   72342 nodes written
   |     5.89s  10%  144684 nodes written
   |     8.80s  15%  217026 nodes written
   |       12s  20%  289368 nodes written
   |       15s  25%  361710 nodes written
   |       18s  30%  434052 nodes written
   |       21s  35%  506394 nodes written
   |       24s  40%  578736 nodes written
   |       27s  45%  651078 nodes written
   |       30s  50%  723420 nodes written
   |       33s  55%  795762 nodes written
   |       36s  60%  868104 nodes written
   |       39s  65%  940446 nodes written
   |       42s  70% 1012788 nodes written
   |       45s  75% 1085130 nodes written
   |       48s  80% 1157472 nodes written
   |       50s  85% 1229814 nodes written
   |       53s  90% 1302156 nodes written
   |       56s  95% 1374498 nodes written
   |       59s  95% 1446831 nodes written and done
    59s TSV file is ~/text-fabric-data/github/ETCBC/bhsa/_temp/data-2021.tsv
    59s Columns 72:
    59s 	nd
    59s 	otype
    59s 	g_cons
    59s 	g_cons_utf8
    59s 	g_lex
    59s 	g_lex_utf8
    59s 	g_word
    59s 	g_word_utf8
    59s 	lex
    59s 	lex_utf8
    59s 	phono
    59s 	phono_trailer
    59s 	qere
    59s 	qere_trailer
    59s 	qere_trailer_utf8
    59s 	qere_utf8
    59s 	trailer
    59s 	trailer_utf8
    59s 	voc_lex_utf8
    59s 	in_book
    59s 	in_chapter
    59s 	in_verse
    59s 	in_lex
    59s 	in_half_verse
    59s 	in_sentence
    59s 	in_sentence_atom
    59s 	in_clause
    59s 	in_clause_atom
    59s 	in_phrase
    59s 	in_phrase_atom
    59s 	in_subphrase
    59s 	in_word
    59s 	crossref
    59s 	mother
    59s 	book
    59s 	chapter
    59s 	code
    59s 	det
    59s 	domain
    59s 	freq_lex
    59s 	function
    59s 	gloss
    59s 	gn
    59s 	label
    59s 	language
    59s 	ls
    59s 	nametype
    59s 	nme
    59s 	nu
    59s 	number
    59s 	pargr
    59s 	pdp
    59s 	pfm
    59s 	prs
    59s 	prs_gn
    59s 	prs_nu
    59s 	prs_ps
    59s 	ps
    59s 	rank_lex
    59s 	rela
    59s 	sp
    59s 	st
    59s 	tab
    59s 	txt
    59s 	typ
    59s 	uvf
    59s 	vbe
    59s 	vbs
    59s 	verse
    59s 	voc_lex
    59s 	vs
    59s 	vt

 1m 00s 	1446832 rows
 1m 00s 	273843208 characters
 1m 00s Importing into Pandas ...
   |     0.00s Reading tsv file ...
   |       13s Done. Size = 104171832
   |       13s Saving as Parquet file ...
   |       19s Saved
 1m 19s PD  in ~/text-fabric-data/github/ETCBC/bhsa/pandas/data-2021.pd

How to use the Pandas file¶

See pandas for a tutorial on how to work with the BHSA as a data frame.

We collect a few pieces of data that will come in handy.

Here is the the first verse node:

In [5]:

F.otype.s("verse")[0]

Out[5]:

MQL¶

The next journey is to MQL, a text-database format not unlike SQL, supported by the Emdros software.

EMDROS, written by Ulrik Petersen, is a text database system with the powerful topographic query language MQL. The ideas are based on a model devised by Christ-Jan Doedens in Text Databases: One Database Model and Several Retrieval Languages.

Text-Fabric's model of slots, nodes and edges is a fairly straightforward translation of the models of Christ-Jan Doedens and Ulrik Petersen.

SHEBANQ uses EMDROS to offer users to execute and save MQL queries against the Hebrew Text Database of the ETCBC.

So it is kind of logical and convenient to be able to work with a Text-Fabric resource through MQL.

If you have obtained an MQL dataset somehow, you can turn it into a text-fabric data set by importMQL(), which we will not show here.

And if you want to export a Text-Fabric data set to MQL, that is also possible.

After the Fabric(modules=...) call, you can call exportMQL() in order to save all features of the indicated modules into a big MQL dump, which can be imported by an EMDROS database.

In [6]:

A.exportMQL("mybhsa", exportDir="~/Downloads/mql")

  0.00s Checking features of dataset mybhsa

   |    4m 45s feature "book@am" => "book_am"
   |    4m 45s feature "book@ar" => "book_ar"
   |    4m 45s feature "book@bn" => "book_bn"
   |    4m 45s feature "book@da" => "book_da"
   |    4m 45s feature "book@de" => "book_de"
   |    4m 45s feature "book@el" => "book_el"
   |    4m 45s feature "book@en" => "book_en"
   |    4m 45s feature "book@es" => "book_es"
   |    4m 45s feature "book@fa" => "book_fa"
   |    4m 45s feature "book@fr" => "book_fr"
   |    4m 45s feature "book@he" => "book_he"
   |    4m 45s feature "book@hi" => "book_hi"
   |    4m 45s feature "book@id" => "book_id"
   |    4m 45s feature "book@ja" => "book_ja"
   |    4m 45s feature "book@ko" => "book_ko"
   |    4m 45s feature "book@la" => "book_la"
   |    4m 45s feature "book@nl" => "book_nl"
   |    4m 45s feature "book@pa" => "book_pa"
   |    4m 45s feature "book@pt" => "book_pt"
   |    4m 45s feature "book@ru" => "book_ru"
   |    4m 45s feature "book@sw" => "book_sw"
   |    4m 45s feature "book@syc" => "book_syc"
   |    4m 45s feature "book@tr" => "book_tr"
   |    4m 45s feature "book@ur" => "book_ur"
   |    4m 45s feature "book@yo" => "book_yo"
   |    4m 45s feature "book@zh" => "book_zh"
   |    4m 45s feature "omap@2017-2021" => "omap_2017_2021"
   |    4m 45s feature "omap@c-2021" => "omap_c_2021"

  0.02s 118 features to export to MQL ...
  0.02s Loading 118 features
   |     0.07s T crossrefLCS          from ~/text-fabric-data/github/ETCBC/parallels/tf/2021
   |     0.04s T crossrefSET          from ~/text-fabric-data/github/ETCBC/parallels/tf/2021
   |     1.20s T dist_unit            from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     3.18s T distributional_parent from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.77s T freq_occ             from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     3.98s T functional_parent    from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.77s T g_nme                from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.79s T g_nme_utf8           from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.71s T g_pfm                from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.73s T g_pfm_utf8           from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.71s T g_prs                from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.71s T g_prs_utf8           from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.68s T g_uvf                from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.68s T g_uvf_utf8           from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.72s T g_vbe                from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.70s T g_vbe_utf8           from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.70s T g_vbs                from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.68s T g_vbs_utf8           from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.18s T instruction          from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.18s T is_root              from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.17s T kind                 from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.68s T kq_hybrid            from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.69s T kq_hybrid_utf8       from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.84s T languageISO          from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.94s T lex0                 from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.76s T lexeme_count         from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.40s T mother_object_type   from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     6.39s T omap@2017-2021       from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     6.31s T omap@c-2021          from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.75s T rank_occ             from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.17s T root                 from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.83s T suffix_gender        from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.83s T suffix_number        from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
   |     0.82s T suffix_person        from ~/text-fabric-data/github/ETCBC/bhsa/tf/2021
    39s Writing enumerations
	book_am        :   39 values, 39 not a name, e.g. «መኃልየ_መኃልይ_ዘሰሎሞን»
	book_ar        :   39 values, 39 not a name, e.g. «1_اخبار»
	book_bn        :   39 values, 39 not a name, e.g. «আদিপুস্তক»
	book_da        :   39 values, 13 not a name, e.g. «1.Kongebog»
	book_de        :   39 values, 7 not a name, e.g. «1_Chronik»
	book_el        :   39 values, 39 not a name, e.g. «Άσμα_Ασμάτων»
	book_en        :   39 values, 6 not a name, e.g. «1_Chronicles»
	book_es        :   39 values, 22 not a name, e.g. «1_Crónicas»
	book_fa        :   39 values, 39 not a name, e.g. «استر»
	book_fr        :   39 values, 19 not a name, e.g. «1_Chroniques»
	book_he        :   39 values, 39 not a name, e.g. «איוב»
	book_hi        :   39 values, 39 not a name, e.g. «1_इतिहास»
	book_id        :   39 values, 7 not a name, e.g. «1_Raja-raja»
	book_ja        :   39 values, 39 not a name, e.g. «アモス書»
	book_ko        :   39 values, 39 not a name, e.g. «나훔»
	book_nl        :   39 values, 8 not a name, e.g. «1_Koningen»
	book_pa        :   39 values, 39 not a name, e.g. «1_ਇਤਹਾਸ»
	book_pt        :   39 values, 21 not a name, e.g. «1_Crônicas»
	book_ru        :   39 values, 39 not a name, e.g. «1-я_Паралипоменон»
	book_sw        :   39 values, 6 not a name, e.g. «1_Mambo_ya_Nyakati»
	book_syc       :   39 values, 39 not a name, e.g. «ܐ_ܒܪܝܡܝܢ»
	book_tr        :   39 values, 16 not a name, e.g. «1_Krallar»
	book_ur        :   39 values, 39 not a name, e.g. «احبار»
	book_yo        :   39 values, 8 not a name, e.g. «Amọsi»
	book_zh        :   38 values, 37 not a name, e.g. «以斯帖记»
	domain         :    4 values, 1 not a name, e.g. «?»
	g_nme          :  108 values, 108 not a name, e.g. «»
	g_nme_utf8     :  106 values, 106 not a name, e.g. «»
	g_pfm          :   87 values, 87 not a name, e.g. «»
	g_pfm_utf8     :   86 values, 86 not a name, e.g. «»
	g_prs          :  127 values, 127 not a name, e.g. «»
	g_prs_utf8     :  126 values, 126 not a name, e.g. «»
	g_uvf          :   19 values, 19 not a name, e.g. «»
	g_uvf_utf8     :   17 values, 17 not a name, e.g. «»
	g_vbe          :  101 values, 101 not a name, e.g. «»
	g_vbe_utf8     :   97 values, 97 not a name, e.g. «»
	g_vbs          :   66 values, 66 not a name, e.g. «»
	g_vbs_utf8     :   65 values, 65 not a name, e.g. «»
	instruction    :   35 values, 20 not a name, e.g. «.#»
	nametype       :   10 values, 5 not a name, e.g. «gens,topo»
	nme            :   20 values, 7 not a name, e.g. «»
	pfm            :   11 values, 4 not a name, e.g. «»
	phono_trailer  :    4 values, 4 not a name, e.g. «»
	prs            :   22 values, 4 not a name, e.g. «H=»
	qere_trailer   :    5 values, 5 not a name, e.g. «»
	qere_trailer_utf8:    5 values, 5 not a name, e.g. «»
	root           :  757 values, 212 not a name, e.g. «<Assyrian>»
	trailer        :   13 values, 13 not a name, e.g. «»
	trailer_utf8   :   13 values, 13 not a name, e.g. «»
	txt            :  136 values, 59 not a name, e.g. «?»
	uvf            :    6 values, 1 not a name, e.g. «>»
	vbe            :   19 values, 6 not a name, e.g. «»
	vbs            :   11 values, 3 not a name, e.g. «>»
   |     0.36s Writing an all-in-one enum with  232 values
    39s Mapping 118 features onto 13 object types
    42s Writing 118 features as data in 13 object types
   |     0.00s word data ...
   |      |     1.24s batch of size               49.9MB with   50000 of   50000 words
   |      |     2.49s batch of size               50.0MB with   50000 of  100000 words
   |      |     3.74s batch of size               50.2MB with   50000 of  150000 words
   |      |     4.99s batch of size               50.2MB with   50000 of  200000 words
   |      |     6.24s batch of size               50.4MB with   50000 of  250000 words
   |      |     7.50s batch of size               50.4MB with   50000 of  300000 words
   |      |     8.76s batch of size               50.5MB with   50000 of  350000 words
   |      |       10s batch of size               50.4MB with   50000 of  400000 words
   |      |       11s batch of size               26.8MB with   26590 of  426590 words
   |       11s word data: 426590 objects
   |     0.00s subphrase data ...
   |      |     0.18s batch of size                8.6MB with   50000 of   50000 subphrases
   |      |     0.35s batch of size                8.5MB with   50000 of  100000 subphrases
   |      |     0.40s batch of size                2.4MB with   13850 of  113850 subphrases
   |     0.40s subphrase data: 113850 objects
   |     0.00s phrase_atom data ...
   |      |     0.26s batch of size               12.0MB with   50000 of   50000 phrase_atoms
   |      |     0.51s batch of size               12.0MB with   50000 of  100000 phrase_atoms
   |      |     0.77s batch of size               12.2MB with   50000 of  150000 phrase_atoms
   |      |     1.03s batch of size               12.2MB with   50000 of  200000 phrase_atoms
   |      |     1.28s batch of size               12.1MB with   50000 of  250000 phrase_atoms
   |      |     1.37s batch of size                4.3MB with   17532 of  267532 phrase_atoms
   |     1.37s phrase_atom data: 267532 objects
   |     0.00s phrase data ...
   |      |     0.23s batch of size               10.9MB with   50000 of   50000 phrases
   |      |     0.45s batch of size               11.0MB with   50000 of  100000 phrases
   |      |     0.68s batch of size               11.0MB with   50000 of  150000 phrases
   |      |     0.92s batch of size               11.0MB with   50000 of  200000 phrases
   |      |     1.15s batch of size               11.0MB with   50000 of  250000 phrases
   |      |     1.16s batch of size              724.2KB with    3203 of  253203 phrases
   |     1.16s phrase data: 253203 objects
   |     0.00s clause_atom data ...
   |      |     0.32s batch of size               14.4MB with   50000 of   50000 clause_atoms
   |      |     0.59s batch of size               11.7MB with   40704 of   90704 clause_atoms
   |     0.59s clause_atom data: 90704 objects
   |     0.00s clause data ...
   |      |     0.28s batch of size               13.3MB with   50000 of   50000 clauses
   |      |     0.49s batch of size               10.2MB with   38131 of   88131 clauses
   |     0.49s clause data: 88131 objects
   |     0.00s sentence_atom data ...
   |      |     0.18s batch of size                7.8MB with   50000 of   50000 sentence_atoms
   |      |     0.23s batch of size                2.3MB with   14514 of   64514 sentence_atoms
   |     0.23s sentence_atom data: 64514 objects
   |     0.00s sentence data ...
   |      |     0.14s batch of size                6.3MB with   50000 of   50000 sentences
   |      |     0.18s batch of size                1.7MB with   13717 of   63717 sentences
   |     0.18s sentence data: 63717 objects
   |     0.00s half_verse data ...
   |      |     0.13s batch of size                5.5MB with   45179 of   45179 half_verses
   |     0.13s half_verse data: 45179 objects
   |     0.00s verse data ...
   |      |     0.12s batch of size                4.8MB with   23213 of   23213 verses
   |     0.12s verse data: 23213 objects
   |     0.00s lex data ...
   |      |     0.16s batch of size                5.5MB with    9230 of    9230 lexs
   |     0.16s lex data: 9230 objects
   |     0.00s chapter data ...
   |      |     0.02s batch of size              131.1KB with     929 of     929 chapters
   |     0.02s chapter data: 929 objects
   |     0.00s book data ...
   |      |     0.02s batch of size               29.2KB with      39 of      39 books
   |     0.02s book data: 39 objects
    57s MQL in ~/Downloads/mql
    57s Done

Now you have a file ~/Downloads/mql/mybhsa.mql of 530 MB. You can import it into an Emdros database by saying:

cd ~/Downloads/mql
rm mybhsa.mql
mql -b 3 < mybhsa.mql

The result is an SQLite3 database mybhsa in the same directory (168 MB). You can run a query against it by creating a text file test.mql with this contents:

select all objects where
[lex gloss ~ 'make'
    [word FOCUS]
]

And then say

mql -b 3 -d mybhsa test.mql

You will see raw query results: all word occurrences that belong to lexemes with make in their gloss.

It is not very pretty, and probably you should use a more visual Emdros tool to run those queries. You see a lot of node numbers, but the good thing is, you can look those node numbers up in Text-Fabric.

All steps¶

start your first step in mastering the bible computationally
display become an expert in creating pretty displays of your text structures
search turbo charge your hand-coding with search templates
export Excel make tailor-made spreadsheets out of your results
share draw in other people's data and let them use yours
export export your dataset as an Emdros database
annotate annotate plain text by means of other tools and import the annotations as TF features
map map somebody else's annotations to a new version of the corpus
volumes work with selected books only
trees work with the BHSA data as syntax trees

CC-BY Dirk Roorda