Tutorial

This notebook gets you started with using Text-Fabric for coding in the Quran.

Familiarity with the underlying data model is recommended.

Installing Text-Fabric

Python

You need to have Python on your system. Most systems have it out of the box, but alas, that is python2 and we need at least python 3.6.

Install it from python.org or from Anaconda.

TF itself

pip3 install text-fabric

Jupyter notebook

You need Jupyter.

If it is not already installed:

pip3 install jupyter

Tip

If you start computing with this tutorial, first copy its parent directory to somewhere else, outside your syrnt directory. If you pull changes from the syrnt repository later, your work will not be overwritten. Where you put your tutorial directory is up till you. It will work from any directory.

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
import os
import collections
In [3]:
from tf.app import use

Quran data

Text-Fabric will fetch a standard set of features for you from the newest github release binaries.

The data will be stored in the text-fabric-data in your home directory.

Load Features

The data of the corpus is organized in features. They are columns of data. Think of the text as a gigantic spreadsheet, where row 1 corresponds to the first word, row 2 to the second word, and so on, for all 100,000+ words.

The letters of each word is a column form in that spreadsheet.

The corpus contains ca. 30 columns, not only for the words, but also for textual objects, such as suras, ayas, and word groups.

Instead of putting that information in one big table, the data is organized in separate columns. We call those columns features.

For the very last version, use hot.

For the latest release, use latest.

If you have cloned the repos (TF app and data), use clone.

If you do not want/need to upgrade, leave out the checkout specifiers.

In [5]:
A = use("q-ran/quran", hoist=globals())
TF-app: ~/text-fabric-data/q-ran/quran/app
data: ~/text-fabric-data/q-ran/quran/tf/0.4
This is Text-Fabric 9.2.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

40 features found and 0 ignored
Text-Fabric: Text-Fabric API 9.2.3, q-ran/quran/app v3, Search Reference
Data: QURAN, Character table, Feature docs
Features:
Quran
a
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
transliterated text of word
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
ax
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
case of word
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
role of the word in its word group (prefix, main, or suffix)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
int
whether the word is definite
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
f
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
stem formation of verb
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
fx
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
gn
str
gender of word (masculine, feminine)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
kind of interjection
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
l
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
lemma of word
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
lx
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
mood of a verb (subj, jus, ...)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
n
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
Name of sura in Arabic
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
language:
arabic
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
Name of sura in English
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
language:
english
languageCode:
en
languageEnglish:
English
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
Name of sura in Arabic, transliterated
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
language:
arabic
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
Name of sura in Arabic, transcribed
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
language:
arabic
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
nu
str
number of word (singular, dual, plural)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:55Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
int
Number of sura, aya, word group, or word
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
int
ordinal number of sura
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
Quran: plain text plus morphological annotations at the word level
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
pos
str
part-of-speech of word, main class
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
documentation:
http://corpus.quran.com/documentation/tagset.jsp
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
part-of-speech of word, refined class
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
documentation:
http://corpus.quran.com/documentation/tagset.jsp
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
ps
str
person of word (1st, 2nd, 3rd)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
root of word
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
sp
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
material between this word and the next
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
tense of a verb (perfect, imperfect, ...)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
english translation of whole aya
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
translator:
Arthur Arberry (1955), https://en.wikipedia.org/wiki/Arthur_John_Arberry
writtenBy:
Text-Fabric
str
type of sura
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
unicode arabic text of word
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:56Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
str
voice of a verb (active, passive)
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:57Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
w
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:57Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
wx
str
not yet understood
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:57Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
none
Quran: plain text plus morphological annotations at the word level
acronym:
quran
convertedBy:
Dirk Roorda and Cornelis van Lit
createdBy:
Kais Dukes
createdDate:
2011
dateWritten:
2019-05-13T07:17:57Z
license1:
Open Source, unspecified, see http://corpus.quran.com/releasenotes.jsp
license2:
Creative Commons BY-ND 3.0 Unported
source1:
Morphology: Quranic Arabic Corpus 0.4 (2011) by Kais Dukes
source1Url:
http://corpus.quran.com
source2:
Text: Tanzil Quran Text (Uthmani, version 1.0.2)
source2Url:
http://tanzil.net/docs/home
writtenBy:
Text-Fabric
Text-Fabric API: names N F E L T S C TF directly usable

API

At this point it is helpful to throw a quick glance at the text-fabric API documentation (see the links under API Members above).

The most essential thing for now is that we can use F to access the data in the features we've loaded. But there is more, such as N, which helps us to walk over the text, as we see in a minute.

Counting

In order to get acquainted with the data, we start with the simple task of counting.

Count all nodes

We use the N.walk() generator to walk through the nodes.

We compared corpus to a gigantic spreadsheet, where the rows correspond to the words. In Text-Fabric, we call the rows slots, because they are the textual positions that can be filled with words.

We also mentioned that there are also more textual objects. They are the verses, chapters and books. They also correspond to rows in the big spreadsheet.

In Text-Fabric we call all these rows nodes, and the N() generator carries us through those nodes in the textual order.

Just one extra thing: the info statements generate timed messages. If you use them instead of print you'll get a sense of the amount of time that the various processing steps typically need.

In [6]:
A.indent(reset=True)
A.info("Counting nodes ...")

i = 0
for n in N.walk():
    i += 1

A.info("{} nodes".format(i))
  0.00s Counting nodes ...
  0.03s 218282 nodes

What are those nodes?

Every node has a type, like word, or aya, or sura. We know that we have approximately 100,000 words and a few other nodes. But what exactly are they?

Text-Fabric has two special features, otype and oslots, that must occur in every Text-Fabric data set. otype tells you for each node its type, and you can ask for the number of slots in the text.

Here we go!

In [7]:
F.otype.slotType
Out[7]:
'word'
In [8]:
F.otype.maxSlot
Out[8]:
128219
In [9]:
F.otype.maxNode
Out[9]:
218282
In [10]:
F.otype.all
Out[10]:
('manzil',
 'sajda',
 'juz',
 'sura',
 'hizb',
 'ruku',
 'page',
 'aya',
 'lex',
 'group',
 'word')
In [11]:
C.levels.data
Out[11]:
(('manzil', 18317.0, 216987, 216993),
 ('sajda', 6043.066666666667, 218154, 218168),
 ('juz', 4273.966666666666, 212125, 212154),
 ('sura', 1124.7280701754387, 218169, 218282),
 ('hizb', 534.2458333333333, 211885, 212124),
 ('ruku', 230.60971223021582, 217598, 218153),
 ('page', 212.28311258278146, 216994, 217597),
 ('aya', 20.56109685695959, 128220, 134455),
 ('lex', 15.440397350993377, 212155, 216986),
 ('group', 1.6559557788425525, 134456, 211884),
 ('word', 1, 1, 128219))

This is interesting: above you see all the textual objects, with the average size of their objects, the node where they start, and the node where they end.

Count individual object types

This is an intuitive way to count the number of nodes in each type. Note in passing, how we use the indent in conjunction with info to produce neat timed and indented progress messages.

In [12]:
A.indent(reset=True)
A.info("counting objects ...")

for otype in F.otype.all:
    i = 0
    A.indent(level=1, reset=True)

    for n in F.otype.s(otype):
        i += 1

    A.info("{:>7} {}s".format(i, otype))

A.indent(level=0)
A.info("Done")
  0.00s counting objects ...
   |     0.00s       7 manzils
   |     0.00s      15 sajdas
   |     0.00s      30 juzs
   |     0.00s     114 suras
   |     0.00s     240 hizbs
   |     0.00s     556 rukus
   |     0.00s     604 pages
   |     0.00s    6236 ayas
   |     0.00s    4832 lexs
   |     0.01s   77429 groups
   |     0.01s  128219 words
  0.03s Done

Viewing textual objects

We use the A API (the extra power) to peek into the corpus.

Let's inspect some words.

In [13]:
wordShow = (1000, 10000, 100000)
for word in wordShow:
    A.pretty(word)

Feature statistics

F gives access to all features. Every feature has a method freqList() to generate a frequency list of its values, higher frequencies first. Here are the parts of speech:

In [14]:
F.pos.freqList()
Out[14]:
(('pronoun', 29319),
 ('noun', 29049),
 ('verb', 19356),
 ('particle', 13511),
 ('preposition', 13006),
 ('conjunction', 10134),
 ('determiner', 8377),
 ('adjective', 1961),
 ('adverb', 1835),
 ('prefix', 1641),
 ('initials', 30))

Lexeme matters

Top 10 frequent verbs

If we count the frequency of words, we usually mean the frequency of their corresponding roots or lexemes.

Let's start with roots.

In [15]:
verbs = collections.Counter()
A.indent(reset=True)
A.info("Collecting data")

for w in F.otype.s("word"):
    if F.pos.v(w) != "verb":
        continue
    verbs[F.root.v(w)] += 1

A.info("Done")
print(
    "".join(
        "{}: {}\n".format(verb, cnt)
        for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
    )
)
  0.00s Collecting data
  0.05s Done
qwl: 1620
kwn: 1358
Amn: 558
Aty: 535
Elm: 425
jEl: 340
rAy: 315
kfr: 304
jyA: 278
Eml: 276

Now the same with lexemes. There are several methods for working with lexemes.

Method 1: counting words

In [16]:
verbs = collections.Counter()
A.indent(reset=True)
A.info("Collecting data")

for w in F.otype.s("word"):
    if F.pos.v(w) != "verb":
        continue
    verbs[F.lemma.v(w)] += 1

A.info("Done")
print(
    "".join(
        "{}: {}\n".format(verb, cnt)
        for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
    )
)
  0.00s Collecting data
  0.05s Done
qaAla: 1618
kaAna: 1358
'aAmana: 537
Ealima: 382
jaEala: 340
kafara: 289
jaA^'a: 278
Eamila: 276
A^taY: 271
ra'aA: 271

Lexeme distribution

Let's do a bit more fancy lexeme stuff.

Hapaxes

A hapax can be found by inspecting lexemes and see to how many word nodes they are linked. If that is number is one, we have a hapax.

We print 10 hapaxes with their gloss.

In [17]:
A.indent(reset=True)

hapax = []
lexIndex = collections.defaultdict(list)

for n in F.otype.s("word"):
    lexIndex[F.lemma.v(n)].append(n)

hapax = dict((lex, occs) for (lex, occs) in lexIndex.items() if len(occs) == 1)

A.info("{} hapaxes found".format(len(hapax)))

for h in sorted(hapax)[0:10]:
    print(f"\t{h}")
  0.05s 1994 hapaxes found
	$aAkilat
	$aAni}
	$aAriko
	$aAwiro
	$aTo_#
	$a`Ti}
	$a`mixa`t
	$a`xiSap
	$afatayon
	$agafa

If we want more info on the hapaxes, we get that by means of its node. The lexIndex dictionary stores the occurrences of a lexeme as a list of nodes.

Let's get the part of speech and the Arabic form of those 10 hapaxes.

In [18]:
for h in sorted(hapax)[0:10]:
    node = hapax[h][0]
    print(f"\t{F.pos.v(node):<12} {F.unicode.v(node)}")
	noun         شَاكِلَتِ
	noun         شَانِئَ
	verb         شَارِكْ
	verb         شَاوِرْ
	noun         شَطْـَٔ
	noun         شَٰطِئِ
	adjective    شَٰمِخَٰتٍ
	noun         شَٰخِصَةٌ
	noun         شَفَتَيْنِ
	verb         شَغَفَ

Small occurrence base

The occurrence base of a lexeme are the suras in which it occurs. Let's look for lexemes that occur in a single sura.

Oh yes, we have already found the hapaxes, we will skip them here.

In [19]:
A.indent(reset=True)
A.info("Finding single sura lexemes")

lexSuraIndex = {}

for (lex, occs) in lexIndex.items():
    lexSuraIndex[lex] = set(L.u(n, otype="sura")[0] for n in occs)

singleSura = [
    (lex, occs)
    for (lex, occs) in lexIndex.items()
    if len(lexSuraIndex.get(lex, [])) == 1
]
singleSuraWithoutHapax = [(lex, occs) for (lex, occs) in singleSura if len(occs) != 1]

A.info("{} single sura lexemes found".format(len(singleSura)))

for data in (singleSura, singleSuraWithoutHapax):
    print("=====================================")
    for (lex, occs) in sorted(data[0:10]):
        print(
            "{:<15} ({}x) first {:>5} last {:>5}".format(
                lex,
                len(occs),
                "{}:{}".format(*T.sectionFromNode(occs[0])),
                "{}:{}".format(*T.sectionFromNode(occs[-1])),
            )
        )
  0.00s Finding single sura lexemes
  0.61s 2228 single sura lexemes found
=====================================
>aZolama        (1x) first  2:20 last  2:20
Ha*ar           (2x) first  2:19 last 2:243
Say~ib          (1x) first  2:19 last  2:19
baEuwDap        (1x) first  2:26 last  2:26
magoDuwb        (1x) first   1:7 last   1:7
nuqad~isu       (1x) first  2:30 last  2:30
rabiHat         (1x) first  2:16 last  2:16
vamarap         (1x) first  2:25 last  2:25
yasofiku        (2x) first  2:30 last  2:84
{sotawoqada     (1x) first  2:17 last  2:17
=====================================
$aTor           (5x) first 2:144 last 2:150
Ha*ar           (2x) first  2:19 last 2:243
Hayov2          (2x) first 2:144 last 2:150
Hur~            (2x) first 2:178 last 2:178
Sibogap         (2x) first 2:138 last 2:138
baqarap         (4x) first  2:67 last  2:71
huwd2           (3x) first 2:111 last 2:140
taTaw~aEa       (2x) first 2:158 last 2:184
yasofiku        (2x) first  2:30 last  2:84
yataEal~amu     (2x) first 2:102 last 2:102

Confined to suras

As a final exercise with lexemes, lets make a list of all suras, and show their total number of lexemes and the number of lexemes that occur exclusively in that sura.

In [20]:
A.indent(reset=True)
A.info("Making sura-lexeme index")

allSura = collections.defaultdict(set)
allLex = set()

for s in F.otype.s("sura"):
    for w in L.d(s, "word"):
        ln = F.lemma.v(w)
        allSura[s].add(ln)
        allLex.add(ln)

A.info("Found {} lexemes".format(len(allLex)))
  0.00s Making sura-lexeme index
  0.08s Found 4833 lexemes
In [21]:
A.indent(reset=True)
A.info("Finding single sura lexemes")

lexSuraIndex = {}

for (lex, occs) in lexIndex.items():
    lexSuraIndex[lex] = set(L.u(n, otype="sura")[0] for n in occs)

singleSuraLex = collections.defaultdict(set)
for (lex, suras) in lexSuraIndex.items():
    if len(suras) == 1:
        singleSuraLex[list(suras)[0]].add(lex)

singleSura = {sura: len(lexs) for (sura, lexs) in singleSuraLex.items()}

A.info("found {} single sura lexemes".format(sum(singleSura.values())))
  0.00s Finding single sura lexemes
  0.60s found 2228 single sura lexemes
In [22]:
print(
    "{:<30} {:>4} {:>4} {:>4} {:>5}\n{}".format(
        "sura name",
        "sura",
        "#all",
        "#own",
        "%own",
        "-" * 51,
    )
)
suraList = []

for s in F.otype.s("sura"):
    suraName = Fs("[email protected]").v(s)
    sura = T.suraName(s)
    a = len(allSura[s])
    o = singleSura.get(s, 0)
    p = 100 * o / a
    suraList.append((suraName, sura, a, o, p))

for x in sorted(suraList, key=lambda e: (-e[4], -e[2], e[1])):
    print("{:<30} {:>4} {:>4} {:>4} {:>4.1f}%".format(*x))
sura name                      sura #all #own  %own
---------------------------------------------------
Abundance                       108    9    4 44.4%
Quraysh                         106   16    5 31.2%
The Dawn                        113   17    5 29.4%
The Chargers                    100   32    9 28.1%
Sincerity                       112    9    2 22.2%
The Traducer                    104   28    6 21.4%
The Palm Fibre                  111   21    4 19.0%
The Overwhelming                 88   69   13 18.8%
The Beneficent                   55  142   26 18.3%
The Overthrowing                 81   77   14 18.2%
The Morning Star                 86   44    8 18.2%
The Elephant                    105   22    4 18.2%
The Sun                          91   45    8 17.8%
Defrauding                       83   96   17 17.7%
The Inevitable                   56  206   36 17.5%
The City                         90   63   11 17.5%
The Calamity                    101   24    4 16.7%
Those who drag forth             79  127   21 16.5%
He frowned                       80  103   17 16.5%
The Resurrection                 75  104   17 16.3%
The Morning Hours                93   31    5 16.1%
The Dawn                         89   94   15 16.0%
The Emissaries                   77  108   17 15.7%
Mary                             19  360   52 14.4%
The Reality                      69  157   22 14.0%
The Repentance                    9  638   89 13.9%
The Cave                         18  552   71 12.9%
The Star                         53  188   24 12.8%
The Cow                           2 1137  145 12.8%
Joseph                           12  512   65 12.7%
Mankind                         114   16    2 12.5%
The Pen                          68  171   21 12.3%
The Cloaked One                  74  155   19 12.3%
The Moon                         54  188   22 11.7%
The Enshrouded One               73  129   14 10.9%
The Announcement                 78  122   13 10.7%
The Splitting Open               84   76    8 10.5%
Taa-Haa                          20  483   50 10.4%
The Light                        24  416   43 10.3%
Those drawn up in Ranks          37  360   37 10.3%
Noah                             71  128   13 10.2%
The Table                         5  685   69 10.1%
The letter Saad                  38  338   34 10.1%
Competition                     102   20    2 10.0%
The Night Journey                17  533   53  9.9%
The Clans                        33  454   43  9.5%
The Women                         4  810   75  9.3%
The Pilgrimage                   22  486   44  9.1%
The Fig                          95   34    3  8.8%
Muhammad                         47  239   21  8.8%
The letter Qaaf                  50  207   18  8.7%
The Jinn                         72  139   12  8.6%
The Prophets                     21  423   35  8.3%
The Clot                         96   50    4  8.0%
Sheba                            34  330   26  7.9%
The Cattle                        6  725   57  7.9%
The Family of Imraan              3  761   59  7.8%
The Spoils of War                 8  400   31  7.8%
The Ascending Stairways          70  142   11  7.7%
Man                              76  155   12  7.7%
The Winnowing Winds              51  194   15  7.7%
The Stories                      28  468   36  7.7%
The Mount                        52  182   14  7.7%
The Inner Apartments             49  160   12  7.5%
The Poets                        26  410   30  7.3%
Hud                              11  554   40  7.2%
The Declining Day, Epoch        103   14    1  7.1%
The Bee                          16  552   39  7.1%
The Exile                        59  213   15  7.0%
The Night                        92   57    4  7.0%
The Ant                          27  414   29  7.0%
The Rock                         15  289   20  6.9%
The Heights                       7  819   51  6.2%
The Criterion                    25  372   23  6.2%
The Pleading Woman               58  198   12  6.1%
Ornaments of gold                43  334   20  6.0%
The Victory                      48  253   15  5.9%
The Cleaving                     82   54    3  5.6%
Divorce                          65  148    8  5.4%
Abraham                          14  334   18  5.4%
The Iron                         57  249   13  5.2%
The Thunder                      13  348   18  5.2%
The Believers                    23  392   20  5.1%
The Evidence                     98   59    3  5.1%
The Most High                    87   61    3  4.9%
The Romans                       30  290   14  4.8%
Almsgiving                      107   21    1  4.8%
Yaseen                           36  298   14  4.7%
The Sovereignty                  67  171    8  4.7%
The Consolation                  94   22    1  4.5%
Luqman                           31  248   11  4.4%
The Smoke                        44  183    8  4.4%
The Power, Fate                  97   23    1  4.3%
The Originator                   35  335   14  4.2%
The Prohibition                  66  144    6  4.2%
The Opening                       1   24    1  4.2%
The Constellations               85   77    3  3.9%
Explained in detail              41  311   12  3.9%
The Forgiver                     40  398   15  3.8%
The Earthquake                   99   28    1  3.6%
The Groups                       39  393   14  3.6%
She that is to be examined       60  158    5  3.2%
The Hypocrites                   63  103    3  2.9%
Jonas                            10  486   13  2.7%
Consultation                     42  304    8  2.6%
The Dunes                        46  275    7  2.5%
Crouching                        45  201    5  2.5%
The Ranks                        61  124    3  2.4%
The Spider                       29  336    7  2.1%
The Prostration                  32  193    2  1.0%
Friday                           62  104    1  1.0%
Mutual Disillusion               64  138    1  0.7%
Divine Support                  110   19    0  0.0%
The Disbelievers                109    9    0  0.0%

For all section types

What we did for suras, we can also do for the other section types.

We generalize the task into a function, that accepts the kind of section as parameter. Then we can call that function for all our section types.

In [23]:
def lexBase(section):
    # make indices
    lexemesPerSection = {}
    sectionsPerLexeme = {}
    for s in F.otype.s(section):
        for w in L.d(s, otype="word"):
            lex = F.lemma.v(w)
            lexemesPerSection.setdefault(s, set()).add(lex)
            sectionsPerLexeme.setdefault(lex, set()).add(s)

    print(
        "{:<10} {:>4} {:>4} {:>5}\n{}".format(
            section,
            "#all",
            "#own",
            "%own",
            "-" * 26,
        )
    )
    sectionList = []

    for s in F.otype.s(section):
        n = F.number.v(s)
        myLexes = lexemesPerSection[s]
        a = len(myLexes)
        o = len([lex for lex in myLexes if len(sectionsPerLexeme[lex]) == 1])
        p = 100 * o / a
        sectionList.append((n, a, o, p))

    for x in sorted(sectionList, key=lambda e: (-e[3], -e[1], e[0])):
        print("{:<10} {:>4} {:>4} {:>4.1f}%".format(*x))
    print("=" * 26)
In [24]:
for section in (
    "manzil",
    #  'sajda',
    #  'juz',
    #  'ruku',
    #  'hizb',
    #  'page',
):
    lexBase(section)
manzil     #all #own  %own
--------------------------
7          2120  685 32.3%
4          1907  415 21.8%
1          1694  302 17.8%
2          1773  316 17.8%
5          1580  235 14.9%
3          1493  222 14.9%
6          1516  215 14.2%
==========================

Layer API

We travel upwards and downwards, forwards and backwards through the nodes. The Layer-API (L) provides functions: u() for going up, and d() for going down, n() for going to next nodes and p() for going to previous nodes.

These directions are indirect notions: nodes are just numbers, but by means of the oslots feature they are linked to slots. One node contains an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to. And one if next or previous to an other, if its slots follow of precede the slots of the other one.

L.u(node) Up is going to nodes that embed node.

L.d(node) Down is the opposite direction, to those that are contained in node.

L.n(node) Next are the next adjacent nodes, i.e. nodes whose first slot comes immediately after the last slot of node.

L.p(node) Previous are the previous adjacent nodes, i.e. nodes whose last slot comes immediately before the first slot of node.

All these functions yield nodes of all possible otypes. By passing an optional parameter, you can restrict the results to nodes of that type.

The result are ordered according to the order of things in the text.

The functions return always a tuple, even if there is just one node in the result.

Going up

We go from the first word to the book it contains. Note the [0] at the end. You expect one book, yet L returns a tuple. To get the only element of that tuple, you need to do that [0].

If you are like me, you keep forgetting it, and that will lead to weird error messages later on.

In [25]:
firstSura = L.u(1, otype="sura")[0]
print(firstSura)
218169

And let's see all the containing objects of word 3:

In [26]:
w = 3
for otype in F.otype.all:
    if otype == F.otype.slotType:
        continue
    up = L.u(w, otype=otype)
    upNode = "x" if len(up) == 0 else up[0]
    print("word {} is contained in {} {}".format(w, otype, upNode))
word 3 is contained in manzil 216987
word 3 is contained in sajda x
word 3 is contained in juz 212125
word 3 is contained in sura 218169
word 3 is contained in hizb 211885
word 3 is contained in ruku 217598
word 3 is contained in page 216994
word 3 is contained in aya 128220
word 3 is contained in lex 212156
word 3 is contained in group 134457

Going next

Let's go to the next nodes of the first book.

In [27]:
afterFirstSura = L.n(firstSura)
for n in afterFirstSura:
    print(
        "{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
            n,
            F.otype.v(n),
            E.oslots.s(n)[0],
            E.oslots.s(n)[-1],
        )
    )
secondSura = L.n(firstSura, otype="sura")[0]
     49: word          first slot=49    , last slot=49    
 134485: group         first slot=49    , last slot=49    
 128227: aya           first slot=49    , last slot=49    
 216995: page          first slot=49    , last slot=112   
 217599: ruku          first slot=49    , last slot=149   
 218170: sura          first slot=49    , last slot=10291 

Going previous

And let's see what is right before the second book.

In [28]:
for n in L.p(secondSura):
    print(
        "{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
            n,
            F.otype.v(n),
            E.oslots.s(n)[0],
            E.oslots.s(n)[-1],
        )
    )
 218169: sura          first slot=1     , last slot=48    
 217598: ruku          first slot=1     , last slot=48    
 216994: page          first slot=1     , last slot=48    
 128226: aya           first slot=34    , last slot=48    
 134484: group         first slot=47    , last slot=48    
     48: word          first slot=48    , last slot=48    

Going down

We go to the chapters of the second book, and just count them.

In [29]:
ayas = L.d(secondSura, otype="aya")
print(len(ayas))
286

The first aya

We pick the first aya and the first word, and explore what is above and below them.

In [30]:
for n in [1, L.u(1, otype="aya")[0]]:
    A.indent(level=0)
    A.info("Node {}".format(n), tm=False)
    A.indent(level=1)
    A.info("UP", tm=False)
    A.indent(level=2)
    A.info("\n".join(["{:<15} {}".format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)
    A.indent(level=1)
    A.info("DOWN", tm=False)
    A.indent(level=2)
    A.info("\n".join(["{:<15} {}".format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)
A.indent(level=0)
A.info("Done", tm=False)
Node 1
   |   UP
   |      |   134456          group
   |      |   128220          aya
   |      |   216994          page
   |      |   217598          ruku
   |      |   218169          sura
   |      |   211885          hizb
   |      |   212125          juz
   |      |   216987          manzil
   |   DOWN
   |      |   
Node 128220
   |   UP
   |      |   216994          page
   |      |   217598          ruku
   |      |   218169          sura
   |      |   211885          hizb
   |      |   212125          juz
   |      |   216987          manzil
   |   DOWN
   |      |   134456          group
   |      |   1               word
   |      |   2               word
   |      |   134457          group
   |      |   3               word
   |      |   134458          group
   |      |   4               word
   |      |   5               word
   |      |   134459          group
   |      |   6               word
   |      |   7               word
Done

Text API

So far, we have mainly seen nodes and their numbers, and the names of node types. You would almost forget that we are dealing with text. So let's try to see some text.

In the same way as F gives access to feature data, T gives access to the text. That is also feature data, but you can tell Text-Fabric which features are specifically carrying the text, and in return Text-Fabric offers you a Text API: T.

Formats

Arabic text can be represented in a number of ways:

  • in transliteration, or in Arabic characters,
  • showing the actual text or only the lexemes, or roots.

If you wonder where the information about text formats is stored: not in the program text-fabric, but in the data set. It has a feature otext, which specifies the formats and which features must be used to produce them. otext is the third special feature in a TF data set, next to otype and oslots. It is an optional feature. If it is absent, there will be no T API.

Here is a list of all available formats in this data set.

In [31]:
sorted(T.formats)
Out[31]:
['lex-trans-full', 'root-trans-full', 'text-orig-full', 'text-trans-full']

Using the formats

We can pretty display in other formats:

In [32]:
for word in wordShow:
    A.pretty(word, fmt="text-trans-full")

Now let's use those formats to print out the first aya of the Quran.

In [33]:
a1 = F.otype.s("aya")[0]

for fmt in sorted(T.formats):
    print("{}:\n\t{}".format(fmt, T.text(a1, fmt=fmt, descend=True)))
lex-trans-full:
	{som {ll~ah r~aHoma`n r~aHiym
root-trans-full:
	smw Alh rHm rHm
text-orig-full:
	بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
text-trans-full:
	bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi

If we do not specify a format, the default format is used (text-orig-full).

In [34]:
print(T.text(a1))
بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ

Whole text in all formats in about a second

Part of the pleasure of working with computers is that they can crunch massive amounts of data. The text of the Quran Bible is a piece of cake.

It takes less than a second to have that cake and eat it. In nearly a handful formats.

In [35]:
A.indent(reset=True)
A.info("writing plain text of whole Quran in all formats")

text = collections.defaultdict(list)

for a in F.otype.s("aya"):
    words = L.d(a, "word")
    for fmt in sorted(T.formats):
        text[fmt].append(T.text(words, fmt=fmt))

A.info("done {} formats".format(len(text)))

for fmt in sorted(text):
    print("{}\n{}\n".format(fmt, "\n".join(text[fmt][0:5])))
  0.00s writing plain text of whole Quran in all formats
  0.90s done 4 formats
lex-trans-full
{som {ll~ah r~aHoma`n r~aHiym
Hamod {ll~ah rab~ Ea`lamiyn
r~aHoma`n r~aHiym
ma`lik yawom diyn
<iy~aA Eabada <iy~aA {sotaEiynu

root-trans-full
smw Alh rHm rHm
Hmd Alh rbb Elm
rHm rHm
mlk ywm dyn
 Ebd  Ewn

text-orig-full
بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ
ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
مَٰلِكِ يَوْمِ ٱلدِّينِ
إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ

text-trans-full
bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi
{loHamodu lil~ahi rab~i {loEa`lamiyna
{lr~aHoma`ni {lr~aHiymi
ma`liki yawomi {ld~iyni
<iy~aAka naEobudu wa<iy~aAka nasotaEiynu

The full plain text

We write a few formats to file, in your Downloads folder.

In [36]:
orig = "text-orig-full"
trans = "text-trans-full"
for fmt in (orig, trans):
    with open(os.path.expanduser(f"~/Downloads/Quran-{fmt}.txt"), "w") as f:
        f.write("\n".join(text[fmt]))
In [37]:
!head -n 20 ~/Downloads/Quran-{orig}.txt
بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ
ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
مَٰلِكِ يَوْمِ ٱلدِّينِ
إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ
ٱهْدِنَا ٱلصِّرَٰطَ ٱلْمُسْتَقِيمَ
صِرَٰطَ ٱلَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ ٱلْمَغْضُوبِ عَلَيْهِمْ وَلَا ٱلضَّآلِّينَ
الٓمٓ
ذَٰلِكَ ٱلْكِتَٰبُ لَا رَيْبَ فِيهِ هُدًى لِّلْمُتَّقِينَ
ٱلَّذِينَ يُؤْمِنُونَ بِٱلْغَيْبِ وَيُقِيمُونَ ٱلصَّلَوٰةَ وَمِمَّا رَزَقْنَٰهُمْ يُنفِقُونَ
وَٱلَّذِينَ يُؤْمِنُونَ بِمَآ أُنزِلَ إِلَيْكَ وَمَآ أُنزِلَ مِن قَبْلِكَ وَبِٱلْءَاخِرَةِ هُمْ يُوقِنُونَ
أُو۟لَٰٓئِكَ عَلَىٰ هُدًى مِّن رَّبِّهِمْ وَأُو۟لَٰٓئِكَ هُمُ ٱلْمُفْلِحُونَ
إِنَّ ٱلَّذِينَ كَفَرُوا۟ سَوَآءٌ عَلَيْهِمْ ءَأَنذَرْتَهُمْ أَمْ لَمْ تُنذِرْهُمْ لَا يُؤْمِنُونَ
خَتَمَ ٱللَّهُ عَلَىٰ قُلُوبِهِمْ وَعَلَىٰ سَمْعِهِمْ وَعَلَىٰٓ أَبْصَٰرِهِمْ غِشَٰوَةٌ وَلَهُمْ عَذَابٌ عَظِيمٌ
وَمِنَ ٱلنَّاسِ مَن يَقُولُ ءَامَنَّا بِٱللَّهِ وَبِٱلْيَوْمِ ٱلْءَاخِرِ وَمَا هُم بِمُؤْمِنِينَ
يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْدَعُونَ إِلَّآ أَنفُسَهُمْ وَمَا يَشْعُرُونَ
فِى قُلُوبِهِم مَّرَضٌ فَزَادَهُمُ ٱللَّهُ مَرَضًا وَلَهُمْ عَذَابٌ أَلِيمٌۢ بِمَا كَانُوا۟ يَكْذِبُونَ
وَإِذَا قِيلَ لَهُمْ لَا تُفْسِدُوا۟ فِى ٱلْأَرْضِ قَالُوٓا۟ إِنَّمَا نَحْنُ مُصْلِحُونَ
أَلَآ إِنَّهُمْ هُمُ ٱلْمُفْسِدُونَ وَلَٰكِن لَّا يَشْعُرُونَ
وَإِذَا قِيلَ لَهُمْ ءَامِنُوا۟ كَمَآ ءَامَنَ ٱلنَّاسُ قَالُوٓا۟ أَنُؤْمِنُ كَمَآ ءَامَنَ ٱلسُّفَهَآءُ أَلَآ إِنَّهُمْ هُمُ ٱلسُّفَهَآءُ وَلَٰكِن لَّا يَعْلَمُونَ
In [38]:
!head -n 20 ~/Downloads/Quran-{trans}.txt
bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi
{loHamodu lil~ahi rab~i {loEa`lamiyna
{lr~aHoma`ni {lr~aHiymi
ma`liki yawomi {ld~iyni
<iy~aAka naEobudu wa<iy~aAka nasotaEiynu
{hodinaA {lS~ira`Ta {lomusotaqiyma
Sira`Ta {l~a*iyna >anoEamota Ealayohimo gayori {lomagoDuwbi Ealayohimo walaA {lD~aA^l~iyna
Al^m^
*a`lika {lokita`bu laA rayoba fiyhi hudFY l~ilomut~aqiyna
{l~a*iyna yu&ominuwna bi{logayobi wayuqiymuwna {lS~alaw`pa wamim~aA razaqona`humo yunfiquwna
wa{l~a*iyna yu&ominuwna bimaA^ >unzila <ilayoka wamaA^ >unzila min qabolika wabi{lo'aAxirapi humo yuwqinuwna
>[email protected]`^}ika EalaY` hudFY m~in r~ab~ihimo wa>[email protected]`^}ika humu {lomufoliHuwna
<in~a {l~a*iyna [email protected] sawaA^'N Ealayohimo 'a>an*arotahumo >amo lamo tun*irohumo laA yu&ominuwna
xatama {ll~ahu EalaY` quluwbihimo waEalaY` samoEihimo waEalaY`^ >aboSa`rihimo gi$a`wapN walahumo Ea*aAbN EaZiymN
wamina {ln~aAsi man yaquwlu 'aAman~aA bi{ll~ahi wabi{loyawomi {lo'aAxiri wamaA hum bimu&ominiyna
yuxa`diEuwna {ll~aha wa{l~a*iyna '[email protected] wamaA yaxodaEuwna <il~aA^ >anfusahumo wamaA ya$oEuruwna
fiY quluwbihim m~araDN fazaAdahumu {ll~ahu maraDFA walahumo Ea*aAbN >aliymN[ bimaA [email protected] yako*ibuwna
wa<i*aA qiyla lahumo laA [email protected] fiY {lo>aroDi qaAluw^[email protected] <in~amaA naHonu muSoliHuwna
>alaA^ <in~ahumo humu {lomufosiduwna wala`kin l~aA ya$oEuruwna
wa<i*aA qiyla lahumo '[email protected] kamaA^ 'aAmana {ln~aAsu qaAluw^[email protected] >anu&ominu kamaA^ 'aAmana {ls~ufahaA^'u >alaA^ <in~ahumo humu {ls~ufahaA^'u wala`kin l~aA yaEolamuwna

Sections

A section is a sura, and an aya. Knowledge of sections is not baked into Text-Fabric. The config feature otext.tf may specify two or three section levels, and tell what the corresponding node types and features are.

From that knowledge it can construct mappings from nodes to sections, e.g. from aya nodes to tuples of the form:

(sura number, aya number)

Here are examples of getting the section that corresponds to a node and vice versa.

NB: sectionFromNode always delivers a verse specification, either from the first slot belonging to that node, or, if lastSlot, from the last slot belonging to that node.

In [39]:
for x in (
    ("sura, aya of first word", T.sectionFromNode(1)),
    ("node of 1:1", T.nodeFromSection((1, 1))),
    ("node of 2:1", T.nodeFromSection((2, 1))),
    ("node of sura 1", T.nodeFromSection((1,))),
    ("section of sura node", T.sectionFromNode(211890)),
    ("section of aya node", T.sectionFromNode(210000)),
    ("section of juz node", T.sectionFromNode(216850)),
    ("idem, now last word", T.sectionFromNode(216850, lastSlot=True)),
):
    print("{:<30} {}".format(*x))
sura, aya of first word        (1, 1)
node of 1:1                    128220
node of 2:1                    128227
node of sura 1                 218169
section of sura node           (2, 92)
section of aya node            (80, 23)
section of juz node            (85, 2)
idem, now last word            (85, 2)

The other sectional units in the quran, manzil, sajda, juz, ruku, hizb, page are not associated with special Text-Fabric functions in this data set, although we could have chosen to use two or three of them instead of sura and aya.

But, TF also offers the possibility to define your own sections, independent from and more flexible than the sections defined above.

For a bit more on sections, consult the sections recipe in the cookbook.

Translations

This data source contains English (by Arberry) and Dutch (by Leemhuis) translations of the Quran. They are stored in the features [email protected] and [email protected] for aya nodes.

Let's get the translations of sura 107, together with the arabic original.

The translation features are not loaded by default, we load them first.

In [40]:
TF.load("[email protected] [email protected]", add=True)

sura = 107

suraNode = T.suraNode(sura)
print(F.name.v(suraNode))

for ayaNode in L.d(suraNode, otype="aya"):
    print(f"{F.number.v(ayaNode)}")
    print(T.text(ayaNode))
    print(Fs("[email protected]").v(ayaNode))
    print(Fs("[email protected]").v(ayaNode))
  0.00s All additional features loaded - for details use TF.isLoaded()
الماعون
1
أَرَءَيْتَ ٱلَّذِى يُكَذِّبُ بِٱلدِّينِ
Hast thou seen him who cries lies to the Doom?
Heb jij hem gezien die de godsdienst loochent?
2
فَذَٰلِكَ ٱلَّذِى يَدُعُّ ٱلْيَتِيمَ
That is he who repulses the orphan
Dat is hij die de wees wegduwt
3
وَلَا يَحُضُّ عَلَىٰ طَعَامِ ٱلْمِسْكِينِ
and urges not the feeding of the needy.
en die er niet op aandringt de behoeftige voedsel te geven.
4
فَوَيْلٌ لِّلْمُصَلِّينَ
So woe to those that pray
En wee hen die de salaat bidden
5
ٱلَّذِينَ هُمْ عَن صَلَاتِهِمْ سَاهُونَ
and are heedless of their prayers,
die hun salaat veronachtzamen,
6
ٱلَّذِينَ هُمْ يُرَآءُونَ
to those who make display
die vertoon willen maken
7
وَيَمْنَعُونَ ٱلْمَاعُونَ
and refuse charity.
en die de hulpverlening weigeren.

Next steps

  • display become an expert in creating pretty displays of your text structures
  • search turbo charge your hand-coding with search templates
  • exportExcel make tailor-made spreadsheets out of your results
  • share draw in other people's data and let them use yours
  • similarAyas spot the similarities between lines
  • rings ring structures in sura 2

CC-BY Dirk Roorda