This notebook gets you started with using Text-Fabric for coding in the Quran.
Familiarity with the underlying data model is recommended.
You need to have Python on your system. Most systems have it out of the box, but alas, that is python2 and we need at least python 3.6.
Install it from python.org or from Anaconda.
pip3 install text-fabric
You need Jupyter.
If it is not already installed:
pip3 install jupyter
If you start computing with this tutorial, first copy its parent directory to somewhere else,
outside your syrnt
directory.
If you pull changes from the syrnt
repository later, your work will not be overwritten.
Where you put your tutorial directory is up till you.
It will work from any directory.
%load_ext autoreload
%autoreload 2
import os
import collections
from tf.app import use
Text-Fabric will fetch a standard set of features for you from the newest GitHub release binaries.
The data will be stored in the text-fabric-data
in your home directory.
The data of the corpus is organized in features. They are columns of data. Think of the text as a gigantic spreadsheet, where row 1 corresponds to the first word, row 2 to the second word, and so on, for all 100,000+ words.
The letters of each word is a column form
in that spreadsheet.
The corpus contains ca. 30 columns, not only for the words, but also for textual objects, such as suras, ayas, and word groups.
Instead of putting that information in one big table, the data is organized in separate columns. We call those columns features.
For the very last version, use hot
.
For the latest release, use latest
.
If you have cloned the repos (TF app and data), use clone
.
If you do not want/need to upgrade, leave out the checkout specifiers.
A = use("q-ran/quran", hoist=globals())
This is Text-Fabric 9.2.3 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 40 features found and 0 ignored
At this point it is helpful to throw a quick glance at the text-fabric API documentation (see the links under API Members above).
The most essential thing for now is that we can use F
to access the data in the features
we've loaded.
But there is more, such as N
, which helps us to walk over the text, as we see in a minute.
In order to get acquainted with the data, we start with the simple task of counting.
We use the
N.walk()
generator
to walk through the nodes.
We compared corpus to a gigantic spreadsheet, where the rows correspond to the words.
In Text-Fabric, we call the rows slots
, because they are the textual positions that can be filled with words.
We also mentioned that there are also more textual objects. They are the verses, chapters and books. They also correspond to rows in the big spreadsheet.
In Text-Fabric we call all these rows nodes, and the N()
generator
carries us through those nodes in the textual order.
Just one extra thing: the info
statements generate timed messages.
If you use them instead of print
you'll get a sense of the amount of time that
the various processing steps typically need.
A.indent(reset=True)
A.info("Counting nodes ...")
i = 0
for n in N.walk():
i += 1
A.info("{} nodes".format(i))
0.00s Counting nodes ... 0.03s 218282 nodes
Every node has a type, like word, or aya, or sura. We know that we have approximately 100,000 words and a few other nodes. But what exactly are they?
Text-Fabric has two special features, otype
and oslots
, that must occur in every Text-Fabric data set.
otype
tells you for each node its type, and you can ask for the number of slot
s in the text.
Here we go!
F.otype.slotType
'word'
F.otype.maxSlot
128219
F.otype.maxNode
218282
F.otype.all
('manzil', 'sajda', 'juz', 'sura', 'hizb', 'ruku', 'page', 'aya', 'lex', 'group', 'word')
C.levels.data
(('manzil', 18317.0, 216987, 216993), ('sajda', 6043.066666666667, 218154, 218168), ('juz', 4273.966666666666, 212125, 212154), ('sura', 1124.7280701754387, 218169, 218282), ('hizb', 534.2458333333333, 211885, 212124), ('ruku', 230.60971223021582, 217598, 218153), ('page', 212.28311258278146, 216994, 217597), ('aya', 20.56109685695959, 128220, 134455), ('lex', 15.440397350993377, 212155, 216986), ('group', 1.6559557788425525, 134456, 211884), ('word', 1, 1, 128219))
This is interesting: above you see all the textual objects, with the average size of their objects, the node where they start, and the node where they end.
This is an intuitive way to count the number of nodes in each type.
Note in passing, how we use the indent
in conjunction with info
to produce neat timed
and indented progress messages.
A.indent(reset=True)
A.info("counting objects ...")
for otype in F.otype.all:
i = 0
A.indent(level=1, reset=True)
for n in F.otype.s(otype):
i += 1
A.info("{:>7} {}s".format(i, otype))
A.indent(level=0)
A.info("Done")
0.00s counting objects ... | 0.00s 7 manzils | 0.00s 15 sajdas | 0.00s 30 juzs | 0.00s 114 suras | 0.00s 240 hizbs | 0.00s 556 rukus | 0.00s 604 pages | 0.00s 6236 ayas | 0.00s 4832 lexs | 0.01s 77429 groups | 0.01s 128219 words 0.03s Done
We use the A API (the extra power) to peek into the corpus.
Let's inspect some words.
wordShow = (1000, 10000, 100000)
for word in wordShow:
A.pretty(word)
F
gives access to all features.
Every feature has a method
freqList()
to generate a frequency list of its values, higher frequencies first.
Here are the parts of speech:
F.pos.freqList()
(('pronoun', 29319), ('noun', 29049), ('verb', 19356), ('particle', 13511), ('preposition', 13006), ('conjunction', 10134), ('determiner', 8377), ('adjective', 1961), ('adverb', 1835), ('prefix', 1641), ('initials', 30))
verbs = collections.Counter()
A.indent(reset=True)
A.info("Collecting data")
for w in F.otype.s("word"):
if F.pos.v(w) != "verb":
continue
verbs[F.root.v(w)] += 1
A.info("Done")
print(
"".join(
"{}: {}\n".format(verb, cnt)
for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
)
)
0.00s Collecting data 0.05s Done qwl: 1620 kwn: 1358 Amn: 558 Aty: 535 Elm: 425 jEl: 340 rAy: 315 kfr: 304 jyA: 278 Eml: 276
Now the same with lexemes. There are several methods for working with lexemes.
verbs = collections.Counter()
A.indent(reset=True)
A.info("Collecting data")
for w in F.otype.s("word"):
if F.pos.v(w) != "verb":
continue
verbs[F.lemma.v(w)] += 1
A.info("Done")
print(
"".join(
"{}: {}\n".format(verb, cnt)
for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
)
)
0.00s Collecting data 0.05s Done qaAla: 1618 kaAna: 1358 'aAmana: 537 Ealima: 382 jaEala: 340 kafara: 289 jaA^'a: 278 Eamila: 276 A^taY: 271 ra'aA: 271
A.indent(reset=True)
hapax = []
lexIndex = collections.defaultdict(list)
for n in F.otype.s("word"):
lexIndex[F.lemma.v(n)].append(n)
hapax = dict((lex, occs) for (lex, occs) in lexIndex.items() if len(occs) == 1)
A.info("{} hapaxes found".format(len(hapax)))
for h in sorted(hapax)[0:10]:
print(f"\t{h}")
0.05s 1994 hapaxes found $aAkilat $aAni} $aAriko $aAwiro $aTo_# $a`Ti} $a`mixa`t $a`xiSap $afatayon $agafa
If we want more info on the hapaxes, we get that by means of its node.
The lexIndex
dictionary stores the occurrences of a lexeme as a list of nodes.
Let's get the part of speech and the Arabic form of those 10 hapaxes.
for h in sorted(hapax)[0:10]:
node = hapax[h][0]
print(f"\t{F.pos.v(node):<12} {F.unicode.v(node)}")
noun شَاكِلَتِ noun شَانِئَ verb شَارِكْ verb شَاوِرْ noun شَطْـَٔ noun شَٰطِئِ adjective شَٰمِخَٰتٍ noun شَٰخِصَةٌ noun شَفَتَيْنِ verb شَغَفَ
The occurrence base of a lexeme are the suras in which it occurs. Let's look for lexemes that occur in a single sura.
Oh yes, we have already found the hapaxes, we will skip them here.
A.indent(reset=True)
A.info("Finding single sura lexemes")
lexSuraIndex = {}
for (lex, occs) in lexIndex.items():
lexSuraIndex[lex] = set(L.u(n, otype="sura")[0] for n in occs)
singleSura = [
(lex, occs)
for (lex, occs) in lexIndex.items()
if len(lexSuraIndex.get(lex, [])) == 1
]
singleSuraWithoutHapax = [(lex, occs) for (lex, occs) in singleSura if len(occs) != 1]
A.info("{} single sura lexemes found".format(len(singleSura)))
for data in (singleSura, singleSuraWithoutHapax):
print("=====================================")
for (lex, occs) in sorted(data[0:10]):
print(
"{:<15} ({}x) first {:>5} last {:>5}".format(
lex,
len(occs),
"{}:{}".format(*T.sectionFromNode(occs[0])),
"{}:{}".format(*T.sectionFromNode(occs[-1])),
)
)
0.00s Finding single sura lexemes 0.61s 2228 single sura lexemes found ===================================== >aZolama (1x) first 2:20 last 2:20 Ha*ar (2x) first 2:19 last 2:243 Say~ib (1x) first 2:19 last 2:19 baEuwDap (1x) first 2:26 last 2:26 magoDuwb (1x) first 1:7 last 1:7 nuqad~isu (1x) first 2:30 last 2:30 rabiHat (1x) first 2:16 last 2:16 vamarap (1x) first 2:25 last 2:25 yasofiku (2x) first 2:30 last 2:84 {sotawoqada (1x) first 2:17 last 2:17 ===================================== $aTor (5x) first 2:144 last 2:150 Ha*ar (2x) first 2:19 last 2:243 Hayov2 (2x) first 2:144 last 2:150 Hur~ (2x) first 2:178 last 2:178 Sibogap (2x) first 2:138 last 2:138 baqarap (4x) first 2:67 last 2:71 huwd2 (3x) first 2:111 last 2:140 taTaw~aEa (2x) first 2:158 last 2:184 yasofiku (2x) first 2:30 last 2:84 yataEal~amu (2x) first 2:102 last 2:102
As a final exercise with lexemes, lets make a list of all suras, and show their total number of lexemes and the number of lexemes that occur exclusively in that sura.
A.indent(reset=True)
A.info("Making sura-lexeme index")
allSura = collections.defaultdict(set)
allLex = set()
for s in F.otype.s("sura"):
for w in L.d(s, "word"):
ln = F.lemma.v(w)
allSura[s].add(ln)
allLex.add(ln)
A.info("Found {} lexemes".format(len(allLex)))
0.00s Making sura-lexeme index 0.08s Found 4833 lexemes
A.indent(reset=True)
A.info("Finding single sura lexemes")
lexSuraIndex = {}
for (lex, occs) in lexIndex.items():
lexSuraIndex[lex] = set(L.u(n, otype="sura")[0] for n in occs)
singleSuraLex = collections.defaultdict(set)
for (lex, suras) in lexSuraIndex.items():
if len(suras) == 1:
singleSuraLex[list(suras)[0]].add(lex)
singleSura = {sura: len(lexs) for (sura, lexs) in singleSuraLex.items()}
A.info("found {} single sura lexemes".format(sum(singleSura.values())))
0.00s Finding single sura lexemes 0.60s found 2228 single sura lexemes
print(
"{:<30} {:>4} {:>4} {:>4} {:>5}\n{}".format(
"sura name",
"sura",
"#all",
"#own",
"%own",
"-" * 51,
)
)
suraList = []
for s in F.otype.s("sura"):
suraName = Fs("name@en").v(s)
sura = T.suraName(s)
a = len(allSura[s])
o = singleSura.get(s, 0)
p = 100 * o / a
suraList.append((suraName, sura, a, o, p))
for x in sorted(suraList, key=lambda e: (-e[4], -e[2], e[1])):
print("{:<30} {:>4} {:>4} {:>4} {:>4.1f}%".format(*x))
sura name sura #all #own %own --------------------------------------------------- Abundance 108 9 4 44.4% Quraysh 106 16 5 31.2% The Dawn 113 17 5 29.4% The Chargers 100 32 9 28.1% Sincerity 112 9 2 22.2% The Traducer 104 28 6 21.4% The Palm Fibre 111 21 4 19.0% The Overwhelming 88 69 13 18.8% The Beneficent 55 142 26 18.3% The Overthrowing 81 77 14 18.2% The Morning Star 86 44 8 18.2% The Elephant 105 22 4 18.2% The Sun 91 45 8 17.8% Defrauding 83 96 17 17.7% The Inevitable 56 206 36 17.5% The City 90 63 11 17.5% The Calamity 101 24 4 16.7% Those who drag forth 79 127 21 16.5% He frowned 80 103 17 16.5% The Resurrection 75 104 17 16.3% The Morning Hours 93 31 5 16.1% The Dawn 89 94 15 16.0% The Emissaries 77 108 17 15.7% Mary 19 360 52 14.4% The Reality 69 157 22 14.0% The Repentance 9 638 89 13.9% The Cave 18 552 71 12.9% The Star 53 188 24 12.8% The Cow 2 1137 145 12.8% Joseph 12 512 65 12.7% Mankind 114 16 2 12.5% The Pen 68 171 21 12.3% The Cloaked One 74 155 19 12.3% The Moon 54 188 22 11.7% The Enshrouded One 73 129 14 10.9% The Announcement 78 122 13 10.7% The Splitting Open 84 76 8 10.5% Taa-Haa 20 483 50 10.4% The Light 24 416 43 10.3% Those drawn up in Ranks 37 360 37 10.3% Noah 71 128 13 10.2% The Table 5 685 69 10.1% The letter Saad 38 338 34 10.1% Competition 102 20 2 10.0% The Night Journey 17 533 53 9.9% The Clans 33 454 43 9.5% The Women 4 810 75 9.3% The Pilgrimage 22 486 44 9.1% The Fig 95 34 3 8.8% Muhammad 47 239 21 8.8% The letter Qaaf 50 207 18 8.7% The Jinn 72 139 12 8.6% The Prophets 21 423 35 8.3% The Clot 96 50 4 8.0% Sheba 34 330 26 7.9% The Cattle 6 725 57 7.9% The Family of Imraan 3 761 59 7.8% The Spoils of War 8 400 31 7.8% The Ascending Stairways 70 142 11 7.7% Man 76 155 12 7.7% The Winnowing Winds 51 194 15 7.7% The Stories 28 468 36 7.7% The Mount 52 182 14 7.7% The Inner Apartments 49 160 12 7.5% The Poets 26 410 30 7.3% Hud 11 554 40 7.2% The Declining Day, Epoch 103 14 1 7.1% The Bee 16 552 39 7.1% The Exile 59 213 15 7.0% The Night 92 57 4 7.0% The Ant 27 414 29 7.0% The Rock 15 289 20 6.9% The Heights 7 819 51 6.2% The Criterion 25 372 23 6.2% The Pleading Woman 58 198 12 6.1% Ornaments of gold 43 334 20 6.0% The Victory 48 253 15 5.9% The Cleaving 82 54 3 5.6% Divorce 65 148 8 5.4% Abraham 14 334 18 5.4% The Iron 57 249 13 5.2% The Thunder 13 348 18 5.2% The Believers 23 392 20 5.1% The Evidence 98 59 3 5.1% The Most High 87 61 3 4.9% The Romans 30 290 14 4.8% Almsgiving 107 21 1 4.8% Yaseen 36 298 14 4.7% The Sovereignty 67 171 8 4.7% The Consolation 94 22 1 4.5% Luqman 31 248 11 4.4% The Smoke 44 183 8 4.4% The Power, Fate 97 23 1 4.3% The Originator 35 335 14 4.2% The Prohibition 66 144 6 4.2% The Opening 1 24 1 4.2% The Constellations 85 77 3 3.9% Explained in detail 41 311 12 3.9% The Forgiver 40 398 15 3.8% The Earthquake 99 28 1 3.6% The Groups 39 393 14 3.6% She that is to be examined 60 158 5 3.2% The Hypocrites 63 103 3 2.9% Jonas 10 486 13 2.7% Consultation 42 304 8 2.6% The Dunes 46 275 7 2.5% Crouching 45 201 5 2.5% The Ranks 61 124 3 2.4% The Spider 29 336 7 2.1% The Prostration 32 193 2 1.0% Friday 62 104 1 1.0% Mutual Disillusion 64 138 1 0.7% Divine Support 110 19 0 0.0% The Disbelievers 109 9 0 0.0%
What we did for suras, we can also do for the other section types.
We generalize the task into a function, that accepts the kind of section as parameter. Then we can call that function for all our section types.
def lexBase(section):
# make indices
lexemesPerSection = {}
sectionsPerLexeme = {}
for s in F.otype.s(section):
for w in L.d(s, otype="word"):
lex = F.lemma.v(w)
lexemesPerSection.setdefault(s, set()).add(lex)
sectionsPerLexeme.setdefault(lex, set()).add(s)
print(
"{:<10} {:>4} {:>4} {:>5}\n{}".format(
section,
"#all",
"#own",
"%own",
"-" * 26,
)
)
sectionList = []
for s in F.otype.s(section):
n = F.number.v(s)
myLexes = lexemesPerSection[s]
a = len(myLexes)
o = len([lex for lex in myLexes if len(sectionsPerLexeme[lex]) == 1])
p = 100 * o / a
sectionList.append((n, a, o, p))
for x in sorted(sectionList, key=lambda e: (-e[3], -e[1], e[0])):
print("{:<10} {:>4} {:>4} {:>4.1f}%".format(*x))
print("=" * 26)
for section in (
"manzil",
# 'sajda',
# 'juz',
# 'ruku',
# 'hizb',
# 'page',
):
lexBase(section)
manzil #all #own %own -------------------------- 7 2120 685 32.3% 4 1907 415 21.8% 1 1694 302 17.8% 2 1773 316 17.8% 5 1580 235 14.9% 3 1493 222 14.9% 6 1516 215 14.2% ==========================
We travel upwards and downwards, forwards and backwards through the nodes.
The Layer-API (L
) provides functions: u()
for going up, and d()
for going down,
n()
for going to next nodes and p()
for going to previous nodes.
These directions are indirect notions: nodes are just numbers, but by means of the
oslots
feature they are linked to slots. One node contains an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.
And one if next or previous to an other, if its slots follow of precede the slots of the other one.
L.u(node)
Up is going to nodes that embed node
.
L.d(node)
Down is the opposite direction, to those that are contained in node
.
L.n(node)
Next are the next adjacent nodes, i.e. nodes whose first slot comes immediately after the last slot of node
.
L.p(node)
Previous are the previous adjacent nodes, i.e. nodes whose last slot comes immediately before the first slot of node
.
All these functions yield nodes of all possible node types. By passing an optional parameter, you can restrict the results to nodes of that type.
The result are ordered according to the order of things in the text.
The functions return always a tuple, even if there is just one node in the result.
We go from the first word to the book it contains.
Note the [0]
at the end. You expect one book, yet L
returns a tuple.
To get the only element of that tuple, you need to do that [0]
.
If you are like me, you keep forgetting it, and that will lead to weird error messages later on.
firstSura = L.u(1, otype="sura")[0]
print(firstSura)
218169
And let's see all the containing objects of word 3:
w = 3
for otype in F.otype.all:
if otype == F.otype.slotType:
continue
up = L.u(w, otype=otype)
upNode = "x" if len(up) == 0 else up[0]
print("word {} is contained in {} {}".format(w, otype, upNode))
word 3 is contained in manzil 216987 word 3 is contained in sajda x word 3 is contained in juz 212125 word 3 is contained in sura 218169 word 3 is contained in hizb 211885 word 3 is contained in ruku 217598 word 3 is contained in page 216994 word 3 is contained in aya 128220 word 3 is contained in lex 212156 word 3 is contained in group 134457
Let's go to the next nodes of the first book.
afterFirstSura = L.n(firstSura)
for n in afterFirstSura:
print(
"{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
n,
F.otype.v(n),
E.oslots.s(n)[0],
E.oslots.s(n)[-1],
)
)
secondSura = L.n(firstSura, otype="sura")[0]
49: word first slot=49 , last slot=49 134485: group first slot=49 , last slot=49 128227: aya first slot=49 , last slot=49 216995: page first slot=49 , last slot=112 217599: ruku first slot=49 , last slot=149 218170: sura first slot=49 , last slot=10291
And let's see what is right before the second book.
for n in L.p(secondSura):
print(
"{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
n,
F.otype.v(n),
E.oslots.s(n)[0],
E.oslots.s(n)[-1],
)
)
218169: sura first slot=1 , last slot=48 217598: ruku first slot=1 , last slot=48 216994: page first slot=1 , last slot=48 128226: aya first slot=34 , last slot=48 134484: group first slot=47 , last slot=48 48: word first slot=48 , last slot=48
We go to the chapters of the second book, and just count them.
ayas = L.d(secondSura, otype="aya")
print(len(ayas))
286
We pick the first aya and the first word, and explore what is above and below them.
for n in [1, L.u(1, otype="aya")[0]]:
A.indent(level=0)
A.info("Node {}".format(n), tm=False)
A.indent(level=1)
A.info("UP", tm=False)
A.indent(level=2)
A.info("\n".join(["{:<15} {}".format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)
A.indent(level=1)
A.info("DOWN", tm=False)
A.indent(level=2)
A.info("\n".join(["{:<15} {}".format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)
A.indent(level=0)
A.info("Done", tm=False)
Node 1 | UP | | 134456 group | | 128220 aya | | 216994 page | | 217598 ruku | | 218169 sura | | 211885 hizb | | 212125 juz | | 216987 manzil | DOWN | | Node 128220 | UP | | 216994 page | | 217598 ruku | | 218169 sura | | 211885 hizb | | 212125 juz | | 216987 manzil | DOWN | | 134456 group | | 1 word | | 2 word | | 134457 group | | 3 word | | 134458 group | | 4 word | | 5 word | | 134459 group | | 6 word | | 7 word Done
So far, we have mainly seen nodes and their numbers, and the names of node types. You would almost forget that we are dealing with text. So let's try to see some text.
In the same way as F
gives access to feature data,
T
gives access to the text.
That is also feature data, but you can tell Text-Fabric which features are specifically
carrying the text, and in return Text-Fabric offers you
a Text API: T
.
Arabic text can be represented in a number of ways:
If you wonder where the information about text formats is stored:
not in the program text-fabric, but in the data set.
It has a feature otext
, which specifies the formats and which features
must be used to produce them. otext
is the third special feature in a TF data set,
next to otype
and oslots
.
It is an optional feature.
If it is absent, there will be no T
API.
Here is a list of all available formats in this data set.
sorted(T.formats)
['lex-trans-full', 'root-trans-full', 'text-orig-full', 'text-trans-full']
We can pretty display in other formats:
Now let's use those formats to print out the first aya of the Quran.
a1 = F.otype.s("aya")[0]
for fmt in sorted(T.formats):
print("{}:\n\t{}".format(fmt, T.text(a1, fmt=fmt, descend=True)))
lex-trans-full: {som {ll~ah r~aHoma`n r~aHiym root-trans-full: smw Alh rHm rHm text-orig-full: بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ text-trans-full: bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi
If we do not specify a format, the default format is used (text-orig-full
).
print(T.text(a1))
بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
Part of the pleasure of working with computers is that they can crunch massive amounts of data. The text of the Quran Bible is a piece of cake.
It takes less than a second to have that cake and eat it. In nearly a handful formats.
A.indent(reset=True)
A.info("writing plain text of whole Quran in all formats")
text = collections.defaultdict(list)
for a in F.otype.s("aya"):
words = L.d(a, "word")
for fmt in sorted(T.formats):
text[fmt].append(T.text(words, fmt=fmt))
A.info("done {} formats".format(len(text)))
for fmt in sorted(text):
print("{}\n{}\n".format(fmt, "\n".join(text[fmt][0:5])))
0.00s writing plain text of whole Quran in all formats 0.90s done 4 formats lex-trans-full {som {ll~ah r~aHoma`n r~aHiym Hamod {ll~ah rab~ Ea`lamiyn r~aHoma`n r~aHiym ma`lik yawom diyn <iy~aA Eabada <iy~aA {sotaEiynu root-trans-full smw Alh rHm rHm Hmd Alh rbb Elm rHm rHm mlk ywm dyn Ebd Ewn text-orig-full بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ مَٰلِكِ يَوْمِ ٱلدِّينِ إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ text-trans-full bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi {loHamodu lil~ahi rab~i {loEa`lamiyna {lr~aHoma`ni {lr~aHiymi ma`liki yawomi {ld~iyni <iy~aAka naEobudu wa<iy~aAka nasotaEiynu
We write a few formats to file, in your Downloads
folder.
orig = "text-orig-full"
trans = "text-trans-full"
for fmt in (orig, trans):
with open(os.path.expanduser(f"~/Downloads/Quran-{fmt}.txt"), "w") as f:
f.write("\n".join(text[fmt]))
!head -n 20 ~/Downloads/Quran-{orig}.txt
بِسْمِ ٱللَّهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ ٱلْحَمْدُ لِلَّهِ رَبِّ ٱلْعَٰلَمِينَ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ مَٰلِكِ يَوْمِ ٱلدِّينِ إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ ٱهْدِنَا ٱلصِّرَٰطَ ٱلْمُسْتَقِيمَ صِرَٰطَ ٱلَّذِينَ أَنْعَمْتَ عَلَيْهِمْ غَيْرِ ٱلْمَغْضُوبِ عَلَيْهِمْ وَلَا ٱلضَّآلِّينَ الٓمٓ ذَٰلِكَ ٱلْكِتَٰبُ لَا رَيْبَ فِيهِ هُدًى لِّلْمُتَّقِينَ ٱلَّذِينَ يُؤْمِنُونَ بِٱلْغَيْبِ وَيُقِيمُونَ ٱلصَّلَوٰةَ وَمِمَّا رَزَقْنَٰهُمْ يُنفِقُونَ وَٱلَّذِينَ يُؤْمِنُونَ بِمَآ أُنزِلَ إِلَيْكَ وَمَآ أُنزِلَ مِن قَبْلِكَ وَبِٱلْءَاخِرَةِ هُمْ يُوقِنُونَ أُو۟لَٰٓئِكَ عَلَىٰ هُدًى مِّن رَّبِّهِمْ وَأُو۟لَٰٓئِكَ هُمُ ٱلْمُفْلِحُونَ إِنَّ ٱلَّذِينَ كَفَرُوا۟ سَوَآءٌ عَلَيْهِمْ ءَأَنذَرْتَهُمْ أَمْ لَمْ تُنذِرْهُمْ لَا يُؤْمِنُونَ خَتَمَ ٱللَّهُ عَلَىٰ قُلُوبِهِمْ وَعَلَىٰ سَمْعِهِمْ وَعَلَىٰٓ أَبْصَٰرِهِمْ غِشَٰوَةٌ وَلَهُمْ عَذَابٌ عَظِيمٌ وَمِنَ ٱلنَّاسِ مَن يَقُولُ ءَامَنَّا بِٱللَّهِ وَبِٱلْيَوْمِ ٱلْءَاخِرِ وَمَا هُم بِمُؤْمِنِينَ يُخَٰدِعُونَ ٱللَّهَ وَٱلَّذِينَ ءَامَنُوا۟ وَمَا يَخْدَعُونَ إِلَّآ أَنفُسَهُمْ وَمَا يَشْعُرُونَ فِى قُلُوبِهِم مَّرَضٌ فَزَادَهُمُ ٱللَّهُ مَرَضًا وَلَهُمْ عَذَابٌ أَلِيمٌۢ بِمَا كَانُوا۟ يَكْذِبُونَ وَإِذَا قِيلَ لَهُمْ لَا تُفْسِدُوا۟ فِى ٱلْأَرْضِ قَالُوٓا۟ إِنَّمَا نَحْنُ مُصْلِحُونَ أَلَآ إِنَّهُمْ هُمُ ٱلْمُفْسِدُونَ وَلَٰكِن لَّا يَشْعُرُونَ وَإِذَا قِيلَ لَهُمْ ءَامِنُوا۟ كَمَآ ءَامَنَ ٱلنَّاسُ قَالُوٓا۟ أَنُؤْمِنُ كَمَآ ءَامَنَ ٱلسُّفَهَآءُ أَلَآ إِنَّهُمْ هُمُ ٱلسُّفَهَآءُ وَلَٰكِن لَّا يَعْلَمُونَ
!head -n 20 ~/Downloads/Quran-{trans}.txt
bisomi {ll~ahi {lr~aHoma`ni {lr~aHiymi {loHamodu lil~ahi rab~i {loEa`lamiyna {lr~aHoma`ni {lr~aHiymi ma`liki yawomi {ld~iyni <iy~aAka naEobudu wa<iy~aAka nasotaEiynu {hodinaA {lS~ira`Ta {lomusotaqiyma Sira`Ta {l~a*iyna >anoEamota Ealayohimo gayori {lomagoDuwbi Ealayohimo walaA {lD~aA^l~iyna Al^m^ *a`lika {lokita`bu laA rayoba fiyhi hudFY l~ilomut~aqiyna {l~a*iyna yu&ominuwna bi{logayobi wayuqiymuwna {lS~alaw`pa wamim~aA razaqona`humo yunfiquwna wa{l~a*iyna yu&ominuwna bimaA^ >unzila <ilayoka wamaA^ >unzila min qabolika wabi{lo'aAxirapi humo yuwqinuwna >uw@la`^}ika EalaY` hudFY m~in r~ab~ihimo wa>uw@la`^}ika humu {lomufoliHuwna <in~a {l~a*iyna kafaruwA@ sawaA^'N Ealayohimo 'a>an*arotahumo >amo lamo tun*irohumo laA yu&ominuwna xatama {ll~ahu EalaY` quluwbihimo waEalaY` samoEihimo waEalaY`^ >aboSa`rihimo gi$a`wapN walahumo Ea*aAbN EaZiymN wamina {ln~aAsi man yaquwlu 'aAman~aA bi{ll~ahi wabi{loyawomi {lo'aAxiri wamaA hum bimu&ominiyna yuxa`diEuwna {ll~aha wa{l~a*iyna 'aAmanuwA@ wamaA yaxodaEuwna <il~aA^ >anfusahumo wamaA ya$oEuruwna fiY quluwbihim m~araDN fazaAdahumu {ll~ahu maraDFA walahumo Ea*aAbN >aliymN[ bimaA kaAnuwA@ yako*ibuwna wa<i*aA qiyla lahumo laA tufosiduwA@ fiY {lo>aroDi qaAluw^A@ <in~amaA naHonu muSoliHuwna >alaA^ <in~ahumo humu {lomufosiduwna wala`kin l~aA ya$oEuruwna wa<i*aA qiyla lahumo 'aAminuwA@ kamaA^ 'aAmana {ln~aAsu qaAluw^A@ >anu&ominu kamaA^ 'aAmana {ls~ufahaA^'u >alaA^ <in~ahumo humu {ls~ufahaA^'u wala`kin l~aA yaEolamuwna
A section is a sura, and an aya.
Knowledge of sections is not baked into Text-Fabric.
The config feature otext.tf
may specify two or three section levels, and tell
what the corresponding node types and features are.
From that knowledge it can construct mappings from nodes to sections, e.g. from aya nodes to tuples of the form:
(sura number, aya number)
Here are examples of getting the section that corresponds to a node and vice versa.
NB: sectionFromNode
always delivers a verse specification, either from the
first slot belonging to that node, or, if lastSlot
, from the last slot
belonging to that node.
for x in (
("sura, aya of first word", T.sectionFromNode(1)),
("node of 1:1", T.nodeFromSection((1, 1))),
("node of 2:1", T.nodeFromSection((2, 1))),
("node of sura 1", T.nodeFromSection((1,))),
("section of sura node", T.sectionFromNode(211890)),
("section of aya node", T.sectionFromNode(210000)),
("section of juz node", T.sectionFromNode(216850)),
("idem, now last word", T.sectionFromNode(216850, lastSlot=True)),
):
print("{:<30} {}".format(*x))
sura, aya of first word (1, 1) node of 1:1 128220 node of 2:1 128227 node of sura 1 218169 section of sura node (2, 92) section of aya node (80, 23) section of juz node (85, 2) idem, now last word (85, 2)
The other sectional units in the quran, manzil
, sajda
, juz
, ruku
, hizb
, page
are not associated with special Text-Fabric functions in this data set, although we could have
chosen to use two or three of them instead of sura and aya.
But, TF also offers the possibility to define your own sections, independent from and more flexible than the sections defined above.
For a bit more on sections, consult the sections recipe in the cookbook.
This data source contains English (by Arberry) and Dutch (by Leemhuis) translations of the Quran.
They are stored in the features translation@en
and translation@nl
for aya nodes.
Let's get the translations of sura 107, together with the arabic original.
The translation features are not loaded by default, we load them first.
TF.load("translation@en translation@nl", add=True)
sura = 107
suraNode = T.suraNode(sura)
print(F.name.v(suraNode))
for ayaNode in L.d(suraNode, otype="aya"):
print(f"{F.number.v(ayaNode)}")
print(T.text(ayaNode))
print(Fs("translation@en").v(ayaNode))
print(Fs("translation@nl").v(ayaNode))
0.00s All additional features loaded - for details use TF.isLoaded() الماعون 1 أَرَءَيْتَ ٱلَّذِى يُكَذِّبُ بِٱلدِّينِ Hast thou seen him who cries lies to the Doom? Heb jij hem gezien die de godsdienst loochent? 2 فَذَٰلِكَ ٱلَّذِى يَدُعُّ ٱلْيَتِيمَ That is he who repulses the orphan Dat is hij die de wees wegduwt 3 وَلَا يَحُضُّ عَلَىٰ طَعَامِ ٱلْمِسْكِينِ and urges not the feeding of the needy. en die er niet op aandringt de behoeftige voedsel te geven. 4 فَوَيْلٌ لِّلْمُصَلِّينَ So woe to those that pray En wee hen die de salaat bidden 5 ٱلَّذِينَ هُمْ عَن صَلَاتِهِمْ سَاهُونَ and are heedless of their prayers, die hun salaat veronachtzamen, 6 ٱلَّذِينَ هُمْ يُرَآءُونَ to those who make display die vertoon willen maken 7 وَيَمْنَعُونَ ٱلْمَاعُونَ and refuse charity. en die de hulpverlening weigeren.
CC-BY Dirk Roorda