Notebook

Attributives¶

We want to make a list of all nouns with their adjectival modifiers. We produce a tab separated file of phrases which contain a noun and adjectival modifiers. The columns are

passage label
phrase text
phrase gloss
head of an attributive subphrase
attributive subphrase
number of words in the head
number of nouns in the head

Hebrew text is represented in ETCBC consonantal transcription, for ease of importing it in Excel. It is not difficult to generate fully vocalized Hebrew, but then you need OpenOffice to open the csv file.

In [1]:

import sys, os
import collections

import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric()

  0.00s This is LAF-Fabric 4.8.3
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html

Loading the feature data¶

In [2]:

version = '4b'
API = fabric.load('etcbc{}'.format(version), 'lexicon', 'adjectives', {
    "xmlids": {"node": False, "edge": False},
    "features": ('''
        otype 
        function rela sp
        gloss
        g_word_utf8 trailer_utf8
        book chapter verse number
    ''',
    '''
        mother
    '''),
    "prepare": prepare,
    "primary": False,
}, verbose='DETAIL')
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.01s DETAIL: COMPILING m: etcbc4b: UP TO DATE
  0.01s USING main: etcbc4b DATA COMPILED AT: 2015-11-02T15-08-56
  0.01s DETAIL: COMPILING a: lexicon: UP TO DATE
  0.01s USING annox: lexicon DATA COMPILED AT: 2016-07-08T14-32-54
  0.03s DETAIL: load main: G.node_anchor_min
  0.12s DETAIL: load main: G.node_anchor_max
  0.21s DETAIL: load main: G.node_sort
  0.27s DETAIL: load main: G.node_sort_inv
  0.65s DETAIL: load main: G.edges_from
  0.71s DETAIL: load main: G.edges_to
  0.77s DETAIL: load main: F.etcbc4_db_otype [node] 
  1.36s DETAIL: load main: F.etcbc4_ft_function [node] 
  1.47s DETAIL: load main: F.etcbc4_ft_g_word_utf8 [node] 
  1.74s DETAIL: load main: F.etcbc4_ft_number [node] 
  2.61s DETAIL: load main: F.etcbc4_ft_rela [node] 
  3.15s DETAIL: load main: F.etcbc4_ft_sp [node] 
  3.49s DETAIL: load main: F.etcbc4_ft_trailer_utf8 [node] 
  3.75s DETAIL: load main: F.etcbc4_sft_book [node] 
  3.78s DETAIL: load main: F.etcbc4_sft_chapter [node] 
  3.80s DETAIL: load main: F.etcbc4_sft_verse [node] 
  3.81s DETAIL: load main: F.etcbc4_ft_mother [e] 
  3.93s DETAIL: load main: C.etcbc4_ft_mother -> 
  4.28s DETAIL: load main: C.etcbc4_ft_mother <- 
  4.41s DETAIL: load annox lexicon: F.etcbc4_lex_gloss [node] 
  4.61s LOGFILE=/Users/dirk/laf/laf-fabric-output/etcbc4b/adjectives/__log__adjectives.txt
  4.63s INFO: LOADING PREPARED data: please wait ... 
  4.63s prep prep: G.node_sort
  4.69s prep prep: G.node_sort_inv
  5.12s prep prep: L.node_up
  7.93s prep prep: L.node_down
    13s prep prep: V.verses
    13s prep prep: V.books_la
    13s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
    15s INFO: LOADED PREPARED data
    15s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK adjectives AT 2016-11-18T18-41-00

Collect data¶

We need phrases that act as a mother to one or more attributive subphrases. That means that such a subphrase must have the rela feature set to atr.

Let us first collect subphrases having rela = atr.

In [3]:

attr_subphrases = set()
inf('Finding subphrases ...')
for s in F.otype.s('subphrase'):
    if F.rela.v(s) != 'atr':
        continue
    attr_subphrases.add(s)
inf('{} attributive subphrases'.format(len(attr_subphrases)))

  7.65s Finding subphrases ...
  8.88s 3106 attributive subphrases

Now let us add the mothers to those subphrases. If there is no mother, we leave it out. A subphrase should not have multiple mothers, but we'll check that anyway.

In [4]:

attr_subphrase_mother = dict()
multiple_mothers = set()
no_mothers = set()
for s in attr_subphrases:
    mothers = list(C.mother.v(s))
    if len(mothers) == 0:
        no_mothers.add(s)
        continue
    if len(mothers) > 1: 
        multiple_mothers.add(s)
        continue
    attr_subphrase_mother[s] = mothers[0]
if len(multiple_mothers):
    msg('{} subphrases with multiple mothers'.format(len(multiple_mothers)))
else:
    inf('No subphrases with multiple mothers')
if len(no_mothers):
    msg('{} subphrases without mothers'.format(len(no_mothers)))
else:
    inf('No subphrases without mothers')

inf('{} attributive subphrases with a single mother'.format(len(attr_subphrase_mother)))

    15s No subphrases with multiple mothers

    15s 12 subphrases without mothers

    15s 3094 attributive subphrases with a single mother

Let us get some information about the mothers of those subphrases. What kind of objects are they?

In [5]:

mother_types = collections.Counter()
idents = 0
for (s, m) in attr_subphrase_mother.items():
    mother_types[F.otype.v(m)] +=1

for t in sorted(mother_types):
    print('{:>4} subphrases with a mother of type {}'.format(mother_types[t], t))

3094 subphrases with a mother of type subphrase

So the mother is always a subphrase. What about the length of that subphrase?

In [6]:

mother_length = collections.Counter()
for (s, m) in attr_subphrase_mother.items():
    mother_length[len(L.d('word', m))] +=1

for t in sorted(mother_length):
    print('{:>4} subphrases with a mother of length {:>2}'.format(mother_length[t], t))

2085 subphrases with a mother of length  1
 919 subphrases with a mother of length  2
  62 subphrases with a mother of length  3
  14 subphrases with a mother of length  4
  11 subphrases with a mother of length  5
   1 subphrases with a mother of length  7
   1 subphrases with a mother of length  8
   1 subphrases with a mother of length  9

How many nouns has the mother?

In [7]:

mother_nouns = collections.Counter()
for (s, m) in attr_subphrase_mother.items():
    mother_nouns[len([w for w in L.d('word', m) if F.sp.v(w) == 'subs'])] +=1

for t in sorted(mother_nouns):
    print('{:>4} subphrases with a mother having {:>2} nouns'.format(mother_nouns[t], t))

  63 subphrases with a mother having  0 nouns
2867 subphrases with a mother having  1 nouns
 137 subphrases with a mother having  2 nouns
  12 subphrases with a mother having  3 nouns
   6 subphrases with a mother having  4 nouns
   8 subphrases with a mother having  5 nouns
   1 subphrases with a mother having  6 nouns

Generating output¶

Let us now assemble all data into the final output. We produce also a row of column headers.

In [8]:

fields = '''
    passage
    phrase_text
    phrase_gloss
    head
    attributive
    #words_mother
    #nouns_mother
'''.strip().split()
nfields = len(fields)
row_template = ('{}\t' * (nfields - 1))+'{}\n'

In [9]:

of_path_template = 'attributives_{}.csv'
for fmt in ['ec', 'ha']:
    of = open(of_path_template.format(fmt), 'w')
    of.write('{}\n'.format('\t'.join(fields)))
    for s in sorted(attr_subphrase_mother, key=NK):
        sw = list(L.d('word', s))
        p = L.u('phrase', s)
        pw = list(L.d('word', p))
        m = attr_subphrase_mother[s]
        mw = list(L.d('word', m))

        of.write(row_template.format(
            T.passage(s),
            T.words(pw, fmt=fmt).replace('\n', ' '),
            ' '.join(F.gloss.v(w) for w in pw),
            T.words(mw, fmt=fmt).replace('\n', ' '),
            T.words(sw, fmt=fmt).replace('\n', ' '),
            len(mw),
            len([w for w in mw if F.sp.v(w) == 'subs']),
        ))

    of.close()
    inf('Written {} lines to {}'.format(len(attr_subphrase_mother) + 1, of_path_template.format(fmt)))

    25s Written 3095 lines to attributives_ec.csv
    25s Written 3095 lines to attributives_ha.csv

Results¶

etcbc consonantal and fully pointed hebrew.

Screenshot made in the Numbers program:

In [10]:

print(open(of_path_template.format('ec')).read()[0:1000])

passage	phrase_text	phrase_gloss	head	attributive	#words_mother	#nouns_mother
Genesis 1:8	JWM #NJ00 	day second	JWM 	#NJ00 	1	1
Genesis 1:13	JWM #LJ#J00 	day third	JWM 	#LJ#J00 	1	1
Genesis 1:16	>T&#NJ HM>RT HGDLJM 	<object marker> two the lamp the great	HM>RT 	HGDLJM 	2	1
Genesis 1:16	>T&HM>WR HGDL LMM#LT HJWM 	<object marker> the lamp the great to dominion the day	HM>WR 	HGDL 	2	1
Genesis 1:16	>T&HM>WR HQVN LMM#LT HLJLH 	<object marker> the lamp the small to dominion the night	HM>WR 	HQVN 	2	1
Genesis 1:19	JWM RBJ<J00 	day fourth	JWM 	RBJ<J00 	1	1
Genesis 1:20	#RY NP# XJH 	swarming creatures soul alive	NP# 	XJH 	1	1
Genesis 1:21	>T&HTNJNM HGDLJM W>T KL&NP# 	<object marker> the sea-monster the great and <object marker> whole soul	HTNJNM 	HGDLJM 	2	1
Genesis 1:23	JWM XMJ#J00 	day fifth	JWM 	XMJ#J00 	1	1
Genesis 1:24	NP# XJH 	soul alive	NP# 	XJH 	1	1
Genesis 1:30	NP# XJH 	soul alive	NP# 	XJH 	1	1
Genesis 2:2	BJWM H#BJ<J 	in the day the seventh	JWM 	H#BJ<J 	2	1
Genesis 2:2	BJWM H#BJ<J 	i

In [ ]: