Social Network Analysis of Leviticus 17-26

This notebook combines the participant references and semantic roles computed in other phases of this research project. The two datatypes are combined to create a social network model of the data and to explore this model by social network analytical tools. The first SNA-measures are given in this notobook, while more detailed studies of participant roles are reserved for other notebooks in this repo.

Content

  1. Import of data
  2. Cross-tabulating participant and semantic roles
  3. Creation of network model
  4. Validation of the model
  5. First social network analyses
In [1]:
#Dataset path
PATH = 'datasets/'

import csv, collections, html
from operator import itemgetter
import pandas as pd
import numpy as np
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
from adjustText import adjust_text
import networkx as nx
import forceatlas2
import random
Warning: uncompiled fa2util module.  Compile with cython for a 10-100x speed boost.

1. Import data

In [2]:
#Importing the Hebrew data and Text-Fabric
from tf.app import use
A = use('bhsa', hoist=globals(), mod='etcbc/heads/tf')
TF-app: C:\Users\Ejer/text-fabric-data/annotation/app-bhsa/code
data: C:\Users\Ejer/text-fabric-data/etcbc/bhsa/tf/c
data: C:\Users\Ejer/text-fabric-data/etcbc/phono/tf/c
data: C:\Users\Ejer/text-fabric-data/etcbc/parallels/tf/c
data: C:\Users\Ejer/text-fabric-data/etcbc/heads/tf/c
Text-Fabric API: names N F E L T S C TF directly usable

1.a Import of participant reference data:

In [3]:
df = pd.read_csv(f'{PATH}participants_FINAL.csv')
df.columns = ['participant','refs']
df.head()
Out[3]:
participant refs
0 JHWH 944128 946176 946179 946182 946184 944142 9441...
1 MCH= 945152 945537 945155 945540 945547 945555 9449...
2 >HRN 944640 944641 65555 944662 65561 944666 944667...
3 BN JFR>L 67584 944132 944133 944139 946216 946217 94417...
4 >JC >JC 945664 64514 945666 945668 944135 944136 94567...

The references are transformed to lists and their respective frequencies in the corpus are counted

In [4]:
ref_list = []
participant_freq = []

for row in df.iterrows():
    refs = [int(r) for r in row[1].refs.split()]
    ref_list.append(refs)
    participant_freq.append(len(refs))
    
df.insert(2, 'ref_list', ref_list)
df.insert(3, 'freq', participant_freq)
In [5]:
df.head()
Out[5]:
participant refs ref_list freq
0 JHWH 944128 946176 946179 946182 946184 944142 9441... [944128, 946176, 946179, 946182, 946184, 94414... 476
1 MCH= 945152 945537 945155 945540 945547 945555 9449... [945152, 945537, 945155, 945540, 945547, 94555... 60
2 >HRN 944640 944641 65555 944662 65561 944666 944667... [944640, 944641, 65555, 944662, 65561, 944666,... 164
3 BN JFR>L 67584 944132 944133 944139 946216 946217 94417... [67584, 944132, 944133, 944139, 946216, 946217... 579
4 >JC >JC 945664 64514 945666 945668 944135 944136 94567... [945664, 64514, 945666, 945668, 944135, 944136... 277
In [6]:
print(f'Number of participants: {len(df)}')
Number of participants: 75

Two functions fetch the participant label from any given word or phrase in the text.

In [7]:
def getLabel(ref, df=df):
    '''
    This function fetches the actor/participant reference from the participant dataframe.
    '''
    
    actor_list = []
    
    for row in df.iterrows():
        if ref in row[1].ref_list:
            actor_list.append(row[1].participant)
    
    return actor_list

def Actor(ref, df=df):
    '''
    This function takes a reference as input and returns the participant label. Phrases are treated differently, becuase 
    non-verbal phrases require additional measures to find the nominal head of the phrase and return the label for that 
    particular constituent.
    '''
    
    nom_head = E.nhead.t(ref) #Finding the nominal head(s) of the phrase
    
    if F.otype.v(ref) == 'word': #Identifying object suffixes
        return getLabel(ref, df=df)
    
    elif F.typ.v(ref) == 'VP':
        return getLabel(L.d(ref, 'phrase_atom')[0], df=df)
    
    elif F.typ.v(ref) == 'PP':
        if len(nom_head) > 1:
            return getLabel(L.d(ref, 'phrase_atom')[0], df=df)
        if nom_head != E.head.t(ref): #If equal, the reference is a simple preposition with a suffix
            return getLabel(L.u(nom_head[0], 'phrase_atom')[0], df=df)
        else:
            if getLabel(E.head.t(ref)[0], df=df):
                return getLabel(E.head.t(ref)[0], df=df)
            else:
                return getLabel(L.u(nom_head[0], 'phrase_atom')[0], df=df)
        
    elif F.typ.v(ref) in {'NP','PrNP','PPrP','DPrP','CP'}:
        return getLabel(L.u(nom_head[0], 'phrase_atom')[0], df=df)
        
    else:
        return "error"

#Actor(65418)

1.b Import agency ranks of participants

In [8]:
ranks_df = pd.read_csv(f'{PATH}role_ranks.csv', index_col=0)
ranks_df.head()
Out[8]:
Vol Inst Aff neg role new_role new_rank rank
688348 y y n NaN Agent Agent 5 5
688349 y n y NaN Volitional Undergoer Volitional Undergoer -1 -1
688350 y y n NaN Agent Agent 5 5
688351 y y n NaN Agent Agent 5 5
688352 y n y NaN Volitional Undergoer Volitional Undergoer -1 -1

A function is defined to return the agency of any given reference

In [9]:
def Agency(ref, colname, df=ranks_df):
    
    if ref in list(df.index):
        return df[df.index == ref][colname].item()

#Agency(68032, 'new_rank')

2. Cross-tabulating participants and roles

This section cross-tabulates the participant and role data to calculate the mean agency of each participant.

In [10]:
actor_list = [Actor(ph) for ph in list(ranks_df.index)]
ranks_df.insert(8, 'Actor', actor_list) #The actor is inserted as a new column
In [11]:
ranks_df.head()
Out[11]:
Vol Inst Aff neg role new_role new_rank rank Actor
688348 y y n NaN Agent Agent 5 5 [JHWH]
688349 y n y NaN Volitional Undergoer Volitional Undergoer -1 -1 [MCH=]
688350 y y n NaN Agent Agent 5 5 [JHWH]
688351 y y n NaN Agent Agent 5 5 [MCH=]
688352 y n y NaN Volitional Undergoer Volitional Undergoer -1 -1 [>HRN, BN JFR>L, BN >HRN]

Cross-tabulation of the data to count how often each participant obtains a certain agency level:

In [15]:
dic = collections.defaultdict(lambda: collections.defaultdict(int))

for row in ranks_df.iterrows():
    for n in row[1].Actor:
        dic[n][row[1].new_rank] += 1
        
agency_df = pd.DataFrame(dic).fillna(0).astype('Int64').T
agency_df = agency_df[[5,4,3,1,0,-1,-2]]
agency_df.head()
Out[15]:
5 4 3 1 0 -1 -2
JHWH 118 0 1 8 29 30 17
MCH= 36 0 1 0 1 19 0
>HRN 16 0 11 31 1 19 10
BN JFR>L 99 0 44 72 28 83 31
BN >HRN 16 0 6 22 5 17 5

The mean agency is calculated

In [16]:
agency_mean = []

for row in agency_df.iterrows():
    n=0
    total = 0
    for v in row[1]:
        total += (v * agency_df.columns[n])
        n+=1
    agency_mean.append(round(total/row[1].sum(), 3))
    
agency_df.insert(7, 'mean', agency_mean)

#Inserting labels
labels = [label_gloss[l] if l in label_gloss else l for l in list(agency_df.index)]
agency_df.insert(0, 'label', labels)
In [17]:
agency_df = agency_df[agency_df.sum(axis=1) > 20]
agency_df.sort_values(by='mean', ascending=False)
Out[17]:
label 5 4 3 1 0 -1 -2 mean
MCH= Moses 36 0 1 0 1 19 0 2.877
JHWH YHWH 118 0 1 8 29 30 17 2.645
>JC >JC an_Israelite 60 0 22 7 4 6 38 2.124
2ms 2msg 21 0 10 57 8 8 2 1.698
BN JFR>L Israelites 99 0 44 72 28 83 31 1.552
GR sojourner 45 0 16 5 13 9 38 1.532
BN >HRN Aaron's_sons 16 0 6 22 5 17 5 1.310
>HRN Aaron 16 0 11 31 1 19 10 1.193
>X -2ms brother 11 0 3 1 16 10 13 0.537
HM remnants 3 2 4 0 2 5 13 0.138
<M foreign_nations 3 0 1 0 5 3 10 -0.227

3. Creating nodes and edges

The network model combines participant data and semantic roles. The primary principle is to isolate those clauses where at least two participants occur (they can be identical) which means that isolated participants are ignored. Secondly, the edges are made from the participant with the highest agency level toward the participant with the lowest agency level within the same clause. We can assume that the participant with the highest agency level is also most active in the event and therefore the source of the event.

In [18]:
def createEdges(colname, df=df, ranks_df=ranks_df, verb_list = [], relation='function', label_text='gloss', mode=str()):
    '''
    Input: dictionary of actors + nodes (references), plus preferred text type, that is, English gloss (default)
        or transcription of the Hebrew lexeme (= trans)
        colname is name of the rank column (usually "rank" or "new_rank")
    Output: dictionary of edges and labels
    '''
    
    error_list = []
    
    #Finding intersection between nodes
    clause_node_list = []
    for i, row in df.iterrows():
        refs = [int(r) for r in row.refs.split()]
        clause_node_list += list(set([L.u(n, 'clause')[0] for n in refs]))
        
    #Intersections are calculated by counting the frequency of unique clauses. If a clause appears more than once, there is
    #an intersection
    counter = collections.Counter(clause_node_list)
    intersection = [n for n in counter if counter[n] > 1]
         
    edges = []
    
    if intersection:
        
        for cl in intersection: #Looping over clauses with intersecting actors
            
            clause_inventory = []
            pred = False
            
            for ph in L.d(cl, 'phrase'):
                ph_info = {}
                sfx_info = {} #Directory for object suffixes
                
                rank = Agency(ph, colname, ranks_df)
                
                #Get verb gloss if Predicate
                if F.function.v(ph) in {'Pred','PreS','PreO','PtcO','PreC'}:
                    pred = True
                    
                    #Finding verb gloss:
                    for w in L.d(ph, 'word'):
                        if F.sp.v(w) == 'verb':
                            pred_gloss, pred_lex = F.gloss.v(L.u(w, 'lex')[0]), F.lex.v(w)
                
                #If the phrase is annotated with a rank (agency), it is fetched.
                if rank or rank == 0:
                    
                    ph_info['ref'] = ph
                    ph_info['function'] = F.function.v(ph)
                    ph_info['rank'] = rank
                    
                    clause_inventory.append(ph_info)
                    
                #If object suffix, the suffix info is stored separately and added to the clause inventory
                if F.function.v(ph) in {'PreO','PtcO'}:
                    for w in L.d(ph, 'word'):
                        if F.sp.v(w) == 'verb' and (Agency(w, colname, ranks_df) or Agency(w, colname, ranks_df) == 0):
                            sfx_info['ref'] = w
                            sfx_info['function'] = F.function.v(ph)
                            sfx_info['rank'] = Agency(w, colname, ranks_df)
                                
                            clause_inventory.append(sfx_info)
            
            if pred == True and pred_lex!= 'HJH[' and len(clause_inventory) > 1:
                ranked = sorted(clause_inventory, key=itemgetter('rank'), reverse = True)                    
                    
                #Getting Actor and labels
                Actor_ref = ranked[0]['ref']
                Actor_rank = ranked[0]['rank']
                Actors = Actor(Actor_ref, df=df) #A list of Actors
                
                if Actors == 'error':
                    error_list.append((cl, Actor_ref))
                        
                #Creating edges from Actor to Undergoer(s)
                for Undergoer in ranked[1:]:
                    Undergoer_ref = Undergoer['ref']
                    Undergoer_rank = Undergoer['rank']
                    Undergoers = Actor(Undergoer_ref, df=df)
                    
                    if Undergoers == 'error':
                        error_list.append((cl, Undergoer_ref))
                    
                    if (Actors and Undergoers) and (Undergoers != 'error') and (Actors != 'error'):
                        for A in Actors:
                            for U in Undergoers:
                                
                                if mode == 'one-mode':
                                    edge = (A, Actor_ref, Actor_rank, U, Undergoer_ref, Undergoer_rank, pred_gloss, cl)
                                    edges.append(edge)
                                elif mode == 'two-mode':
                                    Actor_edge = (A, Actor_ref, Actor_rank, pred_gloss, cl)
                                    Undergoer_edge = (pred_gloss, U, Undergoer_ref, Undergoer_rank, cl)
                                    edges.append(Actor_edge), edges.append(Undergoer_edge)
                                else:
                                    print("You need to specify mode")
                                
        return edges, error_list

Two models are created to account for two versions of the agency data. The 'old' data does not account negations in the clause, while the 'new' data involves a recalculation of the agency (NB: the recalculation is done in another notebook)

In [19]:
old_edges = createEdges(colname='rank',df=df, mode='one-mode')
print(len(old_edges[0]))

#With new ranks because of negatives (e.g. Agent -> Frustrative)
new_edges = createEdges(colname='new_rank',df=df,mode='one-mode')
482

Explore errors:

In [ ]:
errors = old_edges[1]
In [ ]:
for e in errors:
    A.pretty(e[0], highlights={e[1]:'gold'})

Both errors concern adverbial phrases, both referring to a location, so they are not important.

We will remove edges for which both the Actor and Undergoer i 0 (Neutral) in Agency. In these cases, there is no interaction so those relations are not important:

In [20]:
def removeNeutral(edge_list):
    upd_edge_list = []
    
    for e in edge_list[0]:
        Actor_rank = e[2]
        Undergoer_rank = e[5]
        
        if Actor_rank == 0 and Undergoer_rank == 0:
            continue
        else:
            upd_edge_list.append(e)
            
    return upd_edge_list
            
old_edges = removeNeutral(old_edges)
new_edges = removeNeutral(new_edges)

4. Validation and export of the network model

4a. Validation

Before the final export the edges need review. Several issues need validation:

  • Are all relevant clauses included?
  • Are the participants annotated correctly?
  • Are the roles annotated correctly?

The review is carried out manually but assisted by an interface and colorcoding. 'Green' signals that the clause is included in the network, 'salmon' signals absence.

In [ ]:
first_verse = T.nodeFromSection(('Leviticus',17,1))
last_verse = T.nodeFromSection(('Leviticus',26,46))

clauses = range(L.d(first_verse, 'clause')[0], L.d(last_verse, 'clause')[0]+2)
verbal_clauses = []
for cl in clauses:
    pred = False
    for ph in L.d(cl, 'phrase'):
        if F.function.v(ph) in {'Pred','PreS','PreO','PtcO','PreC'}:
            pred = True
            for w in L.d(ph, 'word'):
                if F.sp.v(w) == 'verb' and F.lex.v(w) != 'HJH[':
                    verbal_clauses.append(cl)

print(f'Number of clauses to review: {len(verbal_clauses)}')
In [ ]:
def validate(clauses, edges, n):
    print(f'Nr {n}: {clauses[n]}')
    
    df = pd.DataFrame(edges)
    edge_clauses = list(df[7])
    
    if clauses[n] in edge_clauses:
        subset = df[df[7] == clauses[n]]
        
        for i, row in subset.iterrows():
            print(f'Actor: {row[0]} - Agency: {row[2]}')
            print(f'Undergoer: {row[3]} - Agency: {row[5]}\n')
        
        A.pretty(clauses[n], highlights={clauses[n]:'lightgreen'})
    
    else:
        A.pretty(clauses[n], highlights={clauses[n]:'salmon'})
In [ ]:
n=0
In [ ]:
validate(verbal_clauses, old_edges, n)
n+=1

-----Update: All corrections made -----

Lev 17

  • BN >HRN added to participant: Need to be listed as part-whole relations across the entire text
  • 'MN QRB/ <M/ ->JC >JC' added to Nodes
  • KHN added to roles
  • 'F<JR=' added to Nodes
  • 'B TWK/ -BJT JFR>L' and 'B TWK/ -BJT JFR>L#2' added to Nodes
  • L KM added to Roles

Lev 18

  • 'CH >B -2ms' added to Nodes
  • 'CH >X -2ms' added to Nodes
  • '<RWH/ ->CH#2' corrected in Nodes
  • 'MN QRB/ <M/ -NPC' added in Nodes
  • 'L PNH/ -BN JFR>L' added in Nodes

Lev 19

  • '>B >JC' and >M >JC added to participants: Need to be listed as part-whole relations across the entire text
  • '<NJ' added to participants
  • 'B CM/ -JHWH' and '>T CM/ >LHJM/ -2ms' added to Nodes
  • 'XRC=/' added to Nodes
  • 'PNH/ DL/' and 'PNH/ GDWL/' added to Nodes
  • '>T BN/ <M/ -2ms' changed in collocations
  • Changing Aktionsart of QWM (4_Export_Aktionsart)
  • 'PNH/ ZQN/' added to Nodes

Lev 20

  • '>T CM/ QDC/ -JHWH' added to Nodes
  • '>JC' changed role
  • '>T >CH/ >JC/' changed role
  • 'MN QRB/ <M/ ->JC', 'MN QRB/ <M/ -KL', 'MN QRB/ <M/ -NPC', and 'MN QRB/ <M/ -CNJM -' added to Nodes

Lev 21

  • Passive corrected in 2c_Instigation
  • MCH changed in participants
  • '>CH_2' added in participants
  • BN JFR>L added in participants: Need to be listed as part-whole relations across the entire text

Lev 22

  • NPC_2 added in participants (Aaron's offspring)
  • NPC_3 added in participants (A chattel-slave)
  • Make sure that the compound reference ">HRN BN >HRN" is only deleted if the references have been succesfully distributed to either >HRN or BN >HRN
  • BN JFR>L changed in participants
  • Hypernyms accross the text: >JC (top-level) refers both to GR and a native
  • '>JC#2' added to Nodes: Refers to "any man" within the household of the priest

Lev 23

  • Hypernyms accross the text: 'L H <NJ/ W L H GR/' refers to the poor and the stranger. If this hypernym is removed before the reference are distributed to the involved participants, the references will be missing.

Lev 24

  • BN >CH changed in Affectedness
  • 3mp removed from Participants (one instance)
  • JHWH changed in Affectedness

Lev 25

  • When '<BD -2ms>MH -2ms' is removed (because it is a hypernym) the participants are missing. Hypernyms need to be constructed on top-level before removal.
  • Skip clauses with HJH? They are not interactions
  • 'JD ->JC', 'JD ->X -2ms' and 'JD GR TWCB' added to Nodes (as synonyms)
  • '>X -2ms' changed in Affectedness
  • GR TWCB#2 has been changed in Participants to distinguish "your brother" from "foreigners" although they are sometimes given the same label.

Lev 26

  • NPL changed in Akstionsart to stative
  • XMC changed in Nodes from XMC/
  • PNH changed in Affectedness
  • 'L PNH/ >JB[ -<M' changed in Nodes

4b. Export

Export nodes

In [21]:
new_df = pd.DataFrame(new_edges)
In [22]:
old_df = pd.DataFrame(old_edges)
old_df.insert(3, 'new_rank_Actor', new_df.iloc[:,2])
old_df.insert(7, 'new_rank_Undergoer', new_df.iloc[:,5])
old_df.head()
Out[22]:
0 1 2 new_rank_Actor 3 4 5 new_rank_Undergoer 6 7
0 BN >HRN 690343 5 5 JHWH 690347 0 0 swing 440323
1 JHWH 690383 5 5 MCH= 690384 -1 -1 speak 440335
2 BN JFR>L 690397 5 5 JHWH 690399 -1 -1 approach 440341
3 JHWH 690402 5 5 MCH= 690403 -1 -1 speak 440342
4 BN JFR>L 690415 5 5 JHWH 690417 -1 -1 approach 440347

The labels (generated from the ETCBC-transliteration) will be replaced more readable ones:

In [14]:
label_gloss = {'>CH BN -2ms': 'daughter-in-law',
               '>DM': 'human_being',
               'GR': 'sojourner',
               '>CH#2': 'woman_in_menstruation',
               '>X -2ms': 'brother',
               'BFR/ BN/ -<M': 'children',
               '>T BT/ BN/ ->CH W >T BT/ BT/ ->CH': 'granddaughter_of_woman',
               'MLK=': 'idols',
               'NPC#3': 'slave',
               '<M': 'foreign_nations',
               '>T >CH/ >JC/': "fellow's_wife",
               'ZR': 'lay-person',
               '>B -2ms': 'father',
               'JHWH': 'YHWH',
               '2mp_sfx': '2mpl',
               '>HRN': 'Aaron',
               'MN >JC/ ->CH': 'husband',
               'DWDH -2ms': 'aunt-in-law',
               '>CH >M ->CH': 'woman_and_her_mother',
               'RDP': 'no-one',
               'BN >CH': 'blasphemer',
               'XRC=/': 'deaf',
               'BN JFR>L': 'Israelites',
               'C>R >B -2ms': 'aunt',
               'KL': 'group_of_people',
               '<RWH/ -<RWH -2ms': 'granddaughter',
               'PNH/ ZQN/': 'elderly',
               'BTWLH/': 'virgin',
               '>JC >JC': 'an_Israelite',
               'BN ->X -2ms': 'son_of_brother',
               'QNH': 'purchaser',
               '>JC >CH': 'man/woman',
               '<RWH/ >CH/ W BT/ ->CH': 'woman_and_her_daughter',
               '3mp': 'witnesses',
               '>L MCPXT/ ->JC': 'clan',
               'BT >B -2msBT >M -2ms': 'sister',
               'PNH/ GDWL/': 'rich',
               '>XD': "brother's_brother",
               '>T== ZKR=/': 'male',
               '2ms': '2msg',
               '>XWT ->CH': 'sister_of_woman',
               'BN TWCB': 'sons_of_sojourners',
               '>M -2ms': 'mother',
               'L >JC/': 'man',
               'ZR< ->JC': 'offspring',
               'PNH/ DL/': 'poor',
               'L PNH/ <WR/': 'blind',
               '>CH': 'woman',
               '>CH >B -2ms': "father's_wife",
               'MCH=': 'Moses',
               'BN >HRN': "Aaron's_sons",
               'BT -2ms': 'daughter',
               'CPXH': 'handmaid',
               'C>R -HW>': 'relative',
               '>LMNH GRC XLL': 'widowed/expelled/defiled_woman',
               'HM': 'remnants',
               '>T PGR/ -<M': 'corpse',
               'DWD ->X -2ms': "brother's_uncle",
               'B <M/ -2ms': 'kinsmen'
}
In [24]:
edges_df = old_df

Source = []
Target = []

for n, row in edges_df.iterrows():
    source = row[0]
    target = row[3]
    
    Source.append(label_gloss[source])
    Target.append(label_gloss[target])
    
edges_df.insert(1, 'Source', Source)
edges_df.insert(6, 'Target', Target)
In [25]:
edges_df
Out[25]:
0 Source 1 2 new_rank_Actor Target 3 4 5 new_rank_Undergoer 6 7
0 BN >HRN Aaron's_sons 690343 5 5 YHWH JHWH 690347 0 0 swing 440323
1 JHWH YHWH 690383 5 5 Moses MCH= 690384 -1 -1 speak 440335
2 BN JFR>L Israelites 690397 5 5 YHWH JHWH 690399 -1 -1 approach 440341
3 JHWH YHWH 690402 5 5 Moses MCH= 690403 -1 -1 speak 440342
4 BN JFR>L Israelites 690415 5 5 YHWH JHWH 690417 -1 -1 approach 440347
... ... ... ... ... ... ... ... ... ... ... ... ...
472 DWD ->X -2ms brother's_uncle 691326 5 5 brother >X -2ms 68032 -1 -1 redeem 440637
473 L >JC/ man 689041 0 0 handmaid CPXH 689040 -2 -2 spend autumn 439885
474 MN >JC/ ->CH husband 689652 5 5 widowed/expelled/defiled_woman >LMNH GRC XLL 689651 -2 -2 drive out 440088
475 3mp witnesses 690660 5 5 blasphemer BN >CH 66980 -2 -2 settle 440424
476 3mp witnesses 690675 5 5 blasphemer BN >CH 690677 0 0 support 440429

477 rows × 12 columns

The weight of the ties between the participants is defined as the difference between Actor and Undergoer Rank. We create time stamps to include original rank and new rank (new rank takes negations into account):

In [26]:
old_weight = (edges_df[2]-edges_df[5])**2
new_weight = (edges_df['new_rank_Actor']-edges_df['new_rank_Undergoer'])**2

#Insert Weight: calculated as the difference between the Actor rank and the Undergoer rank
edges_df.insert(12, 'old_weight', old_weight)
edges_df.insert(13, 'new_weight', new_weight)
In [27]:
edges_df.head()
Out[27]:
0 Source 1 2 new_rank_Actor Target 3 4 5 new_rank_Undergoer 6 7 old_weight new_weight
0 BN >HRN Aaron's_sons 690343 5 5 YHWH JHWH 690347 0 0 swing 440323 25 25
1 JHWH YHWH 690383 5 5 Moses MCH= 690384 -1 -1 speak 440335 36 36
2 BN JFR>L Israelites 690397 5 5 YHWH JHWH 690399 -1 -1 approach 440341 36 36
3 JHWH YHWH 690402 5 5 Moses MCH= 690403 -1 -1 speak 440342 36 36
4 BN JFR>L Israelites 690415 5 5 YHWH JHWH 690417 -1 -1 approach 440347 36 36

We produce two files, one for dynamic networks and one for static networks:

In [28]:
static = edges_df[['Source','new_rank_Actor','Target','new_rank_Undergoer',6,'new_weight',7]]
static.columns = ['Source','Source_agency','Target','Target_agency','Label','Weight','Clause']

#Export
static.to_excel('Lev17-26.edges.Static.xlsx', index=None)

4.c Compare with older datasets

For the sake of consistency, it is possible to easily compare the changes that are made in new models in comparison to old ones. This helps to update the data without going through a manual validation.

In [ ]:
data_old = pd.read_excel('Lev17-26.edges.Static_Old.xlsx')
data_new = pd.read_excel('Lev17-26.edges.Static.xlsx')
In [ ]:
data_new.head()
In [ ]:
len(data_new)-len(data_old)
i. Check if edges have been removed or added
In [ ]:
review_edges1 = []
review_edges2 = []

for n, row in data_new.iterrows():
    if row.Clause in list(data_old.Clause):
        subset_old = data_old[data_old.Clause == row.Clause]
        match = False
        for n1, row1 in subset_old.iterrows():
            if row1.Source_label == row.Source_label and row1.Target_label == row.Target_label and row1.Label == row.Label:
                match = True
        if not match:
            review_edges1.append(row.Clause)        
    else:
        review_edges1.append(row.Clause) #Clause is added in new dataset
        
for n, row in data_old.iterrows():
    if row.Clause in list(data_old.Clause):
        subset_new = data_new[data_new.Clause == row.Clause]
        match = False
        for n1, row1 in subset_new.iterrows():
            if row1.Source_label == row.Source_label and row1.Target_label == row.Target_label and row1.Label == row.Label:
                match = True
        if not match:
            review_edges2.append(row.Clause)        
    else:
        review_edges2.append(row.Clause) #Clause is added in new dataset
In [ ]:
#review_edges1
#review_edges2
ii. Check if identical edges have same weight
In [ ]:
review_edges3 = []

for n, row in data_new.iterrows():
    if row.Clause in list(data_old.Clause):
        subset_old = data_old[data_old.Clause == row.Clause]
        match = False
        for n1, row1 in subset_old.iterrows():
            if row1.Source_label == row.Source_label and row1.Target_label == row.Target_label and row1.Label == row.Label and row1.Weight == row.Weight:
                match = True
        if not match:
            review_edges3.append(row.Clause)        
    else:
        review_edges3.append(row.Clause) #Clause is added in new dataset
In [ ]:
review_edges3 = [e for e in review_edges3 if e not in review_edges1 and e not in review_edges2]
In [ ]:
review_edges3

5. Social Network Analysis

The network model can now be explored with SNA-tools, in this case NetworkX.

5.a Visualization

In [ ]:
data = pd.read_excel('Lev17-26.edges.Static.xlsx')
data.head()
In [ ]:
G = nx.MultiGraph()

for n, row in data.iterrows():  
    G.add_edge(row.Source_label, row.Target_label)
    
pos = { i : (random.random(), random.random()) for i in G.nodes()}
l = forceatlas2.forceatlas2_networkx_layout(G, pos, niter=2000, gravity=30, scalingRatio=2.0)
In [ ]:
weight = collections.Counter(G.edges())

for u, v, d in G.edges(data=True):
    d['weight'] = weight[u, v]
In [ ]:
plt.figure(figsize = (15,15))

nx.draw_networkx(G, l, node_color='violet', node_size=[n[1]*10 for n in G.degree()], 
                 edge_color='grey', width=[d['weight']/3 for _, _, d in G.edges(data=True)])

plt.axis('off')
plt.margins(x=0.1, y=0.1)

plt.savefig('screenshots/Leviticus_SNA.png', dpi=500)
plt.show()

Number of nodes and edges:

In [ ]:
print(f'Nodes: {len(G.nodes())}\nEdges: {len(G.edges())}')

Having created the edges and computed a multiple directed graph (MultiDiGraph), we can now explore the resulting network. We will begin with a general inspection:

5.b Cohesion and network density

One of the simplest measures of cohession ("knittedness") is probably density. Density is simply the number of ties in the network proportional to the possible number of ties.

In [ ]:
nx.density(G)

Density is sensitive to the size of the network, and large networks tend to have lower density than small networks, simply because it is more realistic for a member of a small network to be connected with most of the remaining participants than in a large network.

Therefore, another approach is average degree:

In [ ]:
degree = G.degree()
sum_degree = sum(dict(degree).values())
print(f'Average degree: {sum_degree/len(G.nodes())}')
In [ ]:
G = nx.MultiDiGraph()

for n, row in data.iterrows():  
    G.add_edge(row.Source_label, row.Target_label)
In [ ]:
outdegree_sequence = collections.Counter(sorted([d for n, d in G.out_degree()], reverse=True))
indegree_sequence = collections.Counter(sorted([d for n, d in G.in_degree()], reverse=True))

outdegree_df = pd.DataFrame(outdegree_sequence, index=[0]).T
indegree_df = pd.DataFrame([indegree_sequence]).T
In [ ]:
degree_df = pd.concat([indegree_df, outdegree_df], axis=1, sort=False)
degree_df.columns = ['indegree','outdegree']
degree_df
In [ ]:
fig, ax = plt.subplots(figsize=(15,7))

plt.bar(degree_df.index, degree_df.indegree, width=0.33)
plt.bar(degree_df.index+0.33, degree_df.outdegree, color='tomato', width=0.33)

ax.legend(labels=['indegree', 'outdegree'], fontsize=14)
plt.ylabel("Count", size=14)
plt.xlabel("Degree", size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.show()

Cumulative:

In [ ]:
len(G.nodes())
In [ ]:
indegree_cum = [n/len(G.nodes())*100 for n in np.cumsum(degree_df.fillna(0).indegree)]
outdegree_cum = [n/len(G.nodes())*100 for n in np.cumsum(degree_df.fillna(0).outdegree)]
degree_df.insert(2, "indegree_cum (%)", indegree_cum)
degree_df.insert(3, "outdegree_cum (%)", outdegree_cum)
In [ ]:
degree_df

Most connected participants:

In [ ]:
top_degree = sorted(dict(degree).items(), key=itemgetter(1), reverse=True)

A cummulative view:

In [ ]:
cum_degree = pd.DataFrame(top_degree)
cum_degree.columns = ['participant','degree']

degree_cum = [n/(len((G.edges()))*2)*100 for n in np.cumsum(cum_degree.degree)]
cum_degree.insert(2, "degree_cum (%)", degree_cum)
In [ ]:
cum_degree.head(10)

Updated graph:

In [ ]:
fig, ax1 = plt.subplots(figsize=(15,7))
ax2 = ax1.twinx()

ax1.bar(degree_df.index, degree_df.indegree, width=0.33)
ax1.bar(degree_df.index+0.33, degree_df.outdegree, color='tomato', width=0.33)

ax2.plot(degree_df.index, degree_df['indegree_cum (%)'], linestyle='--', alpha=0.5)
ax2.plot(degree_df.index, degree_df['outdegree_cum (%)'], linestyle='--', alpha=0.5)

ax1.legend(frameon=1, labels=['indegree', 'outdegree'], fontsize=14, facecolor='white', framealpha=1)
ax1.set_ylabel("Count", size=14)
ax2.set_ylabel("Cumulative %", size=14)
ax1.set_xlabel("Degree", size=14)
plt.xticks(size=12)
plt.yticks(size=12)
plt.show()

Inspect values:

In [ ]:
G.degree()
In [ ]:
G.out_degree()

Degree proportion of selected participants:

In [ ]:
sel_part = sum(dict(G.degree(['YHWH', 'Moses','Israelites','sojourner','2ms','an_Israelite'])).values())

print(f'{round(sel_part/sum(dict(G.degree()).values())*100, 2)}%')

5.c Reciprocity

Reciprocity concerns whether an interaction from one actor to another is returned, or whether the relation is one-sided. A simple measure of reciprocity is to count the number of reciprocal ties and divide these by the total number of ties. For this analysis, we are not interested in the weights of the edges but simply the binary value (connected or not).

In [ ]:
digraph = nx.DiGraph()

for n, row in data.iterrows():
    digraph.add_edge(row.Source_label, row.Target_label)
In [ ]:
nx.reciprocity(digraph)
In [ ]:
reci_df = pd.DataFrame([nx.reciprocity(digraph, digraph.nodes())]).T.sort_values(by=0, ascending=False)
In [ ]:
fig, ax = plt.subplots(figsize=(15,5))

plt.bar(reci_df.index, reci_df[0], width=0.33)
plt.ylabel("fraction", size=14)
plt.xticks(size=11, rotation=45, ha='right')
plt.yticks(size=12)

plt.show()

5.d Centrality

We use 4 measures for measuring the centrality of individual nodes. That will give an image of core and periphery of the network. The four measures are Degree, Closeness, Betweenness, and Eigenvector.

In [ ]:
indegree = nx.in_degree_centrality(digraph)
outdegree = nx.out_degree_centrality(digraph)
betweenness = nx.betweenness_centrality(digraph)
pagerank = nx.pagerank(digraph)
In [ ]:
centrality = pd.DataFrame([indegree, outdegree, betweenness, pagerank]).T
centrality.columns = ['indegree','outdegree','betweeness','pagerank']
centrality

Top five scores for centrality measures:

In [ ]:
def top(measure, df=centrality):
    return df.sort_values(by=measure, ascending=False)[measure][:10]

fig, (ax1, ax2, ax3, ax4) = plt.subplots(1, 4, figsize=(15,5), sharey=True)

ax1.bar(top('outdegree').index, top('outdegree'))
ax1.set_title("Outdegree", size=16)
ax2.bar(top('indegree').index, top('indegree'))
ax2.set_title("Indegree", size=16)
ax3.bar(top('betweeness').index, top('betweeness'))
ax3.set_title("Betweenness", size=16)
ax4.bar(top('pagerank').index, top('pagerank'))
ax4.set_title("PageRank", size=16)

for ax in fig.axes:
    plt.sca(ax)
    plt.xticks(rotation=45, ha='right', size=12)

plt.show()