#!/usr/bin/env python
# coding: utf-8

# # scona
# 
# scona is a tool to perform network analysis over correlation networks of brain regions. 
# This tutorial will go through the basic functionality of scona, taking us from our inputs (a matrix of structural regional measures over subjects) to a report of local network measures for each brain region, and network level comparisons to a cohort of random graphs of the same degree. 

# In[2]:


import numpy as np
import networkx as nx
import scona as scn
import scona.datasets as datasets


# ### Importing data
# 
# A scona analysis starts with four inputs.
# * __regional_measures__
#     A pandas DataFrame with subjects as rows. The columns should include structural measures for each brain region, as well as any subject-wise covariates. 
# * __names__
#     A list of names of the brain regions. This will be used to specify which columns of the __regional_measures__ matrix to want to correlate over.
# * __covars__ _(optional)_ 
#     A list of your covariates. This will be used to specify which columns of __regional_measure__ you wish to correct for. 
# * __centroids__
#     A list of tuples representing the cartesian coordinates of brain regions. This list should be in the same order as the list of brain regions to accurately assign coordinates to regions. The coordinates are expected to obey the convention the the x=0 plane is the same plane that separates the left and right hemispheres of the brain. 

# In[3]:


# Read in sample data from the NSPN WhitakerVertes PNAS 2016 paper.
df, names, covars, centroids = datasets.NSPN_WhitakerVertes_PNAS2016.import_data()


# In[4]:


df.head()


# ### Create a correlation matrix
# We calculate residuals of the matrix df for the columns of names, correcting for the columns in covars.

# In[5]:


df_res = scn.create_residuals_df(df, names, covars)


# In[6]:


df_res


# Now we create a correlation matrix over the columns of df_res

# In[7]:


M = scn.create_corrmat(df_res, method='pearson')


# ## Create a weighted graph
# 
# A short sidenote on the BrainNetwork class: This is a very lightweight subclass of the [`Networkx.Graph`](https://networkx.github.io/documentation/stable/reference/classes/graph.html) class. This means that any methods you can use on a `Networkx.Graph` object can also be used on a `BrainNetwork` object, although the reverse is not true. We have added various methods which allow us to keep track of measures that have already been calculated, which, especially later on when one is dealing with 10^3 random graphs, saves a lot of time.  
# All scona measures are implemented in such a way that they can be used on a regular `Networkx.Graph` object. For example, instead of `G.threshold(10)` you can use `scn.threshold_graph(G, 10)`.  
# Also you can create a `BrainNetwork` from a `Networkx.Graph` `G`, using `scn.BrainNetwork(network=G)`

# Initialise a weighted graph `G` from the correlation matrix `M`. The `parcellation` and `centroids` arguments are used to label nodes with names and coordinates respectively. 

# In[8]:


G = scn.BrainNetwork(network=M, parcellation=names, centroids=centroids)


# ### Threshold to create a binary graph
# 
# We threshold G at cost 10 to create a binary graph with 10% as many edges as the complete graph G. Ordinarily when thresholding one takes the 10% of edges with the highest weight. In our case, because we want the resulting graph to be connected, we calculate a minimum spanning tree first. If you want to omit this step, you can pass the argument `mst=False` to `threshold`.
# The threshold method does not edit objects inplace

# In[9]:


H = G.threshold(10)


# ### Calculate nodal summary. 
# 
# `calculate_nodal_measures` will compute and record the following nodal measures 
# 
# * average_dist (if centroids available)
# * total_dist (if centroids available)
# * betweenness
# * closeness
# * clustering coefficient
# * degree
# * interhem (if centroids are available)
# * interhem_proportion (if centroids are available)
# * nodal partition
# * participation coefficient under partition calculated above
# * shortest_path_length

# `export_nodal_measure` returns nodal attributes in a DataFrame. Let's try it now.

# In[11]:


H.report_nodal_measures().head()


# Use `calculate_nodal_measures` to fill in a bunch of nodal measures

# In[12]:


H.calculate_nodal_measures()


# In[14]:


H.report_nodal_measures().head()


# We can also add measures as one might normally add nodal attributes to a networkx graph

# In[15]:


nx.set_node_attributes(H, name="hat", values={x: x**2 for x in H.nodes})


# These show up in our DataFrame too

# In[17]:


H.report_nodal_measures(columns=['name', 'degree', 'hat']).head()


# ### Calculate Global measures

# In[18]:


H.calculate_global_measures()


# In[20]:


H.rich_club();


# ## Create a GraphBundle
# 
# The `GraphBundle` object is the scona way to handle across network comparisons. What is it? Essentially it's a python dictionary with `BrainNetwork` objects as values. 

# In[21]:


brain_bundle = scn.GraphBundle([H], ['NSPN_cost=10'])


# This creates a dictionary-like object with BrainNetwork `H` keyed by `'NSPN_cost=10'`

# In[22]:


brain_bundle


# Now add a series of random_graphs created by edge swap randomisation of H (keyed by `'NSPN_cost=10'`)

# In[23]:


# Note that 10 is not usually a sufficient number of random graphs to do meaningful analysis,
# it is used here for time considerations
brain_bundle.create_random_graphs('NSPN_cost=10', 10)


# In[24]:


brain_bundle


# ### Report on a GraphBundle
# 
# The following method will calculate global measures ( if they have not already been calculated) for all of the graphs in `graph_bundle` and report the results in a DataFrame. We can do the same for rich club coefficients below.

# In[25]:


brain_bundle.report_global_measures()


# In[26]:


brain_bundle.report_rich_club()