#!/usr/bin/env python
# coding: utf-8

# # 8. Create a short complete analysis

# Until now we have only seen pieces of code to do some specific segmentation of images. Typically however, one is going to have a complete analysis, including image processing and some further data analysis.
# 
# Here we are going to come back to an earlier dataset where *nuclei* appeared as circles. That dataset was a time-lapse, and we might be interested in knowing how those *nuclei* move over time. So we will have to analyze images at every time-point, find the position of the *nuclei*, track them and measure the distance traveled.

# In[1]:


#importing packages
import numpy as np
import matplotlib.pyplot as plt
plt.gray();

import skimage.io

#import your function
from course_functions import detect_nuclei, random_cmap
cmap = random_cmap()


# ## 8.1 Remembering previous work

# Let's remember what we did in previous chapters. We opened the tif dataset, selected a specific plane to look at and segmented the *nuclei*. We use here again the AICS reader as we will need to access specific time points instead of loading the entire dataset (see [03-Image_import.ipynb](03-Image_import.ipynb) for details).

# In[2]:


from aicsimageio import AICSImage
img = AICSImage("../Data/30567/30567.tif")


# In[3]:


# load the image to process
image = img.get_image_data("YX", C=0, T=0, Z=3) 

# create your mask
nuclei = detect_nuclei(image)

#plot
plt.figure(figsize=(10,10))
plt.imshow(image, cmap = 'gray')
plt.imshow(nuclei, cmap = cmap)
plt.show()


# ## 8.2 Processing a time-lapse

# Thanks to the metadata we also remember the number of time-points and z-stacks that are available:

# In[4]:


img.shape


# In[5]:


img.dims


# 72 time points, 2 colors, 5 planes per color.
# 
# The *nuclei* are going to move a bit in Z (perpendicular to the image) over time, so it will be more accurate to segment a projection of the entire stack. Again we will use the AICS importer to load a complete stack and project it:

# In[6]:


image_stack = img.get_image_data('ZYX', C=0, T=0)

plt.imshow(np.mean(image_stack, axis = 0));


# Now we can 1. import each time point, 2. project the stack, 3. segment the image using our previously developed routine, 4. use region properties on the segmented image to locate all objects. Let's try with a single time-point first:

# In[7]:


#choose a time
time = 10

#load the stack and segment it
image_stack = img.get_image_data('ZYX', C=0, T=time)
image = np.max(image_stack, axis = 0)
nuclei = detect_nuclei(image)

#find position of nuclei
nuclei_label = skimage.morphology.label(nuclei)
regions = skimage.measure.regionprops(nuclei_label)
centroids = np.array([x.centroid for x in regions])

#plto the result
plt.figure(figsize=(10,10))
plt.imshow(image, cmap = 'gray')
plt.imshow(nuclei, cmap = cmap,vmin = 0,vmax = 1,alpha = 0.6)
plt.plot(centroids[:,1], centroids[:,0],'o');


# So now we can repeat the same operation for multiple time points and create a list of detected coordinates:

# In[8]:


centroids_time = []
for time in range(10):

    #load the stack and segment it
    image_stack = img.get_image_data('ZYX', C=0, T=time)
    image = np.max(image_stack, axis = 0)
    nuclei = nuclei = detect_nuclei(image)

    #find position of nuclei
    nuclei_label = skimage.morphology.label(nuclei)
    regions = skimage.measure.regionprops(nuclei_label)
    centroids = np.array([x.centroid for x in regions])
    
    centroids_time.append(centroids)


# Let's plot all those centroids for all time points

# In[9]:


for x in centroids_time:
    plt.plot(x[:,1],x[:,0],'o')


# We definitely see tracks corresponding to single nuclei here. How are we going to track them?

# ## 8.3 Tracking trajectories

# One of the great strengths of Python is the very large amount of available packages for all sorts of tasks. For example, if we Google "python tracking", one of the first hits if for the package [trackpy](http://soft-matter.github.io/trackpy/v0.4.2/) which is originally designed to track particle diffusion but can be repurposed for other uses.
# 
# Browsing through the documentation and following a quick [walkthrough](http://soft-matter.github.io/trackpy/v0.4.2/tutorial/walkthrough.html), we see that we need the function ```link()``` to perform tracking. That function takes a dataframe as input. A Dataframe is a tabular format implemented by the package Pandas. We are not going into details here, but know that just like Numpy is a foundation of numerical computing, Pandas is a foundation of data science in the Python ecosystem. Here, just think of a Dataframe as a 2D numpy array where each column and row has in addition a label.

# In[10]:


import trackpy
import pandas as pd


# Let's look at the help for the link function:

# In[11]:


help(trackpy.link)


# We see that we have a lot of options, but initially we only have two obligatory parameters: a dataframe ```f``` with columns for x,y and time (frame) and a ```search_range``` defining how distant particles can be in a trajectory.

# ### 8.3.1 The position dataframe

# There are different ways to create a Pandas dataframe. One of the simplest is just to transform a 2D Numpy array and assign names to the columns. Let's try with a single time-point. We had aggregated our detected positions in the list ```centroids_time```. Let's look at the first ten rows of time==0:

# In[12]:


time = 0
centroids_time[time][0:10,:]


# So we see above the x and y positions, and we need to add a column for time (in this case time == 0). We can just concatenate the above array with a column filled with zeros. We can do that explicitly:

# In[13]:


# create time column
time_column = time * np.ones((centroids_time[time].shape[0],1))

# concatenate with positions array
pos_time = np.concatenate([centroids_time[time], time_column],axis = 1)


# We can do that of course for all arrays of the ```centroids_time``` list for example with a comprehension list:

# In[14]:


centroids_time_conc = [np.concatenate([x, time * np.ones((x.shape[0],1))],axis = 1)
                       for time, x in enumerate(centroids_time)]
centroids_time_conc[6][0:10,:]


# Now we have a list of Nx3 dimensional arrays whose columns represent x,y,frame. We can now concatenate all these arrays into one large array whose columns are still x,y,frame:

# In[15]:


centroids_complete = np.concatenate(centroids_time_conc)


# Finally we need to transform this Numpy array into a Pandas dataframe acceptable for trackpy. As mentioned before we can simply turn a Numpy array into a Dataframe by calling:

# In[16]:


pd.DataFrame(centroids_complete)


# We see that the output is nicely formatted as a table. We also have bold numbers as columns and rows labels. Remember now that we need the specific labels x,y,frame fro trackpy. There are many options, when creating a Dataframe, one of them being to give a name to columns like this:

# In[17]:


coords_dataframe = pd.DataFrame(centroids_complete, columns=('x','y','frame'))
coords_dataframe


# That's it! We now have an appropriately formated dataframe to pass to our ```link``` function. Just a few more words about Pandas dataframes: they work very similarly to other Python objects. For example to recover a specific column we can use a dictionary-like syntax:

# In[18]:


coords_dataframe['frame']


# The object above is only one-dimensional and called a Pandas Series. Like with Numpy arrays, we can create boolean Series by doing comparisons e.g.:

# In[19]:


coords_dataframe['frame'] == 9


# Finally, we can use Numpy functions on Pandas objects. For example to get a maximum value of a column:

# In[20]:


np.max(coords_dataframe['x'])


# ### 8.3.2 Tracking

# As mentioned before, the only additional information we need for tracking is the allowed distance between linked objects between frames. This has to be adjusted to every case, here we chose 20px. And we can finally do the tracking:

# In[21]:


tracks = trackpy.link(coords_dataframe, search_range=20)


# In[22]:


tracks.head()


# The output is a new dataframe again with columns *x,y,frame* as well as an additional one called *particle*. Each row represents one object found at position *x*,*y* and time *frame*. The new column *particle* is an index that tells us to which track each particle belongs. Let's try to recover one track. For that we are using indexing. It works the same way as for a Numpy array: we create a boolean Series with a comparison (in Numpy e.g. ```myarray > threshold```) and then use it to index the same Dataframe (in Numpy e.g. ```myarray[myarray > threshold]```. For example we want to recover track number 1:
# - first we create the boolean Series:

# In[23]:


tracks['particle'] == 1


# and then use it for indexing:

# In[24]:


tracks[tracks['particle']==1]


# The dataframe above represent the complete track #1. We can do that for all tracks and plot the *x,y* positions for each track:

# In[25]:


plt.figure(figsize=(10,10))
for particle_id in range(tracks['particle'].max()):
    plt.plot(tracks[tracks.particle==particle_id].y,
             tracks[tracks.particle==particle_id].x,'o-')
plt.show()


# We see that most tracks seem reasonable. We had some remaining false positive objects that appear intermittently and whose tracks are of course shorter. We can further use trackpy to clean-up our result, for example by removing short tracks:

# In[26]:


clean_tracks = trackpy.filter_stubs(tracks,threshold=9)


# In[27]:


plt.figure(figsize=(10,10))
for particle_id in range(tracks['particle'].max()):
    plt.plot(clean_tracks[clean_tracks.particle==particle_id].y,
             clean_tracks[clean_tracks.particle==particle_id].x,'o-')
plt.show()


# The visual impression is that we have a gradient in the magnitude of displacements. Let's see if we can quantify this effect.

# ## 8.4 Analysing the data

# Such a data analysis would be easily done with the tracks Dataframe. However, since this course does not cover Pandas in details, we will transform the Dataframe into a Numpy array and do the analysis on the latter:

# In[28]:


tracks_np = np.array(clean_tracks)
tracks_np


# Now that we are back to a regular array, we know for example how to extract all particles from a given trajectory. Trajectory IDs are in the 4th columns so e.g. the ID #1 is:

# In[29]:


track1 = tracks_np[tracks_np[:,3] == 1]
track1


# We now want to measure the total displacement from time=0 to time=9. For that we recover the first and last time point *x,y* coordinates. We use here a complex indexing where we have a list of indices ```[0,-1]``` for the rows:

# In[30]:


track1[[0,-1],0:2]


# Then we subtract the first row (first time-point) from the second (last time-point):

# In[31]:


np.diff(track1[[0,-1],0:2], axis = 0)


# Finally we measure the length of this distance vector:

# In[32]:


np.linalg.norm(np.diff(track1[[0,-1],0:2], axis = 0))


# We can now execute this operation for all possible track IDs within a for loop and append the values to a list:

# In[33]:


max_dist = []
for x in np.unique(tracks_np[:,3]):
    track1 = tracks_np[tracks_np[:,3] == x]
    dist = np.linalg.norm(np.diff(track1[[0,-1],0:2], axis = 0))
    max_dist.append(dist)
max_dist = np.array(max_dist)


# In[34]:


max_dist


# Naturally we could also combine all these steps, including the for loop into a single comprehension list. Knowing each step above, you should be able to understand this:

# In[35]:


max_distances = np.array([np.linalg.norm(np.diff(tracks_np[tracks_np[:,3] == x][[0,-1],0:2], axis = 0)) 
 for x in np.unique(tracks_np[:,3])])
max_distances


# Note that while this expression holds into a single line, it is harder to understand than the for loop. One should always find the right balance between conciseness and readability, but it is often a question of taste.
# 
# To know whether we indeed have a gradient along the horizontal axis, we need now to measure the average position of each track. So for each track ID, we recover the position *y* (index 1) for all frames and calculate the mean:

# In[36]:


mean_pos = np.array([np.mean(tracks_np[tracks_np[:,3] == x,1]) for x in np.unique(tracks_np[:,3])])
mean_pos


# Now that we have the average *y* position of each track and the displacement between first and last point, we can plot this:

# In[37]:


fig,ax = plt.subplots(2,1, figsize=(5,10))
for particle_id in range(tracks['particle'].max()):
    ax[0].plot(clean_tracks[clean_tracks.particle==particle_id].y,
             clean_tracks[clean_tracks.particle==particle_id].x,'o-')
ax[0].set_title('2D nuclei positions over time')
ax[1].plot(mean_pos, max_dist,'o')
ax[1].set_title('Nuclei displacements vs. mean particle positions');


# And yes, we managed to clearly show that there is a gradient of displacement along the horizonal axis. So in summary we went from image import, to image segmentation, data extraction, and data analysis all in one notebook!