#!/usr/bin/env python
# coding: utf-8

# # MotilA Advanced Batch Processing
# 
# This script demonstrates how to use the **MotilA** pipeline for batch analyzing microglial fine process motility in 4D/5D image stacks.
# 
# ### Overview
# - Scans and processes multiple registered 4D TIFF image stacks from multiple experimental folders.
# - Applies preprocessing steps such as projection, registration, and spectral unmixing.
# - Performs image enhancements like histogram equalization and filtering.
# - Segments microglia, applies thresholding, and quantifies motility.
# - Collects results from all processed stacks into summary tables for cohort analysis.
# 
# ### Workflow
# 1. **Batch processing**
#    - Iterates over multiple subject folders (`ID_list`) and searches for experiment folders (`project_tag`).
#    - Processes microglial motility for each dataset according to predefined settings.
#    - Saves analysis results per subject in a structured output folder.
# 2. **Batch collection**
#    - Gathers and combines results across all processed datasets.
#    - Saves consolidated results into a cohort-level output directory.
# 
# ### Usage
# - **Modify parameters:** Adjust paths, projection settings, thresholding methods, and filter settings.
# - **Run the script:** Execute the script to batch process and collect results.
# - **Check outputs:** Processed images, histograms, and motility metrics are saved in structured folders for further analysis and parameter tuning.
# 
# ### Dependencies
# - Requires **MotilA** to be installed and accessible (available in the GitHub repository).
# 
# ### Author
# Fabrizio Musacchio,  March 20, 2025

# ## Installation
# First, download the MotilA repository from GitHub. 
# 
# Then, create and activate a conda environment with the following packages:
# 
# ```bash
# conda create -n motila python=3.12 mamba -y
# conda activate motila
# mamba mamba install -y numpy scipy matplotlib scikit-image scikit-learn pandas tifffile zarr numcodecs openpyxl xlrd ipywidgets ipykernel ipympl
# ```
# 
# We have tested MotilA for Python 3.9 and higher. If you encounter any issues, please let us know.
# 
# **Don't forget to select the correct kernel in Jupyter Notebook (!)**

# ## 1. Import libraries and download the example data set
# First, we import MotilA and other required libraries:

# In[1]:


import sys
from pathlib import Path
sys.path.append(str(Path.cwd().parent))
from motila import motila as mt


# **Note**: `sys.path.append('../motila')` is used to add the MotilA directory to the system path – relative to the current working directory. If you execute this notebook from a different location, you may need to adjust the path accordingly.
# 
# You can verify the correct import by running the following cell:

# In[2]:


mt.hello_world()


# **Before you proceed, please make sure that you have downloaded the example data from Zenodo:** 
# 
# * [Gockel &  Nieves-Rivera (2025), doi: 10.5281/zenodo.15061566](https://zenodo.org/records/15061566)
# 
# Place the downloaded and extracted data set into the `example project` folder.

# ## 2. Define MotilA parameters
# Next, we define the parameters for the MotilA pipeline.
# 
# ### Input/Outut parameters
# Define the input/output path and folder and file tags for the batch processing:

# In[3]:


PROJECT_Path = "../example project/Data/"
                                          # define the path to the project folder; can be absolute or relative to the
                                          # location of this script

ID_list = ["ID240103_P17_1", "ID240321_P17_3"]
                                          # define the list of all IDs to be processed in PROJECT_Path; 
                                          # names must be exact names of the ID folders

project_tag = "TP000"                     # define the tag of the project (folder) to be analyzed;
                                          # all folders in the ID-folders containing this tag will be processed; 
                                          # can be just a part of the tag (will be searched for in the folder name)

reg_tif_file_folder = "registered"        # name of folder within the (found) project_tag-folder containing the 
                                          # registered tif files; must be exact
reg_tif_file_tag = "reg"                  # a Tif file containing this tag will be processed within the reg_tif_file_folder;
                                          # if multiple files containing this tag, folder will be skipped (!)

RESULTS_foldername = f"../motility_analysis/" 
                                          # define the folder name (not the full path!) where the results will be saved
                                          # within each project_tag-folder; can also be relative to the project_tag-folder
                                          # (e.g. "../motility_analysis/"); default destination will be inside the
                                          # reg_tif_file_folder folder

metadata_file = "metadata.xls"            # name of the metadata file in the project_tag-folder; must be exact
                                          # use template provided in the MotilA repository to create the metadata file


# **Note**: By placing an excel file (e.g., `metadata.xls`) in the `project_tag` folder for each animal ID folder (listed in `ID_list`), the following parameters set in this notebook will be overwritten by the parameters in the excel file: 
# 
# * `two_channel_default`: True/False
# * `MG_channel_default`: 0/1
# * `N_channel_default`: 0/1
# * `spectral_unmixing`: True/False
# * `projection_center_default`: integer
# 
# This allows for individual settings for each dataset.
# 
# The table below shows an example of the content of the `metadata.xls` file:
# 
# | Two Channel | Registration Channel | Registration Co-Channel | Microglia Channel | Neuron Channel | Spectral Unmixing | Projection Center 1 |
# | ----------- | -------------------- | ----------------------- | ----------------- | -------------- | ----------------- | ------------------- |
# | True        | 1                    | 0                       | 0                 | 1              | False             | 28                  |
# 
# A template for this excel files is provided in the *MotilA* repository. In this template, ignore the columns `Registration Channel`  and `Registration Co-Channel` as they are not used in this pipeline.
# 
# You can add several projection centers (`Projection Center 1`, `Projection Center 2`, etc.) to the excel file. The pipeline will then create a projection for each center along with the corresponding analysis results.

# ### Projection parameters
# MotilA will generate maximum intensity projections along the specified axes. For this, we need to define the projection center and ranges:

# In[3]:


# define projection settings:
projection_layers_default = 44 # define number of z-layers to project for motility analysis
projection_center_default = 23 # define the center slice of the projection; a sub-stack of +/- projection_layers will be projected;
                               # if metadata.xls is present in project_tag folder, this value is ignored and
                               # the value from the metadata.xls is used instead (in batch processing only!)


# ### Clear previous results?
# Define whether to clear the output folder before running the pipeline:

# In[4]:


# previous results settings:
clear_previous_results = True # set to True if all files in RESULTS_Path folder should be deleted before processing


# ### Thresholding parameters
# Define the thresholding method and parameters for segmenting microglia. 
# 
# As a **thresholding method** (`threshold_method`), you can choose between `otsu`, `li`, `isodata`, `mean`, `triangle`, `yen`, and `minimum`.
# 
# **`blob_pixel_threshold`** defines the minimum number of pixels for a blob to be considered a microglial cell.
# 
# With **`compare_all_threshold_methods`**, a plot is generated comparing all thresholding methods listed above to facilitate the selection of the best method.

# In[5]:


# thresholding settings:
threshold_method = "otsu"     # choose from: otsu, li, isodata, mean, triangle, yen, minimum
blob_pixel_threshold = 100    # define the threshold for the minimal pixel area of a blob during the segmentation
compare_all_threshold_methods = True # if True, all threshold methods will be compared and saved in the plot folder


# ### Image enhancement parameters
# Define the parameters for enhancing the images, such as histogram equalization and filtering.
# 
# With **`hist_equalization`** set to `True`, the pipeline will apply histogram equalization WITHIN each time (3D) stack. This enhances the contrast within each 3D stack.
# 
# With `hist_match` set to `True`, the pipeline will apply histogram matching BETWEEN the time (3D) stacks. This homogenizes the intensity distribution across the time stacks and acts as a bleaching correction.

# In[ ]:


# image enhancement settings:
hist_equalization = True              # enhance the histograms WITHIN EACH projected stack: True or False
hist_equalization_clip_limit = 0.05   # clip limit for the histogram equalization (default is 0.05);
                                      # the higher the value, the more intense the contrast enhancement, 
                                      # but also the more noise is amplified  
hist_equalization_kernel_size = None  # kernel size for the histogram equalization; 
                                      # None (default) for automatic, or use a tuple (x,y) for a fixed size;
                                      # when using a tuple, you can start increasing the values from multiples
                                      # of 8, e.g., (8,8), (16,16), (24,24), (32,32), ... (128,128), ...
                                      # start increasing the values if the images start to included block artifacts
hist_match = True               # match the histograms   ACROSS the stacks          : True or False
histogram_ref_stack = 0         # define the stack which should be used as reference for the histogram matching


# ### Image filtering parameters
# Define the parameters for filtering the images, such as median filtering and Gaussian smoothing.
# 
# #### Median filtering
# Regarding median filtering, you have the option to filter on the single slices BEFORE the projection (**`median_filter_slices`**) and/or on the projected images (**`median_filter_projections`**). For both options, you can choose from:
# 
# * `False` (no filtering)
# * `square` (square kernel): integer numbers (3, 5, 9)
# * `circular` (disk-shaped kernel; analogous to the median filter in ImageJ/Fiji): only values >= 0.5 allowed/have an effect
# 
# When you apply median filtering, you need to additionally provide the kernel size (**`median_filter_window_slices`** for single slices and **`median_filter_window_projections`** for projections). Depending on the chosen filtering kernel method, you can choose a kernel size as listed above.
# 
# #### Gaussian smoothing
# Gaussian smoothing further  enhances the contrast and reduces noise. Set
# 
# * `gaussian_smoothing` to 0: no smoothing, or
# * `gaussian_smoothing` to a value > 0: the standard deviation of the Gaussian kernel.

# In[7]:


# filter settings:
median_filter_slices             = 'circular' # median filter on SLICES BEFORE projecting
                                        # 'square', 'circular', or False
                                        # circular: floating point numbers allowed, not lower than 0.5 for circular
                                        # square: integer numbers (3, 5, 9)
median_filter_window_slices      = 1.55 # median filter window size on SLICES BEFORE projecting
                                        # circular: only values >= 0.5 allowed/have an effect
                                        # square: integer numbers (3, 5, 9)

median_filter_projections        = 'circular' # median filter on PROJECTIONS
                                        # square, circular, or False
median_filter_window_projections = 1.55 # median filter window size on PROJECTIONS
                                        # circular: only values >= 0.5 allowed/have an effect
                                        # square: integer numbers (3, 5, 9)
gaussian_sigma_proj = 1.00  # standard deviation of Gaussian (blur) filter applied on the projected stack
                            # set to 0 for turning off


# ### Channel parameters
# Define the channel parameters for single-channel or two-channel data:

# In[8]:


# channel settings: 
two_channel_default = True # define if the stack has two channels; if metadata.xls is present, this value is ignored
MG_channel_default  = 0    # define the channel of the Microglia; if metadata.xls is present, this value is ignored
N_channel_default   = 1    # define the channel of the Neurons/2nd channel; if metadata.xls is present, this value is ignored


# **Note**: If you stack contains only one channel, set `two_channel_default = False`; any value set in `N_channel_default` will be ignored.
# 
# **Note**: If `metadata.xls` is present in `project_tag` folder, the above defined values (`two_channel_default`, `MG_channel_defaulta`, `N_channel_default`) are ignored and values from the metadata.xls are used instead  (**in batch processing only!**)

# ### Registration parameters
# *MotilA* provides the option to register the image stacks. Two registration options are available:
# 
# * `regStack3d`: register slices WITHIN each 3D time-stack; `True` or `False`
# * `regStack2d`: register projections on each other;  `True` or `False`
# 
# With `template_mode`you can define the template mode for the registration. Choose between `mean` (default), `median`, `max`, `min`, `std`, and `var`.
# 
# With `max_xy_shift_correction`, you can define the maximum allowed shift in x and y (and z) direction for the registration. This is useful to avoid overcorrection.

# In[9]:


# registration settings:
regStack3d = False          # register slices WITHIN each 3D time-stack; True or False
regStack2d = False          # register projections on each other; True or False
template_mode = "max"       # set the template mode for the 3D registration; defines for both 3D and 2D registration
                            # choose from: mean, median, max, std, var.
max_xy_shift_correction = 100 # set the maximal shift in x/y direction for the 2D registration


# ### Spectral unmixing parameters
# *MotilA* provides the option to perform spectral unmixing on two-channel data. At the moment, only a simple method is implemented, which subtracts the N-channel from the MG-channel. Set `spectral_unmixing` to `True` to enable this feature. 
# 
# With `spectral_unmixing_amplifyer_default` you can define the amplification factor for the MG-channel before subtraction. This can be useful to preserve more information in the MG-channel.
# 
# `spectral_unmixing_median_filter_window` defines the kernel size for median filtering of N-channel before subtraction. This can be useful to reduce noise in the N-channel and, thus, achieve a better unmixing result. Allowed are odd integer numbers (3, 5, 9, ...).

# In[10]:


# spectral unmixing settings:
spectral_unmixing = False                  # perform spectral unmixing; True or False
                                          # if metadata.xls is present in project_tag folder, this value is 
                                          # ignored and the value from the metadata.xls is used instead 
                                          # (in batch processing only!)
spectral_unmixing_amplifyer_default    =1 # amplifies the MG channel (to save more from it)
spectral_unmixing_median_filter_window =3 # must be integer; 1=off, 3=common, 5=strong, 7=very strong


# ## 3. Initialize the logger
# Initialize the logger to track the progress of the pipeline. The log file will be saved in the same folder as this notebook is located.

# In[6]:


# init logger:
log = mt.logger_object()
log.log("logger started for TEST/DEBUG RUN: BATCH RUN.")
log.log("Test project: "+str(PROJECT_Path))
log.log(f"Mouse IDs: {ID_list}")
log.log(f"Group: {project_tag}")


# ## 4. Run the MotilA pipeline
# Finally, we run the *MotilA* pipeline with the defined parameters. Simply execute the following cell to start the processing:

# In[ ]:


mt.batch_process_stacks(PROJECT_Path=PROJECT_Path, 
                        ID_list=ID_list, 
                        project_tag=project_tag, 
                        reg_tif_file_folder=reg_tif_file_folder,
                        reg_tif_file_tag=reg_tif_file_tag,
                        metadata_file=metadata_file,
                        RESULTS_foldername=RESULTS_foldername,
                        MG_channel=MG_channel_default, 
                        N_channel=N_channel_default, 
                        two_channel=two_channel_default,
                        projection_center=projection_center_default, 
                        projection_layers=projection_layers_default,
                        histogram_ref_stack=histogram_ref_stack, 
                        log=log, 
                        blob_pixel_threshold=blob_pixel_threshold,
                        regStack2d=regStack2d, 
                        regStack3d=regStack3d, 
                        template_mode=template_mode,
                        spectral_unmixing=spectral_unmixing, 
                        hist_equalization=hist_equalization, 
                        hist_equalization_clip_limit=hist_equalization_clip_limit,
                        hist_equalization_kernel_size=hist_equalization_kernel_size,
                        hist_match=hist_match,
                        max_xy_shift_correction=max_xy_shift_correction,
                        threshold_method=threshold_method, 
                        compare_all_threshold_methods=compare_all_threshold_methods,
                        gaussian_sigma_proj=gaussian_sigma_proj, 
                        spectral_unmixing_amplifyer=spectral_unmixing_amplifyer_default,
                        median_filter_slices=median_filter_slices, 
                        median_filter_window_slices=median_filter_window_slices,
                        median_filter_projections=median_filter_projections, 
                        median_filter_window_projections=median_filter_window_projections,
                        clear_previous_results=clear_previous_results, 
                        spectral_unmixing_median_filter_window=spectral_unmixing_median_filter_window,
                        debug_output=False)


# ## 5. Assessing your results
# After running the pipeline, you can assess the results in the specified output folder. The results of each processing step described above are saved in separate tif and PDF files. By carefully investigating these results, you can evaluate the quality of the processing and adjust the parameters if necessary.
# 
# An example assessment is given in the  MotilA Quick Start (Single File) notebook.

# ## 6. Batch collection
# After processing all datasets, you can collect the results and save them to a central output folder. This allows you to perform cohort-level analyses and visualize the results across all datasets.

# First, define the parameters for batch collection:

# In[ ]:


RESULTS_Path = "../example project/Analysis/MG_motility/"
                                          # define the path to the results folder; in here, the combined results
                                          # of the cohort analysis will be saved; can be absolute or relative to the
                                          # location of this script

motility_folder = "motility_analysis"     # folder name containing motility analysis results in each ID folder/project_tag folder;
                                          # must be exact; wherein, all projection center folders therein will be processed
                                          # to collect the results


# Then, start the batch collection by executing the following cell:

# In[7]:


mt.batch_collect(PROJECT_Path=PROJECT_Path, 
                 ID_list=ID_list, 
                 project_tag=project_tag, 
                 motility_folder=motility_folder,
                 RESULTS_Path=RESULTS_Path,
                 log=log)


# Let's investigate the cohort results by loading the summary tables:

# In[8]:


import pandas as pd

title = "all_motility.xlsx"
excel_file = Path(RESULTS_Path).joinpath(title)

pixel_area_df = pd.read_excel(excel_file)
pixel_area_df


# This table contains the motility metrics for each dataset for each ID, time lapse stack, project tag, and projection center. You can use this table to perform cohort-level analyses and visualize the results across all datasets.
# 
# Also a summary table with the mean and standard deviation of the motility metrics for each ID and project tag is generated:

# In[9]:


title = "average_motility.xlsx"
excel_file = Path(RESULTS_Path).joinpath(title)

pixel_area_df = pd.read_excel(excel_file)
pixel_area_df


# Similarly, all brightness values and pixel areas are summarized in separate tables for any further analysis:

# In[10]:


title = "all_brightness.xlsx"
excel_file = Path(RESULTS_Path).joinpath(title)

pixel_area_df = pd.read_excel(excel_file)
pixel_area_df


# In[11]:


title = "all_pixel_areas.xlsx"
excel_file = Path(RESULTS_Path).joinpath(title)

pixel_area_df = pd.read_excel(excel_file)
pixel_area_df