#!/usr/bin/env python
# coding: utf-8

# # Effective Stiffness of Fiber Composite
# 
# 

# ## Introduction
# 
# This example demonstrates the use of an homogenization model to predict the macroscopic stress in fiber-like structures. The structures emulate fiber-reinforced polymer samples. For a summary of homogenization theory and its use with effective stiffness properties please see the [Effective Siffness example](http://materialsinnovation.github.io/pymks/rst/stress.html). The example generates a series of random microstructures with various fiber lengths and volume fractions. The structures are used to calibrate and test the homegnization model against the simulated effective stress values.

# In[1]:


import warnings
import os

os.environ["OMP_NUM_THREADS"] = "1"
warnings.filterwarnings('ignore')

import dask.array as da
import numpy as np
import pandas
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from dask_ml.decomposition import PCA
from dask_ml.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from dask_ml.model_selection import train_test_split
from dask_ml.model_selection import GridSearchCV
from mpl_toolkits.mplot3d import Axes3D

from pymks import (
    generate_multiphase,
    solve_fe,
    plot_microstructures,
    PrimitiveTransformer,
    TwoPointCorrelation,
    FlattenTransformer
)


# In[2]:


#PYTEST_VALIDATE_IGNORE_OUTPUT

get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('load_ext', 'autoreload')
get_ipython().run_line_magic('autoreload', '2')


# ## Data Generation
# 
# The following generates four different types of microstructures each with 200 samples with spatial dimensions of 31 x 31. The grains have a high aspect ratio to simulate fiber-like structures.

# In[5]:


da.random.seed(10)
np.random.seed(10)


tmp = [
    generate_multiphase(
        shape=(100, 31, 31),
        grain_size=grain_size,
        volume_fraction=volume_fraction,
        chunks=50,
        percent_variance=0.2
    )
    for (grain_size, volume_fraction) in zip(
        [(30, 2), (7, 2), (2, 30), (2, 7)],
        [(0.7, 0.3), (0.6, 0.4), (0.3, 0.7), (0.4, 0.6)]
    )
]

x_data = da.concatenate(tmp).persist()


# In[7]:


y_stress = solve_fe(x_data,
                    elastic_modulus=(1.3, 2.5),
                    poissons_ratio=(0.42, 0.35),
                    macro_strain=0.001)['stress'][..., 0].persist()

y_data = da.average(y_stress.reshape(y_stress.shape[0], -1), axis=1).persist()


# The arrays have been computed by calling the `persist` method.

# In[8]:


x_data


# In[9]:


y_data


# ## View Microstructures

# In[10]:


plot_microstructures(*x_data[::100], cmap='gray', colorbar=False)


# In[11]:


y_stress.shape


# In[12]:


plot_microstructures(*y_stress[::100])


# In[13]:


#PYTEST_VALIDATE_IGNORE_OUTPUT

y_data[::100].compute()


# ## The Model
# 
# The following creates an homogenization pipeline and cross-validates the model. For more details about the two-points statistics view the [Checkerboard Microstructure Example](http://materialsinnovation.github.io/pymks/rst/checker_board.html).

# ### Note (issue or bug)
# 
# There are currently two issues with the pipeline.
# 
#  - The `svd_solver='full'` argument is required in the pipeline as the results are not correct without it. This might be an issue with Dask's PCA algorithms and needs further investigation.
#  
#  - Dask's `LinearRegression` module does not seem to solve this problem correctly and gives very different results to Sklearn's. Needs further investigation. Using Sklearn's instead.

# In[14]:


pca_steps = [
    ("discritize",PrimitiveTransformer(n_state=2, min_=0.0, max_=1.0)),
    ("correlations",TwoPointCorrelation(periodic_boundary=True, cutoff=31, correlations=[(0, 1), (1, 1)])),
    ('flatten', FlattenTransformer()),  
    ('pca',PCA(n_components=3, svd_solver='full', random_state=999))
]

pca_pipeline = Pipeline(steps=pca_steps)

pipeline = Pipeline(steps=pca_steps + [
    ('poly', PolynomialFeatures()),
    ('regressor', LinearRegression())
])


# ## Training and testing data sets
# 
# Split the data into testing and training segments for computations in the rest of the notebook

# In[15]:


x_train, x_test, y_train, y_test = train_test_split(
    x_data.reshape(x_data.shape[0], -1),
    y_data,
    test_size=0.2,
    random_state=3
)


# ## Visualize the task graph
# 
# The pipeline results in an entirely lazy calulation. The task graph can be viewed to check that it remains parallel and doesn't gather all the data. The calculation has not been evaluated yet, since the graph is generated with only a small part of the data.
# 
# ### Note (issue or bug)
# 
# Note that for some reason the `y_train` passed into `fit` is required to be a Numpy array and doens't work with a correctly chunked Dask array, see the [github issue](https://github.com/materialsinnovation/pymks/issues/474).

# In[16]:


#PYTEST_VALIDATE_IGNORE_OUTPUT

pca_pipeline.fit(x_train, np.array(y_train)).transform(x_train).visualize()


# ## Cross Validation
# 
# The following creates an homogenization pipeline and cross-validates the model. For more details about the two-points statistics view the [Checkerboard Microstructure Example](http://materialsinnovation.github.io/pymks/rst/checker_board.html). Use `GridSearchCV` to optimize the `n_components` and `degree` hyper-parameters for the model. Search over the range of 1st order to 3rd order degree polynomials and 2 to 7 principal components.

# In[17]:


params_to_tune = {'pca__n_components': np.arange(1, 7),'poly__degree': np.arange(1, 4)}

grid_search = GridSearchCV(pipeline, param_grid=params_to_tune).fit(x_train, y_train)


# Display the best model from the grid search and check that it is accurate.

# In[18]:


#PYTEST_VALIDATE_IGNORE_OUTPUT

print('Order of Polynomial',grid_search.best_params_.get('poly__degree'))
print('Number of Components', grid_search.best_params_.get('pca__n_components'))
print('R-squared Value', np.allclose(grid_search.score(x_test, y_test), 1.0, rtol=2e-2))


# In[19]:


assert (2 <= grid_search.best_params_.get('poly__degree') <= 3), grid_search.best_params_.get('poly__degree')
assert (4 <= grid_search.best_params_.get('pca__n_components') <= 6), grid_search.best_params_.get('pca__n_components')
assert np.allclose(grid_search.score(x_test, y_test), 1.0, rtol=2e-2), grid_search.score(x_test, y_test)


# In[20]:


def plot_line(x, mean, std_dev, color, label, linestyle):
    plt.fill_between(x, mean - std_dev, mean + std_dev, alpha=0.1, color=color)
    plt.plot(x, mean, 'o-', color=color, label=label, linewidth=2, linestyle=linestyle)
    
def plot(dfs):
    [plot_line(
        df_.n_comp.astype(int),
        df_.mean_, df_.std_,
        df_.color.iloc[0],
        df_.label.iloc[0],
        df_.linestyle.iloc[0]
    )
     for df_ in dfs]
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.,fontsize=15)
    plt.ticklabel_format(style='sci', axis='y')
    plt.ylabel('R-Squared', fontsize=20)
    plt.xlabel('Number of Components', fontsize=20)
    plt.ylim(0.0, 1.05)
    plt.show()

def add_labels(df):
    return df.assign(
        label=df.degree.map(lambda n: {1: '1st Order', 2: '2nd Order', 3: '3rd Order'}[n]),
        color=df.degree.map(lambda n: {1: '#1a9850', 2: '#f46d43', 3: '#1f78b4'}[n]),
        linestyle=df.degree.map(lambda n: {1: 'solid', 2: 'dotted', 3: 'dashed'}[n])
    )
    
def rename(df):
    return df.rename(
        dict(
            param_poly__degree='degree',
            param_pca__n_components='n_comp',
            mean_test_score='mean_',
            std_test_score='std_',
        ),
        axis=1
    )

def groupby(df, col):
    gb = df.groupby(col)
    return [gb.get_group(x) for x in gb.groups]

df = groupby(
    add_labels(
        rename(
            pandas.DataFrame(grid_search.cv_results_)
        )
    ),
    'degree'
)

plot(df)


# The best model has `degree` equal to 3 and `n_components` equal to 5.

# In[21]:


model = grid_search.best_estimator_


# ## Structures in PCA space
# 
# Plot the samples in 3D principle component space using a pipeline that omits both the polynomial features and the linear regression. This pipeline is a transform only pipeline, not a predicting pipeline.

# In[22]:


pca_pipeline.set_params(pca__n_components=3);


# Apply the `pca_pipeline` to the training dataset and testing dataset to obtain their principle component values.

# In[23]:


x_trans_train = pca_pipeline.fit(x_train).transform(x_train).compute()
x_trans_test = pca_pipeline.transform(x_test).compute()


# Visualize the principle component information in the 3D plot.

# In[24]:


fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('Component 1', fontsize=12)
ax.set_ylabel('Component 2', fontsize=12)
ax.set_zlabel('Component 3', fontsize=12)
ax.plot(*x_trans_train.T, 'o', label='Training Data', color='#1a9850')
ax.plot(*x_trans_test.T, 'o', label='Testing Data', color='#f46d43')
plt.title('Low Dimensional Representation', fontsize=15)
plt.legend(loc=1, borderaxespad=0., fontsize=15)
plt.legend(bbox_to_anchor=(1.05, 1.0), loc=2, borderaxespad=0., fontsize=15)
plt.show()


# There is agreement between the testing and the training data. There is also some separation between fiber types in PC space.  

# ## Parity Plot
# 
# The calculated average stress is plotted against the stress generate by the model.

# In[25]:


y_train_predict = model.predict(x_train)
y_predict = model.predict(x_test)


# In[26]:


fit_data = np.array([y_train.compute(), y_train_predict])
pred_data = np.array([y_test.compute(), y_predict])

y_total = np.concatenate((fit_data, pred_data), axis=-1)
line = np.min(y_total), np.max(y_total)

plt.plot(fit_data[0], fit_data[1], 'o', color='#1a9850', label='Training Data')
plt.plot(pred_data[0], pred_data[1], 'o', color='#f46d43', label='Testing Data')
plt.plot(line, line, '-', linewidth=3, color='#000000')
plt.title('Goodness of Fit', fontsize=20)
plt.xlabel('Actual', fontsize=18)
plt.ylabel('Predicted', fontsize=18)
plt.legend(loc=2, fontsize=15)
plt.show()


# There is a good correlation between the FE results and those predicted by the linkage.  

# In[ ]:


# In[ ]: