#!/usr/bin/env python
# coding: utf-8

# # Additional methods
# 
# This notebooks provides an overview of built-in clustering performance evaluation, ways of accessing individual labels resulting from clustering and saving the object to disk.
# 
# ## Clustering performance evaluation
# 
# Clustergam includes handy wrappers around a selection of clustering performance metrics offered by
# `scikit-learn`. Data which were originally computed on GPU are converted to numpy on the fly.
# 
# Let's load the data and fit clustergram on Palmer penguins dataset. See the [Introduction](introduction) for its overview.

# In[1]:


import seaborn
from sklearn.preprocessing import scale
from clustergram import Clustergram

seaborn.set(style='whitegrid')

df = seaborn.load_dataset('penguins')
data = scale(df.drop(columns=['species', 'island', 'sex']).dropna())

cgram = Clustergram(range(1, 12), n_init=10, verbose=False)
cgram.fit(data)


# ### Silhouette score
# 
# Compute the mean Silhouette Coefficient of all samples. See [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score) for details. 

# In[2]:


cgram.silhouette_score()


# Once computed, resulting Series is available as `cgram.silhouette_`. Calling the original method will recompute the score.

# In[3]:


cgram.silhouette_.plot();


# ### Calinski and Harabasz score
# 
# Compute the Calinski and Harabasz score, also known as the Variance Ratio Criterion. See [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html#sklearn.metrics.calinski_harabasz_score) for details.

# In[4]:


cgram.calinski_harabasz_score()


# Once computed, resulting Series is available as `cgram.calinski_harabasz_`. Calling the original method will recompute the score.

# In[5]:


cgram.calinski_harabasz_.plot();


# ### Davies-Bouldin score
# 
# Compute the Davies-Bouldin score. See [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html#sklearn.metrics.davies_bouldin_score) for details.

# In[6]:


cgram.davies_bouldin_score()


# Once computed, resulting Series is available as `cgram.davies_bouldin_`. Calling the original method will recompute the score.

# In[7]:


cgram.davies_bouldin_.plot();


# ## Acessing labels
# 
# `Clustergram` stores resulting labels for each of the tested options, which can be accessed as:

# In[8]:


cgram.labels_


# ## Saving clustergram
# 
# If you want to save your computed `clustergram.Clustergram` object to a disk, you can use `pickle` library:

# In[9]:


import pickle

with open('clustergram.pickle','wb') as f:
    pickle.dump(cgram, f)


# In[10]:


with open('clustergram.pickle','rb') as f:
    loaded = pickle.load(f)