#!/usr/bin/env python # coding: utf-8 # # Additional methods # # This notebooks provides an overview of built-in clustering performance evaluation, ways of accessing individual labels resulting from clustering and saving the object to disk. # # ## Clustering performance evaluation # # Clustergam includes handy wrappers around a selection of clustering performance metrics offered by # `scikit-learn`. Data which were originally computed on GPU are converted to numpy on the fly. # # Let's load the data and fit clustergram on Palmer penguins dataset. See the [Introduction](introduction) for its overview. # In[1]: import seaborn from sklearn.preprocessing import scale from clustergram import Clustergram seaborn.set(style='whitegrid') df = seaborn.load_dataset('penguins') data = scale(df.drop(columns=['species', 'island', 'sex']).dropna()) cgram = Clustergram(range(1, 12), n_init=10, verbose=False) cgram.fit(data) # ### Silhouette score # # Compute the mean Silhouette Coefficient of all samples. See [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html#sklearn.metrics.silhouette_score) for details. # In[2]: cgram.silhouette_score() # Once computed, resulting Series is available as `cgram.silhouette_`. Calling the original method will recompute the score. # In[3]: cgram.silhouette_.plot(); # ### Calinski and Harabasz score # # Compute the Calinski and Harabasz score, also known as the Variance Ratio Criterion. See [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html#sklearn.metrics.calinski_harabasz_score) for details. # In[4]: cgram.calinski_harabasz_score() # Once computed, resulting Series is available as `cgram.calinski_harabasz_`. Calling the original method will recompute the score. # In[5]: cgram.calinski_harabasz_.plot(); # ### Davies-Bouldin score # # Compute the Davies-Bouldin score. See [`scikit-learn` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.davies_bouldin_score.html#sklearn.metrics.davies_bouldin_score) for details. # In[6]: cgram.davies_bouldin_score() # Once computed, resulting Series is available as `cgram.davies_bouldin_`. Calling the original method will recompute the score. # In[7]: cgram.davies_bouldin_.plot(); # ## Acessing labels # # `Clustergram` stores resulting labels for each of the tested options, which can be accessed as: # In[8]: cgram.labels_ # ## Saving clustergram # # If you want to save your computed `clustergram.Clustergram` object to a disk, you can use `pickle` library: # In[9]: import pickle with open('clustergram.pickle','wb') as f: pickle.dump(cgram, f) # In[10]: with open('clustergram.pickle','rb') as f: loaded = pickle.load(f)