from pymeda import Meda
import pandas as pd
import sklearn.datasets as datasets
First, PyMeda expects the input data to either be a string pointing to a csv file or a Pandas dataframe. Make sure that the first row corresponds to the feature names and you remove any columns without meaningful information (e.g. unique identifiers, etc).
Secondly, the following plots can be generated using PyMeda:
You can run each plots individually (if you know what you want), or use the run_all
function to generate all of the above plots. Note: Pair plot for clustering will not be made if number of samples is > 1000 due to issues with plotly.
Lastly, you can also save the outputs as an html file by using the generate_report
function.
#Create iris dataset in pandas dataframe
iris = datasets.load_iris()
data = iris.data
columns = iris.feature_names
iris_df = pd.DataFrame(data=data, columns=columns)
title = 'Iris Dataset' #Set title
cluster_levels = 2 #Set number of times to cluster.
meda = Meda(data=iris_df, title=title, cluster_levels=cluster_levels)
#You can make individual plots by calling the class methods
meda.correlation_matrix()
meda.cluster_pair_plot()
#You can also make all plots at once.
meda.run_all()
#Generate a static HTML report
out_dir = './'
meda.generate_report(out_dir)
Report saved at ./2018-10-01_Iris Dataset.html