In [1]:

from pymeda import Meda

import pandas as pd
import sklearn.datasets as datasets

Running PyMeda¶

First, PyMeda expects the input data to either be a string pointing to a csv file or a Pandas dataframe. Make sure that the first row corresponds to the feature names and you remove any columns without meaningful information (e.g. unique identifiers, etc).

Secondly, the following plots can be generated using PyMeda:

Representative Heatmap - plots all data points as a heatmap
Ridge Line Plot - plots each feature as a density
Location Heatmap - plots mean and median of each feature as heatmap
Location Lines - plots mean and median of each feature as lines
Scree Plot - computes the explained variance after computing PCA of each component
Correlation Matrix - correlation of features
Hierarchical Gaussian Mixture Model Plots - clustering of data
1. Dendogram
2. Pair Plot
3. Stacked Means
4. Mean Heatmap
5. Mean Lines

You can run each plots individually (if you know what you want), or use the run_all function to generate all of the above plots. Note: Pair plot for clustering will not be made if number of samples is > 1000 due to issues with plotly.

Lastly, you can also save the outputs as an html file by using the generate_report function.

In [3]:

#Create iris dataset in pandas dataframe
iris = datasets.load_iris()
data = iris.data
columns = iris.feature_names
iris_df = pd.DataFrame(data=data, columns=columns)

title = 'Iris Dataset' #Set title
cluster_levels = 2 #Set number of times to cluster.

meda = Meda(data=iris_df, title=title, cluster_levels=cluster_levels)

In [4]:

#You can make individual plots by calling the class methods
meda.correlation_matrix()

In [5]:

meda.cluster_pair_plot()

In [6]:

#You can also make all plots at once.
meda.run_all()

In [7]:

#Generate a static HTML report
out_dir = './'
meda.generate_report(out_dir)

Report saved at ./2018-10-01_Iris Dataset.html

In [ ]: