Other than the built-in datasets (e.g. BikeSharing, CaliforniaHousing, TaiwanCredit) for demo purpose, PiML supports custom data uploading for model development and validation.
This example notebook demonstrates how to upload/read custom data in two ways.
Upload new data by the piml.Experiment.data_loader()
widget (with file size limit 10MB)
Manually read data by panda.read_csv()
, then register to piml.Experiment
.
For simplicity, we employ the CASP dataset you may download from https://archive.ics.uci.edu/ml/machine-learning-databases/00265/CASP.csv
(~3.4MB, with details here ) and save it to your local drive.
!pip install piml
to install the latest version of PiML!pip install piml
from piml import Experiment
exp = Experiment()
# The first way: upload new data from your local drive by the widget (with file size limit 10MB)
# Choose "Upload new data"
exp.data_loader()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Data', 'CoCircles', 'Friedman', 'BikeShar…
# The second way: manually read data from your local drive or URL, then register to piml.Experiment
import pandas as pd
data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00265/CASP.csv")
exp.data_loader(data=data)
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
RMSD | F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 | F9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 17.284 | 13558.30 | 4305.35 | 0.31754 | 162.1730 | 1.872791e+06 | 215.3590 | 4287.87 | 102.0 | 27.0302 |
1 | 6.021 | 6191.96 | 1623.16 | 0.26213 | 53.3894 | 8.034467e+05 | 87.2024 | 3328.91 | 39.0 | 38.5468 |
2 | 9.275 | 7725.98 | 1726.28 | 0.22343 | 67.2887 | 1.075648e+06 | 81.7913 | 2981.04 | 29.0 | 38.8119 |
3 | 15.851 | 8424.58 | 2368.25 | 0.28111 | 67.8325 | 1.210472e+06 | 109.4390 | 3248.22 | 70.0 | 39.0651 |
4 | 7.962 | 7460.84 | 1736.94 | 0.23280 | 52.4123 | 1.021020e+06 | 94.5234 | 2814.42 | 41.0 | 39.9147 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
45725 | 3.762 | 8037.12 | 2777.68 | 0.34560 | 64.3390 | 1.105797e+06 | 112.7460 | 3384.21 | 84.0 | 36.8036 |
45726 | 6.521 | 7978.76 | 2508.57 | 0.31440 | 75.8654 | 1.116725e+06 | 102.2770 | 3974.52 | 54.0 | 36.0470 |
45727 | 10.356 | 7726.65 | 2489.58 | 0.32220 | 70.9903 | 1.076560e+06 | 103.6780 | 3290.46 | 46.0 | 37.4718 |
45728 | 9.791 | 8878.93 | 3055.78 | 0.34416 | 94.0314 | 1.242266e+06 | 115.1950 | 3421.79 | 41.0 | 35.6045 |
45729 | 18.827 | 12732.40 | 4444.36 | 0.34905 | 157.6300 | 1.788897e+06 | 229.4590 | 4626.85 | 141.0 | 29.8118 |
45730 rows × 10 columns
exp.data_summary()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
HTML(value='<link rel="stylesheet" href="//stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.…
VBox(children=(HTML(value='Data Shape:(45730, 10)'), Tab(children=(Output(), Output()), _dom_classes=('data-su…
# Select RMSD as the target variable and click "UPDATE"; this is a regression task
exp.data_prepare()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(HBox(children=(VBox(children=(HTML(value='<p>Target Variable:</p>'), HTML(value='<p>Split Metho…
exp.feature_select()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
HBox(children=(Output(), Output()))
VBox(children=(ToggleButtons(layout=Layout(width='100%'), options=('Correlation', 'Distance Correlation', 'Fea…
exp.eda()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(HBox(children=(VBox(children=(HTML(value='<h4>Univariate:</h4>'), HBox(children=(Dropdown(layou…
# Choose GLM, GAM, Tree models with default settings, click "RUN" to train;
# After training is finished, register the three trained models one by one.
exp.model_train()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(Box(children=(Box(children=(HTML(value="<h4 style='margin: 10px 0px;'>Choose Model</h4>"), Box(…
# Model-agnostic post-hoc explanation by Permutation Feature Importance, PDP (1D and 2D) vs. ALE (1D and 2D), LIME vs. SHAP
exp.model_explain()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'GAM', 'GLM', 'Tree'), style=Desc…
# Model-specific inherent interpretation including feature importance, main effects and pairwise interactions.
exp.model_interpret()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'GAM', 'GLM', 'Tree'), style=Desc…
exp.model_diagnose()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(Dropdown(layout=Layout(width='20%'), options=('Select Model', 'GAM', 'GLM', 'Tree'), style=Desc…
exp.model_compare()
HTML(value='\n <style>\n\n .left-label {\n width: 30%;\n }\n\n .card-pa…
VBox(children=(HBox(children=(Dropdown(layout=Layout(width='30%'), options=('Select Model', 'GAM', 'GLM', 'Tre…