Create a rubicon_ml
project
from rubicon_ml import Rubicon
rubicon = Rubicon(persistence="memory", auto_git_enabled=True)
project = rubicon.create_project(name="apply schema")
project
<rubicon_ml.client.project.Project at 0x11c99e890>
RandomForestClassifier
¶Load a training dataset
from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True, as_frame=True)
Train an instance of the model the schema represents
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(
ccp_alpha=5e-3,
criterion="log_loss",
max_features="log2",
n_estimators=24,
oob_score=True,
random_state=121,
)
rfc.fit(X, y)
print(rfc)
RandomForestClassifier(ccp_alpha=0.005, criterion='log_loss', max_features='log2', n_estimators=24, oob_score=True, random_state=121)
Log the model metadata defined in the applied schema to a new experiment in project
with project.log_with_schema
Note: project.log_with_schema
will infer the correct schema based on the given object to log
experiment = project.log_with_schema(
rfc,
experiment_kwargs={ # additional kwargs to be passed to `project.log_experiment`
"name": "log with schema",
"model_name": "RandomForestClassifier",
"description": "logged with the `RandomForestClassifier` `rubicon_schema`",
},
)
print(f"inferred schema name: {project.schema_['name']}")
experiment
inferred schema name: sklearn__RandomForestClassifier
<rubicon_ml.client.experiment.Experiment at 0x16d392b10>
Each experiment contains all the data represented in the schema - more information on the data captured by
a rubicon_schema
can be found in the "Representing model metadata with a rubicon_schema
" section
vars(experiment._domain)
{'project_name': 'apply schema', 'id': 'ec4c3ead-3337-4623-9a97-c61f48e8de3d', 'name': 'log with schema', 'description': 'logged with the `RandomForestClassifier` `rubicon_schema`', 'model_name': 'RandomForestClassifier', 'branch_name': 'schema', 'commit_hash': 'c9f696408a03c6a6fbf2fbff39fa48bbf722bae1', 'training_metadata': None, 'tags': [], 'created_at': datetime.datetime(2023, 9, 25, 15, 47, 37, 552091)}
The features and their importances are logged as defined in the schema's "features" section
project.schema_["features"]
[{'names_attr': 'feature_names_in_', 'importances_attr': 'feature_importances_', 'optional': True}]
for feature in experiment.features():
print(f"{feature.name} ({feature.importance})")
alcohol (0.1276831830349219) malic_acid (0.03863837532736449) ash (0.006168227239831861) alcalinity_of_ash (0.025490751927615605) magnesium (0.02935763050777937) total_phenols (0.058427899304369986) flavanoids (0.15309812550131274) nonflavanoid_phenols (0.007414542189797497) proanthocyanins (0.012615187741781065) color_intensity (0.13608806341133572) hue (0.0892558912217226) od280/od315_of_diluted_wines (0.15604181694153108) proline (0.15972030565063608)
Each parameter and its value are logged as defined in the schema's "parameters" section
project.schema_["parameters"]
[{'name': 'bootstrap', 'value_attr': 'bootstrap'}, {'name': 'ccp_alpha', 'value_attr': 'ccp_alpha'}, {'name': 'class_weight', 'value_attr': 'class_weight'}, {'name': 'criterion', 'value_attr': 'criterion'}, {'name': 'max_depth', 'value_attr': 'max_depth'}, {'name': 'max_features', 'value_attr': 'max_features'}, {'name': 'min_impurity_decrease', 'value_attr': 'min_impurity_decrease'}, {'name': 'max_leaf_nodes', 'value_attr': 'max_leaf_nodes'}, {'name': 'max_samples', 'value_attr': 'max_samples'}, {'name': 'min_samples_split', 'value_attr': 'min_samples_split'}, {'name': 'min_samples_leaf', 'value_attr': 'min_samples_leaf'}, {'name': 'min_weight_fraction_leaf', 'value_attr': 'min_weight_fraction_leaf'}, {'name': 'n_estimators', 'value_attr': 'n_estimators'}, {'name': 'oob_score', 'value_attr': 'oob_score'}, {'name': 'random_state', 'value_attr': 'random_state'}]
for parameter in experiment.parameters():
print(f"{parameter.name}: {parameter.value}")
bootstrap: True ccp_alpha: 0.005 class_weight: None criterion: log_loss max_depth: None max_features: log2 min_impurity_decrease: 0.0 max_leaf_nodes: None max_samples: None min_samples_split: 2 min_samples_leaf: 1 min_weight_fraction_leaf: 0.0 n_estimators: 24 oob_score: True random_state: 121
Each metric and its value are logged as defined in the schema's "metrics" section
project.schema_["metrics"]
[{'name': 'classes', 'value_attr': 'classes_'}, {'name': 'n_classes', 'value_attr': 'n_classes_'}, {'name': 'n_features_in', 'value_attr': 'n_features_in_'}, {'name': 'n_outputs', 'value_attr': 'n_outputs_'}, {'name': 'oob_decision_function', 'value_attr': 'oob_decision_function_', 'optional': True}, {'name': 'oob_score', 'value_attr': 'oob_score_', 'optional': True}]
import numpy as np
for metric in experiment.metrics():
if np.isscalar(metric.value):
print(f"{metric.name}: {metric.value}")
else: # don't print long metrics
print(f"{metric.name}: ...")
classes: ... n_classes: 3 n_features_in: 13 n_outputs: 1 oob_decision_function: ... oob_score: 0.9775280898876404
A copy of the trained model is logged as defined in the schema's "artifacts" section
project.schema_["artifacts"]
['self']
for artifact in experiment.artifacts():
print(f"{artifact.name}:\n{artifact.get_data(unpickle=True)}")
RandomForestClassifier: RandomForestClassifier(ccp_alpha=0.005, criterion='log_loss', max_features='log2', n_estimators=24, oob_score=True, random_state=121)