import h2o
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
h2o.init()
Warning: Version mismatch. H2O is version 3.5.0.99999, but the python package is version UNKNOWN.
H2O cluster uptime: | 58 minutes 48 seconds 43 milliseconds |
H2O cluster version: | 3.5.0.99999 |
H2O cluster name: | ludirehak |
H2O cluster total nodes: | 1 |
H2O cluster total memory: | 4.44 GB |
H2O cluster total cores: | 8 |
H2O cluster allowed cores: | 8 |
H2O cluster healthy: | True |
H2O Connection ip: | 127.0.0.1 |
H2O Connection port: | 54321 |
from h2o.utils.shared_utils import _locate # private function. used to find files within h2o git project directory.
prostate = h2o.upload_file(path=_locate("smalldata/logreg/prostate.csv"))
prostate.describe()
Parse Progress: [##################################################] 100% Uploaded py2a71800e-2ec6-4f71-b955-854a4f22aeb3 into cluster with 380 rows and 9 cols Rows: 380 Cols: 9 Chunk compression summary:
chunk_type | chunk_name | count | count_percentage | size | size_percentage |
CBS | Bits | 1 | 11.111112 | 118 B | 2.4210093 |
C1N | 1-Byte Integers (w/o NAs) | 5 | 55.555557 | 2.2 KB | 45.958145 |
C2 | 2-Byte Integers | 1 | 11.111112 | 828 B | 16.9881 |
C2S | 2-Byte Fractions | 2 | 22.222223 | 1.6 KB | 34.632744 |
Frame distribution summary:
size | number_of_rows | number_of_chunks_per_column | number_of_chunks | |
172.16.2.37:54321 | 4.8 KB | 380.0 | 1.0 | 9.0 |
mean | 4.8 KB | 380.0 | 1.0 | 9.0 |
min | 4.8 KB | 380.0 | 1.0 | 9.0 |
max | 4.8 KB | 380.0 | 1.0 | 9.0 |
stddev | 0 B | 0.0 | 0.0 | 0.0 |
total | 4.8 KB | 380.0 | 1.0 | 9.0 |
Column-by-Column Summary:
ID | CAPSULE | AGE | RACE | DPROS | DCAPS | PSA | VOL | GLEASON | |
type | int | int | int | int | int | int | real | real | int |
mins | 1.0 | 0.0 | 43.0 | 0.0 | 1.0 | 1.0 | 0.3 | 0.0 | 0.0 |
maxs | 380.0 | 1.0 | 79.0 | 2.0 | 4.0 | 2.0 | 139.7 | 97.6 | 9.0 |
mean | 190.5 | 0.4 | 66.0 | 1.1 | 2.3 | 1.1 | 15.4 | 15.8 | 6.4 |
sigma | 109.8 | 0.5 | 6.5 | 0.3 | 1.0 | 0.3 | 20.0 | 18.3 | 1.1 |
zero_count | 0 | 227 | 0 | 3 | 0 | 0 | 0 | 167 | 2 |
missing_count | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
prostate["CAPSULE"] = prostate["CAPSULE"].asfactor()
model = H2ODeepLearningEstimator(activation = "Tanh", hidden = [10, 10, 10], epochs = 10000)
model.train(x = list(set(prostate.columns) - set(["ID","CAPSULE"])), y ="CAPSULE", training_frame = prostate)
model.show()
deeplearning Model Build Progress: [##################################################] 100% Model Details ============= H2ODeepLearningEstimator : Deep Learning Model Key: DeepLearning_model_python_1445544453075_137 Status of Neuron Layers: predicting CAPSULE, 2-class classification, bernoulli distribution, CrossEntropy loss, 322 weights/biases, 8.5 KB, 3,800,000 training samples, mini-batch size 1
layer | units | type | dropout | l1 | l2 | mean_rate | rate_RMS | momentum | mean_weight | weight_RMS | mean_bias | bias_RMS | |
1 | 7 | Input | 0.0 | ||||||||||
2 | 10 | Tanh | 0.0 | 0.0 | 0.0 | 0.1 | 0.1 | 0.0 | 0.1 | 1.0 | 0.4 | 0.7 | |
3 | 10 | Tanh | 0.0 | 0.0 | 0.0 | 0.1 | 0.1 | 0.0 | 0.0 | 1.4 | 1.1 | 0.7 | |
4 | 10 | Tanh | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 0.0 | -0.2 | 1.8 | -0.2 | 0.8 | |
5 | 2 | Softmax | 0.0 | 0.0 | 0.4 | 0.1 | 0.0 | -0.2 | 5.7 | 0.1 | 0.3 |
ModelMetricsBinomial: deeplearning ** Reported on train data. ** MSE: 0.010708193224 R^2: 0.955478877615 LogLoss: 0.0689458344205 AUC: 0.996818404307 Gini: 0.993636808615 Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.85259659057:
0 | 1 | Error | Rate | |
0 | 224.0 | 3.0 | 0.0132 | (3.0/227.0) |
1 | 0.0 | 153.0 | 0.0 | (0.0/153.0) |
Total | 224.0 | 156.0 | 0.0079 | (3.0/380.0) |
Maximum Metrics: Maximum metrics at their respective thresholds
metric | threshold | value | idx |
max f1 | 0.9 | 1.0 | 107.0 |
max f2 | 0.9 | 1.0 | 107.0 |
max f0point5 | 0.9 | 1.0 | 107.0 |
max accuracy | 0.9 | 1.0 | 107.0 |
max precision | 1.0 | 1.0 | 0.0 |
max absolute_MCC | 0.9 | 1.0 | 107.0 |
max min_per_class_accuracy | 0.9 | 1.0 | 105.0 |
Scoring History:
timestamp | duration | training_speed | epochs | samples | training_MSE | training_r2 | training_logloss | training_AUC | training_classification_error | |
2015-10-22 14:06:21 | 0.000 sec | None | 0.0 | 0.0 | nan | nan | nan | nan | nan | |
2015-10-22 14:06:21 | 0.038 sec | 115151 rows/sec | 10.0 | 3800.0 | 0.2 | 0.2 | 0.6 | 0.8 | 0.3 | |
2015-10-22 14:06:26 | 5.047 sec | 309126 rows/sec | 4100.0 | 1558000.0 | 0.0 | 0.9 | 0.1 | 1.0 | 0.0 | |
2015-10-22 14:06:31 | 10.051 sec | 308783 rows/sec | 8160.0 | 3100800.0 | 0.0 | 0.9 | 0.1 | 1.0 | 0.0 | |
2015-10-22 14:06:34 | 12.360 sec | 307717 rows/sec | 10000.0 | 3800000.0 | 0.0 | 1.0 | 0.1 | 1.0 | 0.0 |
predictions = model.predict(prostate)
predictions.show()
H2OFrame with 380 rows and 3 columns:
predict | p0 | p1 | |
---|---|---|---|
0 | 0 | 9.993875e-01 | 6.125394e-04 |
1 | 0 | 9.999998e-01 | 1.937478e-07 |
2 | 0 | 9.999646e-01 | 3.535732e-05 |
3 | 0 | 1.000000e+00 | 2.235483e-12 |
4 | 0 | 9.999950e-01 | 5.024862e-06 |
5 | 1 | 1.237468e-07 | 9.999999e-01 |
6 | 0 | 9.992793e-01 | 7.206910e-04 |
7 | 0 | 1.000000e+00 | 9.146884e-19 |
8 | 0 | 1.000000e+00 | 8.434714e-13 |
9 | 0 | 9.999994e-01 | 6.112821e-07 |
performance = model.model_performance(prostate)
performance.show()
ModelMetricsBinomial: deeplearning ** Reported on test data. ** MSE: 0.010708193224 R^2: 0.955478877615 LogLoss: 0.0689458344205 AUC: 0.996804007947 Gini: 0.993608015894 Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.85259659057:
0 | 1 | Error | Rate | |
0 | 224.0 | 3.0 | 0.0132 | (3.0/227.0) |
1 | 0.0 | 153.0 | 0.0 | (0.0/153.0) |
Total | 224.0 | 156.0 | 0.0079 | (3.0/380.0) |
Maximum Metrics: Maximum metrics at their respective thresholds
metric | threshold | value | idx |
max f1 | 0.9 | 1.0 | 155.0 |
max f2 | 0.9 | 1.0 | 155.0 |
max f0point5 | 0.9 | 1.0 | 155.0 |
max accuracy | 0.9 | 1.0 | 155.0 |
max precision | 1.0 | 1.0 | 0.0 |
max absolute_MCC | 0.9 | 1.0 | 155.0 |
max min_per_class_accuracy | 0.9 | 1.0 | 153.0 |