In statistics and machine learning, ensemble methods use multiple models to obtain better predictive performance than could be obtained from any of the constituent models (Wikipredia, 2015). This notebook demonstrates an easy way to carry out ensemble learning with H2O models using h2oEnsemble
.
We give our users the ability to build, compare and stack different H2O, MXNet, TensorFlow and Caffe models quickly and easily using the H2O platform.
We need three R packages for this demo: h2o
, h2oEnsemble
and mlbench
.
# Load R Packages
suppressPackageStartupMessages(library(h2o))
suppressPackageStartupMessages(library(mlbench)) # for Boston Housing Data
# Install h2oEnsemble from GitHub if needed
# Reference: https://github.com/h2oai/h2o-3/tree/master/h2o-r/ensemble
if (!require(h2oEnsemble)) {
install.packages("https://h2o-release.s3.amazonaws.com/h2o-ensemble/R/h2oEnsemble_0.1.8.tar.gz", repos = NULL)
}
suppressPackageStartupMessages(library(h2oEnsemble)) # for model stacking
The dataset used in this demo is Boston Housing
from mlbench
, it contains housing values in suburbs of Boston.
Reference: UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Housing)
Source: This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.
Creator: Harrison, D. and Rubinfeld, D.L., 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
Type: Regression
Dimensions: 506 instances, 13 numeric features and 1 numeric target.
13 Features:
Target:
# Import data
data(BostonHousing)
head(BostonHousing)
dim(BostonHousing)
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | medv |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.00632 | 18 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296 | 15.3 | 396.90 | 4.98 | 24.0 |
0.02731 | 0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 | 21.6 |
0.02729 | 0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242 | 17.8 | 392.83 | 4.03 | 34.7 |
0.03237 | 0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222 | 18.7 | 394.63 | 2.94 | 33.4 |
0.06905 | 0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222 | 18.7 | 396.90 | 5.33 | 36.2 |
0.02985 | 0 | 2.18 | 0 | 0.458 | 6.430 | 58.7 | 6.0622 | 3 | 222 | 18.7 | 394.12 | 5.21 | 28.7 |
We want to evaluate the predictive performance on a holdout dataset. The following code split the Boston Housing
data randomly into:
# Split data
set.seed(1234)
row_train <- sample(1:nrow(BostonHousing), 400)
train <- BostonHousing[row_train,]
test <- BostonHousing[-row_train,]
# Training data - quick summary
dim(train)
head(train)
summary(train)
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | medv | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
58 | 0.01432 | 100 | 1.32 | 0 | 0.411 | 6.816 | 40.5 | 8.3248 | 5 | 256 | 15.1 | 392.90 | 3.95 | 31.6 |
315 | 0.36920 | 0 | 9.90 | 0 | 0.544 | 6.567 | 87.3 | 3.6023 | 4 | 304 | 18.4 | 395.69 | 9.28 | 23.8 |
308 | 0.04932 | 33 | 2.18 | 0 | 0.472 | 6.849 | 70.3 | 3.1827 | 7 | 222 | 18.4 | 396.90 | 7.53 | 28.2 |
314 | 0.26938 | 0 | 9.90 | 0 | 0.544 | 6.266 | 82.8 | 3.2628 | 4 | 304 | 18.4 | 393.39 | 7.90 | 21.6 |
433 | 6.44405 | 0 | 18.10 | 0 | 0.584 | 6.425 | 74.8 | 2.2004 | 24 | 666 | 20.2 | 97.95 | 12.03 | 16.1 |
321 | 0.16760 | 0 | 7.38 | 0 | 0.493 | 6.426 | 52.3 | 4.5404 | 5 | 287 | 19.6 | 396.90 | 7.20 | 23.8 |
crim zn indus chas nox Min. : 0.00632 Min. : 0.00 Min. : 0.46 0:370 Min. :0.3850 1st Qu.: 0.07782 1st Qu.: 0.00 1st Qu.: 5.13 1: 30 1st Qu.:0.4520 Median : 0.24751 Median : 0.00 Median : 8.56 Median :0.5380 Mean : 3.33351 Mean : 12.01 Mean :10.98 Mean :0.5549 3rd Qu.: 3.48946 3rd Qu.: 18.50 3rd Qu.:18.10 3rd Qu.:0.6258 Max. :73.53410 Max. :100.00 Max. :27.74 Max. :0.8710 rm age dis rad Min. :3.561 Min. : 6.20 Min. : 1.130 Min. : 1.00 1st Qu.:5.883 1st Qu.: 47.08 1st Qu.: 2.103 1st Qu.: 4.00 Median :6.205 Median : 77.75 Median : 3.239 Median : 5.00 Mean :6.273 Mean : 69.25 Mean : 3.824 Mean : 9.44 3rd Qu.:6.626 3rd Qu.: 94.03 3rd Qu.: 5.234 3rd Qu.:24.00 Max. :8.780 Max. :100.00 Max. :12.127 Max. :24.00 tax ptratio b lstat Min. :187.0 Min. :12.60 Min. : 2.52 Min. : 1.73 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:376.46 1st Qu.: 7.17 Median :330.0 Median :19.10 Median :391.99 Median :11.25 Mean :404.8 Mean :18.52 Mean :359.94 Mean :12.61 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.54 3rd Qu.:16.43 Max. :711.0 Max. :22.00 Max. :396.90 Max. :37.97 medv Min. : 5.00 1st Qu.:17.27 Median :21.15 Mean :22.51 3rd Qu.:24.85 Max. :50.00
# Test data - quick summary
dim(test)
head(test)
summary(test)
crim | zn | indus | chas | nox | rm | age | dis | rad | tax | ptratio | b | lstat | medv | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 0.02731 | 0.0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242 | 17.8 | 396.90 | 9.14 | 21.6 |
10 | 0.17004 | 12.5 | 7.87 | 0 | 0.524 | 6.004 | 85.9 | 6.5921 | 5 | 311 | 15.2 | 386.71 | 17.10 | 18.9 |
13 | 0.09378 | 12.5 | 7.87 | 0 | 0.524 | 5.889 | 39.0 | 5.4509 | 5 | 311 | 15.2 | 390.50 | 15.71 | 21.7 |
18 | 0.78420 | 0.0 | 8.14 | 0 | 0.538 | 5.990 | 81.7 | 4.2579 | 4 | 307 | 21.0 | 386.75 | 14.67 | 17.5 |
24 | 0.98843 | 0.0 | 8.14 | 0 | 0.538 | 5.813 | 100.0 | 4.0952 | 4 | 307 | 21.0 | 394.54 | 19.88 | 14.5 |
28 | 0.95577 | 0.0 | 8.14 | 0 | 0.538 | 6.047 | 88.8 | 4.4534 | 4 | 307 | 21.0 | 306.38 | 17.28 | 14.8 |
crim zn indus chas nox Min. : 0.00906 Min. : 0.000 Min. : 0.740 0:101 Min. :0.4000 1st Qu.: 0.09535 1st Qu.: 0.000 1st Qu.: 5.945 1: 5 1st Qu.:0.4480 Median : 0.30770 Median : 0.000 Median :10.300 Median :0.5350 Mean : 4.67018 Mean : 8.929 Mean :11.720 Mean :0.5540 3rd Qu.: 4.86247 3rd Qu.: 0.000 3rd Qu.:18.100 3rd Qu.:0.6128 Max. :88.97620 Max. :95.000 Max. :27.740 Max. :0.8710 rm age dis rad Min. :4.926 Min. : 2.90 Min. :1.202 Min. : 1.000 1st Qu.:5.910 1st Qu.: 37.98 1st Qu.:2.084 1st Qu.: 4.000 Median :6.231 Median : 76.35 Median :3.117 Median : 5.000 Mean :6.330 Mean : 66.01 Mean :3.686 Mean : 9.962 3rd Qu.:6.562 3rd Qu.: 94.35 3rd Qu.:4.906 3rd Qu.:24.000 Max. :8.398 Max. :100.00 Max. :9.188 Max. :24.000 tax ptratio b lstat Min. :193.0 Min. :13.00 Min. : 0.32 Min. : 2.960 1st Qu.:287.5 1st Qu.:16.60 1st Qu.:368.61 1st Qu.: 6.758 Median :367.5 Median :18.40 Median :389.75 Median :11.690 Mean :421.3 Mean :18.23 Mean :344.37 Mean :12.806 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:395.49 3rd Qu.:17.407 Max. :711.0 Max. :21.20 Max. :396.90 Max. :30.810 medv Min. : 5.00 1st Qu.:15.72 Median :21.45 Mean :22.61 3rd Qu.:26.57 Max. :50.00
We are now ready to train regression models using different algorithms in H2O.
Note 1: Although the three algorithms used in this example are different, the core parameters are consistent (see below). This allows H2O users to get quick and easy access to different existing (and future) algorithms with a very shallow learning curve. The core parameters are: - x = features - y = target - training_frame = h_train
Note 2: For model stacking, we need to generate holdout predictions from cross-validation. The parameters required for model stacking are: - nfolds = 5 - fold_assignment = 'Modulo' - keep_cross_validation_predictions = TRUE
# Convert R data frames into H2O data frames
h_train <- as.h2o(train)
h_test <- as.h2o(test)
# Regression - define features (x) and target (y)
target <- "medv"
features <- setdiff(colnames(train), target)
print(features)
[1] "crim" "zn" "indus" "chas" "nox" "rm" "age" [8] "dis" "rad" "tax" "ptratio" "b" "lstat"
For more information, enter ?h2o.gbm
in R to look at the full list of parameters.
# Train a H2O GBM model
model_gbm <- h2o.gbm(x = features, y = target,
training_frame = h_train,
model_id = "h2o_gbm",
learn_rate = 0.1,
learn_rate_annealing = 0.99,
sample_rate = 0.8,
col_sample_rate = 0.8,
nfolds = 5,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
ntrees = 100)
For more information, enter ?h2o.randomForest
in R to look at the full list of parameters.
# Train a H2O DRF model
model_drf <- h2o.randomForest(x = features, y = target,
training_frame = h_train,
model_id = "h2o_drf",
nfolds = 5,
fold_assignment = "Modulo",
keep_cross_validation_predictions = TRUE,
ntrees = 100)
Now we have three different models, we are ready to carry out model stacking.
# Create a list to include all the models for stacking
models <- list(model_dw, model_gbm, model_drf)
# Define a metalearner (one of the H2O supervised machine learning algorithms)
metalearner <- "h2o.glm.wrapper"
# Use h2o.stack() to carry out metalearning
stack <- h2o.stack(models = models,
response_frame = h_train$medv,
metalearner = metalearner)
[1] "Metalearning"
# Finally, we evaluate the predictive performance on the ensemble as well as indiviudal models.
h2o.ensemble_performance(stack, newdata = h_test)
Base learner performance, sorted by specified metric: learner MSE 1 h2o_deepwater 8.377644 2 h2o_gbm 8.106541 3 h2o_drf 7.443517 H2O Ensemble Performance on <newdata>: ---------------- Family: gaussian Ensemble performance (MSE): 5.80436983051916
# Use the ensemble to make predictions
yhat_test <- predict(stack, h_test)