9 Ensembling The Predictions
Load Package And Data
load("../../data/craet_8.Rdata")
library(tidyverse)
library(caret)
#Set Parallel Processing - Decrease computation time
if (!require("doMC")) install.packages("doMC")
library(doMC)
registerDoMC(cores = 4)
Train Multiple Models
So now we have predictions from multiple individual models.To do this we had to run the train() function once for each model, store the models and pass it to the res
library(caretEnsemble)
# Stacking Algorithms - Run multiple algos in one call.
trainControl <- trainControl(method="repeatedcv",
number=10,
repeats=3,
savePredictions=TRUE,
classProbs=TRUE)
algorithmList <- c('rf', 'adaboost', 'earth', 'svmRadial')
set.seed(100)
models <- caretList(Purchase ~ ., data=trainData, trControl=trainControl, methodList=algorithmList)
results <- resamples(models)
summary(results)
##
## Call:
## summary.resamples(object = results)
##
## Models: rf, adaboost, earth, svmRadial
## Number of resamples: 30
##
## Accuracy
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## rf 0.7011494 0.7764706 0.7965116 0.8033148 0.8250684 0.9058824 0
## adaboost 0.6823529 0.7674419 0.7906977 0.7966532 0.8328659 0.8941176 0
## earth 0.7209302 0.7906977 0.8187415 0.8164175 0.8367305 0.8604651 0
## svmRadial 0.7558140 0.7948276 0.8304378 0.8261842 0.8588235 0.9058824 0
##
## Kappa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## rf 0.3518625 0.5184810 0.5504351 0.5737290 0.6253768 0.8040346 0
## adaboost 0.3349754 0.5046620 0.5686668 0.5711983 0.6423870 0.7831018 0
## earth 0.4102857 0.5609657 0.6148850 0.6095470 0.6580869 0.7147595 0
## svmRadial 0.4685109 0.5645744 0.6326120 0.6285652 0.6993397 0.7996464 0
# Box plots to compare models
scales <- list(x=list(relation="free"), y=list(relation="free"))
bwplot(results, scales=scales)
Combine The Predictions Of Multiple Models To Form A Final Prediction
- One thought: Is it possible to combine these predicted values from multiple models somehow and make a new ensemble that predicts better?
- another thought: using the caretStack(). You just need to make sure you don’t use the same trainControl you used to build the models
# Create the trainControl
set.seed(101)
stackControl <- trainControl(method="repeatedcv",
number=10,
repeats=3,
savePredictions=TRUE,
classProbs=TRUE)
# Ensemble the predictions of `models` to form a new combined prediction based on glm
# 在原有模型的基础上叠加 一般线性模型 作为预测
stack.glm <- caretStack(models, method="glm", metric="Accuracy", trControl=stackControl)
print(stack.glm)
## A glm ensemble of 2 base models: rf, adaboost, earth, svmRadial
##
## Ensemble results:
## Generalized Linear Model
##
## 2571 samples
## 4 predictor
## 2 classes: 'CH', 'MM'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 2314, 2314, 2314, 2314, 2313, 2313, ...
## Resampling results:
##
## Accuracy Kappa
## 0.8321128 0.6419638
# Predict on testData
stack_predicteds <- predict(stack.glm, newdata=testData4)
head(stack_predicteds)
## [1] CH CH CH CH CH MM
## Levels: CH MM
save.image("../../data/craet_9.Rdata")
A point to consider: The ensembles tend to perform better if the predictions are less correlated with each other.