Load Package And Data

load("../../data/craet_8.Rdata")
library(tidyverse)
library(caret)
#Set Parallel Processing - Decrease computation time
if (!require("doMC")) install.packages("doMC")
library(doMC)
registerDoMC(cores = 4)

Train Multiple Models

So now we have predictions from multiple individual models.To do this we had to run the train() function once for each model, store the models and pass it to the res

library(caretEnsemble)

# Stacking Algorithms - Run multiple algos in one call.
trainControl <- trainControl(method="repeatedcv", 
                             number=10, 
                             repeats=3,
                             savePredictions=TRUE, 
                             classProbs=TRUE)

algorithmList <- c('rf', 'adaboost', 'earth', 'svmRadial')

set.seed(100)
models <- caretList(Purchase ~ ., data=trainData, trControl=trainControl, methodList=algorithmList) 

results <- resamples(models)
summary(results)
## 
## Call:
## summary.resamples(object = results)
## 
## Models: rf, adaboost, earth, svmRadial 
## Number of resamples: 30 
## 
## Accuracy 
##                Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## rf        0.7011494 0.7764706 0.7965116 0.8033148 0.8250684 0.9058824    0
## adaboost  0.6823529 0.7674419 0.7906977 0.7966532 0.8328659 0.8941176    0
## earth     0.7209302 0.7906977 0.8187415 0.8164175 0.8367305 0.8604651    0
## svmRadial 0.7558140 0.7948276 0.8304378 0.8261842 0.8588235 0.9058824    0
## 
## Kappa 
##                Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## rf        0.3518625 0.5184810 0.5504351 0.5737290 0.6253768 0.8040346    0
## adaboost  0.3349754 0.5046620 0.5686668 0.5711983 0.6423870 0.7831018    0
## earth     0.4102857 0.5609657 0.6148850 0.6095470 0.6580869 0.7147595    0
## svmRadial 0.4685109 0.5645744 0.6326120 0.6285652 0.6993397 0.7996464    0
# Box plots to compare models
scales <- list(x=list(relation="free"), y=list(relation="free"))
bwplot(results, scales=scales)

Combine The Predictions Of Multiple Models To Form A Final Prediction

  • One thought: Is it possible to combine these predicted values from multiple models somehow and make a new ensemble that predicts better?
  • another thought: using the caretStack(). You just need to make sure you don’t use the same trainControl you used to build the models
# Create the trainControl
set.seed(101)
stackControl <- trainControl(method="repeatedcv", 
                             number=10, 
                             repeats=3,
                             savePredictions=TRUE, 
                             classProbs=TRUE)

# Ensemble the predictions of `models` to form a new combined prediction based on glm
# 在原有模型的基础上叠加 一般线性模型 作为预测
stack.glm <- caretStack(models, method="glm", metric="Accuracy", trControl=stackControl)
print(stack.glm)
## A glm ensemble of 2 base models: rf, adaboost, earth, svmRadial
## 
## Ensemble results:
## Generalized Linear Model 
## 
## 2571 samples
##    4 predictor
##    2 classes: 'CH', 'MM' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 2314, 2314, 2314, 2314, 2313, 2313, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8321128  0.6419638
# Predict on testData
stack_predicteds <- predict(stack.glm, newdata=testData4)
head(stack_predicteds)
## [1] CH CH CH CH CH MM
## Levels: CH MM
save.image("../../data/craet_9.Rdata")

A point to consider: The ensembles tend to perform better if the predictions are less correlated with each other.