Load Package And Data

load("../../data/craet_8.Rdata")
library(tidyverse)
library(caret)
#Set Parallel Processing - Decrease computation time
if (!require("doMC")) install.packages("doMC")
library(doMC)
registerDoMC(cores = 4)

Train Multiple Models

So now we have predictions from multiple individual models.To do this we had to run the train() function once for each model, store the models and pass it to the res

library(caretEnsemble)

# Stacking Algorithms - Run multiple algos in one call.
trainControl <- trainControl(method="repeatedcv", 
                             number=10, 
                             repeats=3,
                             savePredictions=TRUE, 
                             classProbs=TRUE)

algorithmList <- c('rf', 'adaboost', 'earth', 'svmRadial')

set.seed(100)
models <- caretList(Purchase ~ ., data=trainData, trControl=trainControl, methodList=algorithmList) 

results <- resamples(models)
summary(results)

## 
## Call:
## summary.resamples(object = results)
## 
## Models: rf, adaboost, earth, svmRadial 
## Number of resamples: 30 
## 
## Accuracy 
##                Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## rf        0.7011494 0.7764706 0.7965116 0.8033148 0.8250684 0.9058824    0
## adaboost  0.6823529 0.7674419 0.7906977 0.7966532 0.8328659 0.8941176    0
## earth     0.7209302 0.7906977 0.8187415 0.8164175 0.8367305 0.8604651    0
## svmRadial 0.7558140 0.7948276 0.8304378 0.8261842 0.8588235 0.9058824    0
## 
## Kappa 
##                Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## rf        0.3518625 0.5184810 0.5504351 0.5737290 0.6253768 0.8040346    0
## adaboost  0.3349754 0.5046620 0.5686668 0.5711983 0.6423870 0.7831018    0
## earth     0.4102857 0.5609657 0.6148850 0.6095470 0.6580869 0.7147595    0
## svmRadial 0.4685109 0.5645744 0.6326120 0.6285652 0.6993397 0.7996464    0

# Box plots to compare models
scales <- list(x=list(relation="free"), y=list(relation="free"))
bwplot(results, scales=scales)

Combine The Predictions Of Multiple Models To Form A Final Prediction

One thought: Is it possible to combine these predicted values from multiple models somehow and make a new ensemble that predicts better?
another thought: using the caretStack(). You just need to make sure you don’t use the same trainControl you used to build the models

# Create the trainControl
set.seed(101)
stackControl <- trainControl(method="repeatedcv", 
                             number=10, 
                             repeats=3,
                             savePredictions=TRUE, 
                             classProbs=TRUE)

# Ensemble the predictions of `models` to form a new combined prediction based on glm
# 在原有模型的基础上叠加 一般线性模型 作为预测
stack.glm <- caretStack(models, method="glm", metric="Accuracy", trControl=stackControl)
print(stack.glm)

## A glm ensemble of 2 base models: rf, adaboost, earth, svmRadial
## 
## Ensemble results:
## Generalized Linear Model 
## 
## 2571 samples
##    4 predictor
##    2 classes: 'CH', 'MM' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 2314, 2314, 2314, 2314, 2313, 2313, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.8321128  0.6419638

# Predict on testData
stack_predicteds <- predict(stack.glm, newdata=testData4)
head(stack_predicteds)

## [1] CH CH CH CH CH MM
## Levels: CH MM

save.image("../../data/craet_9.Rdata")

A point to consider: The ensembles tend to perform better if the predictions are less correlated with each other.

9 Ensembling The Predictions

Load Package And Data

Train Multiple Models

Combine The Predictions Of Multiple Models To Form A Final Prediction

Jixing Liu

9 Ensembling The Predictions

Load Package And Data

Train Multiple Models

Combine The Predictions Of Multiple Models To Form A Final Prediction

Jixing Liu

使用 R 输出格式化的 Excel

如何拟合一条曲线

努力后的失败，才是诚实的失败

蝇王

如何阅读大量的学术论文, 而不发疯？

多标签分类问题

新药研发

Deep Work

The Hello World Of Neural Network

使用 R 分析可视化你的 iPhone 健康 APP 数据