Overview of Machine Learning Ensemble Methods
Voting ensembles combine diverse machine learning models using techniques like majority voting or average predictions. The individual models used in the ensemble could be regression or classification-based algorithms. Once the individual models have been trained, the ensemble can be constructed in a couple of different ways.
In regression, ensembles are created by averaging the (weighted or unweighted) predictions. In classification, ensembles are created through majority voting (also called hard voting) of predicted classes or through (weighted or unweighted) average predicted probabilities (also called soft voting).
Voting ensembles can be easily applied in python using these classes in the Sklearn library:
These classes can also be combined with grid search and cross-validation to tune the hyperparameters in the models such as choosing between hard and soft voting or choosing the best weights.
Bagging ensembles are created from several instances of a single algorithm that are each trained on different subsets of the training data.
There are a few methods that fall under the category of bagging ensembles, and they differ only in the way that the subsets are chosen:
- When subsets are formed by sampling the observations with replacement, the method is called bagging (short for bootstrap aggregation).
- When subsets are formed by sampling the observations without replacement, the method is called pasting (not short for anything).
- When subsets are formed by sampling a set of features only, then the method is called random subspaces.
- When subsets are formed by sampling both the features and observations, then the method is called random patches.
The final ensemble is built by aggregating the individual predictions made for each sample of observations. This aggregation is performed either by majority voting or by (unweighted or weighted) averaging.
When bagging is performed on decision trees, this is a special form of ensembling known as Random Forests and contains many unique, complex features compared to other algorithms.
Bagging can be implemented in python with these Sklearn classes:
- Plus also,
Boosting ensembles are formed by incrementally combining a set of weak learners into a single strong learner. The term ‘weak learner’ here represents a model that may only be slightly better than taking a random guess at the prediction. In the case of regression, this guess is made by simply using the mean value of the target variable.
In contrast to bagging, each weak learner is trained on the entire dataset, and no sampling takes place. However, in boosting, each observation in the training data is assigned a weight based on the ability of the algorithm to predict it accurately.
Since ensembles are constructed sequentially, boosting can be very slow and computationally expensive.
There are many implementations for boosting ensembles. Some have become very popular and are the default methods for applying boosting. These are five of the most common boosting implementations used today, as well as their associated python libraries:
- AdaBoost (
- (Standard) Gradient boosting (
- Histogram-based gradient boosting (
- Extreme Gradient Boosting (XGBoost library)
- Light Gradient Boosting Machine (LightGBM library)
Stacking is short for stacked generalization, and it involves a different approach for aggregating the predictions made by the individual models in the ensemble. Instead of averaging or finding the majority vote of a prediction, stacking uses a machine learning model to select the best way to combine the predictions.
Stacking can be applied using python and Sklearn and these libraries:
A special application of stacking is Super Learner Ensembling. This method applies k-fold cross-validation and builds an ensemble from the out-of-fold predictions from each model.
Super learner ensembles can be implemented in python using Sklearn. Alternatively, there is a library called ML-Ensemble (also called mlens) that builds on Sklearn to simplify the process.
Implicit & Explicit Ensembles
Implicit and explicit ensembles are actually single models that are trained in such a way that it behaves like an ensemble of multiple neural networks. This allows ensembles to be built without the additional training costs involved in training multiple models.
Implicit ensembles share their model parameters, and the averaging of the ensemble models is done at test time with a single unthinned network. Some implicit ensemble models are:
- Stochastic depth
- Swapout (hybrid of dropout and stochastic depth)
- Dropout as a bayesian approximation
Explicit ensembles do not share their model parameters and would produce an ensemble model through standard approaches like majority voting or averaging. Some explicit ensemble models are:
- Snapshot ensembling
- Random vector functional link networks
Clustering is an unsupervised machine learning technique that is used to group similar observations together. Ensemble techniques are used in clustering to combine several weak clusters into a strong one. However, each weak cluster must be as different from each other as possible in order for the ensemble to be a significant improvement.
There are a few approaches that can be used to create clusters that differ:
- Using different sampling data
- Using different subsets of features
- Using different clustering methods
- Adding random noise to the models
This technique gains a consensus on the assigned cluster of an observation based on a combination of all other iterations of the clustering algorithm in the ensemble. Clustering ensembles are made up of (1) the generation mechanism and (2) the consensus function. The generation mechanism is simply the set of iterations of the clustering algorithm that will be combined to form the ensemble. The consensus function defines the way that the ensemble will reach a consensus on the clusters. The biggest challenge in clustering ensembles is to produce a combined ensemble (through the consensus function) that improves the result over any single clustering algorithm.
There are 2 primary consensus function approaches:
- Objects co-occurrence: this uses a voting process to assign clusters
- Median partition: this finds clusters through an optimization problem that finds the median partition with respect to the cluster ensemble. The median partition can be thought of as the partition that best summarises the results of all models in the ensemble.
Consensus clustering can be implemented in python using the Cluster Ensembles library. However, some of the more niche implementations of consensus clustering may need to be done from scratch in python as there doesn’t seem to be a lot of support out there for them yet.
I hope this blog post gave you a good high-level overview of some of the common ensemble methods available today. Look out for future blog posts where I dive into some of these methods with detailed descriptions and practical applications in python.
If you have any questions, feel free to reach out on Twitter or leave a comment below!