Advanced Ensemble ClassifiersTarun AcharyaBlockedUnblockFollowFollowingJun 14Ensemble is a Latin-derived word which means ‘union of parts’.

The regular classifiers that are used often are prone to make errors.

As much as these errors are inevitable they can be reduced with the proper construction of a learning classifier.

Ensemble learning is a way of generating various base classifiers from which a new classifier is derived which performs better than any constituent classifier.

These base classifiers may differ in the algorithm used, hyperparameters, representation or the training set.

The key objective of the ensemble methods is to reduce bias and variance.

The figure shows basic outline of ensemble techniques.

Some of the advanced ensemble classifiers are:StackingBlendingBaggingBoostingStacking : Stacking is a method where a single training dataset is given to multiple models and trained.

The training set is further divided using k-fold validation and the resultant model is formed.

Here each model indicates a different algorithm used.

size of the training data=m*n | no.

of models=MThe predictions made from these M models are used as predictors for the final model.

The variables thus collectively formed are used to predict the final classification with more accuracy than each base modelBlending : Blending is a similar technique compared to stacking but the only difference being the dataset is directly divided into training and validation instead of k-fold validation.

Bagging (Bootsrap aggregation) : In this method, n samplings of training data are generated by picking various data items from the training data with replacement.

In bagging, the items in a sampling are chosen randomly as the data is unweighted.

For every iteration,A base model is created on each of these samplings.

The models run in parallel and are independent of each other.

The final predictions are determined by combining the predictions from all the models.

These models collectively form a higher graded model to produce more accuracy.

The final model is averaged by:e= (Σ eᵢ)/nwhere e₁,e₂….

eₙ = base classifiere = final classifierBagging algorithms:Bagging meta-estimatorRandom forestBoosting : Boosting is a self learning technique.

It learns by assigning weight for various items in the data.

The boosting technique initially starts with equal weights but after every model, each model is assigned a weight based on its performance.

Similarly after the evaluation of each model, the misclassified data are given more weight so that the next model has more focus on these items.

For every iteration,It weights each training example by how incorrectly it was classified.

Makes a hypothesisWeights the hypothesisThus the final model is derived from various models which focussed on various groups of data, by voting them based on their weights.

The final model is averaged by using weighted average methode= ((Σ eᵢwᵢ)/*Σ wᵢ))/nwhere, e₁,e₂….

eₙ = base classifierw₁,w₂….

wₙ =weightsn=no.

of modelse = final classifierBoosting algorithms:AdaBoostGBMXGBMLight GBMCatBoostNote : In bagging, the models run in parallel and are independent on each other whereas in boosting the models run in sequence and depend on the previous models.

.. More details