How to Reduce Variance in the Final Deep Learning Model With a Horizontal Voting Ensemble

This makes choosing which model to use as the final model risky, as there is no clear signal as to which model is better than another toward the end of the training run.The horizontal voting ensemble is a simple method to address this issue, where a collection of models saved over contiguous training epochs towards the end of a training run are saved and used as an ensemble that results in more stable and better performance on average than randomly choosing a single final model.In this tutorial, you will discover how to reduce the variance of a final deep learning neural network model using a horizontal voting ensemble.After completing this tutorial, you will know:Let’s get started.How to Reduce Variance in the Final Deep Learning Model With a Horizontal Voting EnsemblePhoto by Fatima Flores, some rights reserved.This tutorial is divided into five parts; they are:Ensemble learning combines the predictions from multiple models.A challenge when using ensemble learning when using deep learning methods is that given the use of very large datasets and large models, a given training run may take days, weeks, or even months..Training multiple models may not be feasible.An alternative source of models that may contribute to an ensemble are the state of a single model at different points during training.Horizontal voting is an ensemble method proposed by Jingjing Xie, et their 2013 paper “Horizontal and Vertical Ensemble with Deep Representation for Classification.”The method involves using multiple models from the end of a contiguous block of epochs before the end of training in an ensemble to make predictions.The approach was developed specifically for those predictive modeling problems where the training dataset is relatively small compared to the number of predictions required by the model..This results in a model that has a high variance in performance during training..In this situation, using the final model or any given model toward the end of the training process is risky given the variance in performance.… the error rate of classification would first decline and then tend to be stable with the training epoch grows..But when size of labeled training set is too small, the error rate would oscillate […] So it is difficult to choose a “magic” epoch to obtain a reliable output.— Horizontal and Vertical Ensemble with Deep Representation for Classification, 2013.Instead, the authors suggest using all of the models in an ensemble from a contiguous block of epochs during training, such as models from the last 200 epochs..The result are predictions by the ensemble that are as good as or better than any single model in the ensemble.To reduce the instability, we put forward a method called Horizontal Voting..First, networks trained for a relatively stable range of epoch are selected..The predictions of the probability of each label are produced by standard classifiers with top level representation of the selected epoch, and then averaged.— Horizontal and Vertical Ensemble with Deep Representation for Classification, 2013.As such, the horizontal voting ensemble method provides an ideal method for both cases where a given model requires vast computational resources to train, and/or cases where final model selection is challenging given the high variance of training due to the use of a relatively small training dataset.Now that are we are familiar with horizontal voting, we can implement the procedure.Take my free 7-day email crash course now (with sample code).Click to sign-up and also get a free PDF Ebook version of the course.Download Your FREE Mini-CourseWe will use a small multi-class classification problem as the basis to demonstrate a horizontal voting ensemble.The scikit-learn class provides the make_blobs() function that can be used to create a multi-class classification problem with the prescribed number of samples, input variables, classes, and variance of samples within a class.The problem has two input variables (to represent the x and y coordinates of the points) and a standard deviation of 2.0 for points within each group..We will use the same random state (seed for the pseudorandom number generator) to ensure that we always get the same data points.The results are the input and output elements of a dataset that we can model.In order to get a feeling for the complexity of the problem, we can graph each point on a two-dimensional scatter plot and color each point by class value.The complete example is listed below.Running the example creates a scatter plot of the entire dataset..We can see that the standard deviation of 2.0 means that the classes are not linearly separable (can be separated by a line) causing many ambiguous points.This is desirable as it means that the problem is non-trivial and will allow a neural network model to find many different “good enough” candidate solutions resulting in a high variance.Scatter Plot of Blobs Dataset With Three Classes and Points Colored by Class ValueBefore we define a model, we need to contrive a problem that is appropriate for a horizontal voting ensemble.In our problem, the training dataset is relatively small..Specifically, there is a 10:1 ratio of examples in the training dataset to the holdout dataset.. More details

Leave a Reply