Modern deep learning libraries such as Keras allow you to define and start fitting a wide range of neural network models in minutes with just a few lines of code.
Nevertheless, it is still challenging to configure a neural network to get good performance on a new predictive modeling problem.
The challenge of getting good performance can be broken down into three main areas: problems with learning, problems with generalization, and problems with predictions.
Once you have diagnosed the specific type of problem that you are having with a network, a suite of classical and modern techniques can then be selected to address the issue and improve performance.
In this post, you will discover a framework for diagnosing performance problems with deep learning models and techniques that you can use to target and improve each specific performance problem.
After reading this post, you will know:Let’s get started.
Framework for Better Deep LearningPhoto by Anupam_ts, some rights reserved.
This tutorial is divided into seven parts; they are:Historically, neural network models had to be coded from scratch.
You might spend days or weeks translating poorly described mathematics into code and days or weeks more debugging your code just to get a simple neural network model to run.
Those days are in the past.
Today, you can define and begin fitting most types of neural networks in minutes with just a few lines of code, thanks to open source libraries such as Keras built on top of sophisticated mathematical libraries such as TensorFlow.
This means that standard models such as Multilayer Perceptrons can be developed and evaluated rapidly, as well as more sophisticated models that may previously have been beyond the capabilities of most practitioners to implement such as Convolutional Neural Networks and Recurrent Neural Networks like the Long Short-Term Memory network.
As deep learning practitioners, we live in amazing and productive times.
Nevertheless, even through new neural network models can be defined and evaluated rapidly, there remains little guidance on how to actually configure neural network models in order to get the most out of them.
Configuring neural network models is often referred to as a “dark art.
”This is because there are no hard and fast rules for configuring a network for a given problem.
We cannot analytically calculate the optimal model type or model configuration for a given dataset.
Instead, there are decades worth of techniques, heuristics, tips, tricks, and other tacit knowledge spread across code, papers, blog posts, and in peoples heads.
A shortcut to configuring a neural network on a problem is to copy the configuration of another network for a similar problem.
But this strategy rarely leads to good results as model configurations are not transferable across problems.
It is also likely that you work on predictive modeling problems that are most unlike other problems described in the literature.
Fortunately, there are techniques that are known to address specific issues when configuring and training a neural network that are available in modern deep learning libraries like Keras.
Further, discoveries have been made in the past 5 to 10 years in areas such as activation functions, adaptive learning rates, regularization methods, and ensemble techniques that have been shown to dramatically improve the performance of neural network models regardless of their specific type.
The techniques are available; you just need to know what they are and when to use them.
Take my free 7-day email crash course now (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Download Your FREE Mini-CourseUnfortunately, you cannot simply grid search across the techniques used to improve deep learning performance.
Almost universally, they uniquely change aspects of the training data, learning process, model architecture, and more.
Instead, you must diagnose the type of performance problem you are having with your model, then carefully choose and evaluate a given intervention tailored to that diagnosed problem.
There are three types of problems that are straightforward to diagnose with regard to poor performance of a deep learning neural network model; they are:This breakdown provides a systematic approach to thinking about the performance of your deep learning model.
There is some natural overlap and interaction between these areas of concern.
For example, problems with learning affect the ability of the model to generalize as well as the variance in the predictions made from a final model.
The sequential relationship between the three areas in the proposed breakdown allows the issue of deep learning model performance to be first isolated, then targeted with a specific technique or methodology.
We can summarize techniques that assist with each of these problems as follows:Now that we have a framework for systematically diagnosing a performance problem with a deep learning neural network, let’s take a look at some examples of techniques that may be used in each of these three areas of concern.
Better learning techniques are those changes to a neural network model or learning algorithm that improve or accelerate the adaptation of the model weights in response to a training dataset.
In this section, we will review the techniques used to improve the adaptation of the model weights.
This begins with the careful configuration of the hyperparameters related to optimizing the neural network model using the stochastic gradient descent algorithm and updating the weights using the backpropagation of error algorithm; for example:This also includes simple data preparation and the automatic rescaling of inputs at deeper layers.
Stochastic gradient descent is a general optimization algorithm that can be applied to a wide range of problems.
Nevertheless, the optimization process (or learning process) can become unstable and specific interventions are required; for example:The limitation of data on some predictive modeling problems can prevent effective learning.
Specialized techniques can be used to jump-start the optimization process, providing a useful initial set of weights or even whole models that can be used for feature extraction; for example:Are there additional techniques that you use to improve learning?.Let me know in the comments below.
Better generalization techniques are those that change the neural network model or learning algorithm to reduce the effect of the model overfitting the training dataset and improve the performance of the model on a holdout validation or test dataset.
In this section, we will review the techniques to reduce generalization of the model during training.
Techniques that are designed to reduce generalization error are commonly referred to as regularization techniques.
Almost universally, regularization is achieved by somehow reducing or limiting model complexity.
Perhaps the most widely understood measure of model complexity is the size or magnitude of the model weights.
A model with large weights is a sign that it may be overly specialized to the inputs in the training data, making it unstable when used when making a prediction on new unseen data.
Keeping weights small via weight regularization is a powerful and widely used technique.
Rather than simply encouraging the weights to remain small via an updated loss function, it is possible to force the weights to be small using a constraint.
The output of a neural network layer, regardless of where that layer is in the stack of layers, can be thought of as an internal representation or set of extracted features with regard to the input.
Simpler internal representations can have a regularizing effect on the model and can be encouraged through constraints that encourage sparsity (zero values).
Noise can be added to the model to encourage robustness with regard to the raw inputs or outputs from prior layers during training; for example:Often, overfitting can occur due simply to training the model for too long on the training dataset.
A simple solution is to stop the training early.
Are there additional techniques that you use to improve generalization?.Let me know in the comments below.
Better prediction techniques are those that complement the model training process in order to reduce the variance in the expected performance of the final model.
In this section, we will review the techniques to reduce the expected variance of a final deep learning neural network model.
The variance in the performance of the final model can be reduced by adding bias.
The most common way to introduce bias to the final model is to combine the predictions from multiple models.
This is referred to as ensemble learning.
More than reducing the variance of the performance of a final model, ensemble learning can also result in better predictive performance.
Effective ensemble learning methods require that each contributing model have skill, meaning that the models make predictions that are better than random, but that the prediction errors between the models have a low correlation.
This means, that the ensemble member models should have skill, but in different ways.
This can be achieved by varying one aspect of the ensemble; for example:The training data can be varied by fitting models on different subsamples of the dataset.
This might involve fitting and retaining models on different randomly selected subsets of the training dataset, retaining models for each fold in a k-fold cross-validation, or retaining models across different samples with replacement using the bootstrap method (e.
Collectively, we can think of these methods as resampling ensembles.
Perhaps the simplest way to vary the members of the ensemble involves gathering models from multiple runs of the learning algorithm on the training dataset.
The stochastic learning algorithm will cause a slightly different fit on each run that, in turn, will have a slightly different fit.
Averaging the models across multiple runs will ensure the performance remains consistent.
Variations on this approach may involve training models with different hyperparameter configurations.
It can be expensive to train multiple final deep learning models, especially when one model may take days or weeks to fit.
An alternative is to collect models for use as contributing ensemble members during a single training run; for example:The simplest way to combine the predictions from multiple ensemble members is to calculate the average of the predictions in the case of regression, or the statistical mode or most frequent prediction in the case of classification.
Alternately, the best way to combine the predictions from multiple models can be learned; for example:An alternative to combining the predictions from the ensemble members, the models themselves may be combined; for example:Are there additional techniques that you use to reduce the variance of the final model?.Let me know in the comments below.
We can think of the organization of techniques into the three areas of better learning, generalization, and prediction as a systematic framework for improving the performance of your neural network model.
There are too many techniques to reasonably investigate and evaluate each in your project.
Instead, you need to be methodical and use the techniques in a targeted way to address a defined problem.
The first step in using this framework is to diagnose the performance problem that you are having with your model.
A robust diagnostic tool is to calculate a learning curve of loss and a problem-specific metric (like RMSE for regression or accuracy for classification) on a train and validation dataset over a given number of training epochs.
Review the techniques that are designed to address your problem.
Select a technique that appears to be a good fit for your model and problem.
This may require some prior experience with the techniques and may be challenging for a beginner.
Thankfully, there are heuristics and best-practices that work well on most problems.
For example:Pick an intervention, then read-up a little bit on it, including how it works, why it works, and importantly, find examples for how practitioners before you have used it to get an idea for how you might use it on your problem.
Once you have identified an issue and addressed it with an intervention, repeat the process.
Developing a better model is an iterative process that may require multiple interventions at multiple levels that complement each other.
This is an empirical process.
This means that you are reliant on the robustness of your test harness to give you a reliable summary of performance before and after an intervention.
Spend the time to ensure your test harness is robust, guarantee that the train, test, and validation datasets are clean and provide a suitably representative sample of observation from your problem domain.
This section provides more resources on the topic if you are looking to go deeper.
In this post, you discovered a framework for diagnosing performance problems with deep learning models and techniques that you can use to target and improve each specific performance problem.
Specifically, you learned:Do you have any questions?.Ask your questions in the comments below and I will do my best to answer.
…with just a few lines of python codeDiscover how in my new Ebook: Better Deep LearningIt provides self-study tutorials on topics like: weight decay, batch normalization, dropout, model stacking and much more…Skip the Academics.
Click to learn more.