We will use tensordot() function to apply the tensor product with the required summing; the updated ensemble_predictions() function is listed below.Next, we must update evaluate_ensemble() to pass along the weights when making the prediction for the ensemble.We will use a modest-sized ensemble of five members, that appeared to perform well in the model averaging ensemble.We can then estimate the performance of each individual model on the test dataset as a reference.Next, we can use a weight of 1/5 or 0.2 for each of the five ensemble members and use the new functions to estimate the performance of a model averaging ensemble, a so-called equal-weight ensemble.We would expect this ensemble to perform as well or better than any single model.Finally, we can develop a weighted average ensemble.A simple, but exhaustive approach to finding weights for the ensemble members is to grid search values..We can define a course grid of weight values from 0.0 to 1.0 in steps of 0.1, then generate all possible five-element vectors with those values..Generating all possible combinations is called a Cartesian product, which can be implemented in Python using the itertools.product() function from the standard library.A limitation of this approach is that the vectors of weights will not sum to one (called the unit norm), as required..We can force reach generated weight vector to have a unit norm by calculating the sum of the absolute weight values (called the L1 norm) and dividing each weight by that value..The normalize() function below implements this hack.We can now enumerate each weight vector generated by the Cartesian product, normalize it, and evaluate it by making a prediction and keeping the best to be used in our final weight averaging ensemble.Once discovered, we can report the performance of our weight average ensemble on the test dataset, which we would expect to be better than the best single model and ideally better than the model averaging ensemble.The complete example is listed below.Running the example first creates the five single models and evaluates their performance on the test dataset.Your specific results will vary given the stochastic nature of the learning algorithm.On this run, we can see that model 2 has the best solo performance of about 81.7% accuracy.Next, a model averaging ensemble is created with a performance of about 80.7%, which is reasonable compared to most of the models, but not all.Next, the grid search is performed..It is pretty slow and may take about twenty minutes on modern hardware..The process could easily be made parallel using libraries such as Joblib.Each time a new top performing set of weights is discovered, it is reported along with its performance on the test dataset..We can see that during the run, the process discovered that using model 2 alone resulted in a good performance, until it was replaced with something better.We can see that the best performance was achieved on this run using the weights that focus only on the first and second models with the accuracy of 81.8% on the test dataset..This out-performs both the single models and the model averaging ensemble on the same dataset.An alternate approach to finding weights would be a random search, which has been shown to be effective more generally for model hyperparameter tuning.An alternative to searching for weight values is to use a directed optimization process.Optimization is a search process, but instead of sampling the space of possible solutions randomly or exhaustively, the search process uses any available information to make the next step in the search, such as toward a set of weights that has lower error.The SciPy library offers many excellent optimization algorithms, including local and global search methods.SciPy provides an implementation of the Differential Evolution method..This is one of the few stochastic global search algorithms that “just works” for function optimization with continuous inputs, and it works well.The differential_evolution() SciPy function requires that function is specified to evaluate a set of weights and return a score to be minimized..We can minimize the classification error (1 – accuracy).As with the grid search, we most normalize the weight vector before we evaluate it..The loss_function() function below will be used as the evaluation function during the optimization process.We must also specify the bounds of the optimization process.. More details