Fitting a Neural Network Using Randomized Optimization in PythonHow randomized optimization can be used to find the optimal weights for machine learning models, such as neural networks and regression modelsGenevieve HayesBlockedUnblockFollowFollowingJan 24Python’s mlrose package provides functionality for implementing some of the most popular randomization and search algorithms, and applying them to a range of different optimization problem domains.
In this tutorial, we will discuss how mlrose can be used to find the optimal weights for machine learning models, such as neural networks and regression models.
That is, to solve the machine learning weight optimization problem.
This is the third in a series of three tutorials about using mlrose to solve randomized optimization problems.
Part 1 can be found here and Part 2 can be found here.
What is the Machine Learning Weight Optimization Problem?For a number of different machine learning models, the process of fitting the model parameters involves finding the parameter values that minimize a pre-specified loss function for a given training set.
Examples of such models include neural networks, linear regression models and logistic regression models, and the optimal model weights for such models are typically found using methods such as gradient descent.
However, the problem of fitting the parameters (or weights) of a machine learning model can also be viewed as a continuous-state optimization problem, where the loss function takes the role of the fitness function, and the goal is to minimize this function.
By framing the problem this way, we can use any of the randomized optimization algorithms that are suited to continuous-state optimization problems to fit the model parameters.
Solving Machine Learning Weight Optimization Problems with mlrosemlrose contains built-in functionality for solving the weight optimization problem for three types of machine learning models: (standard) neural networks, linear regression models and logistic regression models.
This is done using the NeuralNetwork(), LinearRegression()and LogisticRegression() classes respectively.
Each of these classes includes a fit method, which implements the three steps for solving an optimization problem defined in the previous tutorials, for a given training set.
That is,Define a fitness function object.
Define an optimization problem object.
Select and run a randomized optimization algorithm.
However, when fitting a machine learning model, finding the optimal model weights is merely a means to an end.
We want to find the optimal model weights so that we can use our fitted model to predict the labels of future observations as accurately as possible, not because we are actually interested in knowing the optimal weight values.
As a result, the above mentioned classes also include a predict method, which, if called after the fit method, will predict the labels for a given test set using the fitted model.
The steps involved in solving a machine learning weight optimization problem with mlrose are then, typically:Initialize a machine learning weight optimization problem object.
Find the optimal model weights for a given training set by calling the fit method of the object initialized in Step 1.
Predict the labels for a test set by calling the predict method of the object initialized in Step 1.
To fit the model weights, the user can choose between using either randomized hill climbing, simulated annealing, the genetic algorithm or gradient descent.
[In mlrose, the gradient descent algorithm is only available for use in solving the machine learning weight optimization problem and has been included primarily for benchmarking purposes, since this is one of the most common algorithms used in fitting neural networks and regression models.
]We will now work through an example to illustrate how mlrose can be used to fit a neural network and a regression model to a given dataset.
Before starting with the example, you will need to import the mlrose and Numpy Python packages.
import mlroseimport numpy as npExample: the Iris DatasetThe Iris dataset is a famous multivariate classification dataset first presented in a 1936 research paper by statistician and biologist Ronald Fisher.
Irises (1889) by Vincent Van GoghIt contains 150 observations of three classes (species) of iris flowers (50 observations of each class), with each observation providing the sepal length, sepal width, petal length and petal width (i.
the feature values), as well as the class label (i.
the target value), of each flower under consideration.
The Iris dataset is included with Python’s sklearn package.
The feature values and label of the first observation in the dataset are shown below, along with the maximum and minimum values of each of the features and the unique label values:The feature values for Obs 0 are: [5.
2]The feature names are: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']The target value for Obs 0 is: 0The target name for Obs 0 is: setosaThe minimum values of the four features are: [4.
1]The maximum values of the four features are: [7.
5]The unique target values are: [0 1 2]From this we can see that all features in the Iris data set are numeric, albeit with different ranges, and that the class labels have been represented by integers.
In the next few sections we will show how mlrose can be used to fit a neural network and a logistic regression model to this dataset, to predict the species of an iris flower given its feature values.
Data Pre-ProcessingBefore we can fit any sort of machine learning model to a dataset, it is necessary to manipulate our data into the form expected by mlrose.
Each of the three machine learning models supported by mlrose expect to receive feature data in the form of a Numpy array, with one row per observation and numeric features only (any categorical features must be one-hot encoded before passing to the machine learning models).
The models also expect to receive the target values as either: a list of numeric values (for regression data); a list of 0–1 indicator values (for binary classification data); or as a Numpy array of one-hot encoded labels, with one row per observation (for multi-class classification data).
In the case of the Iris dataset, all of our features are numeric, so no one-hot encoding is required.
However, it is necessary to one-hot encode the class labels.
In keeping with standard machine learning practice, it is also necessary to split the data into training and test subsets, and since the range of the Iris data varies considerably from feature to feature, to standardize the values of our feature variables.
These pre-processing steps are implemented below:Neural NetworksOnce the data has been pre-processed, fitting a neural network in mlrose simply involves following the steps listed above.
Suppose we wish to fit a neural network classifier to the Iris dataset with one hidden layer containing 2 nodes and a ReLU activation function (mlrose supports the ReLU, identity, sigmoid and tanh activation functions).
For this example, we will use the Randomized Hill Climbing algorithm to find the optimal weights, with a maximum of 1000 iterations of the algorithm and 100 attempts to find a better set of weights at each step.
We will also include a bias term; use a step size (learning rate) of 0.
0001 (to find neighbors of the current set of weights); and limit our weights to being in the range -5 to 5 (to reduce the landscape over which the algorithm must search in order to find the optimal weights).
This model is initialized and fitted to our pre-processed data below:Once the model is fitted, we can use it to predict the labels for our training and test sets, and use these predictions to assess the model’s training and test accuracy.
Training accuracy: 0.
45Test accuracy: 0.
533333333333In this case, our model achieves training accuracy of 45% and test accuracy of 53.
These accuracy levels are better than if the labels were selected at random, but still leave room for improvement.
We can potentially improve on the accuracy of our model by tuning the parameters we set when initializing the neural network object.
Suppose we decide to change the optimization algorithm to gradient descent, but leave all other model parameters unchanged.
Training accuracy: 0.
625Test accuracy: 0.
566666666667This results in a 39% increase in training accuracy to 62.
5%, but a much smaller increase in test accuracy to 56.
Linear and Logistic Regression ModelsLinear and logistic regression models are special cases of neural networks.
A linear regression is a regression neural network with no hidden layers and an identity activation function, while a logistic regression is a classification neural network with no hidden layers and a sigmoid activation function.
As a result, we could fit either of these models to our data using the NeuralNetwork() class, with parameters set appropriately.
For example, suppose we wished to fit a logistic regression to the Iris data using the randomized hill climbing algorithm and all other parameters set as for the example in the previous section.
We could do this by initializing a NeuralNetwork() object like so:However, for convenience, mlrose provides the LinearRegression() and LogisticRegression() wrapper classes, which simplify model initialization.
In the Iris dataset example, we can, thus, initialize and fit our logistic regression model as follows:Training accuracy: 0.
191666666667Test accuracy: 0.
0666666666667This model achieves 19.
2% training accuracy and 6.
7% test accuracy, which is worse than if we predicted the labels by selecting values at random.
Nevertheless, as in the previous section, we can potentially improve model accuracy by tuning the parameters set at initialization.
Suppose we increase our learning rate to 0.
Training accuracy: 0.
683333333333Test accuracy: 0.
7This results in significant improvements to both training and test accuracy, with training accuracy levels now reaching 68.
3% and test accuracy levels reaching 70%.
SummaryIn this tutorial we discussed how mlrose can be used to find the optimal weights of three types of machine learning models: neural networks, linear regression models and logistic regression models.
Applying randomized optimization algorithms to the machine learning weight optimization problem is most certainly not the most common approach to solving this problem.
However, it serves to demonstrate the versatility of the mlrose package and of randomized optimization algorithms in general.
To learn more about mlrose, visit the GitHub repository for this package, available here.
About the AuthorGenevieve Hayes is an Data Scientist with experience in the insurance, government and education sectors, in both managerial and technical roles.
She holds a PhD in Statistics from the Australian National University and a Master of Science in Computer Science (Machine Learning) from Georgia Institute of Technology.
You can learn more about her here.