Stock Market Prediction by Recurrent Neural Network on LSTM ModelAniruddha ChoudhuryBlockedUnblockFollowFollowingJan 10The art of forecasting stock prices has been a difficult task for many of the researchers and analysts.

In fact, investors are highly interested in the research area of stock price prediction.

For a good and successful investment, many investors are keen on knowing the future situation of the stock market.

Good and effective prediction systems for stock market help traders, investors, and analyst by providing supportive information like the future direction of the stock market.

In this work, we present a recurrent neural network (RNN) and Long Short-Term Memory (LSTM) approach to predict stock market indices.

IntroductionThere are a lot of complicated financial indicators and also the fluctuation of the stock market is highly violent.

However, as the technology is getting advanced, the opportunity to gain a steady fortune from the stock market is increased and it also helps experts to find out the most informative indicators to make a better prediction.

The prediction of the market value is of great importance to help in maximizing the profit of stock option purchase while keeping the risk low.

Recurrent neural networks (RNN) have proved one of the most powerful models for processing sequential data.

Long Short-Term memory is one of the most successful RNNs architectures.

LSTM introduces the memory cell, a unit of computation that replaces traditional artificial neurons in the hidden layer of the network.

With these memory cells, networks are able to effectively associate memories and input remote in time, hence suit to grasp the structure of data dynamically over time with high prediction capacity.

LSTM ArchitectureLSTM ArchitectureWe will start by implementing the LSTM cell for a single time-step.

Then we can iteratively call it from inside a for-loop to have it process input with Tx time-steps.

About the gatesMethodology Stage 1: Raw Data: In this stage, the historical stock data is collected from the Google stock price and this historical data is used for the prediction of future stock prices.

dataset = pd.

read_csv('Google_Stock_Price_Train.

csv',index_col="Date",parse_dates=True)Google Stock Dataset Stage 2: Data Preprocessing: The pre-processing stage involves a) Data discretization: Part of data reduction but with particular importance, especially for numerical data b) Data transformation: Normalization.

c) Data cleaning: Fill in missing values.

d) Data integration: Integration of data files.

After the dataset is transformed into a clean dataset, the dataset is divided into training and testing sets so as to evaluate.

Creating a data structure with 60 timesteps and 1 output#Data cleaningdataset.

isna().

any()# Feature Scaling Normalizationfrom sklearn.

preprocessing import MinMaxScalersc = MinMaxScaler(feature_range = (0, 1))training_set_scaled = sc.

fit_transform(training_set)# Creating a data structure with 60 timesteps and 1 outputX_train = []y_train = []for i in range(60, 1258): X_train.

append(training_set_scaled[i-60:i, 0]) y_train.

append(training_set_scaled[i, 0])X_train, y_train = np.

array(X_train), np.

array(y_train)# ReshapingX_train = np.

reshape(X_train, (X_train.

shape[0], X_train.

shape[1], 1)) Stage 3: Feature Extraction: In this layer, only the features which are to be fed to the neural network are chosen.

We will choose the feature from Date, open, high, low, close, and volume.

Building the RNN LSTM model# Importing the Keras libraries and packagesfrom keras.

models import Sequentialfrom keras.

layers import Densefrom keras.

layers import LSTMfrom keras.

layers import DropoutUsing TensorFlow backend.

Stage 4: Training Neural Network: In this stage, the data is fed to the neural network and trained for prediction assigning random biases and weights.

Our LSTM model is composed of a sequential input layer followed by 3 LSTM layers and dense layer with activation and then finally a dense output layer with linear activation function.

# Initialising the RNNregressor = Sequential()# Adding the first LSTM layer and some Dropout regularisationregressor.

add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.

shape[1], 1)))regressor.

add(Dropout(0.

2))# Adding a second LSTM layer and some Dropout regularisationregressor.

add(LSTM(units = 50, return_sequences = True))regressor.

add(Dropout(0.

2))# Adding a third LSTM layer and some Dropout regularisationregressor.

add(LSTM(units = 50, return_sequences = True))regressor.

add(Dropout(0.

2))# Adding a fourth LSTM layer and some Dropout regularisationregressor.

add(LSTM(units = 50))regressor.

add(Dropout(0.

2))# Adding the output layerregressor.

add(Dense(units = 1))# Compiling the RNNregressor.

compile(optimizer = 'adam', loss = 'mean_squared_error')# Fitting the RNN to the Training setregressor.

fit(X_train, y_train, epochs = 100, batch_size = 32)OptimizerThe type of optimizer used can greatly affect how fast the algorithm converges to the minimum value.

Also, it is important that there is some notion of randomness to avoid getting stuck in a local minimum and not reach the global minimum.

There are a few great algorithms, but I have chosen to use Adam optimizer.

The Adam optimizer combines the perks of two other optimizers: ADAgrad and RMSprop.

The ADAgrad optimizer essentially uses a different learning rate for every parameter and every time step.

The reasoning behind ADAgrad is that the parameters that are infrequent must have larger learning rates while parameters that are frequent must have smaller learning rates.

In other words, the stochastic gradient descent update for ADAgrad becomesWhereThe learning rate is calculated based on the past gradients that have been computed for each parameter.

Hence,Where G is the matrix of sums of squares of the past gradients.

The issue with this optimization is that the learning rates start vanishing very quickly as the iterations increase.

RMSprop considers fixing the diminishing learning rate by only using a certain number of previous gradients.

The updates becomeWhereNow that we understand how those two optimizers work, we can look into how Adam works.

Adaptive Moment Estimation, or Adam, is another method that computes the adaptive learning rates for each parameter by considering the exponentially decaying average of past squared gradients and the exponentially decaying average of past gradients.

This can be represented asThe v and m can be considered as the estimates of the first and second moment of the gradients respectively, hence getting the name Adaptive Moment Estimation.

When this was first used, researchers observed that there was an inherent bias towards 0 and they countered this by using the following estimates:This leads us to the final gradient update ruleThis is the optimizer that I used, and the benefits are summarized into the following:The learning rate is different for every parameter and every iteration.

The learning does not diminish as with the ADAgrad.

The gradient update uses the moments of the distribution of weights, allowing for a more statistically sound descent.

RegularizationAnother important aspect of training the model is making sure the weights do not get too large and start focusing on one data point, hence overfitting.

So we should always include a penalty for large weights (the definition of large would be depending on the type of regulariser used).

I have chosen to use Tikhonov regularization, which can be thought of as the following minimization problem:The fact that the function space is in a Reproducing Kernel Hilbert Space (RKHS) ensures that the notion of a norm exists.

This allows us to encode the notion of the norm into our regularizer.

DropoutsA newer method of preventing overfitting considers what happens when some of the neurons are suddenly not working.

This forces the model to not be overdependent on any groups of neurons, and consider all of them.

Dropouts have found their use in making the neurons more robust and hence allowing them to predict the trend without focusing on any one neuron.

Here are the results of using dropouts Stage 5: Output Generation: In this layer, the output value generated by the output layer of the RNN is compared with the target value.

The error or the difference between the target and the obtained output value is minimized by using back propagation algorithm which adjusts the weights and the biases of the network.

Epoch 97/1001198/1198 [==============================] – 6s 5ms/step – loss: 0.

0018Epoch 98/1001198/1198 [==============================] – 6s 5ms/step – loss: 0.

0014Epoch 99/1001198/1198 [==============================] – 6s 5ms/step – loss: 0.

0014Epoch 100/1001198/1198 [==============================] – 6s 5ms/step – loss: 0.

0015We followed the same for test Data Preprocessing: The pre-processing stage involves a) Data discretization: Part of data reduction but with particular importance, especially for numerical data b) Data transformation: Normalization.

c) Data cleaning: Fill in missing values.

d) Data integration: Integration of data files.

After the dataset is transformed into a clean dataset, the dataset is divided into training and testing sets so as to evaluate.

Creating a data structure with 60 timesteps and 1 output# Getting the real stock price of 2017dataset_test = pd.

read_csv('Google_Stock_Price_Test.

csv',index_col="Date",parse_dates=True)# Getting the predicted stock pricedataset_total = pd.

concat((dataset['Open'], dataset_test['Open']), axis = 0)inputs = dataset_total[len(dataset_total) – len(dataset_test) – 60:].

valuesinputs = inputs.

reshape(-1,1)inputs = sc.

transform(inputs)X_test = []for i in range(60, 80): X_test.

append(inputs[i-60:i, 0])X_test = np.

array(X_test)X_test = np.

reshape(X_test, (X_test.

shape[0], X_test.

shape[1], 1))predicted_stock_price = regressor.

predict(X_test)predicted_stock_price = sc.

inverse_transform(predicted_stock_price)VisualizationAll of the analysis above can be implemented with relative ease thanks to keras and their functional API.

# Visualising the resultsplt.

plot(real_stock_price, color = 'red', label = 'Real Google Stock Price')plt.

plot(predicted_stock_price, color = 'blue', label = 'Predicted Google Stock Price')plt.

title('Google Stock Price Prediction')plt.

xlabel('Time')plt.

ylabel('Google Stock Price')plt.

legend()plt.

show()Prediction VisualizationGeneral Visualization AnalysisRolling Mean on Time seriesA rolling analysis of a time series model is often used to assess the model’s stability over time.

When analyzing financial time series data using a statistical model, a key assumption is that the parameters of the model are constant over time.

dataset['Close: 30 Day Mean'] = dataset['Close'].

rolling(window=30).

mean()dataset[['Close','Close: 30 Day Mean']].

plot(figsize=(16,6))Stocks opening representation over the period of time.

dataset['Open'].

plot(figsize=(16,6))ConclusionThe popularity of stock market trading is growing rapidly, which is encouraging researchers to find out new methods for the prediction using new techniques.

The forecasting technique is not only helping the researchers but it also helps investors and any person dealing with the stock market.

In order to help predict the stock indices, a forecasting model with good accuracy is required.

In this work, we have used one of the most precise forecasting technology using Recurrent Neural Network and Long Short-Term Memory unit which helps investors, analysts or any person interested in investing in the stock market by providing them a good knowledge of the future situation of the stock market.

.