Now we can start building our feature set.

We need to use previous monthly sales data to forecast the next ones.

The look-back period may vary for every model.

Ours will be 12 for this example.

So what we need to do is to create columns from lag_1 to lag_12 and assign values by using shift() method:#create dataframe for transformation from time series to superviseddf_supervised = df_diff.

drop(['prev_sales'],axis=1)#adding lagsfor inc in range(1,13): field_name = 'lag_' + str(inc) df_supervised[field_name] = df_supervised['diff'].

shift(inc)#drop null valuesdf_supervised = df_supervised.

dropna().

reset_index(drop=True)Check out our new dataframe called df_supervised:We have our feature set now.

Let’s be a bit more curious and ask this question:How useful are our features for prediction?Adjusted R-squared is the answer.

It tells us how good our features explain the variation in our label (lag_1 to lag_12 for diff, in our example).

Let’s see it in an example:# Import statsmodels.

formula.

apiimport statsmodels.

formula.

api as smf# Define the regression formulamodel = smf.

ols(formula='diff ~ lag_1', data=df_supervised)# Fit the regressionmodel_fit = model.

fit()# Extract the adjusted r-squaredregression_adj_rsq = model_fit.

rsquared_adjprint(regression_adj_rsq)So what happened above?Basically, we fit a linear regression model (OLS — Ordinary Least Squares) and calculate the Adjusted R-squared.

For the example above, we just used lag_1 to see how much it explains the variation in column diff.

The output of this code block is:lag_1 explains 3% of the variation.

Let’s check out others:Adding four more features increased the score from 3% to 44%.

How is the score if we use the entire feature set:The result is impressive as the score is 98%.

Now we can confidently build our model after scaling our data.

But there is one more step before scaling.

We should split our data into train and test sets.

As the test set, we have selected the last 6 months’ sales.

#import MinMaxScaler and create a new dataframe for LSTM modelfrom sklearn.

preprocessing import MinMaxScalerdf_model = df_supervised.

drop(['sales','date'],axis=1)#split train and test settrain_set, test_set = df_model[0:-6].

values, df_model[-6:].

valuesAs the scaler, we are going to use MinMaxScaler, which will scale each future between -1 and 1:#apply Min Max Scalerscaler = MinMaxScaler(feature_range=(-1, 1))scaler = scaler.

fit(train_set)# reshape training settrain_set = train_set.

reshape(train_set.

shape[0], train_set.

shape[1])train_set_scaled = scaler.

transform(train_set)# reshape test settest_set = test_set.

reshape(test_set.

shape[0], test_set.

shape[1])test_set_scaled = scaler.

transform(test_set)Building the LSTM modelEverything is ready to build our first deep learning model.

Let’s create feature and label sets from scaled datasets:X_train, y_train = train_set_scaled[:, 1:], train_set_scaled[:, 0:1]X_train = X_train.

reshape(X_train.

shape[0], 1, X_train.

shape[1])X_test, y_test = test_set_scaled[:, 1:], test_set_scaled[:, 0:1]X_test = X_test.

reshape(X_test.

shape[0], 1, X_test.

shape[1])Let’s fit our LSTM model:model = Sequential()model.

add(LSTM(4, batch_input_shape=(1, X_train.

shape[1], X_train.

shape[2]), stateful=True))model.

add(Dense(1))model.

compile(loss='mean_squared_error', optimizer='adam')model.

fit(X_train, y_train, nb_epoch=100, batch_size=1, verbose=1, shuffle=False)The code block above prints how the model improves itself and reduce the error in each epoch:Let’s do the prediction and see how the results look like:y_pred = model.

predict(X_test,batch_size=1)y_pred vs y_testResults look similar but it doesn’t tell us much because these are scaled data that shows the difference.

How we can see the actual sales prediction?First, we need to do the inverse transformation for scaling:#reshape y_predy_pred = y_pred.

reshape(y_pred.

shape[0], 1, y_pred.

shape[1])#rebuild test set for inverse transformpred_test_set = []for index in range(0,len(y_pred)): print np.

concatenate([y_pred[index],X_test[index]],axis=1) pred_test_set.

append(np.

concatenate([y_pred[index],X_test[index]],axis=1))#reshape pred_test_setpred_test_set = np.

array(pred_test_set)pred_test_set = pred_test_set.

reshape(pred_test_set.

shape[0], pred_test_set.

shape[2])#inverse transformpred_test_set_inverted = scaler.

inverse_transform(pred_test_set)Second, we need to build the dataframe has the dates and the predictions.

Transformed predictions are showing the difference.

We should calculate the predicted sales numbers:#create dataframe that shows the predicted salesresult_list = []sales_dates = list(df_sales[-7:].

date)act_sales = list(df_sales[-7:].

sales)for index in range(0,len(pred_test_set_inverted)): result_dict = {} result_dict['pred_value'] = int(pred_test_set_inverted[index][0] + act_sales[index]) result_dict['date'] = sales_dates[index+1] result_list.

append(result_dict)df_result = pd.

DataFrame(result_list)Output:Great!.We’ve predicted the next six months’ sales numbers.

Let’s check them in the plot to see how good is our model:#merge with actual sales dataframedf_sales_pred = pd.

merge(df_sales,df_result,on='date',how='left')#plot actual and predictedplot_data = [ go.

Scatter( x=df_sales_pred['date'], y=df_sales_pred['sales'], name='actual' ), go.

Scatter( x=df_sales_pred['date'], y=df_sales_pred['pred_value'], name='predicted' ) ]plot_layout = go.

Layout( title='Sales Prediction' )fig = go.

Figure(data=plot_data, layout=plot_layout)pyoff.

iplot(fig)Actual vs predicted:Looks pretty good for a simple model.

One improvement we can do for this model is to add holidays, breaks, and other seasonal effects.

They can be simply added as a new feature.

By using this model, we have our baseline sales predictions.

But how we can predict the effect of a promotion on sales?.We will look into it in Part 7.

You can find the Jupyter Notebook for this article here.

.. More details