Predicting Irish electricity consumption with neural networksIn this example, neural networks are used to forecast energy consumption of the Dublin City Council Civic Offices between March 2011 — February 2013.

Michael Grogan (MGCodesandStats)BlockedUnblockFollowFollowingFeb 1Summary of StudyThis analysis is divided into two parts:The neuralnet library in R is used to predict electricity consumption through the use of various explanatory variablesAn LSTM network is generated using Keras to predict electricity consumption using the time series exclusive of any explanatory variablesThe relevant data was sourced from data.

gov.

ie and met.

ie.

Electricity consumption data was provided on an hourly basis, but converted to daily data for the purpose of this analysis.

The variables are as follows:eurgbp: EUR/GBP currency raterain: Rainfallmaxt: Maximum temperaturemint: Minimum temperaturewdsp: Wind speedsun: Sunlight hourskwh: KWH (consumption)With Ireland obtaining about 45% of its electricity from natural gas, 96% of which is imported from Scotland, EUR/GBP currency fluctuations clearly have a significant impact on the cost of electricity in Ireland, and was therefore included as an explanatory variable.

Moreover, with weather conditions also significantly influencing electricity usage, weather data for the Dublin region was also included for the relevant dates in question.

Key FindingsIt was found that of the two models, LSTM was able to predict electricity consumption more accurately, with the training and test predictions closely mirroring actual consumption:The model demonstrated an average error of 353.

25 on the training dataset, and 255.

13 on the test dataset (out of thousands of kilowatts).

Part 1: neuralnetA neural network consists of:Input layers: Layers that take inputs based on existing dataHidden layers: Layers that use backpropagation to optimise the weights of the input variables in order to improve the predictive power of the modelOutput layers: Output of predictions based on the data from the input and hidden layers1.

1.

Data NormalizationThe data is normalized and split into training and test data:# MAX-MIN NORMALIZATION> normalize <- function(x) {> return ((x – min(x)) / (max(x) – min(x)))> }> maxmindf <- as.

data.

frame(lapply(fullData, normalize))# TRAINING AND TEST DATAtrainset <- maxmindf[1:378, ]testset <- maxmindf[379:472, ]1.

2.

Neural Network OutputThe neural network is then run and the parameters are generated:# NEURAL NETWORK> library(neuralnet)> nn <- neuralnet(kwh ~ eurgbp + rain + maxt + mint + wdsp + sun,data=trainset, hidden=c(5,2), linear.

output=TRUE, threshold=0.

01)> nn$result.

matrix 1error 2.

168927756297reached.

threshold 0.

008657878909steps 994.

000000000000Intercept.

to.

1layhid1 -0.

943475389102eurgbp.

to.

1layhid1 1.

221792852624rain.

to.

1layhid1 0.

222508044224maxt.

to.

1layhid1 1.

356892947349mint.

to.

1layhid1 -0.

377284881968wdsp.

to.

1layhid1 0.

749993672528sun.

to.

1layhid1 -0.

250669884677Intercept.

to.

1layhid2 3.

424295572041eurgbp.

to.

1layhid2 -4.

921292790902rain.

to.

1layhid2 3.

380551856044maxt.

to.

1layhid2 -2.

353604121342mint.

to.

1layhid2 0.

877423599705wdsp.

to.

1layhid2 -0.

581900515451sun.

to.

1layhid2 -7.

083263552687Intercept.

to.

1layhid3 0.

352457802915eurgbp.

to.

1layhid3 3.

715376984054rain.

to.

1layhid3 -1.

030450129246maxt.

to.

1layhid3 -0.

672907974572mint.

to.

1layhid3 0.

898040603876wdsp.

to.

1layhid3 -1.

474470972212sun.

to.

1layhid3 -1.

793900522508Intercept.

to.

1layhid4 0.

819225033685eurgbp.

to.

1layhid4 -16.

770362105816rain.

to.

1layhid4 -2.

483557437596maxt.

to.

1layhid4 -0.

059472312293mint.

to.

1layhid4 2.

650852686615wdsp.

to.

1layhid4 3.

863732942893sun.

to.

1layhid4 0.

224801123127Intercept.

to.

1layhid5 -13.

987427433833eurgbp.

to.

1layhid5 -1.

661519269508rain.

to.

1layhid5 -52.

279711798215maxt.

to.

1layhid5 22.

717540151979mint.

to.

1layhid5 11.

670399514036wdsp.

to.

1layhid5 9.

713301368020sun.

to.

1layhid5 10.

804887927196Intercept.

to.

2layhid1 -0.

8344124745811layhid.

1.

to.

2layhid1 1.

6299489453161layhid.

2.

to.

2layhid1 -3.

0644482330971layhid.

3.

to.

2layhid1 0.

1974976361771layhid.

4.

to.

2layhid1 -0.

3700982813351layhid.

5.

to.

2layhid1 -0.

402324278545Intercept.

to.

2layhid2 -1.

1760936808111layhid.

1.

to.

2layhid2 1.

3128971900621layhid.

2.

to.

2layhid2 0.

5936400221501layhid.

3.

to.

2layhid2 1.

9060087019821layhid.

4.

to.

2layhid2 1.

8110350170741layhid.

5.

to.

2layhid2 -0.

725078284924Intercept.

to.

kwh -0.

0939739161072layhid.

1.

to.

kwh 0.

7008473625162layhid.

2.

to.

kwh 0.

922218125575Here is what our neural network looks like in visual format:1.

3.

Model ValidationThen, we validate (or test the accuracy of our model) by comparing the estimated consumption in KWH yielded from the neural network to the actual consumption as reported in the test output:> results <- data.

frame(actual = testset$kwh, prediction = nn.

results$net.

result)> results actual prediction379 0.

8394856269 0.

72836479401380 0.

7976933676 0.

72836479401381 0.

8125463657 0.

72836479401382 0.

8377382154 0.

72836479401383 0.

8394856269 0.

72836479401384 0.

8415242737 0.

72836479401.

.

467 0.

7464359625 0.

80778769677468 0.

7018769682 0.

82063018370469 0.

7004207919 0.

78094824279470 0.

6726078249 0.

77185373598471 0.

7176036721 0.

91671846789472 0.

7199335541 0.

809742225041.

4.

AccuracyIn the below code, we are then converting the data back to its original format, and yielding an accuracy of 98% on a mean absolute deviation basis (i.

e.

the average deviation between estimated and actual electricity consumption stands at a mean of 2%).

Note that we are also converting our data back into standard values given that they were previously scaled using the max-min normalization technique:> predicted=results$prediction * abs(diff(range(kwh))) + min(kwh)> actual=results$actual * abs(diff(range(kwh))) + min(kwh)> comparison=data.

frame(predicted,actual)> deviation=((actual-predicted)/actual)> comparison=data.

frame(predicted,actual,deviation)> accuracy=1-abs(mean(deviation))> accuracy[1] 0.

9828191884A mean accuracy of 98% is obtained using a (5,2) hidden configuration.

However, note that since this is a mean accuracy, it does not necessarily imply that all predictions generated by the model will have such high accuracy.

Indeed, accuracy is lower in certain cases as can be observed from the histogram below.

When we plot a histogram of the deviation (with 100 breaks), we see that the majority of forecasts fall within 10% from the actual consumption.

When plotting the predicted and actual consumption, it is observed that while the prediction series generated by the neural network follows the general range of the actual (i.

e.

between 4200–5000 Kwhs), the model is not particularly adept at predicting the peaks and valleys in the series (or periods of abnormally low or high usage).

Part 2: LSTM (Long-Short Term Memory Network)A shortcoming of traditional neural network models is that they do not account for dependencies across time series data.

When a neural network was generated using neuralnet, it was assumed that all observations are independent to each other.

However, this is not necessarily the case.

2.

1.

Issue of StationarityWhen observing line charts for both KWH (consumption) and the EUR/GBP, we can see that the KWH time series shows a stationary pattern (stationary meaning that the mean, variance, and autocorrelation are constant):However, when the EUR/GBP currency fluctuations are plotted over the same time period, the data is clearly non-stationary, i.

e.

the mean, variance, and autocorrelation differ over time:Given that non-stationarity was present in certain explanatory variables, the LSTM model will now be used to predict future values of KWH against the test set — independent of any other explanatory variables.

In other words, only the values of KWH will be predicted using LSTM.

The analysis is carried out using the Keras library in Python.

The following guide also provides a detailed overview of predictions with LSTM using a separate example.

2.

2.

Data ProcessingFirstly, the relevant libraries are imported and data processing is carried out:# Import librariesimport numpy as npimport matplotlib.

pyplot as pltfrom pandas import read_csvimport mathfrom keras.

models import Sequentialfrom keras.

layers import Densefrom keras.

layers import LSTMfrom sklearn.

preprocessing import MinMaxScalerfrom sklearn.

metrics import mean_squared_errorimport os;path="filepath"os.

chdir(path)os.

getcwd()# Form dataset matrixdef create_dataset(dataset, previous=1):dataX, dataY = [], []for i in range(len(dataset)-previous-1):a = dataset[i:(i+previous), 0]dataX.

append(a)dataY.

append(dataset[i + previous, 0])return np.

array(dataX), np.

array(dataY)# fix random seed for reproducibilitynp.

random.

seed(7)# load datasetdataframe = read_csv('data.

csv', usecols=[0], engine='python', skipfooter=3)dataset = dataframe.

valuesdataset = dataset.

astype('float32')# normalize dataset with MinMaxScalerscaler = MinMaxScaler(feature_range=(0, 1))dataset = scaler.

fit_transform(dataset)# Training and Test data partitiontrain_size = int(len(dataset) * 0.

8)test_size = len(dataset) – train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape into X=t and Y=t+1previous = 1X_train, Y_train = create_dataset(train, previous)X_test, Y_test = create_dataset(test, previous)# reshape input to be [samples, time steps, features]X_train = np.

reshape(X_train, (X_train.

shape[0], 1, X_train.

shape[1]))X_test = np.

reshape(X_test, (X_test.

shape[0], 1, X_test.

shape[1]))2.

3.

LSTM Generation and PredictionsThen, the LSTM model is generated and predictions are yielded:# Generate LSTM networkmodel = Sequential()model.

add(LSTM(4, input_shape=(1, previous)))model.

add(Dense(1))model.

compile(loss='mean_squared_error', optimizer='adam')model.

fit(X_train, Y_train, epochs=100, batch_size=1, verbose=2)# Generate predictionstrainpred = model.

predict(X_train)testpred = model.

predict(X_test)# Convert predictions back to normal valuestrainpred = scaler.

inverse_transform(trainpred)Y_train = scaler.

inverse_transform([Y_train])testpred = scaler.

inverse_transform(testpred)Y_test = scaler.

inverse_transform([Y_test])# calculate RMSEtrainScore = math.

sqrt(mean_squared_error(Y_train[0], trainpred[:,0]))print('Train Score: %.

2f RMSE' % (trainScore))testScore = math.

sqrt(mean_squared_error(Y_test[0], testpred[:,0]))print('Test Score: %.

2f RMSE' % (testScore))# Train predictionstrainpredPlot = np.

empty_like(dataset)trainpredPlot[:, :] = np.

nantrainpredPlot[previous:len(trainpred)+previous, :] = trainpred# Test predictionstestpredPlot = np.

empty_like(dataset)testpredPlot[:, :] = np.

nantestpredPlot[len(trainpred)+(previous*2)+1:len(dataset)-1, :] = testpred# Plot all predictionsinversetransform, =plt.

plot(scaler.

inverse_transform(dataset))trainpred, =plt.

plot(trainpredPlot)testpred, =plt.

plot(testpredPlot)plt.

title("Predicted vs.

Actual Consumption")plt.

show()The model is trained over 100 epochs, and the predictions are generated.

2.

4.

AccuracyWhen plotting the actual consumption (blue line) with the training and test predictions (orange and green lines), the two series follow each other quite closely, with the exception of certain spikes downward (or periods of abnormally low usage):Moreover, here is our output when 100 epochs are generated:Epoch 94/100 – 1s – loss: 0.

0108Epoch 95/100 – 1s – loss: 0.

0108Epoch 96/100 – 1s – loss: 0.

0107Epoch 97/100 – 1s – loss: 0.

0108Epoch 98/100 – 1s – loss: 0.

0108Epoch 99/100 – 1s – loss: 0.

0108Epoch 100/100 – 1s – loss: 0.

0109>>> # calculate RMSE.

trainScore = math.

sqrt(mean_squared_error(Y_train[0], trainpred[:,0]))>>> print('Train Score: %.

2f RMSE' % (trainScore))Train Score: 353.

25 RMSE>>> testScore = math.

sqrt(mean_squared_error(Y_test[0], testpred[:,0]))>>> print('Test Score: %.

2f RMSE' % (testScore))Test Score: 255.

13 RMSEThe model has an average error of 353.

25 on the training dataset, and 255.

13 on the test dataset (out of thousands of kilowatts).

However, when running this model, the prediction was made over a 1-day, i.

e.

t+1 period.

How would the model perform over longer time periods, e.

g.

10 days, 50 days?.Let’s find out.

10 daysTraining error: 345.

31 RMSETest error: 283.

77 RMSE50 daysTraining error: 288.

94 RMSETest error: 396.

36 RMSEWhile the test error was slightly higher across the 10 and 50 day periods, this was not by a great margin.

Moreover, the overall errors remain low in the context of the average of 4609 kilowatts per day in the time series itself.

ConclusionOf the two neural networks, LSTM proved to be more accurate at predicting fluctuations in electricity consumption.

In the case of neuralnet, the model was not completely adept at handling non-stationary data present in various explanatory variables.

Moreover, factors such as temperature already follow set historical trends generally (with the exception of abnormal weather patterns which might have an effect on consumption).

In this regard, a traditional neural network with explanatory variables proved less effective in this instance than LSTM, which was able to model fluctuations in consumption without the need for explanatory data.

.. More details