Extreme Rare Event Classification using Autoencoders in Keras

Extreme Rare Event Classification using Autoencoders in KerasIn this post, we will learn how to implement an autoencoder for building a rare-event classifier.

We will use a real-world rare event dataset from here [1].

Chitta RanjanBlockedUnblockFollowFollowingMay 3BackgroundWhat is an extreme rare event?In a rare-event problem, we have an unbalanced dataset.

Meaning, we have fewer positively labeled samples than negative.

In a typical rare-event problem, the positively labeled data are around 5–10% of the total.

In an extreme rare event problem, we have less than 1% positively labeled data.

For example, in the dataset used here it is around 0.


Such extreme rare event problems are quite common in the real-world, for example, sheet-breaks and machine failure in manufacturing, clicks or purchase in an online industry.

Classifying these rare events is quite challenging.

Recently, Deep Learning has been quite extensively used for classification.

However, the small number of positively labeled samples prohibits Deep Learning application.

No matter how large the data, the use of Deep Learning gets limited by the amount of positively labeled samples.

Why should we still bother to use Deep Learning?This is a legitimate question.

Why should we not think of using some another Machine Learning approach?The answer is subjective.

We can always go with a Machine Learning approach.

To make it work, we can undersample from negatively labeled data to have a close to a balanced dataset.

Since we have about 0.

6% positively labeled data, the undersampling will result in rougly a dataset that is about 1% of the size of the original data.

A Machine Learning approach, e.


SVM or Random Forest, will still work on a dataset of this size.

However, it will have limitations in its accuracy.

And we will not utilize the information present in the remaining ~99% of the data.

If the data is sufficient, Deep Learning methods are potentially more capable.

It also allows flexibility for model improvement by using different architectures.

We will, therefore, attempt to use Deep Learning methods.

In this post, we will learn how we can use a simple dense layers autoencoder to build a rare event classifier.

The purpose of this post is to demonstrate the implementation of an Autoencoder for extreme rare-event classification.

We will leave the exploration of different architecture and configuration of the Autoencoder on the user.

Please share in the comments if you find anything interesting.

Autoencoder for ClassificationThe autoencoder approach for classification is similar to anomaly detection.

In anomaly detection, we learn the pattern of a normal process.

Anything that does not follow this pattern is classified as an anomaly.

For a binary classification of rare events, we can use a similar approach using autoencoders (derived from here [2]).

Quick revision: What is an autoencoder?An autoencoder is made of two modules: encoder and decoder.

The encoder learns the underlying features of a process.

These features are typically in a reduced dimension.

The decoder can recreate the original data from these underlying features.

Figure 1.

Illustration of an autoencoder.

[Source: Autoencoder by Prof.

Seungchul LeeiSystems Design Lab]How to use an Autoencoder rare-event classification?We will divide the data into two parts: positively labeled and negatively labeled.

The negatively labeled data is treated as normal state of the process.

A normal state is when the process is eventless.

We will ignore the positively labeled data, and train an Autoencoder on only negatively labeled data.

This Autoencoder has now learned the features of the normal process.

A well-trained Autoencoder will predict any new data that is coming from the normal state of the process (as it will have the same pattern or distribution).

Therefore, the reconstruction error will be small.

However, if we try to reconstruct a data from a rare-event, the Autoencoder will struggle.

This will make the reconstruction error high during the rare-event.

We can catch such high reconstruction errors and label them as a rare-event prediction.

This procedure is similar to anomaly detection methods.

ImplementationData and problemThis is a binary labeled data from a pulp-and-paper mill for sheet breaks.

Sheet breaks is severe problem in paper manufacturing.

A single sheet break causes loss of several thousand dollars, and the mills see at least one or more break every day.

This causes millions of dollors of yearly losses and work hazards.

Detecting a break event is challenging due to the nature of the process.

As mentioned in [1], even a 5% reduction in the breaks will bring significant benefit to the mills.

The data we have contains about 18k rows collected over 15 days.

The column y contains the binary labels, with 1 denoting a sheet break.

The rest columns are predictors.

There are about 124 positive labeled sample (~0.


CodeImport the desired libraries.

%matplotlib inlineimport matplotlib.

pyplot as pltimport seaborn as snsimport pandas as pdimport numpy as npfrom pylab import rcParamsimport tensorflow as tffrom keras.

models import Model, load_modelfrom keras.

layers import Input, Densefrom keras.

callbacks import ModelCheckpoint, TensorBoardfrom keras import regularizersfrom sklearn.

preprocessing import StandardScalerfrom sklearn.

model_selection import train_test_splitfrom sklearn.

metrics import confusion_matrix, precision_recall_curvefrom sklearn.

metrics import recall_score, classification_report, auc, roc_curvefrom sklearn.

metrics import precision_recall_fscore_support, f1_scorefrom numpy.

random import seedseed(1)from tensorflow import set_random_seedset_random_seed(2)SEED = 123 #used to help randomly select the data pointsDATA_SPLIT_PCT = 0.


figsize'] = 8, 6LABELS = ["Normal","Break"]Note that we are setting the random seeds for reproducibility of the result.

Data preprocessingNow, we read and prepare the data.

df = pd.

read_csv("data/processminer-rare-event-mts – data.

csv")The objective of this rare-event problem is to predict a sheet-break before it occurs.

We will try to predict the break 4 minutes in advance.

To build this model, we will shift the labels 2 rows up (which corresponds to 4 minutes).


y = df.


shift(-2)Now, we divide the data into train, valid, and test sets.

Then we will take the subset of data with only 0s to train the autoencoder.

df_train, df_test = train_test_split(df, test_size=DATA_SPLIT_PCT, random_state=SEED)df_train, df_valid = train_test_split(df_train, test_size=DATA_SPLIT_PCT, random_state=SEED)df_train_0 = df_train.

loc[df['y'] == 0]df_train_1 = df_train.

loc[df['y'] == 1]df_train_0_x = df_train_0.

drop(['y'], axis=1)df_train_1_x = df_train_1.

drop(['y'], axis=1)df_valid_0 = df_valid.

loc[df['y'] == 0]df_valid_1 = df_valid.

loc[df['y'] == 1]df_valid_0_x = df_valid_0.

drop(['y'], axis=1)df_valid_1_x = df_valid_1.

drop(['y'], axis=1)df_test_0 = df_test.

loc[df['y'] == 0]df_test_1 = df_test.

loc[df['y'] == 1]df_test_0_x = df_test_0.

drop(['y'], axis=1)df_test_1_x = df_test_1.

drop(['y'], axis=1)StandardizationIt is usually better to use a standardized data (transformed to Gaussian, mean 0 and variance 1) for autoencoders.

scaler = StandardScaler().

fit(df_train_0_x)df_train_0_x_rescaled = scaler.

transform(df_train_0_x)df_valid_0_x_rescaled = scaler.

transform(df_valid_0_x)df_valid_x_rescaled = scaler.


drop(['y'], axis = 1))df_test_0_x_rescaled = scaler.

transform(df_test_0_x)df_test_x_rescaled = scaler.


drop(['y'], axis = 1))Autoencoder ClassifierInitializationFirst, we will initialize the Autoencoder architecture.

We are building a simple autoencoder.

More complex architectures and other configurations should be explored.

nb_epoch = 100batch_size = 128input_dim = df_train_0_x_rescaled.

shape[1] #num of predictor variables, encoding_dim = 32hidden_dim = int(encoding_dim / 2)learning_rate = 1e-3input_layer = Input(shape=(input_dim, ))encoder = Dense(encoding_dim, activation="tanh", activity_regularizer=regularizers.

l1(learning_rate))(input_layer)encoder = Dense(hidden_dim, activation="relu")(encoder)decoder = Dense(hidden_dim, activation='tanh')(encoder)decoder = Dense(input_dim, activation='relu')(decoder)autoencoder = Model(inputs=input_layer, outputs=decoder)TrainingWe will train the model and save it in a file.

Saving a trained model is a good practice for saving time for future analysis.


compile(metrics=['accuracy'], loss='mean_squared_error', optimizer='adam')cp = ModelCheckpoint(filepath="autoencoder_classifier.

h5", save_best_only=True, verbose=0)tb = TensorBoard(log_dir='.

/logs', histogram_freq=0, write_graph=True, write_images=True)history = autoencoder.

fit(df_train_0_x_rescaled, df_train_0_x_rescaled, epochs=nb_epoch, batch_size=batch_size, shuffle=True, validation_data=(df_valid_0_x_rescaled, df_valid_0_x_rescaled), verbose=1, callbacks=[cp, tb]).

historyFigure 2.

Loss for Autoencoder Training.

ClassificationIn the following, we show how we can use an Autoencoder reconstruction error for the rare-event classification.

As mentioned before, if the reconstruction error is high, we will classify it as a sheet-break.

We will need to determine the threshold for this.

We will use the validation set to identify the threshold.

valid_x_predictions = autoencoder.

predict(df_valid_x_rescaled)mse = np.


power(df_valid_x_rescaled – valid_x_predictions, 2), axis=1)error_df = pd.

DataFrame({'Reconstruction_error': mse, 'True_class': df_valid['y']})precision_rt, recall_rt, threshold_rt = precision_recall_curve(error_df.

True_class, error_df.


plot(threshold_rt, precision_rt[1:], label="Precision",linewidth=5)plt.

plot(threshold_rt, recall_rt[1:], label="Recall",linewidth=5)plt.

title('Precision and recall for different threshold values')plt.




show()Figure 3.

A threshold of 0.

8 should provide a reasonable trade-off between precision and recall.

Now, we will perform classification on the test data.

We should not estimate the classification threshold from the test data.

It will result in overfitting.

test_x_predictions = autoencoder.

predict(df_test_x_rescaled)mse = np.


power(df_test_x_rescaled – test_x_predictions, 2), axis=1)error_df_test = pd.

DataFrame({'Reconstruction_error': mse, 'True_class': df_test['y']})error_df_test = error_df_test.

reset_index()threshold_fixed = 0.

8groups = error_df_test.

groupby('True_class')fig, ax = plt.

subplots()for name, group in groups: ax.


index, group.

Reconstruction_error, marker='o', ms=3.

5, linestyle='', label= "Break" if name == 1 else "Normal")ax.

hlines(threshold_fixed, ax.

get_xlim()[0], ax.

get_xlim()[1], colors="r", zorder=100, label='Threshold')ax.


title("Reconstruction error for different classes")plt.

ylabel("Reconstruction error")plt.

xlabel("Data point index")plt.

show();Figure 4.

Using threshold = 0.

8 for classification.

The orange and blue dots above the threshold line represents the True Positive and False Positive, respectively.

In Figure 4, the orange and blue dot above the threshold line represents the True Positive and False Positive, respectively.

As we can see, we have good number of false positives.

To have a better look, we can see a confusion matrix.

pred_y = [1 if e > threshold_fixed else 0 for e in error_df.


values]conf_matrix = confusion_matrix(error_df.

True_class, pred_y)plt.

figure(figsize=(12, 12))sns.

heatmap(conf_matrix, xticklabels=LABELS, yticklabels=LABELS, annot=True, fmt="d");plt.

title("Confusion matrix")plt.

ylabel('True class')plt.

xlabel('Predicted class')plt.

show()Figure 5.

Confusion Matrix on the test predictions.

We could predict 4 out of 21 breaks.

That around 25%, which is a good recall rate for the paper industry.

The False Positive Rate is around 7%.

This is not ideal but not terrible for a mill.

Still, this model can be further improved to increase the recall rate with smaller False Positive Rate.

Github repositoryThe entire code with comments are present here.

cran2367/autoencoder_classifierAutoencoder model for rare event classification.

Contribute to cran2367/autoencoder_classifier development by creating…github.

comWhat can be done better here?This is a (multivariate) time series data.

We are not taking into account the temporal information/patterns in the data.

In the next post, we will explore if it is possible with an RNN.

We will try a LSTM autoencoder.

ConclusionWe worked on an extreme rare event binary labeled data from a paper mill to build an Autoencoder Classifier.

We achieved reasonable accuracy.

The purpose here was to demonstrate the use of a basic Autoencoder for rare event classification.

We will further work on developing other methods, including an LSTM Autoencoder that can extract the temporal features for better accuracy.

ReferencesRanjan, C.

, Mustonen, M.

, Paynabar, K.

, & Pourak, K.


Dataset: Rare Event Classification in Multivariate Time Series.

arXiv preprint arXiv:1809.




com/blog/fraud-detection-with-tensorflowGithub repo: https://github.


. More details

Leave a Reply