Application of Gradient Boosting in Order Book ModelingSergey MalchevskiyBlockedUnblockFollowFollowingJun 18Today we are going to create an ML model that forecasts the price movement in the order book.

This article contains a full-cycle of research: getting data, visualization, feature engineering, modeling, fine-tuning of the algorithm, quality estimation, and so on.

What is an Order Book?An order book is an electronic list of buy and sell orders for a specific security or financial instrument organized by price level.

An order book lists the number of shares being bid or offered at each price point, or market depth.

Market depth data helps traders determine where the price of a particular security could be heading.

For example, a trader may use market depth data to understand the bid-ask spread for a security, along with the volume accumulating above both figures.

Securities with strong market depth will usually have strong volume and be quite liquid, allowing traders to place large orders without significantly affecting market price.

More information is here.

Pricing schemeMarket depth looks like this, visualization could be different and it depends on softwareBTC market depth on GDAXAnother way to visualize order books is a list with bids and offersOrder book listMid-price is the price between the best price of the sellers of the stock or commodity offer price or ask price and the best price of the buyers of the stock or commodity bid price.

It can simply be defined as the average of the current bid and ask prices being quoted.

Our goal is to create a model that forecasts the mid-price.

Getting the DataLet’s download the data samples from LOBSTER.

This service provides Google, Apple, Amazon, Intel, Microsoft assets as an examples with 3 levels as market depth (1, 5, 10 levels).

First of all, I suggest to visualize mid-price and difference of ask-bid volumes for all available assets.

We need to import necessary librariesThe next code loads the data of a given asset and level from a fileAfter that, we can visualize each assetMSFT and INTC have a slightly strange and different distributions.

The mid-price graph doesn’t has a single bell curve, it looks like a mixture of two distributions.

Also, the volume difference is too symmetric and differs from other assets.

Feature EngineeringThis part is very important, because of the quality of the model directly depending on it.

We should reflect a wide range of relationships between bids, asks, volumes, and also between different depths of data in these new features.

The next formulas allow to create these featuresThese features are the first part of feature engineering.

The second part is adding the lag components.

It means that we shift given features with some lags in time and add as columns.

This example shows how it works on the raw dataset (not new features).

The next code provides these two parts of feature engineering, and add the target column log_return_mid_price.

Usually, the features look like thisModeling via Gradient Boosting and Fine-TuningOur goal is to show that training a GBM is performing gradient-descent minimization on some loss function between our true target, y, and our approximation,That means showing that adding weak models,to our GBM additive model:is performing gradient descent in some way.

It makes sense that nudging our approximation, closer and closer to the true target y would be performing gradient descent.

For example, at each step, the residual gets smaller.

We must be minimizing some function related to the distance between the true target and our approximation.

Let’s revisit our golfer analogy and visualize the squared error between the approximation and the true valueMore information you can find here.

We will use the Yandex’s implementation of Gradient Boosting that calls CatBoost.

This library is better than others by speed and quality in most casesLibraries performanceThis algorithm has a few parameters that have a huge impact on the quality:n_estimators — the maximum number of trees that can be built when solving machine learning problems;depth — the maximum depth of the trees;learning_rate — this setting is used for reducing the gradient step.

It affects the overall time of training: the smaller the value, the more iterations are required for training;l2_leaf_reg — coefficient at the L2 regularization term of the cost function.

Any positive value is allowed.

Also, we have parameters of the features:level — market depth;number of time-steps — how many lags to build.

Theoretically, each our asset could have the unique set of the parameters.

For this task, we should define the objective function that estimates the quality of the modelOne of the best ways to define the optimal parameters is Bayesian optimization.

I described this approach in the previous article.

The loss function is RMSE that looks like thisTrain set consists of 50% of data from the beginning.

Validation data is used for fine-tuning of the model.

The last 25% of the data needed to test the final result and this is hold-out data.

After the fine-tuning step, we train the final model on the both parts (train and validation sets) and test the model using the last part.

Let’s code thisThe do_experiment function is a main of this research.

This function additionally build the feature importance of the best model, and estimates the quality of the model.

Generally, importance provides a score that indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model.

The more an attribute is used to make key decisions with decision trees, the higher its relative importance.

This importance is calculated explicitly for each attribute in the dataset, allowing attributes to be ranked and compared to each other.

Importance is calculated for a single decision tree by the amount that each attribute split point improves the performance measure, weighted by the number of observations the node is responsible for.

The performance measure may be the purity (Gini index) used to select the split points or another more specific error function.

The feature importances are then averaged across all of the the decision trees within the model.

Source here.

Analysis of the ResultsThe basic metric of success is to get the error less than the baseline.

It means that the final model has good quality.

The first question is how to measure quality.

It could be squared errors.

After that, we can estimate the interval by bootstrapping method.

The bootstrap sampling, calculation statistics, and interval estimation are implemented in bs_interval function above.

BootstrappingThe second question is what values should be used as a baseline forecast.

A lot of research claim that the markets are unpredictable.

Often, the forecasted next price is the same as the last price plus some noise, and it looks like thisBad stock prediction resultIt means that if we want to forecast the return it will be around 0 plus noise.

You can find this result in this article by Rafael Schultze-Kraft.

The our baseline is similar.

This approach is implemented in do_experiment function.

Let’s run this experiment do_experiment(asset_name), where asset_name from the list (AAPL, AMZN, GOOG, INTC, MSFT).

Collect of important parameters and metrics into this tableFinal result tableAMZN and GOOG have the same optimal parameters.

Often, level and depth have the maximum or close to the maximum value.

As we remember, in the exploratory step at the beginning, the first three assets (AAPL, AMZN, GOOG) had good distributions of ask-bid prices and volumes.

The last two assets (INTC, MSFT) had strange distributions.

This table shows that we got a statistically significant difference in error for AAPL, AMZN, GOOG, and the baseline has been beaten (green color).

The upper bound of the interval for modeling is lower than the lower bound for the baseline.

For INTC we don’t have a significant result, the intervals are intersected (grey color).

In MSFT case, the given result is worse than the baseline (red color).

Probably, the causation of that is detected pattern in distributions (maybe some activities by market-makers or other things).

Let’s look at the most important features of the modelsTop features for AAPLTop features for AMZNTop features for GOOGTop features for INTCTop features for MSFTAs we see, for the successful models the most important features correlated with recent values of log_return_ask, log_return_bid, log_ask_div_bid, and so on.

ConclusionsSuggested the approach for order book modeling via gradient boosting.

The code you can find on GitHub.

Feature engineering method described and formalized.

The feature importances are shown.

Quality estimation is demonstrated.

For some assets the good result was obtained.

How to improve the result:Change the number of max_evals in optimization.

Change the max_depth, n_estimators in fitting.

Add the new features that better than current ones, or combinations of given features.

Carry out the experiments using more data to get a better model.

Find the history with more number of levels in the order book.

Use a model that specifically developed for time-series (e.

g.

LSTM, GRU, and so on).

.. More details