Is there a way to observe all exchanges in real time and visualize it on a single chart?In this tutorial we will graph the trades, volume and time delta from trade execution until it reaches our system (an indicator of how close to real time we can get the data).
The goal of the tutorial — realtime multi exchange observerTo achieve that, as a first step, we need to capture as much real-time trading data as possible for analysis.
However, the large amount of currency and exchange data requires a scalable system that can ingest and store such volume while keeping latency low.
Failing to do so, the system will not stay in sync with the exchanges stream.
In this article, we will use Cloud Dataflow and Cloud Bigtable to satisfy those requirements.
Dataflow will provide low latency data streaming ingestion capability while Bigtable will provide low latency storage and time series querying at scale.
Requirements / SolutionsArchitectural overviewThe “usual” requirement for trading systems is low latency data ingestion.
We extend this requirement with near real-time data storage and querying at scale.
In the following list we will demonstrate what can be learned by conducting this tutorial:Ingest real-time trading data with low latency from globally scattered datasources / exchanges.
Possibility to adopt data ingest worker pipeline location.
Easily add additional trading pairs / exchanges.
Solution: Dataflow + Xchange Reactive Websockets FrameworkDemonstrate an unbounded streaming source code that is runnable with multiple runners.
Solution: Apache BEAMStrong consistency + linear scalability + super low latency for querying the trading data.
Solution: BigtableEasy and automated setup with project template for orchestration.
Example of dynamic variable insertion from Terraform template into the GCP compute instance.
Solution: TerraformQuerying and visualization — Execute time series queries on Bigtable visualize it in on the webpage.
Solution: Python Flask + Vis.
js + Google BigTable Python ClientArchitecture/How it worksThe source code is written in Java 8, Python 2.
The code can be divided into five main framework units:Data ingestion — XChange Stream framework (Github link) Java library providing a simple and consistent streaming API for interacting with Bitcoin and other cryptocurrency exchanges via WebSocket protocol.
XChange library is providing new interfaces for streaming API.
User can subscribe for live updates via reactive streams of RxJava library.
We use this JAVA 8 framework to connect and configure some exchanges (BitFinex, Poloniex, BitStamp, OkCoin, Gemini, HitBTC, Binance…).
Link to the exchange / trading pair configuration codeParallel processing — Apache Beam (Github link)Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing.
Supported runners: Apache Apex, Apache Flink, Apache Gearpump, Apache Samza, Apache Spark, and Google Cloud Dataflow.
We demonstrate how to create an unbounded streaming source/reader and manage basic watermarking, checkpointing and record id for data ingestion.
Link to the bridge between BEAM and XChange Stream frameworkBigTable sink — Cloud Bigtable with Beam using the HBase API.
(Github link) Connector and writer to Bigtable.
We explain here how to create a row key and create a Bigtable mutation function prior to writing to Bigtable.
Link to the BigTable key creation / mutation functionRealtime API endpoint — Flask web server at port:5000+ BigTable client (GitHub link) will be used to query the Bigtable and serve as API endpoint.
JS Flask template that will query the real-time API endpoint every 500ms.
Link to the HTML template fileFlask web server will be run in the GCP VM instancePipeline definitionFor every exchange + trading pair, we create a different pipeline instance.
Pipeline consists of 3 steps:UnboundedStreamingSource that contains ‘Unbounded Streaming Source Reader’ (bitStamp2)BigTable pre-writing mutation and key definition (ETH-USD Mut2)BigTable write step (ETH-USD2)Bigtable row key design decisionsOur DTO looks like this:We formulated the row key structure in the following way: TradingCurrency#Exchange#SystemTimestampEpoch#SystemNanosTime.
g: a row key might look like BTC/USD#Bitfinex#1546547940918#63187358085BTC/USD — Trading PairBitfinex — Exchange1546547940918 — Epoch timestamp (more info)63187358085 — System Nano time (more info)Why do we add nanotime at our key end? Our design decision is to avoid multiple versions per row for different trades.
Two DoFn mutations might execute in the same Epoch ms time if there is a streaming sequence of TradeLoad DTOs.
NanoTime at the end will split Millisecond to an additional one million.
If this is not enough for your needs we recommend hashing the volume / price ratio and attaching the hash at the end of the row key.
Row cells will contain an exact schema replica of the exchange TradeLoad DTO (see earlier in the table above).
This choice will help us go from a specific (trading pair) — (exchange) to less specific (timestamp — nanotime) and avoid hotspots when you query the data.
CostsThis tutorial uses billable components of Google Cloud Platform, including: Dataflow, Compute Engine, Google Cloud Storage, BigTableWe recommend to clean up the project after finishing this tutorial to avoid costs.
Use the Pricing Calculator to generate a cost estimate based on your projected usage.
Environment setupUse Terraform instructions if you are familiar with Terraform, it can save you a lot of time.
Otherwise, just continue.
We assume you have a Google Cloud Project associated with a billing account already (otherwise check out the getting-started section).
Log into the console, and activate a cloud console sessionWe’ll need a VM to drive the creation of the pipeline so let’s create one with the following command:Note how we used the compute engine service account with cloud API scope, we need that to easily build up the environment.
Wait for the VM to come up and SSH into it.
Installing necessary tools like java, git, maven, pip, python 2.
7 and cloud bigtable command line tool cbt using the following command:We’ll now enable some APIs, create a Bigtable instance and a bucket:In this scenario, we use one column family called ‘market’ to simplify the schema design.
For more on that you can read this link.
Ready to go, clone the repoNow we will build the pipelineYou should see this at the end if everything is OKNow we can finally start the pipelinePlease ignore illegal thread pool exceptions.
After a few minutes we can observe the incoming trades by peeking into the Bigtable table.
To observe the Dataflow pipeline navigate to the console Dataflow page.
And click the pipeline and view the Job status as Running:To run the Flask frontend server visualization navigate to frontend directory inside our VM and build the python packageVisualizationOpen firewall port 5000 for visualization:Link VM with the firewall rule:Navigate to frontend directoryFind your external IP in Compute console and open it in your browser with port 5000 at the end e.
http://external-ip:5000/streamYou should be able to see the visualization of aggregated BTC/USD pair on several exchanges (without predictor part)The goal of the tutorial — realtime ‘periscope’ multi exchange observerEnjoy!CleanupTo save on cost we can clean up the pipeline by running the following commandEmpty and Delete the bucket:Delete the Bigtable instance:Exit the VM and delete it from the console.
ConclusionIn this article, we have discussed the most important design and technical decisions: i) how to set up and configure pipeline for ingesting real-time, time-series data from various crypto exchanges ii) how to design suitable data model, which facilitates querying and graphing at scale.
Finally, we have provided a tutorial on how to set up and deploy the proposed architecture using GCP.
By following the tutorial steps we managed to establish a connection to multiple exchanges, subscribe to their trade feed, extract and transform these trades into a flexible format to be stored in Bigtable and to be graphed and analyzed.
If our readers show interest (please follow us to do so), we will extend the tutorial with the second part where we will build ML models to predict trends.
Do not hesitate to ask questions in the comment section.