# Nailing The Basics of Pairs Trading with Python

Most stocks follow this trend in the long-run, but it’s the short-run we have to worry about.

However, with a pair of stocks, if there is a relationship between them such that the RATIO between their prices is mean-reverting, we can take advantage of the pattern and make profits regardless of the trend of the market or the economy.

To make this as concrete as possible, I’ll use an example.

If Apple and Facebook stocks have this type of relationship, their time series will hover relative to each other in a somewhat constant way.

If, for some reason, the two series start to drift away, I can buy the one that’s lower and short the one that’s higher because I know that they will eventually come to each other.

It’s like a toxic relationship where two people can’t stand each other and often drift away but always end up coming back to each other somehow.

There’s a term for this type of time series relationship, cointegration.

I’ll give a more in-depth explanation for cointegration in a bit.

For now, let’s move from the theory and get to some code.

Generating Fake SecuritiesLet’s actually implement the concept of pairs trading with some Python code!.We’ll first start by getting some intuition on how the strategy actually works with some fake time series data.

Importing LibrariesLet’s get the necessary Python libraries.

For the sake of following along with this guide, it’s best to set the same random seed as me.

np.

random.

seed(107) # So that you can get the same random numbers as meNow we generate two fake securities.

Let’s have their returns drawn from a normal distribution and create the time series with random walks.

Here is our first security time seriesFor the sake of the illustration and for easy intuition, we will generate the Y security to have a clear link with X, so the price of Y should vary in a similar way to X.

What we can do is just take X and shift it up slightly and add some noise from a normal distribution.

Create Y, join the two, then plot both at the same timeOur two security time seriesIllustrating CointegrationLet’s get this right off the bat.

Cointegration is NOT the same thing as correlation!.Correlation means that the two variables are interdependent.

If you’ve studied statistics, you know that correlation is simply the covariance of the two variables normalized by their standard deviations.

Cointegration is slightly different.

It means that the ratio between two series will vary around a mean.

So a linear combination like:Y = αX + ewould be a stationary time series.

Now what is a stationary time series?.In simple terms, it’s when a time series varies around a mean and the variance also varies around a mean.

What matters most to US is that we know that if a series looks like its diverging and getting really high or low, we know that it will eventually revert back.

Just like how we know that if we throw a football upwards into the air, it will eventually come down.

Likewise with a stationary series, if a time series seems to be drifting away from the mean, it will eventually come back.

Let’s plot the ratio between our two fake securities to show you what cointegration actually looks like.

Plotting the ratio of Y over XNotice how the time series tends to revert around the mean?Cointegration TestYou now know what it means for two stocks to be cointegrated, but how do we actually quantify and test for cointegration?The module statsmodels has a good cointegration test that outputs a t-score and a p-value.

It’s a lot of statistical mumbo-jumbo that shows us the probability that we get a certain value given the distribution.

In the end, we want to see a low p-value, ideally less than 5%, to give us a clear indicator that the pair of stocks are very likely to be cointegrated.

We imported a cointegration test a bit earlier with the code:from statsmodels.

tsa.

stattools import cointHere coint is a function that takes in a pair of securities and outputs a p-value, which basically (avoiding the statistics lingo) show how highly cointegrated the two time series are.

The lower the p-value (ideally lower than 5%), the more likely the two are cointegrated.

The p-value is VERY low, so the two time series are cointegratedThis result should be very obvious, because we purposely generated the fake securities to be related to each other!Clarification of Difference between Cointegration and CorrelationIn case you are a bit on the ropes regarding the difference between correlation and cointegration, let me show you some pictures that will make the distinction between correlation and cointegration clear.

High correlation but definitely not cointegratedHere is a clear example of two series with high correlation but a high p-value, indicating that they are not cointegrated at all.

Now let’s take a look at the opposite side of the spectrum: two series with low correlation but are very cointegrated.

Perfectly cointegrated yet the correlation is bunkDo you get the difference now?.I hope so.

Testing on Historical DataLet’s move on from the training wheels and get to some REALLY meaty stuff, real data.

How to actually make a pairs tradeNow that we’ve clearly explained the essence of pair trading and the concept of cointegration, it’s time to get to the nitty-gritty.

We know that if two time series are cointegrated, they will drift towards and apart from each other around the mean.

We can be confident that if the two series start to diverge, they will eventually converge later.

When the series diverge from one another, we say that the spread is high.

When they drift back towards each other, we say that the spread is low.

We need to buy one security and short the other.

But which ones?Remember the equation we had?Y = αX + eAs the ratio (Y/X) moves around the mean α, we watch for when X and Y are far apart, which is when α is either too high or too low.

Then, when the ratio of the series moves back toward each other, we make money.

In general, we long the security that is underperforming and short the security that is overperforming.

In terms of the equation, when α is smaller than usual, that means that Y is underperforming and X is overperforming, so we buy Y and sell X.

When α is larger than usual, we sell Y and buy X.

Data Analysis of Stock MarketBefore we begin, I’ll first define a function that makes it easy to find cointegrated security pairs using the concepts we’ve already covered.

It’s important to include the market itself into the data because there is such a thing as a confounding variable which is when two stocks are not actually cointegrated with each other but with the market, which can mess up our numbers.

I will be using a Python module called Stocker which I cloned from Will Koehrsen’s GitHub.

For libraries, you will need to also install:quandlfbprophetpytrendspystan!git clone 'https://github.

com/WillKoehrsen/Data-Analysis.

git'!pip install -U quandl numpy pandas fbprophet matplotlib pytrends pystanNow we will create the dataset for containing the time series we will analyze.

This section is much more akin to the data cleaning part of data science, which I’m very familiar with.

I’m starting with a handful of stocks.

Obviously this is not all-encompassing.

If you want to analyze more potential pairs, feel free to add more.

This is what you should see in the dataframeNow I will add in the ETF (Exchange Traded Fund) data for the S&P 500.

Now I’ll join the two sets into one big set called all_prices.

Now that we’ve got our data, let’s go find some cointegrated pairs.

Looks like we have 4 pairs!According to this heatmap which plots the various p-values for all of the pairs, we’ve got 4 pairs that appear to be cointegrated.

Let’s plot their ratios on a graph to see what’s going on.

It appears that our first pair, Adobe and Microsoft, has a plot that moves around the mean in the most stable way.

Let’s stick with this pair.

What we need to do next is to try to standardize the ratios because the absolute ratio might not be the most ideal.

We need to use z-scores.

Remember from stats class?.The z score is calculated by:Z Score (Value) = (Value — Mean) / Standard DeviationLet’s create a function that calculates the z-scores and plot the z-scores of our first pair between ADBE and MSFT.

See the mean reversion between the z-score range?By setting two other lines placed at the z-scores of 1 and -1, we can clearly see that for the most part, any big divergences from the mean eventually converge back.

This is exactly what we want for pair trading.

Trading SignalsWhen conducting any type of trading strategy, it’s always important to clearly define and delineate at what point you will actually do a trade.

As in, what is the best INDICATOR that I need to buy or sell a particular stock?.That’s what a trading signal is.

Let’s break down a clear plan for creating our trading signals.

Setup rulesIf we’re going to look at our ratio and see if it’s telling us to buy or sell at a particular moment in time, let’s create a prediction variable Y:Y = Ratio is buy (1) or sell(-1)Y(t) = Sign(Ratio(t+1) — Ratio(t))What’s great about pair trading signals is that we don’t need to know absolutes about where the prices will go, all we need to know is where it’s heading: up or down.

Train Test SplitWhen training and testing a model, it’s common to have splits like 70/30 or 80/20.

Because our data is from 2000–12–12 to 2016–12–12, I’ll split it 11 years (~70%) and 5 years (~30%).

Feature EngineeringWe need to find out what features are actually important in determining the direction of the ratio moves.

Knowing that the ratios always eventually revert back to the mean, maybe the moving averages and metrics related to the mean will be important.

Let’s try using these features:60 day Moving Average of Ratio5 day Moving Average of Ratio60 day Standard Deviationz scoreThat’s pretty.

And enlightening!.Let’s also take a look at the moving average z-scores.

Look at the mean reversion.

It’s beautiful.

Creating a ModelTaking a look at our z-score chart, it’s pretty clear that if the absolute value of the z-score gets too high, it tends to revert back.

We can keep using our +1/-1 ratios as thresholds, and we can create a model to generate a trading signal:Buy (1) whenever the z-score is below -1.

0 because we expect the ratio to increaseSell (-1) whenever the z-score is above 1.

0 because we expect the ratio to decreaseTraining and OptimizingHow well does our model work on actual data?.Bet you’re dying to figure out.

Let’s look at our trading signals for the ratios up to around 2009.

This is our training data.

For the most part, our model is pretty good at picking the right times to tradeOf course, this is just the trading signals for the ratios, what about the actual stocks?BOOM!.How ‘bout dat?.That is beautiful.

Now we can clearly see when we should buy or sell on the respective stocks.

Let’s see how much money we can make off of this strategy, shall we?Let’s define a function that goes through the whole trading signal process.

Now what happens when we backtest on our trading data?Looks like our strategy is profitable!Let’s also backtest on our test data (2010 to 2016)Very nice!Given that this data is occurring smack in the middle of the Great Recession, I’d say that’s not bad!Areas of ImprovementThank you for taking the time to read through this article!.Feel free to check out my portfolio site or my GitHub.

By no means is this a perfect strategy and by no means was the implementation depicted in this article the best.

There are several things that can be improved.

Feel free to play around with the notebook or python files!1.

Using more securities and more varied time rangesFor the pairs trading strategy cointegration test, I only used a handful of stocks.

Feel free to test this out on many more, as there are a lot of stocks in the stock market!. More details