# Stock Clustering with Time Series Clustering in R

Stock Clustering with Time Series Clustering in RYin-Ta PanBlockedUnblockFollowFollowingAug 9, 2018IMPORTANT: THIS IS NOT INVESTMENT ADVICE.

As a newbie in stock market, the amount of choices available always prevents us from moving forward.

It would be much easier if there is a tool can classify different stocks based on their historical stock price and then we can determine my investment strategy.

For this purpose, time series clustering with dtwclust package in R is perfect.

It can compare different stock prices and group them together, with few lines of R code.

Acquiring DataFor a conservative investor, we choose to invest in S&P 500 companies.

It is pretty easy to access the a list of S&P 500 companies list and symbols from the internet.

After we have the list of companies and symbols, we can lookup historical stock prices from yahoo finance with quantmod package in R.

Sample code to access Apple’s historical stock price from 7/1/2015 is below:library(quantmod)stock = getSymbols("AAPL",src='yahoo',from = '2015-07-01', auto.

assign = F)Time Series ClusteringIn this analysis, we use stock price between 7/1/2015 and 8/3/2018, 780 opening days .

Besides, to be convenient, we take close price to represent the price for each day.

As stock prices for different companies vary a lot, we need to standardize them before comparison.

Here, we can calculate the z score with the corresponding sample mean and deviation for each company in our dataset.

Without doing and grouping, we can plot the time series of standardized stock prices for the companies in Figure 1.

As we can see, it is hard to detect which companies worth our money.

However, generally there is a rising trend for S&P 500 companies in the recent three years.

The coefficient for the regression line is 0.

002, indicating that the standardized stock price would increase 0.

002 per opening day, on average.

That is, if you are rich enough to buy all the stocks in S&P 500 and put them for three years, you should have some positive returns on your investment, without considering any inflation or interest rate.

Figure 1: standardized price time seriesHowever, for most of us, at least for me, cannot execute the previous investment strategy.

Therefore, we need to do some grouping based on historical stock price performance.

There are many options for dtwclust package in R, and we strongly recommend the two articles: Comparing Time-Series Clustering Algorithms in R — Using the dtwclust Package and Time Series Clustering Along with Optimizations for the Dynamic Time Warping (DTW) Distance, for more information.

In this analysis, we first try partitional clustering with Dynamic Time Warping distance (DTW) and Euclidean distance, for 5, 10 and 20 clusters.

The sample code for DTW distance and Eulidean distance of 5 clusters is following:# DTW distance# The input of normalized_price is a list of normalized time-series stock pricedtw_cluster = tsclust(normalized_price, type="partitional",k=5, distance="dtw_basic",centroid = "pam",seed=1234,trace=T, args = tsclust_args(dist = list(window.

size = 5)))# Euclidean distanceeu_cluster = tsclust(normalized_price, type="partitional",k=5, distance="Euclidean",centroid = "pam",seed=1234,trace=T)After clustering, we can draw the time series plot for each group to evaluate the groups in Figure 2 and Figure 3.

As we can see, instead of aggregating too many companies together and losing the significant trends or dividing into a lot of groups and few differentiation between each group, 10 clusters might be a good choice.

In the following analysis, we will focus on the 10 clusters with DTW distance, with window size equal to 5.

Figure 2: clustering with DTW distance for 5, 10 and 20 clustersFigure3: clustering with Euclidean distance for 5, 10 and 20 clustersCluster AnalysisWe can build up a better investment strategy by putting more money in stocks in booming groups and being careful to stocks in slumping clusters.

In Figure 4 we can observe the major trend in each group.

Obviously, group 1 and 5 have stronger growing trend for standardized stock price; in contrast, group 2, 6 and 10 have somehow decreasing trend.

Figure 4: 10 clusters with DTW distanceTo make it clearer, we can calculate the average gross spread, as part of investment return, for each group.

In Figure 5 we can see that on average, we can earn about \$80 per stock if we invest in group 1, but lose \$30 per stock in group 6 in recent three years.

In terms of return rate, if we consider only the investment returns from the price differences, the return rate is about 97% if we put our money in group 1 as an investment portfolio for three years.

Oppositely, there is around -30% return rate for the portfolio of group 6 stocks.

For now, our investment strategy seems to be better.

Figure 5: gross spread and return rate for each groupHowever, besides considering only return, we also need to think of risk.

In Figure 6, we have smoothed density estimates plots for each group, based on the differences of standardized stock prices on 7/1/2015 and 8/3/2018.

It is amazing that the standardized gross spread for group 1 and group 5 are concentrated above 0, indicating that they are not only providing better return but also less risky.

If I were able to go back to 2015, I would put all of my money into stocks in group 1 and group 5.

Figure 6: standardized gross spread distribution for each groupSadly, we cannot travel back the time, so what we can do is to put more effort to understand the composite of each group.

By doing so, at least we can learn something from the past.

Besides, if we believe that the market does not change a lot (unfortunately, it is hardly possible), we can still put our money in group 1 or group 5.

It shows the percentage of different industrial sectors in each group in Figure 7.

Information Technology is the major sector in the “historically lucrative” group 1, similar to what we realized in the recent years.

Besides, Financials and Industrials might also are good choices as they take some shares in group 1 and group 5 too.

On the contrary, we should be more careful for Health Care industry as it is one of the major sector in declining group 6.

Figure 7: Percentage of different industrial sectors in each groupOther from looking at industry sectors, we can also search for the groups of our target companies.

If they are in group 1 or group 5, we might have greater confidence as they had good performance in the past.

If the companies are in group 6 or group 7, hopefully they are at end of the night now.

SummaryIn this analysis, we apply R with dtwclust package to classify S&P 500 stocks into different groups, based on their historical stock price.

The discrepancy between groups is large, as the best group can provide 97% return on investment in recently three years, but -30% for the worst one.

Also, the distributions of return, indicating the risk of investment, are very different from groups to groups.

By doing the analysis, we can come up with our own strategy much easier as we can point out which companies had more desirable performance in the past.

Also, we can have better understanding about the overall performance in different industries by the analysis.

However, as all text books mentioned, the good performance in the past DOES NOT guarantee a better one in the future.

All of our analysis is based on historical data and therefore the outcome cannot exactly tell what you should do now.

After all, if I know an investment strategy with 97% return, I will keep it in secret!.. More details