Using R in Algorithmic Trading: Back-testing a machine learning strategy that retrains every day

On my last post we went into the world of machine learning with the building of a simple machine learning model using a support vector machine to attempt to predict the daily returns of the UUP in R. However failing to or being able to predict the return of a series is no indicative of actual trading success or failure since trading returns are clearly not homogeneous (not all bad days are the same and not all good days are the same). This means that you can have a machine learning technique that has quite low accuracy (below random chance) but it can still be successful if it tends to predict positively the days that have the highest returns. When using machine learning strategies for trading it is not only a matter of what percentage of the time you’re right/wrong but how much money you make or lose when you trade. Due to this fact it becomes very important to test our models within an actual back-test, so that we can actually find out whether our machine learning algorithmic works, in practice, when trading (at least from a historical perspective). On today’s tutorial we’re going to learn exactly how to do that.

library(e1071)
library(quantmod)
library(PerformanceAnalytics)

getSymbols("GLD",src="yahoo", from="1990-01-01")

To do the above we are going to be needing the following R packages: e1071, quantmod and PerformanceAnalytics. Note that the packages need to be loaded in the exact order above, because we’re going to be using some functions that are defined on e1071 and PerformanceAnalytics but we want the PerformanceAnalytics functions (so you’ll get errors if you load PerformanceAnalytics first and the links to the function names get overwritten internally). Today we’re going to run a back-test on a gold ETF (GLD), attempting to predict it using an SVM. Our SVM model will retrain on every bar using the past X bars. First of all, we load the libraries and we then obtain the data for the GLD ETF using the quantmod getSymbols function. Once this is done we will be using a simple function created by this guy that allows us to create a data frame containing all the predictors in a very organized manner (I fully recommend the post linked before, which shows you how to create/test a classification model using an SVM on the SPY). I also changed this function to also return the plain return of the series so that I can assign it to another array and later use it to get the profit/loss for the strategy. Define the function first in R as detailed below:

addFewerFeatures = function(data)
{

close = Cl(data)

returns = na.trim(ROC(close, type="discrete"))

# n-day returns
res = merge(na.trim(lag(returns, 1)),
na.trim(lag(ROC(close, type="discrete", n=2), 1)),
na.trim(lag(ROC(close, type="discrete", n=3), 1)),
na.trim(lag(ROC(close, type="discrete", n=5), 1)),
na.trim(lag(ROC(close, type="discrete", n=10), 1)),
na.trim(lag(ROC(close, type="discrete", n=20), 1)),
na.trim(lag(ROC(close, type="discrete", n=50), 1)),
na.trim(lag(ROC(close, type="discrete", n=100), 1)),
na.trim(lag(ROC(close, type="discrete", n=150), 1)),
na.trim(lag(ROC(close, type="discrete", n=200), 1)),
all = FALSE)

# other features

res = merge(res,
xts(na.trim(lag(rollmean(returns, k=21, align="right"),1))),
xts(na.trim(lag(rollmedian(returns, k=21, align="right"),1))),
xts(na.trim(lag(apply.rolling(returns, width=21, FUN=sd),1))),
xts(na.trim(lag(apply.rolling(returns, width=21, FUN=mad),1))),

xts(na.trim(lag(apply.rolling(returns, width=21, 
align="right", FUN=skewness),1))),

xts(na.trim(lag(apply.rolling(returns, width=21, 
align="right", FUN=kurtosis),1))),all = FALSE)

# add volume
res = merge(res, xts(na.trim(lag(Vo(data),2))), all=FALSE)

# add result column
nextday = ifelse(returns >= 0, 1, -1) 
res = merge(res, nextday, all=FALSE)
res = merge(res, returns, all=FALSE)

res <- na.omit(res)

colnames(res) = c("ROC1", "ROC2", "ROC3", "ROC5", "ROC10", 
"ROC20", "ROC50", "ROC100", "ROC150", "ROC200",
"MEAN", "MEDIAN", "SD", "MAD", "SKEW", "KURTOSIS","VOLUME1", 
"output", "returns")

return(res)
}

Note that the function defines 17 different predictors that we use as inputs to predict a binary classifier (1 bullish, -1 bearish) which we will then use to train our SVM. Also notice how the function takes advantage of the ROC function and other nifty vector based functions that are an order of magnitude more efficient than the loop I shared with you on my last post. Once the function is defined we can then simply call it to populate a data frame called “data” . We are then going to create a data frame called “daily” and assign it the daily returns, after which we will be deleting them from the main data array (because having the return in the predictor array would lead to snooping).

data <- addFewerFeatures(GLD)
daily <- data$returns
data$returns <- NULL

After this we can now run the back-test for our system. What we will do first is choose a learning period (the algorithm will be trained with the past learningPeriod number of bars) and we then create an empty data frame called “results” where we will be saving the daily returns of our trading strategy. In the case below I have chosen a learning period of 200. After this we loop through all the points of our data frame from lerningPeriod+1 to the length of the array and on each point we’re going to create a training subset using a learningPeriod number of bars prior to the current data point and we are then going to use it to train our SVM. After this we’re then going find the prediction of the newly built SVM model for the next return and we’re going to add a positive or negative return to our results array depending on whether our result matches or mismatches the actual real output class. Note that in any case (whether we’re right or wrong) I have subtracted 0.0001 from the return, which is the commission burden I have chosen to put on all trades (0.01% of trade volume).  I then use the chart functionality of the performance analytics library to display a graph of trading progress every 200 bars.

learningPeriod <- 200
result <- c()

for (i in (learningPeriod+1):(length(data[,1])-2)){

efTrain <- data[(i-learningPeriod):i,]
r1 <- svm(factor(output) ~ ., data = efTrain, cost = 100, gamma = 0.1)
r1.pred <- predict(r1, data[i+1,1:17])
r1.pred <- data.frame(r1.pred)

if (as.numeric(as.character(r1.pred[1,])) == data$output[i+1]){
result <- rbind(result,abs(daily[i+1,1])-0.0001) # we won
} else {
result <- rbind(result,-abs(daily[i+1,1])-0.0001) # we lost
}

if (i %% 200 == 0){
charts.PerformanceSummary(result, ylog=TRUE)
}

}

After this is finished we will have a graph showing the results of the system in a format given by the PerformanceAnalytics package. Since you have an array of returns in xts format you will also be able to invoke any of the functions within this package to further analyze the statistics of your trading system. The image below shows you the return we have obtained for this strategy on the GLD ETF, we can see that the machine learning technique used was not able to properly predict directionality in a large variety of cases. This shows, as we also saw on our previous post, that achieving profitable returns using machine learning procedures is not very easy (even when using as many inputs as we have used here).

2014-09-04_9-12-01

 

However by choosing inputs a bit better, doing some data pre-processing and modifying the gamma and c characteristics of the SVM we can indeed obtain some profitable results for the daily retrained SVM on GLD as showed below(note that the y axis is logarithmic). By using ensemble techniques, including several different machine learning algorithms and choosing even better inputs we can improve the results below even further, however we’ll leave this discussion for future posts.

2014-09-04_9-19-23

If you would like to learn more about machine learning techniques and how you too can create your own systems that re-train on every bar and give historically profitable results please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

You can skip to the end and leave a response. Pinging is currently not allowed.

4 Responses to “Using R in Algorithmic Trading: Back-testing a machine learning strategy that retrains every day”

  1. fab says:

    Aloah,
    i am a bit puzzled. From your previous posts you seemed to be few steps further than this post. What happened to your own framework for machine learning strategy creation and how are you guys with the GPU approach on the lower time frames. I recall you concluded that there is more potential in the intra day strategies (which much lower drawdowns). I would be curious to see an update there :-) So to me your post on R strategy building on the DTF seems like a result from some drawbacks on your former track? I cross fingers that is not the case..

    Greetings and still thanks for this “howto”

    Fab

    • admin says:

      Hi Fab,

      Thanks for your post :o) Don’t worry, we continue to research machine learning on the lower time frames and the creation of systems using data-mining and GPUs. We’re making some good advances in both of these fields and our cloud data-mining is already generating many promising systems (many of them already under live testing). These articles are meant as tutorials for those who don’t have access to our tools but wish to carry out some machine learning experiments using R with freely available data and software. Please don’t take all article I post as the “state of the art” of what we are doing, many are meant as educational tools for those who want to start their journey into our research paths. Thanks again for posting,

      Best Regards,

      Daniel

Leave a Reply

Subscribe to RSS Feed Follow me on Twitter!
Show Buttons
Hide Buttons