I have written several posts in the past about the building of RF models for the prediction of OS returns in our price-action based trading system repository at Asirikuy (you can read more here, here and here). So far we have built 7 different models to attempt to solve this problem, 2 of these models are meant to predict only the first sixth months of OS performance for systems that have just been mined (new systems without an OS) while the other 5 models aim to provide continuous predictions for the following weeks/months for our trading strategies, training with data that comes solely from the real out-of-sample of the strategies (data that didn’t exist when the systems were mined). Some of these continuous prediction models are really single models that use data from all the strategies in the repository while others are actually a single model per system that uses market conditions as inputs to make predictions. Today I want to share with you some data on the latest model I have created to tackle the problem of OS return predictions, what inputs it uses and how it innovates relative to the other models we have.

–

–

From the models we currently have, the most successful have been those created using data for all systems to make predictions with a 180 day look back window and a 90 day forward looking window. This means that the examples used to build the model are created using past sets of 180 days to predict the following 90 days. In essence these models output the probability that a system will be able to give a positive return in the following 90 days, given the systems’ characteristics, which never change with time, the systems’ performance and some properties of the general market during this time. To enhance the results of previous models I decided to test a wider array of trading system statistics and to include some volatility measurements of the overall market to see if these could be used to enhance predictions.

The graph above shows you the final variable importance plot for the random forest (RF) algorithm I ended up with. Compared to other models the ulcer index and Pearson correlation are now present as some of the most important inputs. I also included the standard deviation and returns for 9 general market ETFs. However because I included so many new variables it was necessary to perform some sort of pruning to avoid curve-fitting bias, so I removed the 10 least important variables from the model before arriving at the plot above. This importance was figured out looking only at the training data – which is 80% of the available examples – which means that there isn’t any forward looking bias in this step.

–

–

As usual, the Bayes statistics evaluated over the testing set – which is the last 20% of the data – are the most interesting part when building these models. As you can see the sensitivity, specificity and posterior probability values are better for previous models since in this case we are able to achieve a higher sensitivity at a lower probability threshold with a higher accuracy and posterior probability. This means that we end up being more certain about positive predictions and generating a higher positive change in the mean return compared with the average return in the testing set without any predictions being made. In the end we select a larger number of systems more accurately which leads to an improvement of +250% relative to the mean return without predictions in the testing set. In line with previous models this value increases even further at a 0.55 probability threshold, then declining when going to 0.6 as the sensitivity of the model becomes virtually zero (almost no positive predictions are made so the probability of being hit by randomly selecting something bad becomes larger).

This new model is a relatively significant advance towards better predictions at lower variable complexities using better predictors. The previous model of this type used a significant amount of additional variables – since no pruning was involved – meaning that it was subject to more significant bias than this model. With the elimination of predictors that are not very relevant and the use of market volatility descriptors – which do seem to confer an additional edge – we now have a model that has been able to make the best improvements up until now within testing sets. As always it is worth pointing out that the testing set contains some hindsight – as it is obviously data that exists right now – so we will need to see how this model improves things across real out-of-sample testing to say for sure whether it’s better or worse than our current RF system selection models.

–

–

There are however certain improvements to the above model – and potentially our other models – that can take them to even better performance. Next week we will explore the effect of using thresholds for predictions in this model to see what happens when we don’t try to only predict whether the period was just profitable but above a certain profitability threshold. If you would like to learn more about machine learning and how you too use machine learning to select which systems to trade from a repository of nearly 11 thousandÂ please consider joiningÂ Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.