After more than a year of trading GPU mined price action based strategies we now have enough data to attempt to perform some initial statistical explorations to see if we can refine our criteria for the selection of trading systems any further. Have there been any strong relationships between in-sample statistics and out-of-sample statistics? Is there any way in which we could have foreseen the failure of some of these trading systems? Today I want to talk about some basic statistical analysis over this data that shows us how our oldest systems’ (the first 100 which have more than 1 year of trading) in sample statistical variables relate to their out of sample performance, in particular to their out of sample Sharpe ratio and profit. In this case out-of-sample refers to the trading data since the strategies were mined, a real out of sample performance measurement.

–

–

Before starting with the analysis it is worth it to further talk about how the systems were initially selected. There are also several filters that are applied to the systems before addition to the repository, some applied during the data-mining bias testing we do to ensure that our systems have a low probability to be the result of random chance. The systems that are added in the end have a high R² (higher than 0.9) and a Sharpe ratio that is above 0.6 and they also exhibit a low correlation (R<0.5) with all the other strategies within the trading repository. These requirements were derived from my previous research in historical long term results where the highest likelihood of 10 year survival after system mining using historical data was strongly correlated with the presence of a high stability (high R²) and at least some minimum threshold for risk adjusted returns (which we have calculated using both the Burke and Sharpe ratios).

After removing all columns where NaN or Infinity values where present the first thing I decided to do was a basic correlation analysis to see if there was any obvious correlation between the out of sample sharpe and the in sample statistical variables calculated using qqpat. The heat map above shows the correlation between the in sample variables and the out of sample variable (the os_sharpe). The table below shows the correlation values. As you can see there is no strong correlation between any in sample variable and the out of sample Sharpe ratio. The strongest correlations are for the sharpe.ratio, the pearson.correlation and the average.drawdown.length, although the magnitude of these correlations is still very low. The out of sample absolute profit paints an even less interesting picture with the correlations dropping significantly although the most relevant correlations remain the sharpe.ratio and the pearson.correlation. * That said the correlations are extremely weak and therefore do not provide a significant avenue for improving selection criteria*.

–

–

Since there are no easy to see linear relationships within the data then the next step is to try to create some models to see if we can indeed predict the out of sample performance using the in sample variables. Since we have so many different variables and the importance of the variables is not very well known a random forest model provides an excellent first attempt to see if we can indeed create something for the prediction of out of sample performance. We split the systems into two sets (80% training, 20% testing) preserving the distribution of the out of sample profit variable and we train a random forest model on the training set containing 80 systems, we then test the model on the remaining 20 systems to see how well the model predicts the results (to see if the results are just due to curve fitting and there is really no substance to the underlying relationships found).

Not surprisingly and matching some of my previous research on the matter we find that for high R² and minimum Sharpe systems – like the ones in our price action based repository – we cannot come up with a random forest model that has any sort of reasonable accuracy in the prediction of the out of sample profit or sharpe. The image below shows you the training and testing results for real os_profit Vs predicted os_profit for the tested systems. As you can see there is simply no predictive ability within all the variables being used against the out of sample variables for these systems. As I discovered along the past few years once you select systems that are uncorrelated, highly stable and have some minimum risk adjusted return your expectation to perform well going forward is basically unknown due to the fact that the future cannot be known in advance.

–

–

The above results show you how after selecting systems that are very stable and have acceptable risk adjusted returns there is not much you can do to predict their future performance, especially at short term intervals such as a year. The graph above also highlights why failure detection is very important. Although some of the systems we created during the past year did reach drawdowns in the 10-30% region we were able to stop them in live trading way before this happened due to our worst case detection mechanisms. Being able to discard strategies once they stop working is just as important as being able to properly select new strategies to trade. If you would like to learn more about how to mine trading strategies and how you too can create a trading strategy portfolio please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.

it curve fitted like

http://www.nuclearphynance.com/Show%20Post.aspx?PostIDKey=139614

I wouldn’t argue this to be the case. Almost all systems (+95%) from these 100 are within what is expected from their back-testing returns. This means that the very large majority are all within what are plausible 1+ year trading scenarios for the systems. For the large majority the fact that they have had losses does not mean that they are “not working as expected” merely that they are within a predictable drawdown scenario according to their historical distribution of returns. Sure, some have indeed failed (less than 5%) which is expected due to data-mining bias, curve-fitting bias, etc.

But you trade “uncorrelated portfolio” strategies should compensate each other like in sample. You trade trend following , You earn money when you have heavy tails.

Try to plot two distributions in sample vs out sample for each traded instruments and compare tails…

Yes if you do this you’ll find that the out of sample has far less “fat tails” when compared with the in-sample, but the in sample is a 30 year period while the out of sample is far smaller. If you look for periods of the same length in the in-sample with similar kurtosis then you find similar performance for the overall set.