After more than one year of trading: Are there any in-sample/out-of-sample correlations?

After more than a year of trading GPU mined price action based strategies we now have enough data to attempt to perform some initial statistical explorations to see if we can refine our criteria for the selection of trading systems any further. Have there been any strong relationships between in-sample statistics and out-of-sample statistics? Is there any way in which we could have foreseen the failure of some of these trading systems? Today I want to talk about some basic statistical analysis over this data that shows us how our oldest systems’ (the first 100 which have more than 1 year of trading) in sample statistical variables relate to their out of sample performance, in particular to their out of sample Sharpe ratio and profit. In this case out-of-sample refers to the trading data since the strategies were mined, a real out of sample performance measurement.

Before starting with the analysis it is worth it to further talk about how the systems were initially selected. There are also several filters that are applied to the systems before addition to the repository, some applied during the data-mining bias testing we do to ensure that our systems have a low probability to be the result of random chance. The systems that are added in the end have a high R² (higher than 0.9) and a Sharpe ratio that is above 0.6 and they also exhibit a low correlation (R<0.5) with all the other strategies within the trading repository. These requirements were derived from my previous research in historical long term results where the highest likelihood of 10 year survival after system mining using historical data was strongly correlated with the presence of a high stability (high R²) and at least some minimum threshold for risk adjusted returns (which we have calculated using both the Burke and Sharpe ratios).

After removing all columns where NaN or Infinity values where present the first thing I decided to do was a basic correlation analysis to see if there was any obvious correlation between the out of sample sharpe and the in sample statistical variables calculated using qqpat. The heat map above shows the correlation between the in sample variables and the out of sample variable (the os_sharpe). The table below shows the correlation values. As you can see there is no strong correlation between any in sample variable and the out of sample Sharpe ratio. The strongest correlations are for the sharpe.ratio, the pearson.correlation and the average.drawdown.length, although the magnitude of these correlations is still very low. The out of sample absolute profit paints an even less interesting picture with the correlations dropping significantly although the most relevant correlations remain the sharpe.ratio and the pearson.correlation. That said the correlations are extremely weak and therefore do not provide a significant avenue for improving selection criteria.

Since there are no easy to see linear relationships within the data then the next step is to try to create some models to see if we can indeed predict the out of sample performance using the in sample variables. Since we have so many different variables and the importance of the variables is not very well known a random forest model provides an excellent first attempt to see if we can indeed create something for the prediction of out of sample performance. We split the systems into two sets (80% training, 20% testing) preserving the distribution of the out of sample profit variable and we train a random forest model on the training set containing 80 systems, we then test the model on the remaining 20 systems to see how well the model predicts the results (to see if the results are just due to curve fitting and there is really no substance to the underlying relationships found).

Not surprisingly and matching some of my previous research on the matter we find that for high R² and minimum Sharpe systems – like the ones in our price action based repository – we cannot come up with a random forest model that has any sort of reasonable accuracy in the prediction of the out of sample profit or sharpe. The image below shows you the training and testing results for real os_profit Vs predicted os_profit for the tested systems. As you can see there is simply no predictive ability within all the variables being used against the out of sample variables for these systems. As I discovered along the past few years once you select systems that are uncorrelated, highly stable and have some minimum risk adjusted return your expectation to perform well going forward is basically unknown due to the fact that the future cannot be known in advance.