The prediction of out-of-sample statistics from in-sample values is a recurrent topic in this blog and a fundamental problem for the long term profitable trading of algorithmic strategies. Being able to predict which systems are worth trading and which ones are not from simply looking at how the systems behaved in the past would be an extremely powerful tool and therefore a problem worth studying. Today we are going to look at a fundamental aspect of in-sample/out-of-sample relationships which is how much time we would need to test a strategy for relationships between statistics to become apparent. We will also talk about why these relationships do not show up from day one and what consequences this has when developing and trading algorithmic strategies.
–
–
Trading would be simple if we could draw relationships between short term out-of-sample results and in-sample statistics. If we could say with any degree of certainty that – for example – high Sharpe strategies in back-testing would tend to behave in the same manner within the next year it would be easy to mine and trade strategies successfully without having any sort of important psychological pain. However the fact is that the edge of most trading systems is rather small meaning that it takes time for the edge to manifest itself and in the short-term it is difficult to distinguish a system that has an edge from a system that is just profitable due to pure random chance. Like a slightly biased coin it takes a large number of throws to be able to distinguish it from a regular coin.
But how long does this process take? I know from my previous studies that there are some strong correlations between in-sample and pseudo out-of-sample variables when the pseudo out-of-sample period is long enough. As I have showed previously you can generate systems with data from 1986-2000 and establish some strong relationships with the 2000-2016 period. But how long would we have to trade to see these relationships? Because of the problem above it is fairly easy to say that those relationships are expected not to show in the short term but there must be some evolution of the relationships with time as they would not show instantaneously when the systems reached their 16th birthday.
–
–
To study this I performed 500 system generation runs on the EUR/USD using data from 1986 to 2016 with 3650 day in-sample periods (randomly chosen) followed by pseudo out-of-sample periods of different lengths. Generated systems were only expected to have a frequency of at least 10 trades/year and an R² higher than 0.90. Using this methodology I was able to obtain the data showed above. From this data it is clear that the correlation between the in-sample and pseudo-out-of-sample profit is almost zero when the pseudo out-of-sample period is small – for the first two years – and only starts to grow significantly after the third year of system trading.
This means that there is a relationship between the edge of the systems within the in-sample and the pseudo out-of-sample but it simply takes a long time before this edge accumulates enough tries to show up in any significant manner. For the first two years the systems are practically indistinguishable from chance and only in the third year do we start to see some rather small correlation. This correlation then evolves substantially as a function of time and reaches quite significant values as we reach year 6 of out-of-sample testing. Sadly I couldn’t go higher in this analysis as the random sampling of periods does not allow a large enough distribution when going to higher pOS values but the correlation is expected to stabilize at some point in time, probably reaching something close the correlation value in the 1986-2000|2000-2016 split tests I have done before.
–
–
The correlation graphs above show how the correlation of variables evolve. It is evident that in the short term we can only draw relationships between risk and trading frequency related variables – as I have recently talked about in a post – but relationships between return related variables (like the Sharpe, Ulcer Index, Profit, etc) only start to become significant after more than 5 years have passed. You can see how there are squares that are practically white in the 2 year pOS test that start to become much darker in the 6 year correlation graphic, those are return related variables that start to correlate when the pOS period becomes big enough.
With all this in mind it starts to become clear that looking at short term correlations between in-sample and out-of-sample variables is most probably a waste of time. Up until now my live trading experience has perfectly matches the above, at short term periods it is not possible to draw meaningful linear or non-linear relationships between IS and OS data, simply because – as a slightly biased coin – it takes a lot of time for edges to show themselves. If you would like to learn more about trading systems and how we work to develop profitable trading methodologies please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies
Quite interesting, as usual :)
Just a quick question: in the first graph in your post, on the y-axis is this the usual correlation quotient? If so, do I read it right, that after approx 2200 days (6 years) the overall correlation is 0.4 something? Assumed that the linear relationship holds, it would take about 12-15 years until we would see sufficiently high correlation values in order to be reasonably sure that our Approach has some merrit.
Another question would be if you have applied any measures to stop trading for a strategy during the test, like we do it in live trading? Otherwise it could happen, that a strategy first hits the worst case scenario, but then again gets profitable and correlated later on – but in live trading we would never see this.
Thx for your great work!
Hi Fd,
Thanks for writing! Always happy to read your comments. There are several limitations to this experiment, the results cover only a single generation mechanism using a single symbol on a single timeframe. Definitely the above results could change for different generation mechanisms on different symbols and timeframes so I wouldn’t call these results anything else than indicative right now. It is also very probable that the linear relationship won’t hold, my gut feeling – from my experience with similar previous results – is that at best it will be logarithmic and will tend to 0.6-0.7 in the long term, I doubt it will ever be above 0.8.
You are also right in that this does not cover any other aspects of the methodology, such as the discarding you have mentioned. I have evidence that discarding improves results more than it deteriorates them – systems that fail according to MC tend to take deeper losses afterwards – but that is something we will leave for another post :o). Thanks again for writing and for continuing to be a faithful reader!
Best Regards,
Daniel