Out of Sample Testing: The Power and Correct Execution of This Simulation Technique

When we’re building trading strategies it becomes clear that one of the main problems we have is ensuring a high probability of future profitability. This issue is the most important problem in algorithmic trading development since the most important question is not about future returns but about future survival. From all the ways in which we can make sure future profitability is attained, none is more powerful than out of sample testing since it gives us an idea about how our system performs out side of its “comfort zone”. However there are several important things one must take into account when doing out of sample testing to ensure that it is done without any important biases. Through the next few paragraphs I will talk about out of sample testing and how it needs to be done in order to ensure that results are not guided by any hidden factors.

It is certainly a truth that strategies developed – even across 10 or 20 years – can fail on new market conditions since they are not able to withstand changes in the way in which the market behaves. In order to ensure that strategies have the potential to survive it therefore becomes vital to carry out stress tests that can tell us how a strategy behaves when it is not being tailor-made (optimized) to a certain set of market conditions. The idea here is to develop a strategy across a certain set of market conditions and then run it across a set where no optimization has ever taken place.

The classic approach to out of sample testing is quite straightforward. Take a system you have optimized from period X to Y and then run it from Y to Z, then evaluate the characteristics of the system on the Y to Z region in order to evaluate whether or not it passes the test. You might think that any strategy which passes this test is “robust” but it can in fact it depends on how the strategy which passes the test is originated. Any strategy evaluated in an out of sample fashion must simulate the exact same process you would do if you were to trade the system forward. This means that you should run an optimization, take the best result and run an out of sample test. If the result fails then the whole trading logic has no value and no other optimization results are used.

The problem is that if you start “picking” results to get those that perform well in out of sample testing you are effectively introducing a strong selection bias which is equivalent to running an optimization of the full testing period. It makes no sense to out of sample any but the best optimized results of the “working period” since in real life you wouldn’t be able to “go back” and “cherry pick” the best system for use. A strategy is therefore robust when it manages to survive to many years of out of sample testing or – better yet – when it is able to survive to many years of out of sample testing across many different instruments.

Another interesting factor is that using a straight X->Y->Z approach doesn’t allow for the best out of sample testing solution since you are effectively also introducing a selection bias against the Y->Z period. Although this selection bias is not that bad you can obtain better results if you distribute the out of sample period length randomly within the tests in such a way that no particular set of market conditions is evaluated as an out of sample period. So for example if I wanted to run a 20 year test with 10 years of optimization and 10 years of out of sample testing I would choose 10 years at random for optimization and then do a test of the 10 out of sample years after that. This means that any strategy developed will be inherently more robust as it is out of sample tested across non sequential market conditions.

Out of sample testing is definitely one of the most powerful ways in which strategies can be validated, giving the user the peace of mind to know that the strategies have been stress tested against actual market conditions it had “never seen”. Does this mean that out of sample tested strategies cannot fail ? Certainly not, out of sample testing merely ensures that a strategy was able to survive outside of its optimization period and therefore it hints that it can survive changes in market conditions without losing its mathematical expectancy. Remember that we can never guarantee future results but out of sample tested strategies offer us a higher degree of robustness in the sense that they have been known to have already done what we want them to do, survive.

If you would like to learn more about my work in automated trading and how you too can develop your own strategies based on sound trading tactics please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

Print Friendly, PDF & Email
You can leave a response, or trackback from your own site.

2 Responses to “Out of Sample Testing: The Power and Correct Execution of This Simulation Technique”

  1. Stefan says:

    Hello Daniel,

    you are completely right, all that is left is to apply this knowledge in Asirikuy.com. There you still do not distinguish between the statistics of the optimisation period and the test period. Also distributed test periods are not yet used. Would be great if the above insights would also be fully applied at Assirikuy.com. Should also be easy to extend the analysis tool for distributed test periods. See also my post on the forum.



    • admin says:

      Hello Stefan,

      Thank you for your comment :o) Yes, you’re right in that several of the above insights are yet to be applied in Asirikuy mainly due to the limitations of our current testing implementations. Right now we are working on powerful C++ solutions for system evaluation which will allow us to implement all these insights into our trading system development. It is certainly very exciting to see all the potential that you can get when you combine the above mentioned walk forward insights with the potential of genetic programming and multi-instrument simultaneous evaluation. What I am trying to say is that we aren’t there yet, but we are working on it :o) Thanks again for your comment,

      Best Regards,


Leave a Reply

Subscribe to RSS Feed Follow me on Twitter!
Show Buttons
Hide Buttons