Reality Check: Why there is no such thing as a true “out-of-sample test” when using historical data

January 27th, 2014 30 Comments

When facing the large amount of possible biases derived from the construction of trading strategies one of the most common solutions is to try to test systems on “unseen data” – also known as out-of-sample data – in order to obtain measures of performance that are independent from the data used to construct our trading strategies. However, there are many issues with the use of this “out-of-sample” data, particularly because what we perceive as unseen data never fulfills the conditions needed to occupy this role. Through the following paragraphs I am going to explain why there is no such thing as “out-of-sample” data and what this means for strategy development and testing. We will go through the conditions needed for something to be considered truly “out-of-sample” and why historical data can never fulfill this role, something that becomes even clearer with the use of modern computational methods for trading system generation.

What is the objective of an “out-of-sample” test? The real objective of this type of test is to serve as a proxy for live trading. You have generated a strategy using some data and you then want to know what would have happened if you had taken the decision to trade this strategy live. The idea is that if your strategy falls into any type of testing bias the out-of-sample test will reveal it by making your strategy fail bluntly. If you test your strategy under unseen conditions before live-trading you then have in theory a better chance of withstanding new market conditions because you have made sure that failure is not due to any bias within the generation process. The objective of the out-of-sample is to provide us with a live-test before a true live test, a verification that our development has followed the right path.

–

If you want to have something that is a proxy for live trading then it needs to fulfills the fundamental characteristics of a live trading exercise. The data needs to be unknown (no crystal ball) and the test can only be carried out once (no redo). These are the two main characteristics that make an OS test using historical data impossible. Historical data is both known and allows for historical repetition, meaning that you can use it as many times as you want and you already know what it’s like. This introduces all the inherent biases related to data snooping and data-mining, even if they are not literally within the simulation runs they are within the entire process because of the fact that the data you’re using is historical.

Let me explain myself a bit better. Suppose you want to create a system for live trading and you have data from 2000 to 2014. You then decide to use data from 2000-2007 and then data from 2007 to 2014 to validate your system. You create a system with great 2000-2007 results and then the 2007-2014 results are bad, so you go back to the drawing board. You do this until you have a system that is optimized from 2000-2007 and still works great from 2007-2014. You’re very happy that you finally have something that was designed with 2000-2007 data and worked in a 7 year OS period. However you shouldn’t be this happy because your process made sure that the OS period became part of the in-sample analysis as it was used repeatedly until something that “worked well” was found. When using any OS technique you will always do tests and modifications until the end result is profitable, regardless of the system or testing scheme complexity.

The issue here is that when the possibility to use an OS more than once exists you have the chance to perform as many trials as required to find a technique that works on that OS — your process is fitting a system to a historical period through trial and error and you cannot do this with live data. This applies to walk forward optimization (WFO) techniques as well, the OS periods are still historical data and they are still reused but the fitting exercise is simply more complicated because you have to find something that works well across many of these tests. Having a system that works on WFO simply means that you have found a strategy that succeeds at a more complicated fitting exercise but it is still as biased as any other system developed with historical data because you cannot remove the bias from a data set which is known and can be reused. It is also unsurprising that many systems with limited degrees of freedom that worked on long term tests also survive WFO analysis, this is simply because all the OS periods of the system were previously introduced as well in the initial design when the system was tested across the whole landscape. When using WFO you will perform variations and repeat tests until you obtain better results, just as you would do if you had a single OS.

–

Algorithmic strategy creation exercises better show these dangers. Using a system generator you can create and test different strategies until you find some system that is able to work under WFO conditions. Given any symbol and system there is a complicated enough set of rules that will enable a successful WFO to happen (even more so if you also take into account that the WFO process has rules that can be optimized as well, such as window sizes, anchoring, etc). This does not mean that your system is better able to survive or adapt to future market conditions successfully, it just means that it was able to fit a more elaborate testing scheme on past historical data. There is actually not a single piece of actual real quantitative evidence I could find that shows that a system developed to give a profitable historical WFO is actually better able to generate a profitable live trading result than a system that was developed to give profits through the whole back-testing period above data-mining bias. If you do know this evidence, please post a comment with it so that I can include and comment on it. I can tell you that I’ve personally seen systems with 20 year WFO successful tests fail bluntly under live trading conditions.

Any historical data cannot be an out-of-sample, because all historical data is known and can be used an infinite amount of times. You can always modify and mine systems or testing processes long enough to obtain the results you want and the only factor that limits the amount of profitability/complexity you obtain is the power of your data mining tools and the complexity of the testing schemes they can accommodate. However every scheme you can come up with suffers from bias problems inherent to the use of historical data that cannot be removed by doing more complex mining exercises. In the end what you’re dealing with is the inevitable premise that the future is unknown and that no matter how much you develop on historical data, you cannot generate certainty against a future that – as a true out-of-sample – is really unknown and can only be traded once.

However this does not mean that everything is lost ;o) It simply means that you should consider that OS exercises do not remove the biases inherent to historical testing and that IS/OS tests can simply fool you into deeper complexity that might not relate in any way with the future or your ability to profit from it. My advice is to develop systems that perform well above your data-mining bias within your historical data (real historical inefficiency) and then make sure that your strategies give back as little as possible (take advantage of luck) and are easy to fail-detect (easy to know they stop working from a statistical point of view). Highly historically profitable and linear strategies with good position management techniques often fulfill these criteria. What you want is to know that your strategy exploits a real historical inefficiency, that it won’t give back when it gets favored by being lucky and that it will be easy to know when it fails.

If you would like to learn more about strategy development and how you too can develop your own trading systems with algorithmic generation methods please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

Posted in Articles | Tags: eye openers, system development

You can skip to the end and leave a response. Pinging is currently not allowed.

30 Responses to “Reality Check: Why there is no such thing as a true “out-of-sample test” when using historical data”

Fd says:

January 27, 2014 at 9:24 am

Hi Daniel,

I agree that it is most desirable to have unlimited data available, however this happens not very often in the real world. – In a data analysis course I had the task to infer the activity of a user (sit, walk, run, etc.) with a mobile phone from the acceleration data of the built-in gyroscope. There were more than 100 explanatory variables to be sorted out whether they should be included in a suitable model or not. Although the amount of data available for analysis was limited (I had no phone of the same make to produce more data as needed), a good model could be found. This had been judged by dividing the data into training, test and validation sets – usual procedure. As long as your sets are sufficiently large, it does not matter whether data is limited, and you can build valid models.

The main question to be answered is: is there a good reason to believe that the variables used to explain the data are suitable to build a model? In case of using the acceleration data to infer user activity this reason is evident – in case of indicator values derived from past price movement to predict future price movement doubts are prudent.

If we would know for sure that at least some of the indicators or measurements of previous price movement we are using for constructing a prediction model are indicative for future price movement (i.e. they have predictive power) we would have no issue with limited data: we would find a suitable model (read: trading strategy) using IS/OS cross validation.

Obviously in trading no one can know all influencing factors and their ever changing weighting. Even if we had unlimited data available there would be no guarantee that we could come up with a valid model that would provide us with an never ending revenue stream. From my point of view, our problem is neither a bias nor a data quantity issue.

It all boils down to the questions:
Why do you think that the market you want to exploit is predictable?
Why do you think that the market will be predictable in the future?
Why do you think that you are able to build a suitable model?

The answer to these questions will influence your selection of inputs for model building and the type of model you will eventually come up with.

Best regards,
Fd

Reply
- admin says:
  
  January 27, 2014 at 11:07 am
  
  Hi Fd,
  
  Thanks for your post :o) The difference between the market and your example is that your example is of a clear and deterministic nature. You are not playing with an evolving time series but you are simply trying to find something that can accurately describe a real world phenomena with a deterministic nature. When you’re describing a deterministic phenomena that does not change, things are easy. Using inference from an in-sample set and doing cross-validation is good enough. If you’re above your data-mining bias the relationship you find is most certainly true because the underlying nature of the phenomena never changes.
  
  In the case of financial time series – as you say – you cannot know if any relationships are ever-lasting because the relationships are ever-changing, so any inference from IS/OS analysis can indeed be flawed. Moreover, the fact that you have freedom in your manipulation of data and variable generation allows you to find any number of desired correlations. The problem here is also that the system generation problem has potentially infinite degrees of freedom and you can always find solutions that fit your IS/OS historical samples that may never work under live trading conditions. What I’m trying to say in this post is that you cannot have an out-of-sample proxy for live trading using historical data because you can always fit something upon repetitive testing and process changes. Therefore the value of any IS/OS division/testing technique is very limited because the procedure always has bias related to the known nature of the financial time series and the ability to tweak-and-repeat as much as needed.
  
  Obviously, the questions you ask are very important and these determine fundamentally how you tackle the problem. However – regardless of how you build systems – the fact that you will always fit your selection until you reach a desirable OS result across a single OS or several IS/OS pairs is obvious (because you want to have tests that are as good as possible before live trading). Therefore there is no true out-of-sample, no true way of live testing before actually live testing. So use the full IS in system development, you’ll always do it anyway.
  
  Thanks again for your comment Fd :o) It’s always good to see you around!
  
  Best Regards,
  
  Daniel
  
  Reply
Carmine says:

January 27, 2014 at 12:05 pm

Hi Daniel,

what had worked in past may not work in the future because maybe we employ a deterministic system that “does not understand the markets”
How do manual traders to operate without having enough statistics based on past years on their manual system?..and meantime he is trading with confidence and success..
Simple, they’re using a system that “understands markets”
I am reminded a simple but powerfull manual system based on crossover of moving averages that work taking into account the breaking of counter-trendlines (something used by advanced traders like livermore and Donchian) this system works well and will continue to work over time because it “understands markets ” The question arises: can any EA tell when a counter-trend line is broke and open a position taking into account that there is a key support or resistance that need to be monitored?
I agree that statistical data are an important assessment tool for envaluate any system, but we should aim to create a system that takes into account the more “understanding markets” and then favorable risk:reward .. and I am convinced that the good results then come over the time because basically there are some aspects that will never change in the markets (pull-back, counter-trendline break, divergences .. etc. ..) statistical aspect can be very usefull to decide whether a system is more or less prone to fail with a certain type of money / trade management with respect to another.

Thanks for your efforts, kind regards

Reply
- admin says:
  
  January 27, 2014 at 12:50 pm
  
  Hi Carmine,
  
  Thanks for posting :o) Your view has survivorship bias because you’re not taking into account all the discretionary traders that have failed, eventhough they were successful for a while using a method that worked in the past. As I have said before, there is no such thing as a higher chance of success from a system that “understands the market”. I’ve seen many discretionary traders fail after years of success because their method stopped working, it worked very well for years, then it just didn’t anymore. How do you know if a system “understands the market” ? Past success of any kind is no guarantee of future results. There is no “ever present” market characteristic that cannot change. Something can make a lot of sense (like Donchian channel breakouts did in the 70s and 80s) and then things can change drastically.
  
  I would like to stress that there are no such things as “ever present inefficiencies”. All systems do fail, regardless of their origin. Discretionary traders fail, algorithmic systems fail. All systems can and will fail. Your biggest hope is to know failure quickly and to be able to exploit an edge while you have it. However it is very dangerous to think that you just need a method that “understands the market”, such a method – whichever you think that is – – is also prone to failure. The market has no characteristic that is immune to change. Thanks again for commenting :o)
  
  Best Regards,
  
  Daniel
  
  Reply
Carmine says:

January 27, 2014 at 1:55 pm

Hi Daniel, my humble opinion is that your point of view regard to market changes is too dastric ..
So, basically you’re putting in doubt that even the areas of supply and demand will cease to be respected? Are you questioning one of the BASIC rules of any market in the world even outside the “trading game”?
Personally i firmly believe that concepts like supply/demand(SUP/RES), momentum and formation of indecision candles (which the Japanese used for rice trading already 3000 years ago) to key levels, are simply the basic nature of any market that can not stop working.
If you think that these concepts will become obsolete then I guess you’ll be contradicted by the majority of serious trader and economists out there.
So wolfe waves (natural occurrences wave of the market) will stop working? I believe that your vision is too “algorithmic” .. lol

kind regards,

Reply
- admin says:
  
  January 27, 2014 at 2:53 pm
  
  Hi Carmine,
  
  Thanks for posting :o) I don’t think that these concepts are going to “stop working”, as concepts. You will always find some interpretation of these concepts that works under any market. However another thing pertains to hard algorithmic rules. Do you have any statistical proof that some of these “basic rules” have derived into algorithmic rules that have been constantly profitable within the market? There is none that I know of (correct me if I’m wrong!).
  
  The reason why these concepts “always work” is because they are never put into some hard definition but are always labile and subject to interpretation. As soon as you put them into something practical that can be traded algorithmically (as the turtle traders did with breakouts) you find yourself within a failure-prone situation. Anything you put into algorithmic terms can and does fail eventually, concepts never fail, because they never have clear and concise definitions. Some discretionary traders adapt their definitions of these concepts to new markets and survive, others don’t and fail. The same goes for algorithmic traders, some develop new systems to tackle new markets when their old systems fail, some don’t. I hope this better explains my point :o) Thanks again for commenting,
  
  Best Regards,
  
  Daniel
  
  PS: Try doing a statistical analysis of these “indecision candles”, I’ve done so in the past and the results are really interesting ;o)
  
  Reply
Gary Antonacci says:

January 27, 2014 at 3:29 pm

you make some good points. So many have the wrong idea about OS validation if they bother with it all. Here are some things I like to see to have more confidence in a back test:
a) Does the method make sense? Is it in tune with what I know about the markets?
b) Does the method work on unrelated (as much as that is possible) markets?
c) Is there a long back test history so I can see how the method holds up under various economic conditions.
d) Can I get a hold of data that wasn’t available to me or anyone else previously? Usually this means waiting for fresh real time data to build up, then looking at the method again later.

Reply
- admin says:
  
  January 27, 2014 at 11:13 pm
  
  Hi Gary,
  
  Thanks a lot for your comment :o) Some really good points indeed!
  
  Best Regards,
  
  Daniel
  
  Reply
- Ryan says:
  
  January 28, 2014 at 2:14 am
  
  As a systems developer I am well aware of the issue that you raised in your blog post. I would like to reiterate the point of testing your system across multiple markets as means to reduce the possible OOS bias.
  
  If I design an equities based system in the ES, I want to see it work with NQ, YM, TF and the S&P ETF sub sectors. Additionally, I will look at foreign equity markets for confirmation e.g. DAX, FESX, ASX200 etc.
  
  Finally, I will use statistical tests e.g. t test, monte carlo analysis etc, as validation that my system is working ‘live’ in comparison to the best test.
  
  Reply
  - admin says:
    
    January 28, 2014 at 11:02 am
    
    Hi Ryan,
    
    Thanks a lot for your comment :o) Sure, but if something doesn’t work across all these markets you will go back to the drawing board, modify your systems/process and try again until you find a system that works in the OS across all the symbols you are testing for. It is a more complex fitting exercise but doesn’t remove the problem that the OS eventually becomes an in-sample period because you will attack it until you find something that works on it. Don’t get me wrong – it’s great to have something that works across many symbols – but this doesn’t make your OS anymore like a real OS because you will have the same problems (you can use it as many times as you want and you know it in advance).
    
    I would be interested in hearing more about your validations. How do you use Monte Carlo analysis to validate your system? Thanks again for your comment,
    
    Best Regards,
    
    Daniel
    
    Reply
    - Ryan says:
      
      January 28, 2014 at 9:00 pm
      
      If you build a system based on ‘sound’ logic with a ‘limited’ number of parameters, you’re going to be hard pressed to find an over optimised system that also works using a multi-market approach. Unfortunately for newer traders, there is a part of the development process that relies on experience and having an understanding of the market that you’re trading.
      
      SIM trading or small position size trading and carefully tracking and ‘watching’ the systems performance in a live trading environment can add a lot of value. This does make the assumption that you will have at least 40-50 trades a year and you’re willing to maintain discipline and track at least 20-30 trades, while gaining a comfort level with the system, prior to increasing size.
      
      I use a number of metrics to analyse ongoing performance as means of system validation. I had mentioned previously the simple to use t test. In addition I use a monte carlo product called Market System Analyzer which will create a ‘prediction envelope’ over X number of most recent trades and assign a % significance as means to assess ongoing system performance. I would also recommend Howard Bandy’s website and books for those interested in the system creation process. Bandy does discuss the issue that you have raised in your blog post.
      
      Reply
      - Bob says:
        
        January 29, 2014 at 6:43 am
        
        If your system is fitted Monte Carlo of the type you suggested will do nothing for you but just confirm the fact that your system is fitted. Those products do not perform a true MC. A true MC would involve the returns of every bar including flat periods. An optimized system by definition will show high significance when using such an analysis and this is more misleading and dangerous than anything Daniel has mentioned here. Not only this type of analysis does not deal with curve-fitting but it often misleads traders to abandon good systems and accept bad ones.
      - admin says:
        
        January 29, 2014 at 10:59 am
        
        Couldn’t have put it better Bob, good insight :o)
- Carmine says:
  
  January 28, 2014 at 11:07 am
  
  Hi Gary ,
  
  you’re absolutely right on your points, but you are even more right to point a)
  
  The hard conclusion I have reached is as follows:
  1) With computer resources available today, there is certainly a way to build a RELIABLE automated system that “understands markets” that will not fail over the time under some type of human supervision.
  2) An EA who really understands the market ( maybe under a human supervision like a hybrid system , for example ) will never be released in any public place even paying huge money!
  3) There is only one way to accomplish this: build it yourself by combining all your manual trader experience (if any)with algorithmic building experience (which unfortunately I do not have and then i will continue with my “stressfull” way with manual trades)
  4)On any public place even for a fee you will never find a system that “understands the markets ” because those who have developed it he keeps it tight and do not even sell for $ 100,000 (perhaps 1million€ yes lol)
  If an honest trader put this on hand with a honest fee, with all the right explanations and implication and this worked too much over the time, he would not sell any more membership of other products that he will sell, and its business will stop as everyone would stop to the product that works (there are also agreements between several system’s seller who are affiliated with each other … milking the poor traders who hope in the honest product.
  This is a bit like the oil industry that does not want to advance alternative energy otherwise it would lose its value and meaning, or like a pharmaceutical lobby that despite having found the ultimate cure of a disease will not EVER get out the drug as could no longer sell drugs on milder effect that pay off forever.
  Unfortunately, the hard reality is … we are in a monetary system and there is no charity :(
  
  Regards,
  
  Reply
Carmine says:

January 27, 2014 at 4:50 pm

Hi Daniel , correct me if I’m wrong but the last part of your answer I notice a subtle irony .
Obviously it all depends on how you conducted this your statistical analysis , then it makes no sense your irony , in fact I could say the same about you that you have expressed in this case a labile and vague discourse on indecision candles.

Let’s be serious :
– indecision candle must be chosen at KEY level (all imaginable types Sup/Res as trendline, counter-trendlines, relevant Fibonacci levels, horizontal sup/res levels itself, pitchfork etc..)
-indecision candle should be considered only for H4 timeframe and higher (at most H1)
– indecision candles should be considered only in the direction of the dominant market ( wave analysis: higher high, higher low), and this analysis must be done in a top-down approach from the weekly chart
– indecision candles must be confirmed with possible hidden/regular divergences

Let me give you an example:
Mister Market form a butterfly pattern on the daily chart that show a probable reversal of a bull market , we now see a strong doji/pin/engulfing/etc.. appear and we realize that this is also a regular divergence on MACD/OSMA/RSI/STOCHASTIC/etc.. we realize that everything happens at the 61.8 fibo level of last bearish wave on the higher TF where the wave study confirm the general direction of the market , do you think this setup is something that has no relevance or can go and disappear over the time ?

I hope to make the idea , cheers

Reply
- admin says:
  
  January 27, 2014 at 6:09 pm
  
  Hi Carmine,
  
  Thanks for your answer :o) Sorry if it sounded ironic, I didn’t mean it that way. What I’m trying to say is that I invite you to put whichever criteria you have for indecision candles into an algorithmic code that can be traded, then see how well it fared historically. You’ll see that your success depends on how you frame these concepts (how you define “dominant market”, how you mathematically define “key levels”, how you define a “strong doji/pin/engulfing”, etc). There will always be an indecision candle concept that “works” but it will never be the same across different markets. The exact mathematics will change and this exact mathematics will be subject to failure, they will always eventually fail because the market’s precise definition of what constitutes an “indecision candle” will change.
  
  There will always be someone successfully trading “indecision candles” but who is successful will depend on what interpretation of this concept is currently working. What exactly works – which precise mathematical definitions – can and always changes. I hope this clears it up a bit :o)
  
  Best Regards,
  
  Daniel
  
  Reply
Michael says:

January 28, 2014 at 4:46 am

So, you have a testing sample of 14 years. You take the first 7 as a “training” set.
You can also create, say 100 variations of the next 7 years as an OS set by performing permutations of the OHLC and saving those off for OS, WF tests.

Can’t find a good link right now, but Jaffray Woodriff (interview in Hedge Fund Wizards) discusses this a bit. Of course, there will be better and worse ways to construct the artificial series on top of a known one, but a bit of common sense (don’t create lots of back to back gaps in price) and a bit of creativity (create wide scale of volatility in the variants – perhaps classifying them by degree of volatility) should be quite useful.

Also a potentially excellent basis for more ML exercises to derive the algo patterns from a basket of series more grounded in reality (past actualy series) rather than completely artificial random walks.

Best,

Reply
- admin says:
  
  January 28, 2014 at 10:58 am
  
  Hi Michael,
  
  Thanks for your comment :o) The question of using artificial series is a good one. Does using artificial series provide any guarantee of a higher probability of profit in live trading? Is this justified? An artificial price series might be something completely alien to the real price series. By shuffling the OHLC you can create series were the autocorrelation features of the range are lost or hourly seasonality changes completely. You cannot shuffle the OHLC and preserve all price characteristics because you also cannot know how future series will look like. Is being unprofitable on artificial series that lose some characteristics inherent to the real price series a guarantee of a higher probability of failure? These are all interesting questions! Thanks again for commenting Michael,
  
  Best Regards,
  
  Daniel
  
  Reply
- Bob says:
  
  January 29, 2014 at 6:47 am
  
  It would be good to realize that his method has not help his fund to perform well in the last few years. Check it out. Last two years negative returns in a row.
  
  Reply
  - admin says:
    
    January 29, 2014 at 10:58 am
    
    Excellent point Bob :o)
    
    Reply
Hans says:

January 28, 2014 at 3:41 pm

Exactly my thoughts Daniel.

OS is useless and I never was a fan of it. What I am doing is that I have created a EURUSD system that was developed on data from 1999 to 2012 initially (5M timeframe) and then steadily adapt it to new market data weekly as it arrives at weekend, but always optimizing the system as a whole, always from 1999 to date.

This is pure curve-fitting, no question, but the system has created great profits last year and this year too so far. This is the only real working approach for me after many IS / OS crap in the past that never worked live. This is the first time it´s profitable now in live trading through the steady adaption but always keeping the complete data in the optimization.

Just offering some real trading experience here….

Reply
- admin says:
  
  January 29, 2014 at 12:57 am
  
  Hi Hans,
  
  Thanks a lot for your comment :o) What you say makes sense, it matches similar experiences I’ve had with live trading. Systems that are stable and perform well above data-mining bias through a significant period of time tend to do well on live trading. High stability, low market exposure and a position management that tries to avoid giving back profits are some of the most important characteristics I’ve found are relevant. Thanks a lot for sharing your experience and for reading my blog. Keep the comments flowing :o)
  
  Best Regards,
  
  Daniel
  
  Reply
Rodolfo says:

January 28, 2014 at 9:39 pm

Hi Daniel,

after years spent in asirikuy to swing from basic IS/OS, rank analisys, walk forward and a lot of statistics to evaluate past results we come back at the starting point. It reminds me of a literature piece at school:

“We shall not cease from exploration, and the end of all our exploring will be to arrive where we started and know the place for the first time.” – T. S. Eliot.

So, the conclusion we can draw from all the evidence we collected through our journey in trading is that the best way to backtest strategies to obtain their parameters is to use all the historical data available.

In the light of this discovery (i.e. OS is meaningless) shouldn’t it be right to recalculate parameters for all asirikuy systems we’re trading live ?

Then, after this exercise it should be interesting to evaluate how our EAs should have behaved in 2013.

Best,
Rodolfo

Reply
- admin says:
  
  January 29, 2014 at 1:01 am
  
  Hi Rodolfo,
  
  Thanks a lot for your comment :o) Well, you do say some truths here (a beautiful quote by the way ;o)). I feel that after exploring many different options we are now back to a place similar to where we started but with a much greater understanding about why it makes sense. Additionally we now know about data-mining bias and other problems so we can now do this in a more proper manner. You do pose an interesting question regarding our systems and it might be worth it to reoptimize them and see if there is a better set of parameters covering our whole data. Now that we have the NST we can do this quickly and we can also optimize for things such as linearity which we couldn’t do in the past (with the limitations of the MT4 tester). Thanks again for commenting, it’s always glad to read comments from long term Asirikuy members :o)
  
  Best Regards,
  
  Daniel
  
  Reply
Bob says:

January 29, 2014 at 6:50 am

Daniel,

BTW did you delete a post of mine or some server error happened? Regardless this post is a major accomplishment of yours (and ours).

Reply
- admin says:
  
  January 29, 2014 at 10:58 am
  
  Hi Bob,
  
  I haven’t deleted any posts from you, perhaps it was a server issue? If the spam filter deletes a post for some reason always feel free to re-post (although your username and email are cleared for posting without screening so spam filters shouldn’t be an issue).
  
  By the way, thanks a lot for all your comments, they really helped me come up with the way of thinking that finally concluded with this post. I agree, this is a major breakthrough for all of us. Thanks a lot for reading and sharing your thoughts with us :o)
  
  Best Regards,
  
  Daniel
  
  Reply
Tony says:

November 30, 2014 at 11:35 pm

Hi Daniel,

Market data changes every several months, so making 12 years of back-testing unreliable. However, have you considered very short back-tests, for instance a 2 month back-test, then trade for a month. Or, even maybe test for 3 weeks, then trade for 2 weeks!

If one was to find the average length of change, this could be done. Or, even the minimum time the market is likely to remain in that gear. So, if we know that the market is very likely to stay the same for at least 4 months; then we test for 3 months and trade the 4th.

Bit cheeky, I know!

Reply
- admin says:
  
  December 1, 2014 at 12:57 pm
  
  Hi Tony,
  
  Thanks for writing :o) Sadly things are not that easy. The market does not change like a sine function with a fixed period. This means that the market does not change “every 2 months” or at any regular pace. The market can behave in one manner for 2 years, then change completely in a single month or it can behave in some manner during two months and then change completely during the next two months. When you optimize using a small period of time you introduce a very heavy curve-fitting bias, your system is extremely well adapted to a very narrow set of market conditions. By making sure that a system works across a long period of time (12-25 years) you reduce curve-fitting bias and allow your system to work under a much more varied set of market conditions. Granted, the market can still change and render your system useless (change to something unseen during the past 25 years) but you usually get enough productive time before this is the case — at least in my experience. If you believe that current market conditions are more important then you can simply give them more weight within the optimization process, while still taking into account long term conditions (think like an exponential weighting function). Thanks again for commenting :o)
  
  Best Regards,
  
  Daniel
  
  Reply
Rafael Del Rey says:

October 3, 2016 at 9:07 pm

It’s been almost 3 years since you pubished this article. Have you evolved into some kind of better way to avoid data mining bias? Maybe System Parameter Randomization (by Dave Walton). Btw, I have just joined Asirikuy, but didnt have time yet to explore its forums.

Reply
- admin says:
  
  October 4, 2016 at 12:59 pm
  
  Hi Rafael,
  
  Thanks for writing and joining our community. This type of approach (parameter randomization) only works when parameters represent very correlated variations of the same system, meaning that when changing a parameter does not represent and entirely new system but a variation on the same system. When studying simple cases like MA crosses varying things like periods this generally works but it losses theoretical ground when you put it against a truly general case.
  
  For example imagine that you had an MA cross system that worked very well when doing parameter randomization but you now consider the indicator used as a parameter and you try varying the indicator between 20 different indicators, you will certainly see failure across most of the parameter space, this is because these parameter variations now imply complete system variations (variations in parameters create completely different systems) but yet this does not mean that your previously found MA system is less robust. When you go into system mining using unrestricted spaces the idea of parameter randomization no longer applies.
  
  Thanks again for writing, feel free to comment on the forums as well :o)
  
  Best Regards,
  
  Daniel
  
  Reply

Mechanical Forex

Trading in the FX market using mechanical trading strategies

Reality Check: Why there is no such thing as a true “out-of-sample test” when using historical data

30 Responses to “Reality Check: Why there is no such thing as a true “out-of-sample test” when using historical data”

Leave a Reply

Recent Posts

Archives