Long term profitable back-testing: No guarantee of future profitability

When we want to trade the financial markets we need to design some sort of strategy that is able to give us an above random chance probability to succeed — a trading edge. In order to solve this problem the first impulse is to look at trading mechanisms that have succeeded in the past and then attempt to trade this exact same techniques going forward in order to generate similar profits. We commonly refer to this simulation process as back-testing, where we evaluate how a trading strategy behaved in the past and then we “tweak” and modify our system in order to increase its historical profits. Today I am going to talk about some truths behind back-testing and why there are several important pitfalls surrounding this trading premise.

Back-testing is a tool that allows you to evaluate the historical performance of a trading strategy. Supposing that your strategy is not significantly vulnerable to historical spread variations or slippage – or you can evaluate these accurately in some way – we can say that the representation you get is a good proxy of what you would have obtained if you had traded the strategy for real in the past. The idea therefore seems pretty straightforward, generate some system that performed significantly well for as long as possible into the past and this system should work in the same way going forward. We make several important assumptions such as: the market will behave in the future as it has in the past, the longer we back-test into the past the more market conditions we cover, etc. However, there are some important fallacies here.

Does the market really behave as it has in the past? Does having a strategy with long term profitable back-testing guarantee profitability to any degree (does it even increase our chances beyond random chance?)? Does having a 13 year back-test give you a higher chance of profit than a 2 year back-test?

13-09-2013 19-01-10

Let us suppose we are designing a strategy in the year 2001 to trade the EUR/USD, we use data from 1991 to 2001 from the DEM/USD  (10 years) and we are going to trade it for the next few years. We design our system to be extremely stable (very linear equity curve without compounding) and we also ensure that drawdown periods are limited and that the Ulcer Index has a reasonable value. We expect to be making at least a 10% yearly profit for each 10% maximum drawdown, we want something that has worked really well in the past and we put it on a live account with the expectation to make money going forward.

The above picture shows you what could have happened. You have a system that looks very well on paper and then you have that the system went for a straight and deep drawdown right after we started trading. However the picture below shows you a case where we have the entire opposite scenario. We have a system that for all intents and purposes is statistically similar to our first system within our design period 1991-2001, and then we have rather similar performance during the next 13 years, up to the year 2013. We have two systems where both have 10 year long profitable back-tests and then we have that one collapses immediately after we start trading it while the other is profitable all the way to the present.

What is the difference between them? Many of you would be tempted to perform elaborate statistical analysis to attempt to find differences between them so that you can say: “system A failed because of this and this while system B didn’t fail because of this and this”, in reality you can find no definitive answer (as least I haven’t), there is always a probability for a system to fail bluntly, regardless of its past statistical characteristics. You cannot therefore say something along the lines “use a system with Ulcer Index below X and you will surely be profitable”, there is always a chance that a system will fail, and this chance isn’t small at all.

13-09-2013 19-31-32

I would also like to make it clear that the “out of sample” testing, some people use before live trading (design a system using 5 years, then out of sample on 1, for example and only trade it if its similar), is a complete fallacy, because by evaluating a system on this “out of sample” period (by making a decision to trade it or not based on its characteristics during this period), it has already become an in-sample evaluation period. A similar note can be applied to the walk forward analysis crowd, a system where parameters are varied systematically already has a fixed logic designed in hindsight to have worked in the past. The only thing this establishes is whether a trading technique varies significantly through its parameter space, if the market changes outside of this, the strategy is doomed regardless of any attempts to adjust its parameters. Varying a strategy’s parameters does not constitute a more realistic approach to profits under real market conditions because the strategy is known through the whole test.

So if there is always a chance for a strategy to fail – despite brilliant back-tests and even walk forward parameter analysis – and this chance isn’t small, how do we trade profitably under unknown market conditions? How do you know what performance to expect and how do you choose the system(s) to trade ?  This is in fact what I have been working on inside Asirikuy within the past few months, attempting to develop a quantitative idea about the chances of success under unknown market conditions with absolutely no hindsight, trying to design a methodology that minimizes the chances of failure under new market conditions.

What I am working on is a method that goes systematically through a mechanical system generation process in the past, repeating this process in order to generate new strategies based on different desired in-sample characteristics, then gathering statistical information about their performance under unknown market conditions. This research has generated significant data about the likelihoods of success for trading strategies in different symbols/timeframes and the periods lengths that have created the best results. It is a type of X-ray of the market, it doesn’t give us information merely about how a system would have performed in the past (which is back-testing) but it tells us the chances of success we would have had in the past if we had mechanically generated systems using given sets of criteria. It goes beyond walk forward analysis and other such techniques. It tells us what the best number of systems to pick is, how it is best to pick them and what sort of real performance we would expect (not the delusional back-testing profits we’re used to seeing).

With this type of analysis you can learn for example if it’s better to generate a system using a 1 year back-test or a 10 year back-test and if it’s better to choose a system with a low Ulcer Index or with a high profitability. You can also see how frequently this type of systems failed in the past (what percentage and how deep failure was) and if there are any evident correlations that might help you reduce the chances of this failure happening. You can also calculate your possible chances of failure for portfolios and you can also see how different symbols and time frames differ. Is it better to generate systems on the daily or the 1H ? What are the problems/benefits of each time frame regarding performance under unseen market conditions? Was higher trading frequency better or worse? You can answer all of these questions!

This is a computationally intensive, massive on going work, but I believe it will take us to a whole new level of understanding within our community. If you would like to learn more about algorithmic trading and how you too can use mechanical system generation to develop trading methodologies  please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

 

Print Friendly, PDF & Email
You can leave a response, or trackback from your own site.

16 Responses to “Long term profitable back-testing: No guarantee of future profitability”

  1. Umberto says:

    Hi, I read your articles for 4 years and have been a subscriber to Asirikuy for 2 years. At the beginning all your work was oriented to make trading system in the long term, because only if an EA proved to make a profit over the past 20 years, could go into real hoping to make new profits.
    Then, obviously you’ve had a drop in performance of your trading system and have embraced the Walk Forward Analysis, which until then did not think useful, because it works on intervals of not long term.
    And today I read that a trading system in a long term profitable back-testing, does not guarantee future profitability!
    Also, until a few months ago almost all the work was on Mt4, today Mt4 is abandoned because it does not help the developer.
    This is definitely a long journey of knowledge of Asirikuy, but in my opinion this path varies and evolves over time without fixed points … When there will be a time of stability? fixed points that do not evolve and change over time and that they do not tip over?

    • admin says:

      Hi Umberto,

      Thanks for your comment :o) It’s a constantly evolving journey, our approach evolves as new things are discovered and some assumptions are proved to be wrong or inaccurate. There are something we learn that we never unlearn (we learn for example some approaches that in definitive don’t work), and there are some things we learn which we incorporate into a trading approach. It’s not like things are “constantly changing” but new findings are added to our knowledge base and the way in which we approach trading.

      Trading is an ever-evolving journey and our methods need to improve as we uncover new things. We will always change things as our understanding improves, this is just the way in which knowledge-building works. Thanks again for posting :o)

      Best Regards,

      Daniel

      • BL says:

        What is the most puzzling for me is that everyday right now and for years and maybe decades now, *some people* (a.k.a banks, hedge funds, managers, etc.) know how to make such systems work and are using them.

        Hopefully one day we will be able to compete with them!

        • admin says:

          Hi BL,

          Thanks for your post :o) Several Hedge funds use very similar tactic to us, in fact many CTA, hedge funds, banks are taking loses right now in their systematic trading division (see for example the barclay systematic trader index http://www.barclayhedge.com/research/indices/cta/sub/sys.html). The fact is that traditional alpha generation demands periods of extended drawdown (years), which are simply to difficult to bear for most traditional investors under a non-positive biased market.

          Any trading tactic we come up with will also have these shortcomings (there is no holy grail). My goal is to be able to reduce the probability of net losses as much as possible and come up with realistic measures of performance based on historical simulations of a whole trading methodology (system design, selection, trading, etc). Thanks again for posting!

          Best Regards,

          Daniel

  2. Bob says:

    “What I am working on is a method that goes systematically through a mechanical system generation process in the past…”

    Is this just introducing more data-mining bias since you analyze systems generated already?

    “With this type of analysis you can learn for example if it’s better to generate a system using a 1 year back-test or a 10 year back-test and if it’s better to choose a system with… Was higher trading frequency better or worse? You can answer all of these questions!

    You can only answer questions about systems already generated but this will provide no clues about future systems. This is like hindsight.

    I think you like most have hit the wall…”past performance is no guarantee of future success.” It is simple as that.

    • admin says:

      Hi Bob,

      We agree, past performance does not guarantee future results :o)

      The most you can do – which is what I am doing right now – is to analyze how true this actually has been for different symbols and time frames. For example, if you run a system generation analysis and you find out that on a Forex pair the chances of success of a strategy generated using 10 year back-testing data – under many different scenarios during the past 25 years – has been only 5% historically, would you trade this symbol using a 10 year back-testing method?

      Sure, this assures you nothing about the future – no one knows the future – but it does tell you that at least this has not worked historically. You can find strategies that are very profitable on any instrument/time frame, but only a few setups have been able to generate consistent results historically from a system generation process under new market conditions. Does this mean they will continue to do so in the future? Obviously I don’t know if this is the case (I don’t know the future) but at least I know for which setups this hasn’t been true and in all likeness it’s also likely to be the false going forward.

      There is clearly no way in which you can guarantee that you know something about the future, but at least you can know the success of the exact method you are using during the past. This is unlike simply designing a system with data for the past 10 years – for example – because by curve-fitting you can always generate profitable systems of N degree complexity for any testing period, however you cannot make a methodology successful which just isn’t. Either market conditions have been “stable” during they past or they haven’t.

      Thanks again for posting :o)

      Best Regards,

      Daniel

  3. JojoW says:

    Hi Daniel,

    I think the reason, why your first strategy wasn’t sucessfull was because, even if you have a profitable strategy, that was backtested on large chunk of historical data, you need in your strategy manage the risk, i. e. if it’s possible set the stop on breakeven or use trailing stop. Simple look on the equity chart tells me, that you are not cutting loses quickly a do not let the profit run.

    My question would be, if you have live traded this strategy with real money and at which point would you gave up upon this strategy.

    One positive message is, that your strategy Atinalla FE is stil profitable, although i don’t trade it yet live.

    Best regards,
    JojoW

  4. Daniele says:

    hello all!

    i’m Daniele from Italy, i’m new of forex trading, and i opened a demo account with dukascopy.i’m trying to test the EA named WATUKUSHAY_FE_VF1 but jforex does not recognize it. its becasue of the file extension? why is it different from the classic mq4? is there any hope for me to run it on jforex platform? thanks all for your help!
    bye!

    • admin says:

      Hi Daniele,

      Thanks for your post :o) This EA is meant to be run only on MT4. Our F4 framework and systems – capable of running on JForex – are limited to only Asirikuy members. I hope this answers your question,

      Best Regards,

      Daniel

  5. Daniele says:

    ah ok thanks! i will check Asirikuy site to learn more.
    thanks again, bye!

  6. David says:

    Daniel,

    A very thought provoking article, as always! I’ve been a silent follower of your blog since I was introduced to it a few months ago.

    Software development has been my career since 1979. I started studying FX in 2006, and since then have written more than 400 MQL4 indicators and EAs. I am ‘hanover’ on the Forex Factory (and other) forums.

    I acknowledge that, regardless whether one is a system or discretionary trader, there are no guarantees that the past will repeat itself in the future. That is a risk we must embrace. On the other hand, either we believe that we can gain some kind of ‘on balance’ edge from studying past behavior and patterns; or we don’t, in which case both analysis and trading are effectively futile. Of course all of this is merely stating the obvious.

    I’m not a member of asirikuy.com, and I’m not familiar with the specifics of your trading systems. They may already embody some of the concepts that I’m about to describe.

    I have reservations about the notion of building automated traders through the process of repetitive data mining and optimization. Instead, I believe that it’s better to first devise abstract trading ideas that seem plausibly robust, and only then use the available data to test them.

    Rather than applying more complex statistical math, or digging deeper into ways of crunching existing historical data, it’s my view that the best probability of obtaining system robustness is to begin with an understanding of the drivers that underlie price movement (more info here — http://www.forexfactory.com/showthread.php?p=4287632#post4287632 —). Some examples: macroeconomics, money flows, levels of supply and demand, orderflow, breakouts during times of high liquidity, central bank/heavyweight agendas, self-fulfilling prophecy, effects of high impact news, fading extreme overboughtness/oversoldness, trapped trader behavior, key levels observed by heavyweight players, session/time-of-day idiosyncrasies, the need to maintain triangular (and across-the-board) equilibrium (e.g. EURJPY = EURUSD x USDJPY). These are somewhat different to the conventional TA-based ideas of indicators, line studies, candle and chart patterns, etc; and of course the obvious questions are (i) how can we possibly gain access to some of this info from mere price charts — or, conversely, what types of discernible price patterns, if any, are created by these drivers — and (ii) in the case of an automated trader, how can these patterns be nailed down mechanically to allow an algorithmic solution?

    Anyway, for better or worse, that summarizes the current direction of my research. Forex Factory member ‘mim2005’ and I have been collaborating for almost 12 months now, and we have 3 EAs that have backtested profitably using a couple of the principles that I described. If you’re interested in obtaining more info and/or following their performance, here’s a link — http://www.forexfactory.com/showthread.php?t=448350

    Best wishes, and many thanks for your amazingly comprehensive contribution to the field of automated forex trading.

    David

    • admin says:

      Hi David,

      Thanks for posting :o) I’ve seen both types of strategies fail in live trading, strategies developed by data mining and strategies developed with in-depth analysis relying on fundamental characteristics of price action (supply/demand, volatility, etc). I believe that the method you use for development is irrelevant, in the end you are simply finding some edge that either survives or fails under live trading conditions. The data used is in the end the exact same price-series from which the same information can be derived. What I am trying to figure out is what actually causes edges to fail or succeed in the future as I am convinced that there are some clear characteristics that allow you to know when you have high or low chances of success under unknown conditions. By this I mean quantitative characteristics – for example related with mathematical expectancy – and not merely qualitative characteristics (like the edge finding mechanism, which I believe might be largely irrelevant). Thanks for sharing your opinion and approach :o)

      Best Regards,

      Daniel

  7. Andy says:

    I think we tend to take for granted that our systems are vulnerable to changing markets. These examples serve as an excellent reminder that we cannot count on markets to perform as they have in the past.

  8. Krystian says:

    Hello Daniel,

    I like your blog and your professional approach – not only in automated system field, but also in your helpful and mind stimulating articles :)

    I am beginner in stock market field. I’m not Forex trader yet. However, I think some concepts from this article are similar to any market and on that concepts I want to focus.

    1. Out-of-sample testing fallacy.
    In one of your comments you wrote:

    “The most you can do – which is what I am doing right now – is to analyze how true this actually has been for different symbols and time frames. For example, if you run a system generation analysis and you find out that on a Forex pair the chances of success of a strategy generated using 10 year back-testing data – under many different scenarios during the past 25 years – has been only 5% historically, would you trade this symbol using a 10 year back-testing method?”

    My question is – how it differs from out-of-sample testing? I think that out-of-sample testing is not a guarantee of success, but rather a “cut test” that eliminates strategies that were adapted only to selected historical period.

    I think, that the major goal of automated trading system trader is to find a system that use signals (not noise) that corresponds to fundamentals in some way. Of course – when fundamentals change in future, the system will fail.

    2. The automated systems general
    a) do you think that there are universally (very) profitable systems out there (discovered or not), which worked well in any time in history and will work well in the future?

    b) if you think that there are such systems, do you think these systems can still be profitable in the future, if large group of people would know them?

    My theory is this: there is no universally very profitable system (at least not a simple one, but rather something that uses a LOT of data to predict prices). People behavior can change if profitable system would be discovered (a lot people would use it, and it will stop working from this moment). Having said that, a good system IMO can recover, if a lot of people abandon it seeking other alternatives.

    The most what we can hope for is a system that is quite challenging to use (not many people would use it) or not very profitable (people always are looking for amazing alphas and fast profit).

    I hope you understand what I mean. English is not my first language and I’m still a beginner :)

    Regards,
    Krystian

  9. Joachim says:

    Hello Daniel:
    Maybe the key of good performance of a system in the future is its SIMPLICITY.
    Perhaps a good idea is maximize the Profit Factor, minimize Ulcer Index and only on systems with no more tan 4-5 parmeters.

    Regards
    Joachim

    • admin says:

      Hi Joachim,

      Thanks for your post :o) That is an intuitive first approach but when you run historical tests you realize that there is little link between OS success and “system complexity” (more complex systems do not have a higher or lower probability to perform better under out-of-sample conditions). Some systems with 1 parameter fail, some with 20 parameters fail, having X number of parameters does not mean that your system is less or more robust, it only means that you can understand it better. What matters more is how similar variations of a given idea work out, so total parameter space evaluations might be more important. In any case, it’s definitely not that simple ;o) Thanks a lot for posting,

      Best Regards,

      Daniel

Leave a Reply

Subscribe to RSS Feed Follow me on Twitter!
Show Buttons
Hide Buttons