Trying to Measure a Trading System’s Robustness : The Hardness Index

On Yesterday’s post I talked about the Ulcer Index and how this measurement – along with the Pain Index – allows us to see how easy or difficult it is to trade a strategy from a psychological point of view. Both of these indexes use draw down period lengths and depths as a way to tell the trade whether a strategy will be very “stressful” to trade on an account. Thinking about this measurements and  how their pertain to a non-profit related aspect of system quality I thought that we have indexes to measure almost every aspect of a trading system except the robustness of a trading strategy. Through the following few paragraphs I will introduce you to an idea I had – The Hardness Index – which allows us to measure how robust a strategy might be when compared to another one.

When we think about how good or bad a trading strategy is we usually think in terms of draw downs and profits. The only thing that most people – especially new traders – tend to value is the ability of a trading system to bring profits with limited draw downs. The problem with this is that many – if not every time – most of the statistical information used to derive these values is a consequence of simulations and therefore constitutes a measurement of expected profit Vs real risk and as a consequence we may fool ourselves to believe that a system is “very good” when in reality it is simply not.

A very important characteristic of a strategy – beyond how much profit or draw down it brings – is its robustness which can be defined as the like hood of a strategy to continue to work under future market conditions. Certainly there is no “crystal ball” to tell us if a system will fail or continue to work but there are several things we know which we can use to mathematically measure the like hood of a system failing against another. For example we intuitively know that a strategy which has reliable profitable simulations on 10 different instruments is bound to be less prone to failure than a system which trades a single currency pair and we also know that a strategy which has 20 variables optimized using a 0.01% step criteria has a high probability to be curve fitted (and therefore to fail under new conditions).

Using this information I have devised an equation – the Hardness Index – which tells us how a strategy compares to another when talking about their robustness. The equation takes into account all aspects I consider relevant to system robustness, in particular when such strategies are derived from simulations. Note that this index does NOT contain any profit or draw down information but measures trading system quality simply through the like hood of the strategy being able to withstand future conditions. This is how the index is calculated :

The higher the value of the hardness index, the more robust a strategy is and the higher its quality is regarding the possibilities it has to survive future changes in market conditions. As you see a system that trades more instruments with the same parameters, has more out of sample years and more coarse optimizations has a much better chance at surviving in the future. The index puts a lot of weight in out-of-sample testing as this shows in fact if the strategy survived or not to changes under market conditions it was not optimized for. The hardness index rewards systems which have less degrees of freedom and longer testing periods while it “punishes” strategies which have a higher like hood of being curve fitted (finer optimizations, shorter testing periods, more degrees of freedom, etc).

The value of the index can vary significantly between systems. For example let us consider the current Quimichi strategy –  a very robust system based on 7 different currency pairs – and calculate the pain index for it. The solved equation is showed below :

The result above shows that the hardness index for Quimichi – taking into account its development process – is 466 while the hardness index for a system like Watukushay FE which trades only on the EUR/USD and had many more variables optimized in a much finer manner has a hardness index of merely 8. This definitely shows that the difference between the robustness of both systems is very large and Quimichi may easily be considered far more robust as it trades more currency pairs, had less variables optimized and those variables were optimized in a much more coarse fashion than for Watukushay FE.

Although there may be several ways to improve the above proposed calculation, this Hardness Index will be able to give us a good idea of where different strategies stand in relation to how robust they might be towards changes in future market conditions. The above index is my first attempt to ever measure robustness in a mathematical way, something which is necessary since robustness is a very important characteristic which is often neglected in favor of other statistics such as profit to draw down ratios while it might be one of the most important characteristics of a trading strategy. Of course this is just a proposal and any suggestions or contributions to the above equation would certainly be welcome so feel free to leave any comment, question or suggestion you may have !

If you would like to learn more about my work in automated trading and how you too can design trading strategies with robustness in mind please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

Print Friendly, PDF & Email
You can leave a response, or trackback from your own site.

12 Responses to “Trying to Measure a Trading System’s Robustness : The Hardness Index”

  1. Maxim says:

    Daniel,

    A really original idea!
    I’d like to know whether it is possible to calculate an index for portfolio.
    Will it be just a sum of indices of the underlying systems or something more complex?

    Maxim

  2. Explorer says:

    Daniel,

    How about to incorporate multiple time frames into the formula above the 1hr.

    Explorer

  3. fd says:

    Hi Daniel,

    due to my profession as you know I am used to look at risks of an approach ;)

    While your proposed formula might give a resonable indication how likely it is that a strategy has been overfitted, it should make no one believe that a strategy with a high “Hardness Index” can’t fail or will definitely lead to profitable results. And that’s the risk.

    It’s difficult to assign practical relevance to this index number. Shouldn’t you trade a strategy with a low number? From which index number is it too risky to trade? What is the exact risk? What are the consequences of having a certain number? I think, the only reasonable thing you could do, is to check if there are chances to improve this score through amendmends. But how far can/should you optimize it? How far do market conditions need to change before a strategy scored very robust fails compared to a strategy with a low score?

    From my point of view, a robust strategy is one, which does not open a single position if market conditions don’t fit (and if it finds not way to adapt to it). The ultimative test would be to drop a strategy on all pairs. On pairs the strategy has not been written for, the entry logic needs to be good enough to detect when market behaviour will not allow for profits. I mean, not through simple detection of the Symbol() name but through observation of the market and not after a long loosing streak.

    In this sense I think none of our current strategies would pass this test (btw. I do not know any commercial strategy which would). At current we are able to detect non adaptable market conditions only after we have lost money and we should work on decreasing this time.

    Regards,
    fd

    • admin says:

      Hi Fd,

      Thank you very much for your comment :o) Of course, the hardness index does not determine if a strategy will be profitable in the future (as no index does) and it merely indicates which strategies might have a higher chance of continuing to work under future market conditions. The index does not give a “scale comparison” like the pain index but merely serves as a way to compare the robustness of one system against another. For example Quimichi Vs Watukushay FE as shown on the article.

      Certainly it would be great to have a system that could trade everything and simply stay out when the market does not fit what it deems “profitable trading” but sadly this cannot be the case because every trade carries with it a market exposure and a probability of ending up as a loser. Of course, I understand you might not imply that the perfect system “always wins” but even the notion of maintaining an edge under all market conditions (with the draw down periods implied) is flawed as market conditions can always change to something the system does not know about. Having a system which would maintain an edge on every instrument is most likely not possible since being able to know when a system stops working before it stops working is predicting the future (something which cannot be done with 100% certainty by definition).

      So there is definitely potential for improvement and for more dynamic adaptation of strategies but this will never reduce market exposure to the point where we can stop trading a system before it shows that it cannot continue to sustain its edge . Profitable trading is in essence reactive (not predictive) and therefore we need to see a system fail before we say it has failed because in order for the probability of a system to have “stopped working” to be high we need to take those trades. In essence since the future is not certain we cannot be certain that a system has failed until it has.

      Thank you very much again for your comment Fd,

      Best Regards,

      Daniel

    • erick says:

      “On pairs the strategy has not been written for, the entry logic needs to be good enough to detect when market behaviour will not allow for profits.”

      i think this statement should be changed to: “entry logic needs to be good enough to detect when market behaviour will not allow for profits.”

      this second statement does not ask for a prediction; it just asks for an estimate of possible profitability, of trade potential.

      a successful strategy uses its indicators to measure the potential for a profitable trade prior to entry. if the currency is showing an “aberrant” personality – i.e. the currency’s normal volatility per timeframe is outside a specified range – the entry conditions would not be met.

      in theory, you should be able to put a long-term profitable euro strategy on the pound and it would only trade when the pound acted like the euro. but this euro strategy on the pound would only be long-term successful if the pound continued to act like the euro during the trade.

      note that the above has nothing to do with the hardness index – which only looks at the robustness of the strategy’s statistics, not at the strategy itself.

  4. Maurizio says:

    Dear Daniel,
    thank you for this very interesting article. It seems that this index is in some way more important than the pain or ulcer index. While these last indices focus on the difficult to trade a system or portfolio on a psychological point of vue, the hardness index focuses on the probability to be profitabile in the long run, which is more important. Qumici is in this sense the most robust of our strategies and you told this already in your first system presentation video. What surprise me is that Qumici has not been very considered in our most used portfolios, maybe because nobody likes its long and quite deep DD. I think it would be worth working a little bit on it in order to associate other systems which could somewhat hedge a bit these DD and allowing to better use Quimici portfolio by smoothing the equity curve. About Coatl, when we optimise the settings (on D1 timeframe and using 20 years of data) we generate a huge amount of likely profitable EAs. Maybe among all these profitable strategies, we could find one which is suitable for all or almost all of the majors and minors pairs using the same settings, but how would it be possible to find it between the tons of staregies genrated without loosing our whole life? I think this will be the most robust of all the strategies so far. What is your opinion?
    Thank you.
    Best regards.
    Maurizio

    • admin says:

      Hello Maurizio,

      Thank you very much for your comment :o) This index is not more or less important than other tools which measure performance but it is simply an additional source of information which evaluates a non-profit related aspect of trading strategies. I have always believed that people put too much weight on profit and draw down statistics without ever considering robustness as much as they should when it should be at least as important as performance based statistics.

      As you say Quimichi is not used within any portfolios but this is mainly because it doesn’t compliment other strategies (which are mostly intra-day systems) that well. Certianly it would be interesting to have a portfolio including Quimichi but up until now I haven’t found a portfolio combination which gives results which are good enough to consider. Regarding your questions about Coatl, this is certainly a great idea and something we are implementing within our C++ genetic framework, with this framework we will be able to search for a system that gives profitable results along X number of instruments, something which will give us the power to create far more robust trading strategies than what we can currently do with Coatl (which is limited in its evaluation powers due to the limitations of MT4, as you so clearly put it we would “loose our lives” attempting to cross-test all systems on all pairs in MT4). Certainly as we evolve and develop more powerful evaluation tools our possibilities – especially for the development of more robust systems – become much larger :o) Thank you very much again for your comment Maurizio,

      Best Regards,

      Daniel

  5. erick says:

    meant to say: “detect when market behaviour will Probably not allow for profits.”

  6. Stefan says:

    Hello Daniel,

    I think your formula misses the number of trades per year. Without this a strategy with for example 3 parameters and some 3 trades per year could generate a perfect index also it is a curve fittet strategy and the 3 trades in the the out of sample period have just been lucky.
    In any case the actual sample size is a key parameter to be evaluated for any statistical meaningfull analysis.

    Best regards

    Stefan

    • admin says:

      Hello Stefan,

      Thank you for your comment :o) You’re right in that the sample size is significant but trade number is a misleading statistical measurement. What if I made a system which takes 10 trades per entry signal with 1/10th the lot size ? As you see it is as easy to generate a system that scores better simply because it has traded more regardless of whether or not each trade corresponds to a different “entry event”. The adequate statistical measurement would most likely be the number of distinct entry signals rather than the number of trades. However you are definitely right in that including sample size is key :o) Thanks again for your comment,

      Best Regards,

      Daniel

  7. erick says:

    is AOS as potent as NYO?

    how could you determine the relative weights of the terms?

    • admin says:

      Hello Erick,

      Thank you for your comment :o) Definitely a tricky question since there is no clear way in which the different “strengths” of parameters can be determined. The reason why I gave the out of sample testing a higher rating was because obviously doing trading outside an optimized region should be exponentially more important (as the system is “walking the walk”) but the different relevance of parameters against each other has no “correct” way of being calculated so it depends on the judgement of the person building the index. Certainly I attempted to put my best judgement into the proposed index but that does not mean that a better measurement could not be devised :o) Thanks again for your comment,

      Best Regards,

      Daniel

Leave a Reply

WordPress › Error

There has been a critical error on this website.

Learn more about troubleshooting WordPress.