A look into feed dependency in FX trading: Quantifying OHLC data differences between brokers

The key characteristic that makes the Forex market so different from other markets (such as bonds, futures or equities) is that the FX market is a decentralized market where no “final and true” price series exists. Since the Forex market lacks a central exchange, there is no consolidated, real data feed and each broker must work with the feed that it obtains from its liquidity providers. This means that an FX trader using broker A might be looking at a slightly different version of the data compared with broker B. When you design algorithmic strategies these differences become a nightmare because they can generate different algorithmic responses, merely due to differences in the underlying data feeds. On today’s post I want to share with you an initial quantitative exploration of these feed differences, along with their timeFrame dependence and the problems that this can generate. In the end I will also give you some conclusions regarding how I think system development should be done to attenuate these issues.

Graph2

First of all we should ask ourselves how we can compare data between two different brokers. Data feeds in FX trading are a nightmare because they come in a variety of time stamps and configurations. For example broker A might be GMT +1/+2 while broker B might be GMT zero all year round, in addition broker A might open on Mondays at 0:00 its time, while broker B might open at 0:00 its time, which means that daily and 4H candles will have dramatically different structures and the feeds will be difficult to compare. Comparing different data feeds is not as easy as exporting them and performing some simple arithmetic, as the process requires an involved data re-factoring procedure to bring all feeds to the same timestamps and weekly starting/ending times. Thankfully all of this is taken care of automatically by our Asirikuy  F4 programming framework, allowing us to save the datafeeds to a common time stamp with little effort.

Once the price series have been refactored we are still not home-free because we must also face the problem of missing candles. Two FX brokers might have different numbers of candles depending upon the feed frequency, which becomes a very important aspect when going to the lower time frames. Starting with the 15 minute time frame, a broker might have up to 0.2% less candles (in my experiments at least) simply because its feed volume is lower and therefore some 15 minute candles during low liquidity times go undrawn. What happens is that in the end you have two feeds where one has some candles that the other doesn’t have simply because the feed on one broker didn’t draw this candle while in the other it did. The difference in the actual tick feed is very small but on the actual charted time series the difference is a whole 15 minute candle that can cause significant problems if you’re using systems that rely on shift candle comparisons (like Close[1] > Close[2]). This means that signal differences can be significant on these time frame simply due to missing candles. This problem becomes bigger as you move into lower and lower time frames.

Graph1

Quantitatively the differences are also quite important. Within this post you can see the comparisons between two brokers (A and B) for the 15, 60 minute and daily time frames for the period between 2012 and 2014. All time stamps have been properly refactored to make sure that all candles match, candles that didn’t have a counter-part on the other broker were removed. The comparisons within these graphs show that the overall largest deviations become much larger as the time frame becomes lower. While the maximum deviation between daily candles constitutes 32.09% of the candle’s range, the maximum difference within the 15 minute candles is as large as 599%. This means that you will find 15 minute candles that are 6 times bigger on broker A compared to the same candle in broker B.

The average difference also increases significantly as you go into lower time frames. The average difference for the daily candles is only 1.49% while for the 60 minute time frame its 8.4% and for the 15 minute time frame its 12.02%. The standard deviation also increases significantly as you go to lower time frames, in the daily candles its 2.47%, 17.80% for the 1H and 18.88% for the 15 minute time frame. It is also interesting to note how the structure of the distributions change as you go into lower time frames. Although all time frames show the highest frequency at bar zero (almost all bars are identical) on the daily time frame you see an exponential decay towards higher differences while the 1H and 15M distributions show significant peaks at significant dependency points (for example at 5% for the 1H and about 20% for the 15M). This means that dependency issues are particular to certain events and are not homogeneously distributed within the time series. It is not that all bars are a bit different between brokers but that some bars are very different.

Graph0

This analysis supports the idea that higher time frames suffer from less drastic but more frequent dependency problems because theses “single events” get diluted as you look into higher data frames while they appear much larger when you go into the lower time frames. Nonetheless, since the differences in the lower time frames are centered around specific events we could come up with some mechanism to mitigate this divergences. A good idea might be to use median calculations of several bars instead of pure OHLC values when dealing with lower time frames. Since the median is somewhat immune to extreme excursions, using the median of the OHLC  might eliminate a big part of this dependency.

Another important point is that the lower time frames show an overall lower probability to face dependency but if you face it your probability to get a larger dependency is bigger. Since the dependency is centered around specific events you could indeed analyse which bars contain the highest dependency and avoid trading or using data from those times altogether. For example if dependency happens fundamentally during low liquidity, one could introduce a refactoring mechanism that filters bars by volume (something alike an normalized volume oscillator filter) which could eliminate a significant portion of the problem. Of course, this requires a significant degree of additional analysis.

In the end your time frame choice in the Forex market is a choice between a small but fairly constant feed difference (higher time frames) or a more rare but intense dependency problem (lower time frames). During March I will be publishing a more detailed study of broker dependency (with some broker names and additional comparisons, incuding seasonality) within Currency Trader Magazine. If you’re interested in my work and would like to learn more about refactoring and comparing data from different brokers please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

 

You can skip to the end and leave a response. Pinging is currently not allowed.

2 Responses to “A look into feed dependency in FX trading: Quantifying OHLC data differences between brokers”

  1. Hans says:

    Feed differences are a real pity indeed. The difference is less for the major pairs between brokers, but can be a lot higher for non major pairs. And even with the rather small difference for major pairs any system will still trade different on those feeds sometimes, just natural… (and a real problem).

    My solution is that I first had to find a broker that is really good and serious and can show me the tickets that the order is really going to the real market + has good relations with it´s LPs (so they don´t reject to much and slippage stays as low as possible). Once that was achieved, I am ONLY trading my system at that broker and only optimize it with the data obtained from that broker with a weekly 1M chart refresh and then creating the upper timeframes from that 1 minute data.

    With this method, so far, I have achieved a 100% match between live trading / backtesting for the last year of trading this system live. Of course with some small differences in the entry price because of market slippage, but at least it was the exact same amount of trades and exact same time of entry / exit between the backtest / live trading.

    The reason behind this also is that my system only trades on bar open (entries and exits). When letting it trade intrabar this won´t work so well, at least not with simulated ticks from MT4, and real tick data backtests take to long and there is no data prior 2007, so not good.

    But anyhow, that´s the working solution for me for feed differences.

    • admin says:

      Hi Hans,

      Thanks for your comment :o) Yes, I approach this problem in almost the same manner. Having good brokers and trading on bar open seems to be one of the only ways to achieve good back/live testing consistency, making time corrections and refactoring time data so that timestamps between brokers match is also fundamental if you want to reproduce results across brokers with different GMT shift and weekly opening/closing times. Trading majors is also important in this regard — as you also mentioned. Thanks again for stopping by,

      Best Regards,

      Daniel

Leave a Reply

Subscribe to RSS Feed Follow me on Twitter!
Show Buttons
Hide Buttons