Understanding financial time series data: Why a 1D bar is not the same as a 1M bar

December 5th, 2015 No Comments

A very common argument in the world of retail algorithmic trading for using only recent market data from a low timeframe is that you don’t have to worry about how far back your data goes provided you can come up with a large enough number of trades to validate your model. The argument goes that if you need 5000 bars to generate 100 trades then you can simply use 5000 one minute (1M) bars for the past 5-10 days instead of daily data for the past 20 or so years. Today I am going to explain why this argument is fallacious and why you cannot put a direct equivalency between bars from different time frames. I will walk you through the fundamental composition of market data and explain why the number of market conditions relative to different bar lengths is essentially different.

–

When we represent market data in the form of japanese candlesticks with an open/high/low/close (OHLC) format we are indeed generating 4 points of data that describe an instrument’s trading through a given span of time. The existence of this same number of points regardless of the length of the period described is what draws people into making the conclusion that a daily, hourly and 1M bars are the same amount of data. Each of these bars has 4 points, so they obviously contain the same amount of data, therefore 100 minute bars can be used to draw the same statistical conclusions that could be drawn from 100 daily bars. Why is this vision wrong? Why is it that 100 1M bars cannot be used to draw the same conclusions as 100 daily bars?

To understand this we must first understand what the fundamental unit of market movements is and how candlesticks are actually constructed. In the market the fundamental unit is not any OHLC bar structure but it’s actually the tick. A tick is basically a Bid/Ask quote in the market that forms when a change in the offered prices to buy or sell a given market instrument happen. Ticks do not have to happen with any given frequency and are therefore time-independent. We then use candlesticks to encapsulate the movement of ticks within a given defined time span and generate the candlestick structure using the OHLC values that formed within that period. Since the tick is the real fundamental unit of a financial time series then the amount of information in a candlestick is described by the amount of ticks that took to form it.

–

The above describes precisely why a candlestick formed within one minute is totally different from a candlestick formed within a 24 hour period. The first might have been formed with just 3 or 4 ticks while the daily candlestick might contain thousands. The amount of information on the daily candlestick is much greater than the amount of information on the 1M candle. Even though both candles just contain 4 data points the amount of fundamental units of price (ticks) that took to build each candle is completely different.

Think about this as if you were using candlesticks to register temperature in your city. Imagine that you register temperature randomly using a digital thermometer such that you take about 1 to 10 measurements per minute. You then take data for a year and create both 1M and daily candlestick charts. If you take the 365 daily candles that make up the year you get a very good feeling of all the different temperature conditions – the seasons, the highs, the lows , the records, etc – while if you used the last 365 1M candles (just the past several hours) you will get a very limited picture of what is actually going on. If you attempted to device a model to predict temperature what data would you use, 365 daily candles or 365 1M candles? In this example it is obvious that both candlesticks are not the same and exactly the same happens with financial data. The more information that is taken to build the candles, the better they are for making predictions.

–

The consequences of the above are similar to what you would expect for a temperature model. If you use just the most recent lower timeframe data to build trading systems you will find out that your strategies will have a terribly low survival rate going forward while if you use long term high timeframe data to build systems you will find that your survival rate will increase tremendously. Even if you used 1M candles for the past 2 years, your models would be much worse than models built using 20 years of daily data. This is simply because the amount of information in 20 years of daily data is still much greater than the amount of information in 2 years of 1M data. Of course, this does not mean that you cannot use low timeframe data to build systems, only that to reach the same level of market information you need to use data for the same time span. Obviously if you have 1 year of daily data and 1 year of 1M you would be able to build much more precise and complex models using the year of 1M data. The point is that the amount of market information you have actually depends on the number of ticks that are implicit within the data you are using, not on the number of bars and OHLC points used within your models. When in doubt, always go with the data that has the highest amount of implicit market information.

If you would like to learn more about algorithmic trading, model building and how you too can construct your own systems using long term historical data please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.