Financial Time Series: Can you generate new valid data?

April 8th, 2016 No Comments

The reason why we cannot have trading systems that can infinitely withstand the test of time is simple, we don’t have access to infinite amounts of valid market data that we can use to build systems that unequivocally tackle the real fundamental market inefficiencies present within the data instead of just transitory ones. Since this problem would be resolved by finding some unending source of market data it is worth asking whether we can indeed generate additional market data – valid eventual future market data – that we can use to create systems that are far more robust than the current data would allow for. On today’s post I am going to talk about this issue, the potential advantages that it could bring as well as why it is so difficult – probably impossible at this point – and what would be required to make it a reality.

–

Selection_929

–

Extending market data is not a novel idea. Many people in the past have attempted to extend market data to perform simulations beyond the available real historical data. These attempts have generally resulted in even less robust systems rather than more robust systems which has halted efforts to perform this type of exercise. Attempts have involved anything from Monte Carlo simulations using price data to more elaborate methods that attempt to preserve some property within the series as a function of time. There lies the fundamental problem in the extension of financial time series, there must be some rules to create additional market data and those rules are not easy to understand because they are indeed part of the underlying mechanisms that would help you predict market data. An understanding of these factors would imply a fundamental understanding of the market itself which would immediately imply you can make money — you don’t need any additional systems if you get to this point.

The reason why it’s so complex to generate future data is because market data contains some degree of inefficient behavior that must be preserved but such inefficiencies are present both within a huge amount of noise and within some very particular conditions. To be able to generate new data where the inefficiencies are intact you would need to know their very nature or at the very least you must have some measure that you can use to preserve them, there must be something that remains constant within the financial time series. If you have read my posts this year you’ll notice that I have studied inefficiencies through the use of chaos related properties, but the fact is that such properties are not conserved across financial time series, they indeed change across all time series in some way or another.

–

Selection_931

–

Even if the exact values are not preserved you could go even deeper and say that you won’t preserve exact values but their distribution, or even the distribution of their gradient across periods, but you are bound to get new data that does not conform to all properties of your real market data. For example real market data has periodicity that is generated both due to market opening/closing times as well as news cycles. The NFP is a good example of this where the first Friday of every month follows some rather predictable volatility patterns that you would need to reproduce within your candidate data. Focusing on a single property eliminates other values that are critical to the markets and – worse still – can create apparent inefficiencies within your artificial data that are not and will probably never be present within the real data. Systems traded on such inefficiencies inevitably lead to even faster failure.

The fact of the matter is that usually such generation efforts have involved simplifications and since the idea is to do very large data expansions such simplifications end up being much more representative of the data than the real data that you use for trading. In the end you end up creating data that has some additional quirks, that does not exactly match the data’s properties across all possible measurements and such quirks doom these approaches to failure. If these approaches are bound to succeed it would be necessary to find a time series property that is unchangeable – it can even be the distribution of some property as I explained above – and that its preservation automatically reflects all other expected sets of market behavior. There is also a risk – even if you find such a property – that it would be just noise, because your base data is not infinite as well.

–

Selection_932

–

Of course the fact of the matter is that if you find such an unmovable factor that is also able to preserve all other sets of market behavior you have found the “real signal” within the data, which means that you can indeed create as much data as you want but – even further – it means that you can profit from it on itself instead of relying on systems created from the additional data that is generated. Interestingly the finding of such a factor may indeed cause it to change – as market reacts to the finding of information – so it is safe to say that the generation of potentially infinite amounts of potentially valid future data for trading is never going to be the case, or at least it never will be for long if it ever happens. If you want to learn more about trading and how you too can generate trading systems using more than 28 years of market data please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies