With my recent research in reinforcement learning I have been asking myself questions about system development, especially questions about how the ideal trading system should work. After a lot of analysis I have come up with the conclusion that orders like stops and limits (stoploss and takeprofits) would never be needed by an ideal automated trading algorithm and that the only reason why we use them is mainly because regular trading algorithms are by nature sub-optimal. Today I want to talk about how an ideal trading algorithm should work and why the introduction of additional order closing mechanics is simply not needed once you develop such an algorithm.
To understand how an ideal algorithm would work it is first important to ask ourselves about the decisions that can be made in the market and what this means in the development of a trading system. In almost all markets you can only do three things at each point in time. You can either go long, go short or stay on the sidelines. Even a perfect algorithm might decide to stay on the sidelines sometimes as there might be cases where the algorithm wouldn’t see a point in getting into the market as it wouldn’t be able to extract profit beyond trading costs. However it is clear that these are the only three decisions you can make when trading in the market and any other decision could be translated into this array of possibilities.
An ideal trading system would look at the market as often as possible and it would simply decide whether – according to current market conditions – it is best to be short, long or on the sidelines. The introduction of a stoploss or takeprofit would not improve such an algorithm as it would simply make a decision that was ideal at each given point in time. The algorithm acts as a human trader in the sense that it constantly evaluates the market and decides what course of action might be best, since this evaluation is constant there is absolutely no need for the algorithm to use stoploss or takeprofit mechanics, these only become necessary when an algorithm is very limited in its scope and therefore must establish boundaries for the very limited set of market conditions it can evaluate.
–
–
Regular trading strategies that use traditional trading signals suffer from the problem of “market blindness”. This means they do not have the tools to constantly evaluate the market and therefore they need to use stop and limit orders to control the profit/loss they obtain from each signal. This happens simply because there is no constant logic-based evaluation of the market but a simple BUY/SELL logic that uses some signal. However there is no information beyond this simple signal and the system needs to rely on sub-optimal solutions to capture market gains. It’s like being in a dark room with limited information about your surroundings and some simple rules to navigate, this might be enough to avoid bumping into every wall, but it’s certainly not equal to having a flashlight.
What I really like about reinforcement learning is that it approaches trading in a manner that emulates what an ideal algorithm would do. Instead of having simple signals and relying on SL/TP mechanics you have a constant picture of the market – you know the state of the market at each point in time – and you just use this information to make trading decisions without ever having to need an SL and/or TP. Since reinforcement learning aims to learn “what to do” under a very varied number of circumstances you will create algorithms that simply always know what to do based on their past experience. You will always get a BUY/SELL/OUT signal from the market which will eliminate the need to implement this type of exits.
–
–
The above is one of the reasons why I am so excited about the development of trading strategies that tackle the market as a finite state machine. Learning to trade the markets as if it was a computer game seems to make intuitive sense and should lead to algorithms that are more adaptive and less likely to fail under changing market conditions. If you would like to learn more about reinforcement learning and how we are implementing systems for trading and learning using this approach please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies.
Hi Daniel,
First of all can I just say congratulations on all the work you’re doing on reinforcement learning; I’m following your developments with keen interest!
There are a couple of things that confuse me in this article though.
Please correct me if I’m wrong, but wouldn’t systems based on supervised learning models that use a long/short/neutral decision as the dependent variable(s) also fall into this category of systems? In other words, this is not something particular to systems based on reinforcement learning, but simply depends on the type of signals generated. Surely any model that yields either a buy, a sell or a neutral decision would satisfy your criterion of ‘an ideal algorithm’?
When you say “an ideal trading system would look at the market as often as possible”, you imply that these kinds of systems — those that trade without a SL/TP and can yield neutral signals — are better suited to the lower timeframes (ignoring such issues as simulation inaccuracy, computational cost etc). Of course this makes intuitive sense; a system that can adjust it’s position every minute will be inherently less susceptible to large, adverse inter-bar moves than, say, a system that trades the daily timeframe. Doesn’t this constitute something of a conflict of interest with regards to an RL GPU mining methodology, since it becomes a lot more difficult to achieve as you move to lower and lower timeframes?
In a similar vein, wouldn’t the current breed of RL systems that trade the daily timeframe benefit from the inclusion of an SL and/or TP for the reasons outlined above?
Best regards,
James
Hi James,
Thanks for writing :o) I’m happy to hear you’re enjoying this series! Let me now answer your questions:
Yes, in the strictest sense any algorithm that can yield a BUY/SELL/OUT opinion at any given moment in time could be considered an “ideal algorithm”. However I would envision an ideal algorithm as not only an algorithm that can output these choices but also as an algorithm that chooses from a complex enough array of options where the choices have the potential to change with time. Supervised learning approaches could indeed fit this scope but they have some important problems that I have never been able to overcome when attempting to do this. Whenever I have tried to use supervised learning to learn a large array of market states I have never achieved something that isn’t hopelessly curve-fitted — this does not mean it cannot be done, just that I haven’t been able to do it. Another important thing is computational efficiency in both back-testing and live trading, with Q-learning I can perform thousands of simulations at the same time on a GPU – which you cannot easily do with supervised learning methods – and when you live trade Q-learning is extremely cheap to update while retraining a supervised learning approach is not as easy or as fast.
Ideally you would want to work with tick data so that you could have an algorithm that decides just as often as the market moves. However this is obviously not practical and we’ll start with larger timeframes and then go to lower timeframes as time goes on. Reinforcement learning in the GPU is around 100,000x times cheaper than in the CPU – I already have it coded! – so we might be able to go to 1-5M timeframes without facing insurmontable issues (provided we use cards with enough memory, etc). However we’ll start on the daily and we’ll learn and move from there as we have with most other trading approaches. I would also like to stay potentially above the 15M TF as I would start to doubt simulation accuracy beyond this point given the inherent variability of the FX market and the lack of real bid/ask spreads in the historical data we use. The number of states needed for something like 1M or tick simulations is also too big so at this point you’re probably looking at a reinforcement learning approach that uses some sort of function approximation (using something like a neural network) for the definition of market states. However we’re just getting our feet wet, so we’ll go slowly :o)
I don’t want to introduce SL/TP mechanisms because these inherently increase mining bias and curve-fitting within the approach (plus they increase simulation complexity as well). My tests with GPU mining so far show that you can do really well with Q-learning in the daily TF without having to introduce this sort of exit mechanisms. I would prefer not to introduce them as this will probably yield results that are more likely to match future real out-of-sample testing. I would however include a very large SL at something like 5x ATR in order to have an emergency exit in place, just as a safety precaution.
Of course do post if you have any other questions and/or observations, your insights are always very welcome James :o)
Best Regards,
Daniel
Hi Daniel,
This is really exciting! It sounds like a new dimension. One remark thought, you hinted in your past blogs that past experiences are not a reflection of future performance. Here you are stating “create algorithms that simply always know what to do based on their past experience”. I have a feeling that you are meaning something different, but I just wanted to get your opinion on this :) Thank you.
Chris
Hi Chris,
Thanks for writing. Past experiences never have a guaranteed correlation with future performance. An algorithm might learn from its past experience but whether or not that learning and experience will yield the same or better results in the future is unknown. Let me know if you have other questions,
Best Regards,
Daniel