Reinforcement learning and trading: Approaching the markets like a game

Those of you who are familiarized with my blog and work know that I am always striving to find new ways to profit from the markets and build novel and adaptive trading strategies. Through the past several years I have been focused on automated system mining using price action and supervised machine learning techniques but today I want to talk about another area of machine learning that has the potential to yield very powerful trading strategies — reinforcement learning. In today’s post I want to talk about how we might approach the trading problem from a reinforcement learning perspective, what this type of approach can do in practice and how we can improve on it to arrive at practical trading applications. This post will serve as an introduction to a new journey in trading system development, a journey that seeks to find out whether or not we can approach trading as a game and find suitable solutions.


Reinforcement learning is an area of machine learning focused on solving constantly-evolving problems for which an agent cannot possibly have all information but where agents can make decisions and gather information about their environment. This type of problem can usually be described as a “game”, it is a situation where there is an environment that rewards/punishes you and gives you information such that you can take actions that influence future rewards/punishments. If you learn how the environment reacts to your actions and how these actions can lead to different outcomes you can certainly take actions that maximize your potential for rewards within the game. We can look at trading as this sort of problem, you have access to past and current market information – which sets your environment – and based on this you must make decisions that maximize the rewards (profits) you get from the market.

In the simplest incarnations of reinforcement learning we can assume that the market is a finite state machine – there are only a given fixed number of environments possible – and we can try to learn from these possibilities and the rewards that they entail. One of the simplest ways to do this is to use Q-learning, where we have a table with numerical values – which we call Q-values – and these values are constantly updated as we experience the game. We then take actions according to the action that has the highest predicted Q-value within our present game environment. As the game progresses we gain more experience and we should become much better at getting rewards.


If we approach the market using Q-learning we need to wonder about how we define the environment or game “states”. To implement a very simple Q-learning approach on the daily charts I defined states as binary numbers using the bullish/bearish quality of the past 8 bars and used this for reinforcement learning, initializing all Q-values as zero at the beginning of the game. Since a player can only have three instances within the game – long exposure, short exposure or no exposure – I defined the reward for each action as the gain/loss that would have been collected for being short, long or neutral going forward one bar. In this example an agent will always look at the past 8 bars and to define its state – so for example if the last two bars were bearish and the other 6 were bullish the state would be defined as 11000000 (192) – and when the bar finishes it will look at its reward and update its Q-table so that it reflects its experience. Luckily in trading we do not need to take all actions to see their effect since we immediately know the effect if we had been long, short or neutral when a bar finishes and we can therefore update all Q-values for any given state even if we didn’t take all actions but only one.

The first image above shows you how powerful Q-learning can be (this are all GBPUSD 1D tests). If you run a back-test one time on the data the agent will perform poorly since it does not have any knowledge about the environment but after playing just one game the agent has become extremely skilled at playing it, since it has already seen the outcomes. Play a few more times and the system actually reinforces its learning and gains an even better understanding of how to play this game extremely well. The agent converges really fast and reaches a state where it does not improve further running on the same back-testing data, it has already learned all it can from the states within the environment. The second image above shows you how the historical balance curves change as we perform Q-learning. The first curve is losing but the agent quickly learns and effectively trades this with the efficacy you would expect from the hindsight acquired within the learning process. The agent is learning to play the game we’re showing it very well, but we should wonder about whether or not it’s learning relevant information or not.


The above shows you why reinforcement learning is very dangerous in trading. Reinforcement learning provides a very easy mechanism for N-degree curve fitting exercises and can easily generate beautiful back-testing curves with almost no computational effort. While it would take you a lot of trying to be able to find a specific algorithm that was able to generate extremely highly linear historical testing curves, Q-learning does this in a few minutes with extraordinarily high efficiency. You can do this on any trading symbol on any timeframe, you will always find an embodiment of Q-learning that will create extremely enticing equity curves. The point however is that Q-learning applied blindly can be expected to fail extremely rapidly as it can easily provide no insights into market inefficiencies but simply insights into noise within the historical data. This is a specific problem related to the application of learning algorithms on at least partially stochastic problems.

On my next post about reinforcement learning we will see what happens as we change the richness of states within a Q-learning algorithm and we will also see what happens when we apply Q-learning to randomly generated data. If you would like to learn more about machine learning and automated trading system mining and how you too can create strategies while properly accounting for problems such as data-mining bias please consider joining, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies.


Print Friendly, PDF & Email
You can leave a response, or trackback from your own site.

Leave a Reply

internal_server_error <![CDATA[WordPress &rsaquo; Error]]> 500