When solving problems using machine learning there is typically a relationship between errors within training sets, errors within testing sets and model complexity. This relationship and the arrival at an optimum trade off between model variance and bias is commonly denoted the bias-variance trade off. Today we are going to talk about the bias-variance tradeoff in trading and why it is a much more complex problem than in other deterministic machine learning problems. Furthermore we are going to talk about the irreducible error, an error component that is usually small or not present in many modeling problems and that poses the most fundamentally important error source when building trading strategies. Note that I used some images from this post which contains a very good explanation of the bias/variance tradeoff.

When you build a model to describe some behavior – for example a model to predict the movies a person might like – you can usually see that the error of your predictions within your training set is reduced as model complexity increases. A 10th degree polynomial will usually have much less error in the training set than a linear model and so on. This is a simple consequence of the degrees of freedom within the model that allow a much better adjustment to the data at hand. However this adjustment – or fitting – usually means that results improve up to a point on unseen data but after that point the error increases in the testing set. This means that bias decreases with model complexity up to a point – your predictions in unseen data become better – and then variance starts increasing as you gain improvement in training but not in testing data.

The above is the reason why curve-fitting bias is often considered an important source of system failure in algorithmic trading. If a model worked in a back-test and failed within real data testing then it is easy to assume that – as in most other model building efforts – the problem is that you did not arrive at a proper bias-variance trade off and you created something that was too complex for the amount of data that you had at hand. In traditional model building this means that you should either reduce your model complexity or use more data so that your model can behave properly under unseen data. This in fact works to reduce curve-fitting bias in trading as larger data for “training” – in this case meaning any generic model building effort – should in all cases lead to less curve-fitting bias and lower complexity should help – up to a point – if the model is very complicated.

That said, the fact of the matter is that model-building related sources of bias, like curve-fitting bias and data-mining bias, are not the only sources of error that generate failure in trading strategies. There is another fun error class which is called irreducible error which points to a source of error that is brought by the stochastic nature of the problem being solved and the lack of enough data to properly describe the problem. It is the valley in the bias-variance trade off graph, the minimum error you could get to. In the trading case this is an error related with the market simply changing to something else that cannot be reduced because of the unpredictable nature of these changes. No matter how much you attempt to reduce curve-fitting and data-mining bias, irreducible errors will always be present and they cannot be reduced — as the name implies.