During the last few weeks I have been talking about the implementation of a random forest based classifier for the prediction of out-of-sample success at Asirikuy. After several weeks of testing and improving the classifier I have now added the code to our mining process in order to obtain predictions for new price action based strategies that are added to our repository. This week was the first time we tested this classifier on newly added systems. Today I am going to talk about the output given by the classifier, how these predictions actually achieve success within testing sets and what the outcomes will most probably look like according to classic Bayes theory. With this in mind we’ll discuss why the classifier works in testing sets and how we would expect it to work when making live trading selections. For some basic information about this classifier I suggest you read this post.
–
–
The results from the classifier in testing sets are in general successful in the sense that the classifier manages to significantly improve the average total return from testing sets by sometimes more than 100%. This means that when trading only what the classifier marks as a system that will be positive in the future we expect a significant improvement in the returns relative to what we would have if we would have traded every system. To better understand how the classifier achieves this effect it is interesting to look at the data from running the classifier on a testing set as showed above. By analyzing this confusion matrix we can understand how the classifier acts and why it is able to improve returns successfully.
If we look at the data we can see that the classifier does not like to classify things positively. Given that the prevalence of positive results within this testing set was 35% – only 35% of systems were profitable in the first 6 months of OS testing overall – and that the classifier only classifies around 8% as positive we can say that the method has a rather low sensitivity and tends to classify things as losing systems for the most part. The method also has a significantly large specificity as 93% of systems that are unprofitable are indeed classified as such by the strategy. The strategy’s main weakness is therefore classifying profitable systems as unprofitable. Around 92% of the profitable systems are discarded due to this fact.
–
–
But given all of this how can the method be successful in both K-fold cross validation and completely separate testing sets? Well the answer comes from the increase in the probability of the classifier identifying profitable systems, the increase in the average profit from those systems and the decrease in the average loss for systems that are classified incorrectly as profitable. The second image in this post shows you the average daily loss for systems in different classes. We can see that the general population of losing systems has an average loss of 0.145% while in the systems that are correctly classified as unprofitable this value is higher at 0.148%. Most importantly in the class of systems that are classified as profitable but end up being unprofitable the average loss is indeed much lower, at 0.107%. This means that even when we classify something as profitable and it turns out not to be profitable it tends to at least out-perform the average losing system.
A similar observation can be made in the case of profitable strategies as shown below. Systems that are properly classified as profitable have a larger average profitability than the average profitable system. Furthermore the average profit of systems that are incorrectly classified as unprofitable is in fact below the average for the population of profitable systems. This means that our classifier is identifying a set of profitable systems that tend to have much higher than average profitabilities as well as losing systems that have lower than average losses. Couple this with the fact that the profitability to identify a profitable system increases from 35% to 39% and we have a very substantial chance to significantly improve our overall profitability even though our classifier’s sensitivity values are low.
–
–
The data above shows how a classifier with apparently low selectivity that ends up classifying systems only slightly better than what is expected from randomness can derive a significant edge from the distribution of the actual classification of systems. Since in trading we’re not only affected by the correct classification of an algorithm as profitable or unprofitable but also by how profitable or unprofitable these algorithms are a classifier can prove very valuable if it can improve the distribution of losing/profitable systems, even if the classification itself does not prove to be dramatically successful (as in the above case). Reducing expected average losses and increasing average expected returns might play a much bigger role than the mere number of correct classifications. If you would like to learn more about trading portfolios and how you too can create your own trading portfolios using several different selection criteria please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies.
Hi Daniel,
sorry, but I would say that the results you’ve optained are not that encouraging, because sensitivity is much too low compared to specificity to be meaningful. There is a measure to calculate how probable it is that a classifier leads to an informed decision: specificity + sensitivity – 1. In your example it is 0.08 + 0.93 – 1 = 0.01. 0 means random chance, >0 usable, topic “Misconceptions”.
Nevertheless, if you could come up with something which increases sensitivity that would be a huge step forward!
Kind regards,
Fd
A link I gave in my previous post has been dropped automatically by the cms of this site. I referred to a Wikipedia article on sensitivity and specificity, where the Topic “Misconceptions” can be found.
https : // en. wikipedia. org / wiki / Sensitivity_and_specificity
Hi Fd,
Thanks for writing, always a pleasure to read your comments. Yes, I agree with you here, there is a significant probability that there is no meaningfulness here. The reason why I have been encouraged – despite of the relationship between specificity and sensitivity – is that the positive improvements in the average return of results does hold in both 10-fold cross validation and in completely independent testing sets that have been produced after I constructed the initial models (as new systems have reached the 6 month trading threshold). Note that I am not only interested in achieving good classification here – case in which this model would be useless – but also in how the average return is indeed distributed within the classified samples.
Of course I also strive to increase the sensitivity of the method – while preserving gains in the average return – but this is in fact the most difficult thing to do! Thanks again for writing Fd,
Best Regards,
Daniel