IS/OS variable correlations: How do things change for higher/lower in-sample PF values?

On last week’s post we talked about how the in-sample profit factor (PF) and sharpe ratio (SR) statistics in our PA system repository correlate with their out-of-sample values. It was clear from those findings that the correlation between in-sample and out-of-sample variables is increasing as a function of trade number, corroborating earlier evidence using pseudo out-of-sample periods that showed this same behavior. However it is also interesting to consider whether this correlation is fundamentally located across a certain portion of the in-sample variable values or whether it is evenly distributed across the aboard. On today’s post we are going to look into this matter to find out whether the correlation is weaker or stronger for higher or lower in-sample PF values. This should help us understand how evenly distributed the data is and whether there is any difference in how in-sample and out-of-sample correlations happen depending on the range of the in-sample statistic we study.

It is clear that if the in-sample has anything to do with the out-of-sample then the correlation between PF/SR statistics across both periods must increase as a function of trade number as the statistical significance of the out-of-sample measurement increases to better match that of the in-sample period. A higher PF in the in-sample will not correspond to a higher PF in the out-of-sample unless the out-of-sample is long enough to reflect at least a part of the edge that was obtained by the system in the IS. We now know that this indeed happens in our PA repository. However it is also a common observation that systems that are particularly well performing in the IS tend to show the inverse relationship in the OS, mainly because the large performance values might tend to mean revert more significantly than values for lower performing IS strategies. With this in mind we would expect the largest PF and SR values to show some tendency towards a negative correlation in the OS.

The image above shows the OS and IS PF values with a green line located at the 50% percentile line of the IS PF (1.27) for systems with more than 100 trades in the OS. There are also two linear regression lines showing the linear regression for the bottom 50% of PF values (red) and the top 50% of PF values (blue). The Pearson correlation coefficient (R) for the entire data is 0.148 while the R for the other lines are 0.27 (red) and 0.04 (blue). This shows us that the correlation is in fact greater for the lower half of IS PF values and the correlation drops significantly when we go to the top half. This implies that there is in fact a decaying tendency for a higher PF in the IS to show a higher PF in the OS as the IS PF gets larger. We look into this in more detail by examining how the R changes for groups above/below different percentile values of the IS PF from 10 to 95%.

In this case what we see is fairly telling. When we separate the PF by looking only at values above a certain percentile we can see that the R is always low, this means that always including the top IS PF values and cutting the lower parts heavily penalizes the correlation while going the other way and removing top performers – by only looking at groups below a certain percentile – actually increases the correlation. The highest benefit in the correlation actually comes from the bottom IS PF performers showing that the highest tendency of increases in the IS PF to become increases in the OS PF is when the PF in the IS is low. The benefit is lost rather quickly as the IS PF becomes larger and it’s evident that demanding ever large increases in the IS PF won’t lead to larger increases in the OS PF. It is most likely true that increasing the IS PF above 1.3 brings little benefit in terms of the expected increase in the OS PF at least with systems that have at least 100 trades in the OS.

As the number of trades grows larger the overall correlation of the entire data set becomes larger – as shown in last week’s post – and the above relationship between the different percentiles remains similar. For example when looking at systems with more than 200 trades in the OS the IS/OS global PF correlation is 0.258 while for systems below the 50% percentile it’s 0.568 and for systems above the 50% percentile it’s 0.046. This means that most of the gains in correlation as system numbers increase occur for the lower performing ranks of the IS PF while for higher rankings the correlation remains close to 0 the entire time. This further supports the notion that demanding an IS PF of 1.3 is logical – since increases in the IS PF are bound to lead to increases in the OS PF up to this point – but forcing systems to perform better in the IS might not lead to better gains in the OS, at least in terms of the PF for our PA system repository.

Stay tuned next week as I uncover a similar analysis for the SR which will show us how the SR statistic correlations behave across our PA repository using different percentile cuts. If you would like to learn more about our work and how you too can trade a repository of more than 10,000 price action Forex strategies without having to use a VPS  please consider joining, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.

Print Friendly, PDF & Email
You can leave a response, or trackback from your own site.

Leave a Reply

internal_server_error <![CDATA[WordPress &rsaquo; Error]]> 500