The Kantu Project: Learning to Filter Price Patterns Through a High School Trip

As you may have read on my last post, I have started a software project that seeks to automatically generate algorithmic systems through the use of price patterns. Following and expanding on the concepts highlighted through Michael Harris’ work, this program searches through millions of potential pattern combinations and selects those that fit certain user defined criteria. The systems generated are entirely parameter-less, meaning that they cannot be subject to any optimization because they lack any degrees of freedom. The creation of this software had some tricky steps, particularly the generation of a filtering algorithm that could discard patterns that don’t make any sense. I wanted to write a small blog post so that people interested in this idea can have some reference into how the problem can be solved. On today’s post I will talk about the problem of filtering price patterns – when they are generated algorithmically – and how my thought process led me to the solution of this conundrum.

First of all we should talk about the problem of price pattern generation. When you generate a set of rules to trade based on price action you must create a series of comparisons that will constitute the trading logic. A genetic programming approach puts these “logic blocks” together and generates an overall set of conditions that must be true in order to enter trades or exit trades. For example a genetic programming engine might generate a set of two rules that say: enter a long when the last candle’s high is higher than the high of the candle before and it’s close is below the close of the candle before. Here we have two main logic rules High[1] > High[2] and Close[1] < Close[2] which constitute the main entry logic. When a genetic framework puts these rules together it has no idea of how they should “match” so you can end with things that are bizarre and will never yield any trading result. For example a trading engine can come up with something like High[2] > High[1] and High[2] < High[1] so there is a clear conflict between the rules because the two of them cannot simply be true at the same time. The logic space – the possible combinations of all trading rules – is made up largely of such combinations so being able to adequately filter them from back-testing becomes fundamental to achieve an implementation that doesn’t waste computational time and reaches results more effectively.

professorowlrgb

How do we solve this problem ? It is not very simple when you start to think about it. How do you generate an algorithmic implementation that efficiently filters these problematic sets of rules? In order to find out how to organize the rule generation procedure it is convenient to think about a real-life analogy that can help us understand the problem better. Image that you are a high school principal and you need to organize a high school trip where you want to avoid both fights and relationships between your students. Out of 500 students at your high school you need to generate random lists of 10 and then decide whether they are appropriate for the trip or not. There are many possible combinations and potentially most of them won’t work out. It’s time to put your student management skills to the test :o)

Think about each student as a candle property (for example High[1]) and whether they like or dislike someone as the “>” and “<” operators. When you generate a list of students  you can know who each student likes or dislikes (the trading rules) and through this you can make a decision about whether you like the list or not (if the pattern is valid). If a student A likes a student B and B also likes A then the list cannot work because A and B would be in a relationship (because they like each other!) and in analogy you cannot have a list where B disliked A and A disliked B because they would get into a fight (they repel each other). This means that if you have a property that compares to another one and that other property compares in the same way to the first one, the pattern isn’t valid (for example Close[1] < Close[2] and Close[2] < Close[1] is invalid). This is because there is a logical impossibility. It doesn’t matter if your students all like or dislike someone else, as long as that someone else’s sentiment isn’t reciprocal. In the same manner it is OK for your trading properties to compare to any other one provided that the property they compare to doesn’t compare to them in the same way.

Another case where things don’t work out is if you find out that one of your students is ambiguous about who he/she likes or dislikes.  Your list cannot be valid if one of your student dislikes and likes another one at the same time, in analogy a pattern cannot be valid if a property has different comparisons with the same secondary property (for example High[1] < High[2] and High[1] > High[2]). In essence you want all of your students to know who they like or dislike so that you can be sure when you make your list. In analogy your price pattern must not contain rules that are antagonic, you cannot have a property that compares in different ways to the same value.  When you finally have a set of students with clear likes and dislikes for which there are not conflicts, you can be sure that your list is going to be good.

This problem teaches us about the way in which we need to organize our data and what we need to do to create a validation function for price pattern detection which is both simple and efficient. In essence you must have a list of the properties you are using (the main properties) and the properties you are comparing to (the secondary properties) as well as the operators used for each comparison. Once you do this validating patterns is as easy as looping through the properties to find if a main property is the secondary property of another rule where the main property and operators are swapped (the first case we discussed) and the second is simply a check to see if the main property has opposite comparisons to the same secondary property. This logic can be applied to any set of rules and will always generate patterns that are – at least – not logical impossibilities. The above implementation is also computationally very efficient, taking up less than 0.1% of the back-testing time when doing a 25 year test on daily bars using Kantu.

L6_FILTER

Implementing a pattern filtering algorithm as described above is not only important but fundamental to the automatic generation of trading systems because otherwise you are spending many CPU cycles looking at patterns that are simply a waste of time (because they never generate trades). Kantu experiences a speed boost of about 5x in the finding of suitable patterns whenever the above filtering algorithm is implemented while otherwise many computer time is spent in the running of empty simulations. It is also important to note that the above example and algorithm suggestion is valid when constructing trading logic sets using any type of building blocks that are compared through “greater than”, “lower than” and “equal” operators, not simply those derived from price action (although I used these as an example as they are the ones used in Kantu).

I hope you enjoyed this small post about Kantu and the way in which I created its pattern filtering algorithm. Stay tuned for more results from this program as well as new ideas and developments. If you would like to learn more about algorithmic trading and how you too can improve your understanding in this field please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

Print Friendly, PDF & Email
You can leave a response, or trackback from your own site.

7 Responses to “The Kantu Project: Learning to Filter Price Patterns Through a High School Trip”

  1. Samkronin says:

    Hi Daniel, thanks for the article – so that’s how you did it. A neat solution, although does it also work for cases where there are one or more degrees of separation between the properties you are comparing? For example, take the set of three relationships, High[1]>High[2], High[2]>High[3], High[3]>High[1]. Take any two relationships in isolation and they each seem logical. It’s only when all three are compared does the inherent illogic become clear. Of course, this is just one degree of separation but depending on how many relationships a pattern is allowed to define there could be 2, 3, or even more degrees of separation. Does your code check for this sort of complex illiogical relationship?

    Best wishes,
    Sam

    • admin says:

      Hi Sam,

      Thank you for your post reply :o) Yes, as a matter of fact the code can deal with N degrees of complexity in separation. The way in which I did this wasn’t explained in the article (it’s a bit more complex). What I did was simply to look for each secondary property as a main property and then follow this chain until you cannot find a new secondary property, then if you have a comparison with a property with itself you have an invalid pattern (obviously in the search you don’t take into account properties you have already found). In your example:

      High[1] > High[2] —> look for a property with main property as the current’s secondary (High[2])
      High[2] > High[3] —> we find this property, now look for a property that has High[3] as its main property
      High[3] > High[1] —> we find this property, now look for a property that has high[1] as its main property
      none found —> since the chain ends here we can now look at the first rule’s main property and compare with the last rule’s secondary
      High[1] > High[1] —> This is a comparison of a property with itself, pattern is invalid.

      In the second case you mentioned it is also worth noting that this pattern is NOT impossible because gaps in candle opening/closings can actually generate this type of structures. I hope this solves all your doubts :o) Let me know if you have any more questions,

      Best Regards,

      Daniel

      • Samkronin says:

        Hi Daniel,

        Thanks for your reply and patience with my questioning. I now understand the walking the chain approach you have used, which I can see would work nicely. I’m not sure it would work for some of the more subtle relationships that do not involve the same property being mentioned twice (like I described to Franco) but assume you have thought of this and your code deals with these situations too.

        Taking a longer look at the second example I provided to Franco, I do see how there are patterns where these rules could be true. I wrote it quickly and didn’t check it thoroughly. However, I’m fairly certain that when looking into this problem last year I did come across similar sets of relationships that didn’t include the same property twice but was illogical. I look forward to next month when I can take a look at Kantu for myself and answer these questions for myself :)

        Thanks again Daniel for all your hard work on this!

        Sam

  2. Franco says:

    Hey,

    Interesting article Daniel!

    I would have done the following (let me know if this makes sense :O):

    Determine the amount of “variables” you have in your logical comparison. For example in the following case you have two variables (High[1] and High[2]) :

    High[1] > High[2] AND High[1] < High[2]

    Because you have two variables you would assign 2 numbers to a "pool". In our case we would just use the numbers "1" and "2". They can be any numbers as long as they are different.

    Now run the logical comparison for all combinations of the numbers in the pool. The total combinations are as follows:

    Combination 1: High[1] = 1 ; High[2] = 2
    Combination 2: High[1] = 2 ; High[2] = 1

    After the combinations are run and none of them are true, you will know that the logic you are using cannot ever be true and should be discarded.

    In the case Sam queried about the same technique will also work. The only difference is now you have three variables, thus you assign "1", "2" and "3" to the pool. The combinations are then as follows:

    Combination 1: High[1] = 1; High[2] = 2 ; High[3] = 3
    Combination 2: High[1] = 2; High[2] = 3 ; High[3] = 1
    Combination 3: High[1] = 3; High[2] = 1 ; High[3] = 2

    If all combinations test untrue then the logic can be discarded.

    This is only an idea by the way I might be missing something here :)

    • Samkronin says:

      Hi Franco, I think I considered something similar, but gets into trouble dealing with multiple open and close relationships that could be interpreted in different ways. For example, a close on a candle will sometimes be higher than its open and other times it can legally be below its open. Also consider:

      Close[1]>Open[2]
      Close[2]>Open[3]
      Close[3]>Open[1]

      This is actually an impossible pattern, but since no variables are mentioned more than once assigning them numerical values doesn’t help. Hopefully, I too am missing something, because last year this stuff really made my head hurt!

      • Franco says:

        Hey Sam,

        Yeah I see what you mean. Assigning numbers to the logic variables and testing different combinations will still work I believe but since you are now dealing with four numbers per candle the amount of combinations you are going to have to test will be quite large depending on the amount of candles you are using in your price action logic. Essentially what you want to do is run a very small “pseudo backtest” with any relative combination of data available and see if something happened. If the logic is never true then you scrap the pattern.

        But Daniel’s solution is the better solution that looks to be more versatile so we don’t have to break our heads any longer hehe…

        • admin says:

          Hi Franco,

          Thank you for your post :o) It is also worth remembering that a pseudo back-test has the disadvantage that you always need to run the full test to have an answer while with the explicit evaluation of rule matching you can exit loops whenever you find an explicit inconsistency with the rules. This has the advantage that although evaluation takes longer – because you have to loop – it actually often takes a smaller time because you exit loops early. Your solution – a pseudo test – is ideal when you expect most results to be positive (because the full explicit evaluation of the logic takes longer to be fully carried out for each instance) but if you expect most results to be negative (our case) then the logic comparisons do better. Thanks however for sharing your ideas, they might come in handy some day :o)

          Best Regards,

          Daniel

Leave a Reply

WordPress › Error

There has been a critical error on this website.

Learn more about troubleshooting WordPress.