The next frontier: Automating machine learning

September 27th, 2016 No Comments

Right now the machine learning world is in a really primitive stage. We now have the computational power to perform machine learning exercises and solve complicated problems but the creation of the machine learning solutions is a very manually intensive process where the data scientist has to put a significant amount of time and experience into figuring out what pre-processing to use and which machine learning models to test. Today I want to talk about the next frontier in machine learning: automating the entire process, which focuses primarily on building solutions that carry out the entire machine learning process – from data processing to final model selection – with no user intervention. I’ll talk about some of the libraries currently available for this and how they can greatly increase your success when solving machine learning problems.

–

selection_999392

–

Let’s say you want to solve a typical machine learning problem where you have a set of features that are somehow related with an outcome you want to predict. The way we usually solve this problem is by drawing from experience to figure out which feature preprocessing techniques and models might work best, then moving onto a time-consuming iterative process where we have to iterate through tons of feature preprocessing ideas, models and hyperspace parameter combinations in order to in the end arrive at a solution that gives satisfactory results. This process is tedious and it’s the main reason why Kaggle competitions exist. It is a process that requires a lot of insight, experience and computational power.

But isn’t the above process prone to automation? Well, yes and no. Without insight into what model might work best and which variations make sense a computer is left with dumb optimization techniques to figure out what to use. The problem quickly becomes very difficult given the amount of feature engineering techniques and models that are available to test plus the parameters that can be varied within the above. A computer trying to test all permutations of all possible feature preprocessing algorithms and machine learning models will take a lifetime figuring out what to use. The oldest library attempting this sort of thing is probably AutoML, which attempts to automate the entire machine learning process.

–

selection_999393

–

The problem with libraries like AutoML is that they really cannot replace human experts in figuring out machine learning solutions because of the problem’s complexity. They lack insight to perform machine learning and are therefore probably condemned to suboptimal solutions due to the level of computational complexity of the optimization problem. Granted if you have nothing and want to have a machine learning solution AutoML will probably still be a whole lot better than nothing but if you are a machine learning expert a library like AutoML will most probably not be able to take your job because with experience and the ability to understand the data you will certainly be able to be much more creative. You will be able to try targeted computationally more expensive techniques that are inaccessible to something like AutoML due to the simple computational cost of exploring them in a “dumb” way.

More recently however there have been some advances in attempting to do the above in a much smarter manner. The TPOT library attempts to include genetic optimization techniques in the automatic ML optimization problem in order to be able to figure out what to use in a less “dumb” manner. This means that TPOT can explore more complexity and reach better solutions than something like AutoML but it is still a significant distance away from human machine learning experts. Evolution is good for taking you out of problems but it does not necessarily bring good solutions for future circumstances, often evolutionary techniques can take you into corners from which it is difficult to go out (think about a giraffe’s neck). I am however finding TPOT increasingly valuable for a “first crack” at a machine learning problem as it can provide some insights that would be difficult to arrive to manually, it is able to “think out of the box” and in that way it has shown to be very valuable.

–

selection_999394

–

Right now the automated machine learning world is like the phone world before the iphone. We have several solutions that attempt to tackle the problem but no one has been able to arrive at an elegant and efficient solution that emulates well what a data scientist can do. My guess is that we need something that tackles the problem in a similar way to a human, a framework that is somehow able to read the data and guess solutions intelligently besides genetic/brute-forcing optimization techniques. If you would like to learn more about machine learning and how you too can create your own automatically retraining systems using powerful machine learning libraries like Shark please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies.