Are you into Machine Learning OR are you “just” a Statistician? Have you been asked this question yet? If you are in a career or looking to get into one that has anything to do with deriving insights out of data, you probably know what I am talking about.
The year 2016 has seen over three dozen machine learning startups being acquired by tech giants; another several dozen machine learning startups raked up a aggregate funding to the tune of $4 Billion worldwide. Is it a blip or a bubble? Definitely not. In times when automation is key, it was but imperative that we figure out methods of data analysis & model building that automates data analysis & model building. Sounds tautological? It is. And in a way that is what machine learning is … err … rather does. It picks up right where traditional statistical models stop. It’s all about building algorithms that learn iteratively from data. The more data you feed it, the better results it churns out.
While conceptually machine learning has been around for more than 80 years (recent history dates it back to World War II and Turing), the recent frenzy around it can be attributed to the overall advances and affordability in computing power. While manually getting these models to improve themselves through numerous iterations may seem tedious, if not impossible, a modern computer fed with the algorithm can get these models to learn, grow, change, and develop by themselves in a matter of seconds … and we are already talking “real-time!” What more, they can look for insights without being told exactly where to look for insights a.k.a. dealing with unstructured data (think social media, web-searches). It iterates, learns new stuff, and adapts, and iterates and continues the whole process all over again learning from new data every time. It really embodies the adage that practice makes perfect.
Now if you put this in the context of the self-driving cars, or the recommender engines in Netflix or Amazon – you can see why such algorithms that generate decisions out of data real time without human intervention, would be key to where we are headed both in terms of technology and user experience. It is machine learning that has turned the “hype” around the importance of “big data” into a reality. When availability of more data could have caused concerns around it’s usability for deriving meaningful insights, it was machine learning that came to the rescue. Let’s just say that compared to traditional statistical methods which dealt with static models, machine learning is more in tune with the current times and it’s needs.
The discussion becomes a bit more exciting and a little more tangible when we start considering some problems where machine learning is a clear improvement over traditional statistical methods (although a strong caveat here would be … a lot of machine learning techniques are really enhancements or extensions of their “statistical” counterparts).
Let’s start with Pattern Recognition: it is the essence of solving a lot of business problems that rely on regularities in the gathered data to make predictions. It is also what is called “supervised learning” in machine learning parlance. While a traditional classification and regression model can give you a prediction, it is a “closed-form” solution which of course is static. A machine learning technique called gradient boosting takes the same approach but iteratively and continuously searches for a “local minimum” and adapts as it learns. Ever had your credit card declined when using it a gas station that’s not your “usual” one? In most cases that is supervised (or semi-supervised) learning models at work for you.
But given supervised learning deals prominently with “historical data” or a “training set,” it has to live with it’s own limitations of being unable to predict in situations where you have no past data to predict the future. While the traditional k-means or hierarchical clustering can in theory be applied, when you are dealing with a deluge of transactional data, you need to apply unsupervised learning techniques like ANN or GMM to explore the surpassed data and find structure within it. So the next time you see the “Also recommended for you” while shopping on Amazon.com, know that every new item you searched, every new item you saved, and every new item you bought, were factored into those recommendations by those unsupervised machine learning algorithms within a matter of a few seconds.
If you are dealing with discrete outcome variables, logistic regression techniques naturally come to mind, but fall short when it comes to dealing with complex data sets – and hence the need for looking into Support Vector Regressions or Hierarchical Bayesian models. If you are looking at a large but well-behaved data set the good ol’ logit will still work great; but with the big bad ugly ones you probably have to resort to the much talked about Bagged Regression technique, which by the way is nothing more than a thousand logits estimated through random sample draws from the mother data set, which are then averaged to minimize bias and variance.
Of course, we are often looking at blurred lines when applying supervised, semi-supervised, and unsupervised learning techniques to business problems. When you first open an YouTube account, the recommended content served on your home page has to be based on in-session click-stream based unsupervised models. With repeat visits, as the system learns the depth and spread of your viewing preference, semi-supervised dynamic models with temporal aspects akin to Hidden-Markov-Models probably kick-in. (I am not commenting on what algorithms YouTube actually applies to these problems – rather I am stating what could potentially be applied)
One way to view machine learning is as though it were statistics on steroids: keep building thousands of trees with simpler assumptions and then average them up to derive predictions yielding lower bias and variance. Machine learning is concerned more by the accuracy of final predictions rather than the laundry list of underlying distributions and asymptotic tests in statistical methods. That doesn’t necessarily mean that the math is not complex – it just says that the intent is much simpler to understand. Contrary to the common myth, all machine learning techniques are NOT adaptive (actually only the generative algorithms are). And no, you will not be expected to dive into the deep-end of writing algorithms which require “real time” on Day 1.
Originally appeared in InsideBIGDATA