Data scientists are expected to know a lot — machine learning, computer science, statistics, mathematics, data visualization, communication, and deep learning. Within those areas there are dozens of languages, frameworks, and technologies data scientists could learn. How should data scientists who want to be in demand by employers spend their learning budget? Which skills are most in demand for data scientists?
Deep learning continues to be the hottest thing in data science. Deep learning frameworks are changing rapidly. Just five years ago, none of the leaders other than Theano were even around. I wanted to find evidence for which frameworks merit attention, so I developed this power ranking. I used 11 data sources across 7 distinct categories to gauge framework usage, interest, and popularity. Without further ado, here are the Deep Learning Framework Power Scores
A more refined framework is needed to provide a richer common lexicon for thinking and communicating about data in machine learning. A framework along the lines of the one in this article should lead practitioners, especially newer practitioners, to develop better models faster. With 7 Data Types to reference we should all be able to more quickly evaluate and discuss the encoding options and imputation strategies available. Hope that this article will provide a useful taxonomy of groups that for more actionable steps for data scientists.
Let’s briefly look at the types of chips available for deep learning. I’ll simplify the major offerings by comparing them to Ford cars. CPUs alone are really slow for deep learning. You do not want to use them. They are fine for many machine learning tasks, just not deep learning. The CPU is the horse and buggy of deep learning. GPUs are much faster than CPUs for most deep learning computations.
Automated machine learning doesn’t replace the data scientist, but it might be able to help you find good models faster. TPOT bills itself as your Data Science Assistant. TPOT is meant to be an assistant that gives you ideas on how to solve a particular machine learning problem by exploring pipeline configurations that you might have never considered, and then leaves the fine-tuning to more constrained parameter tuning techniques such as grid search.