Despite R’s apparent successas MongoDB’s Matt Asays has arguedwhile R was once the language of choice for data scientists, it is quickly ceding ground to Python. One of the reasons for a perceived decrease in R’s popularity it is argued is its complex programming environment that requires special training. According to Robert Muenchen at the University of Tennessee, even for data scientists who possess expertise in statistical tools such as SAS, SPSS and StataR remains a tough language to master. This is largely because R uses misleading function and parameter names. If SAS, SPSS and Stata use the sort command to sort data sets, R has the same command but it does not sort data sets; instead R uses the command to sort individual variables. In R, one must use the order function to sort data sets and that too happens in a rather convoluted manner. In addition, R suffers from sparse non-standard output, and it has too many commands to master. R also provides a sloppy control over variables and naming or remaining variables is an overly complex exercise, at least for the novice.
The increasing homogenization (Pythonification?) of the tools I use on a regular basis primarily reflects the spectacular recent growth of the Python ecosystem. A few years ago, you couldnt really do statistics in Python unless you wanted to spend most of your time pulling your hair out and wishing Python were more like R (which, is a pretty remarkable confession considering what R is like). Neuroimaging data could be analyzed in SPM (MATLAB-based), FSL, or a variety of other packages, but there was no viable full-featured, free, open-source Python alternative. Packages for machine learning, natural language processing, web application development, were only just starting to emerge.
The growing popularity of Python is not surprising given its versatility. To be sure, R still is far more powerful when it comes to data analytics. However, Python is catching up, but does this really mean that its large number of followers are going to supplant R? The chart above needs to be nuanced because it compares apples and oranges. Charts like these are often used to make misguided arguments about R’s impending demise. So, how does demand for R compare with other statistical tools such as SAS?
The growing popularity of Python is not surprising given its versatility. To be sure, R still is far more powerful when it comes to data analytics. However, Python is catching up, but does this really mean that its large number of followers are going to supplant R? The chart above needs to be nuanced because it compares apples and oranges. Charts like these are often used to make misguided arguments about R’s impending demise. So, how does demand for R compare with other statistical tools such as SAS?