If machine learning and statistics are synonymous with one another, why are we not seeing every statistics department in every university closing down or transitioning to being a ‘machine learning’ department? Because they are not the same! A major difference between machine learning and statistics is indeed their purpose. However, saying machine learning is all about accurate predictions whereas statistical models are designed for inference is almost a meaningless statement unless you are well versed in these concepts.
The importance of visualization is a topic taught to almost every data scientist in an entry-level course at university but is mastered by very few individuals. This article focuses on the importance of visualization with data. The amount and complexity of information produced in science, engineering, business, and everyday human activity is increasing at staggering rates. Good visualizations not only present a visual interpretation of data, but do so by improving comprehension, communication, and decision making.
The future of computation looks like it will involve speeding up computations to handle the relentless and exponential increase in data production. However, speeding up individual processors is difficult for various reasons, and Moore’s law cannot last forever — it is becoming increasingly constrained by the limits of heat transfer and quantum mechanics. A greater push will continue to be seen towards parallel computing, especially with more specialized hardware such as GPUs and TPUs, as well as towards more energy-efficient computing which may become possible as we enter into the realm of neuromorphic computing.
This article takes you to a deep dive into data privacy and urges you to keep up to date on what is happening in the privacy world — knowing where and how your data is used and protected by companies and governments is likely to become an important topic in the data-driven societies of the future. Learn how little privacy you have and how differential privacy aims to help.
Dataset shift is a topic that is extremely important and yet undervalued by people in the field of data science and machine learning. Given the impact, it can have on the performance of our algorithms, spend some time working out how to handle data properly in order to give you more confidence in your models, and, hopefully, superior performance. The key theme of this article can be summarized in a single sentence: Dataset shift is when the training and test distributions are different.
Machine learning in science does present problems in academia due to the lack of reproducibility of results. However, scientists are aware of these problems and a push toward more reproducible and interpretable machine learning models is underway. The real breakthrough will be once this has been completed for neural networks. The scientific community must make a concerted effort in order to understand how these algorithms work and how best to use them to ensure reliable, reproducible, and scientifically valid conclusions are made using data-driven methods.