In this article, you will learn that a proper sample can be statistically significant to represent the whole population. This may help us in machine learning because a small dataset can make us train models more quickly than a larger one, carrying the same amount of information. However, everything is strongly related to the significance level we choose. For certain kinds of problems, it can be useful to raise the confidence level or discard those variables that don’t show a suitable p-value.
Now we are facing this new character in the stage of evolution that is Artificial Intelligence. Where do we have to put this card in the puzzle of human history? Artificial Intelligence is not a tool at all. It’s more like a synthetic partner in our lives. It’s something able to use cognitive capabilities in order to perform certain tasks faster than we can. So it’s not a tool; it’s an artificial extension of our brain. AI, with its role of boosting human capabilities, can actually be the key for salvation from self-extinction. And this has nothing to share with technology.
In this short article, a simple example of the use of Azure ML studio is shown. It’s a very useful tool in the machine learning industry and, although it has some limits with limited number of records, limited choice of models. Even the most code-oriented data scientist will love this simple tool. It’s pretty worth mentioning that, paying the appropriate fee, ML studio can be used for real-time training and prediction thanks to its strong REST API interface. This enables many possible machine learning scenarios.
One of the main tasks that a data scientist must face when he builds a machine learning model is the selection of the most predictive variables. Selecting predictors with low predictive power can lead, in fact, to overfitting or low model performance. This article shows you some techniques to better select the predictors of a dataset in a binary classification model, and two simple techniques in R to measure the importance of numerical and categorical variables against a binary target.
Random search is a really useful tool in a data scientist toolbox. It’s a very simple technique and can be a powerful tool to perform feature selection. It’s not meant to give the reasons why some features are more useful than other ones (as opposed to other feature selection procedures like Recursive Feature Elimination), but it can be a useful tool to reach good results in less time. Learn how to use a simple random search in Python to get good results in less time.
Data Science is an exciting job, but it can be very difficult to perform if you speak to a non-technical audience. Data and business are intimately related to each other and you must remember this point when you work with business-oriented people. The only way to survive is to find a middle point between a data-driven bottom-up approach and a business-driven top-down approach. Finally, as Data Science is hard and time-consuming, delivering small results with a constant delivery rate is the only way you can keep your customers engaged.