If you are getting started in your data science journey and don’t come from a technical background, then you definitely understand the struggle of keeping up with the terminology of data pre-processing. This was obviously a concern, considering that Data Scientists spend 60% of the time cleaning and organizing data! This is the FIRST article, so we will only focus on key terms. Make sure to follow me, in order to read the next posts more focused on feature engineering, model selection, etc. Keep in mind that some of these terms differ depending on the language or platform you are using. But, I hope it gives you a nice overview.
We have covered the basic terms and definitions for data types and structure on my previous post, let’s dive into the creative and most time consuming side of data science — cleaning and feature engineering. What are some of the basic strategies that data scientists use to clean their data AND improve the amount of information they get from it?The type of cleaning and engineering strategies used usually depend on the business problem and type of target variable, since this will influence the algorithm and data preparation requirements.