{"id":1928,"date":"2019-09-04T03:18:06","date_gmt":"2019-09-04T00:18:06","guid":{"rendered":"http:\/\/kusuaks7\/?p=1533"},"modified":"2024-04-22T10:23:18","modified_gmt":"2024-04-22T10:23:18","slug":"a-laymans-guide-to-data-science-how-to-become-a-good-data-scientist","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/a-laymans-guide-to-data-science-how-to-become-a-good-data-scientist\/","title":{"rendered":"A Layman\u2019s Guide to Data Science: How to Become a (Good) Data Scientist"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"1928\" class=\"elementor elementor-1928\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4768e4b8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4768e4b8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1bf6c7e2\" data-id=\"1bf6c7e2\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-36a2295 elementor-widget elementor-widget-heading\" data-id=\"36a2295\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"4aff\" data-selectable-paragraph=\"\">How simple is Data Science?<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-26e416a elementor-widget elementor-widget-text-editor\" data-id=\"26e416a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"39f8\" data-selectable-paragraph=\"\">Sometimes when you hear data scientists shoot a dozen of algorithms while discussing their experiments or go into details of Tensorflow usage you might think that there is no way a layman can master Data Science. Big Data looks like another mystery of the Universe that will be shut up in an ivory tower with a handful of present-day alchemists and magicians. At the same time, you hear about the urgent necessity to become data-driven from everywhere.<\/p>\n<p id=\"3d5e\" data-selectable-paragraph=\"\">The trick is, we used to have only limited and well-structured data. Now, with the global Internet, we are swimming in the never-ending flows of structured, unstructured and semi-structured data. It gives us more power to understand industrial, commercial or social processes, but at the same time, it requires new tools and technologies.<\/p>\n<p id=\"b0f0\" data-selectable-paragraph=\"\">Data Science is merely a 21st century extension of mathematics that people have been doing for centuries. In its essence, it is the same skill of using information available to gain insight and improve processes. Whether it\u2019s a small Excel spreadsheet or a 100 million records in a database, the goal is always the same: to find value.\u00a0<mark>What makes Data Science different from traditional statistics is that it tries not only to explain values, but to predict future trends.<\/mark><\/p>\n<p id=\"484d\" data-selectable-paragraph=\"\">In other words, we use Data Science for:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-1e2d11f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1e2d11f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-641c338\" data-id=\"641c338\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ec139e7 elementor-widget elementor-widget-image\" data-id=\"ec139e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/800\/0*qoucFhHZD7x4adNW\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-86d5f18 elementor-widget elementor-widget-text-editor\" data-id=\"86d5f18\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d2f9\" data-selectable-paragraph=\"\">Data Science is a newly developed blend of machine learning algorithms, statistics, business intelligence, and programming. This blend helps us reveal hidden patterns from the raw data which in turn provides insights in business and manufacturing processes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2aac95b elementor-widget elementor-widget-image\" data-id=\"2aac95b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/800\/0*3JE-J_wocesAi4qM\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ef185c0 elementor-widget elementor-widget-heading\" data-id=\"ef185c0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"c5e5\" data-selectable-paragraph=\"\">What should a data scientist know?<\/h1>\n<\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b9528bd elementor-widget elementor-widget-text-editor\" data-id=\"b9528bd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"a2f4\" data-selectable-paragraph=\"\">To go into Data Science, you need the skills of a business analyst, a statistician, a programmer, and a Machine Learning developer. Luckily, for the first dive into the world of data, you do not need to be an expert in any of these fields. Let\u2019s see what you need and how you can teach yourself the necessary minimum.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-639e486 elementor-widget elementor-widget-heading\" data-id=\"639e486\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\">\n<h1 id=\"3bf7\" data-selectable-paragraph=\"\">Business Intelligence<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4eb6392 elementor-widget elementor-widget-text-editor\" data-id=\"4eb6392\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"449b\" data-selectable-paragraph=\"\">When we first look at Data Science and Business Intelligence we see the similarity: they both focus on \u201cdata\u201d to provide favorable outcomes and they both offer reliable decision-support systems. The difference is that while BI works with static and structured data, Data Science can handle high-speed and complex, multi-structured data from a wide variety of data sources. From the practical perspective, BI helps interpret past data for reporting or\u00a0<a href=\"http:\/\/www.dataversity.net\/fundamentals-descriptive-analytics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Descriptive Analytics<\/a>\u00a0and Data Science analyzes the past data to make future predictions in\u00a0<a href=\"http:\/\/www.dataversity.net\/fundamentals-predictive-analytics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Predictive Analytics<\/a>\u00a0or\u00a0<a href=\"http:\/\/www.dataversity.net\/fundamentals-prescriptive-analytics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Prescriptive Analytics<\/a>.<\/p>\n<p id=\"1c67\" data-selectable-paragraph=\"\">Theories aside, to start a simple Data Science project, you do not need to be an expert Business Analyst. What you need is to have clear ideas of the following points:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f2937f5 elementor-widget elementor-widget-text-editor\" data-id=\"f2937f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n \t<li id=\"474e\" data-selectable-paragraph=\"\">have a question or something you\u2019re curious about;<\/li>\n \t<li id=\"4dfd\" data-selectable-paragraph=\"\">find and collect relevant data that exists for your area of interest and might answer your question;<\/li>\n \t<li id=\"23f1\" data-selectable-paragraph=\"\">analyze your data with selected tools;<\/li>\n \t<li id=\"8173\" data-selectable-paragraph=\"\">look at your analysis and try to interpret findings.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-73c9dbe elementor-widget elementor-widget-text-editor\" data-id=\"73c9dbe\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"949a\" data-selectable-paragraph=\"\">As you can see, at the very beginning of your journey your curiosity and common sense might be sufficient from the BI point of view. In a more complex production environment, there will probably be separate Business Analysts to do insightful interpreting. However, it is important to have at least dim vision of BI tasks and strategies.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d7937cd elementor-widget elementor-widget-heading\" data-id=\"d7937cd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"b9b8\" data-selectable-paragraph=\"\"><strong>Resources<\/strong><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b7fbd77 elementor-widget elementor-widget-text-editor\" data-id=\"b7fbd77\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"3d34\" data-selectable-paragraph=\"\">We recommend you to have a look at the following introductory books to feel more confident in analytics:<\/p>\n<p id=\"a226\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.datapine.com\/blog\/business-intelligence-concepts-and-bi-basics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Introduction To The Basic Business Intelligence Concepts<\/a>\u00a0\u2014 an insightful article giving an overview of the basic concepts in BI;<\/p>\n<p id=\"9ef6\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.dummies.com\/store\/product\/Business-Intelligence-For-Dummies.productCd-0470127236.html\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Business Intelligence for Dummies<\/a>\u00a0\u2014 a step-by-step guidance through the BI technologies;<\/p>\n<p id=\"13b0\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.udemy.com\/big-data-business-intelligence\/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Big Data &amp; Business Intelligence<\/a>\u00a0\u2014 an online course for beginners;<\/p>\n<p id=\"b9b2\" data-selectable-paragraph=\"\">Business Analytics Fundamentals\u00a0\u2014 another introductory course teaching the basic concepts of BI.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-297e884 elementor-widget elementor-widget-heading\" data-id=\"297e884\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"bf01\" data-selectable-paragraph=\"\">Statistics and probability<\/h1>\n<\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5663313 elementor-widget elementor-widget-text-editor\" data-id=\"5663313\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4b01\" data-selectable-paragraph=\"\">Probability and statistics are the basis of Data Science. Statistics is, in simple terms, the use of mathematics to perform technical analysis of data. With the help of statistical methods, we make estimates for the further analysis. Statistical methods themselves are dependent on the theory of probability which allow us to make predictions. Both statistics and probability are separate and complicated fields of mathematics, however, as a beginner data scientist, you can start with 5 basic statistics concepts:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4e5dd06 elementor-widget elementor-widget-text-editor\" data-id=\"4e5dd06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n \t<li id=\"f4e0\" data-selectable-paragraph=\"\"><strong><em>Statistical features<\/em><\/strong>. Things like bias, variance, mean, median, percentiles, and many others are the first stats technique you would apply when exploring a dataset. It\u2019s all fairly easy to understand and implement them in code even at the novice level.<\/li>\n \t<li id=\"dcff\" data-selectable-paragraph=\"\"><strong><em>Probability Distributions<\/em><\/strong>\u00a0represent the probabilities of all possible values in the experiment. The most common in Data Science<strong>\u00a0<\/strong>are a<a href=\"https:\/\/courses.lumenlearning.com\/odessa-introstats1-1\/chapter\/the-uniform-distribution\/\" target=\"_blank\" rel=\"noopener noreferrer\">Uniform Distribution<\/a>\u00a0that has is concerned with events that are equally likely to occur, a Gaussian, or\u00a0<a href=\"https:\/\/courses.lumenlearning.com\/odessa-introstats1-1\/chapter\/introduction-to-the-normal-distribution\/\" target=\"_blank\" rel=\"noopener noreferrer\">Normal Distribution<\/a><strong>\u00a0<\/strong>where most observations cluster around the central peak (mean) and the probabilities for values further away taper off equally in both directions in a bell curve, and a Poisson Distribution similar to the Gaussian but with an added factor of skewness.<\/li>\n \t<li id=\"d97e\" data-selectable-paragraph=\"\"><strong><em>Over and Under Sampling\u00a0<\/em><\/strong>that help to balance datasets. If the majority class is overrepresented, undersampling helps select some of the data from it to balance it with the minority class has. When data is insufficient, oversampling duplicates the minority class values to have the same number of examples as the majority class has.<\/li>\n \t<li id=\"3f49\" data-selectable-paragraph=\"\"><strong><em>Dimensionality Reduction<\/em>.<\/strong>\u00a0The most common technique used for dimensionality reduction is PCA which essentially creates vector representations of features showing how important they are to the output i.e. their correlation.<\/li>\n \t<li id=\"be10\" data-selectable-paragraph=\"\"><strong><em>Bayesian Statistics.<\/em>\u00a0<\/strong>Finally, Bayesian statistics is an approach applying probability to statistical problems<strong>.<\/strong>\u00a0It provides us with mathematical tools to update our beliefs about random events in light of seeing new data or evidence about those events.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c375d9b elementor-widget elementor-widget-image\" data-id=\"c375d9b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/800\/0*KPKQLWn6bJmPKCnp\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fc61302 elementor-widget elementor-widget-text-editor\" data-id=\"fc61302\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\" data-selectable-paragraph=\"\">Image credit: unsplash.com<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-96d6292 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"96d6292\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8b25b98\" data-id=\"8b25b98\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ad4cedf elementor-widget elementor-widget-heading\" data-id=\"ad4cedf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"230f\" data-selectable-paragraph=\"\"><strong>Resources<\/strong><\/h2>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-35e886e elementor-widget elementor-widget-text-editor\" data-id=\"35e886e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"54f5\" data-selectable-paragraph=\"\">We have selected just a few books and courses that are practice-oriented and can help you feel the taste of statistical concepts from the beginning:<\/p>\n<p id=\"52b5\" data-selectable-paragraph=\"\">Practical Statistics for Data Scientists: 50 Essential Concepts\u00a0\u2014 a solid practical book that introduces essential tools specifically for data science;<\/p>\n<p id=\"0d79\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.amazon.com\/Naked-Statistics-Stripping-Dread-Data-ebook\/dp\/B007Q6XLF2\" target=\"_blank\" rel=\"noopener noreferrer\">Naked Statistics: Stripping the Dread from the Data<\/a>\u00a0\u2014 an introduction to statistics in simple words;<\/p>\n<p id=\"6646\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.khanacademy.org\/math\/statistics-probability\" target=\"_blank\" rel=\"noopener noreferrer\">Statistics and probability<\/a>\u00a0\u2014 an introductory online course;<\/p>\n<p id=\"055f\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.udemy.com\/statistics-for-data-science\/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Statistics for data science<\/a>\u00a0\u2014 a special course on statistics developed for data scientists.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7605bfd elementor-widget elementor-widget-heading\" data-id=\"7605bfd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"01da\" data-selectable-paragraph=\"\">Programming<\/h1>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-329c58e elementor-widget elementor-widget-text-editor\" data-id=\"329c58e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c4c6\" data-selectable-paragraph=\"\">Data Science is an exciting field to work in, as it combines advanced statistical and quantitative skills with real-world programming ability. Depending on your background, you are free to choose a programming language to your liking. The most popular in the Data Science community are, however, R, Python and SQL.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ded4493 elementor-widget elementor-widget-text-editor\" data-id=\"ded4493\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n \t<li id=\"8492\" data-selectable-paragraph=\"\"><strong><em>R<\/em><\/strong>\u00a0is a powerful language specifically designed for Data Science needs. It excels at a huge variety of statistical and data visualization applications, and being open source has an active community of contributors. In fact, 43 percent of data scientists are using R to solve statistical problems. However, it is difficult to learn, especially if you already mastered a programming language.<\/li>\n \t<li id=\"3d95\" data-selectable-paragraph=\"\"><strong><em>Python<\/em><\/strong>\u00a0is another common language in Data Science.\u00a0<a href=\"http:\/\/www.oreilly.com\/data\/free\/files\/stratasurvey.pdf\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">40 percent of respondents surveyed<\/a>\u00a0by O\u2019Reilly use Python as their major programming language. Because of its versatility, you can use Python for almost all steps of data analysis. It allows you to create datasets and you can literally find any type of dataset you need on Google. Ideal for entry level and easy-to learn, Python remains exciting for Data Science and Machine Learning experts with more sophisticated libraries such as Google\u2019s Tensorflow.<\/li>\n \t<li id=\"712c\" data-selectable-paragraph=\"\"><strong><em>SQL<\/em><\/strong>\u00a0<strong><em>(structured query language)<\/em><\/strong>\u00a0is more useful as a data processing language than as an advanced analytical tool. IT can help you to carry out operations like add, delete and extract data from a database and carry out analytical functions and transform database structures. Even though NoSQL and Hadoop have become a large component of Data Science, it is still expected that a data scientist can write and execute complex queries in SQL.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1ada53e elementor-widget elementor-widget-heading\" data-id=\"1ada53e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"7288\" data-selectable-paragraph=\"\"><strong>Resources<\/strong><\/h2>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a1a5fcc elementor-widget elementor-widget-text-editor\" data-id=\"a1a5fcc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"aecf\" data-selectable-paragraph=\"\">There are plenty of resources for any programming language and every level of proficiency. We\u2019d suggest visiting\u00a0<a href=\"https:\/\/www.datacamp.com\/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">DataCamp<\/a>\u00a0to explore the basic programming skills needed for Data Science.<\/p>\n<p id=\"4f31\" data-selectable-paragraph=\"\">If you feel more comfortable with books, the\u00a0<a href=\"https:\/\/www.oreilly.com\/programming\/free\/\" target=\"_blank\" rel=\"noopener noreferrer\">vast collection of O\u2019Reilly\u2019s free programming ebooks<\/a>\u00a0will help you choose the language to master.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eb30f50 elementor-widget elementor-widget-heading\" data-id=\"eb30f50\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"90d5\" data-selectable-paragraph=\"\">Machine Learning and AI<\/h1>\n<\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2ea02f1 elementor-widget elementor-widget-text-editor\" data-id=\"2ea02f1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c48b\" data-selectable-paragraph=\"\">Although AI and Data Science usually go hand-in-hand, a large number of data scientists are\u00a0<a href=\"https:\/\/www.kaggle.com\/kaggle\/kaggle-survey-2017\" target=\"_blank\" rel=\"noopener noreferrer\">not proficient in Machine Learning areas and techniques<\/a>. However, Data Science involves working with large amounts of data sets that require mastering Machine Learning techniques, such as supervised machine learning, decision trees, logistic regression, etc. These skills will help you to solve different data science problems that are based on predictions of major organizational outcomes.<\/p>\n<p id=\"3efd\" data-selectable-paragraph=\"\">At the entry level, Machine Learning does not require much knowledge of math or programming, just interest and motivation. The basic thing that you should know about ML is that in its core lies one of the three main categories of algorithms: supervised learning, unsupervised learning and reinforcement learning.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3007fde elementor-widget elementor-widget-text-editor\" data-id=\"3007fde\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n \t<li id=\"492e\" data-selectable-paragraph=\"\"><strong><em>Supervised Learning<\/em><\/strong>\u00a0is a branch of ML that works on labeled data, in other words, the information you are feeding to the model has a ready answer. Your software learns by making predictions about the output and then comparing it with the actual answer.<\/li>\n \t<li id=\"e6e1\" data-selectable-paragraph=\"\">In\u00a0<strong><em>unsupervised learning<\/em><\/strong>, data is not labeled and the objective of the model is to create some structure from it. Unsupervised learning can be further divided into clustering and association. It is used to find patterns in data, which are especially useful in business intelligence to analyze the customer behavior.<\/li>\n \t<li id=\"ca51\" data-selectable-paragraph=\"\"><strong><em>Reinforcement learning<\/em><\/strong>\u00a0is the closest to the way that humans learn,i.e. by trial and error. Here, a performance function is created to tell the model if what it did was getting it closer to its goal or making it go the other way. Based on this feedback, the model learns and then makes another guess, this continues to happen and every new guess is better.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5ded20f elementor-widget elementor-widget-text-editor\" data-id=\"5ded20f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"1694\" data-selectable-paragraph=\"\">With these broad approaches in mind, you have a backbone for analysis of your data and explore specific algorithms and techniques that would suit you the best.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-07e1534 elementor-widget elementor-widget-heading\" data-id=\"07e1534\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"a130\" data-selectable-paragraph=\"\"><strong>Resources<\/strong><\/h2>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c597cd3 elementor-widget elementor-widget-text-editor\" data-id=\"c597cd3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9e13\" data-selectable-paragraph=\"\">Similarly to programming, there are numerous books and courses in Machine Learning. Here are just a couple of them:<\/p>\n<p id=\"b6f0\" data-selectable-paragraph=\"\">Deep Learning textbook by Ian Goodfellow and Yoshua Bengio and Aaron Courville is a classic resource recommended for all students who want to master machine and deep learning.<\/p>\n<p id=\"edd7\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.coursera.org\/learn\/machine-learning\" target=\"_blank\" rel=\"noopener noreferrer\">Machine Learning<\/a>\u00a0course by Andrew Ng is an absolute classic that leads your through the most popular algorithms in ML.<\/p>\n<p id=\"aeb3\" data-selectable-paragraph=\"\"><a href=\"https:\/\/www.udemy.com\/machinelearning\/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Machine Learning A-Z\u2122: Hands-On Python &amp; R In Data Science\u00a0<\/a>\u2014 a Udemy course specifically for novice data scientists that introduces basic ML concepts both in R and Python.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23d1c9c elementor-widget elementor-widget-heading\" data-id=\"23d1c9c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><h1 id=\"6234\" data-selectable-paragraph=\"\">What skills should a data scientist possess?<\/h1><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fe14de3 elementor-widget elementor-widget-text-editor\" data-id=\"fe14de3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"028f\" data-selectable-paragraph=\"\">Now you know the main prerequisites for Data Science. Does it make you a good data scientist? While there is no correct answer, there are several things to take into consideration:<\/p>\n<p id=\"e2a8\" data-selectable-paragraph=\"\"><strong>Analytical Mindset<\/strong>: it is a general requirement for any person working with data. However, if common sense might suffice at the entry level, your analytical thinking should be further backed up by statistical background and knowledge of data structures and machine learning algorithms.<\/p>\n<p id=\"a713\" data-selectable-paragraph=\"\"><strong>Focus on Problem Solving<\/strong>: when you master a new technology, it is tempting to use it everywhere, However, while it is important to know recent trends and tools, the goal of Data Science is to solve specific problems by extracting knowledge from data. A good data scientist first understands the problem, then defines the requirements for the solution to the problem, and\u00a0<strong>only then<\/strong>\u00a0decides which tools and techniques are best fit for the task. Don\u2019t forget that stakeholders will never be captivated by the impressive tools you use, only by the effectiveness of your solution.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-db57760 elementor-widget elementor-widget-text-editor\" data-id=\"db57760\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"94de\" data-selectable-paragraph=\"\"><strong>Domain Knowledge<\/strong>: data scientists need to understand the\u00a0<a href=\"https:\/\/www.analyticsindiamag.com\/6-ways-in-which-artificial-intelligence-is-revolutionising-the-real-estate-business\/\" target=\"_blank\" rel=\"noopener noreferrer\">business<\/a>\u00a0problem and choose the appropriate model for the problem. They should be able to interpret the results of their models and iterate quickly to arrive at the final model. They need to have an eye for detail.<\/p>\n<p id=\"a6d6\" data-selectable-paragraph=\"\"><strong>Communication Skills<\/strong>: there\u2019s a lot of communication involved in understanding the problem and delivering constant feedback in simple language to the stakeholders. But this is just the surface of the importance of communication \u2014 a much more important element of this is asking the right questions. Besides, data scientists should be able to clearly document their approach so that it is easy for someone else to build on that work and, vice versa, understand research work published in their area.<\/p>\n<p id=\"8380\" data-selectable-paragraph=\"\">As you can see, it is the combination of various technical and soft skills that make up a good data scientist.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Data Science is a newly developed blend of&nbsp;machine learning&nbsp;algorithms, statistics, business intelligence, and programming. This blend helps us reveal hidden patterns from the raw data which in turn provides insights in business and&nbsp;manufacturing&nbsp;processes. To go into Data Science, you need the skills of a business analyst, a statistician, a programmer, and a&nbsp;Machine Learning&nbsp;developer. You do not need to be an expert in any of these fields. Let&rsquo;s see what you need and how you can teach yourself the necessary minimum.<\/p>\n","protected":false},"author":570,"featured_media":3827,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[3261],"class_list":["post-1928","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":3261,"user_id":570,"is_guest":0,"slug":"max-ved","display_name":"Max Ved","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_cbaf23d5-a78a-4ceb-8f6e-343134811364-150x150.jpg","user_url":"https:\/\/sciforce.solutions\/","last_name":"Ved","first_name":"Max","job_title":"","description":"Max Ved, a Scientist Entrepreneur, is Co-Founder &amp; CTO at SciForce, an IT company specialized in the development of software solutions.\u00a0"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1928","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1928"}],"version-history":[{"count":8,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1928\/revisions"}],"predecessor-version":[{"id":36678,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1928\/revisions\/36678"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3827"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1928"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1928"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1928"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1928"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}