{"id":9449,"date":"2020-08-26T08:48:18","date_gmt":"2020-08-26T08:48:18","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=9449"},"modified":"2023-11-15T14:41:14","modified_gmt":"2023-11-15T14:41:14","slug":"the-data-science-process","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/the-data-science-process\/","title":{"rendered":"The Data Science Process"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"9449\" class=\"elementor elementor-9449\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-751de9fc elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"751de9fc\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-700ee44e\" data-id=\"700ee44e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-15aefbb elementor-widget elementor-widget-heading\" data-id=\"15aefbb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">A Visual Guide to Standard Procedures in Data Science<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-292f121 elementor-widget elementor-widget-image\" data-id=\"292f121\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/08\/1_PzzcJA-cwXQ8hwlpM4DwbA@2x.jpeg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1a48b03 elementor-widget elementor-widget-text-editor\" data-id=\"1a48b03\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Let\u2019s suppose that you\u2019ve been given a data problem to solve and you\u2019re expected to produce unique insights from the data given to you. So the question is, what do you exactly do to transform a data problem through to completion and generate data-driven insights? And most importantly of all,\u00a0<em>Where do you start?<\/em><\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Let\u2019s use some analogy here, in the construction of a house or building the guiding piece of information used is the blueprint. So what sorts of information are contained within these blueprints? Information pertaining to the building infrastructure, the layout and exact dimensions of each room, the location of water pipes and electrical wires, etc.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b5bc27f elementor-widget elementor-widget-image\" data-id=\"b5bc27f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2182\/1*EiiCmWzXBVfhkq1NsZYjWw@2x.jpeg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-72ef5e1 elementor-widget elementor-widget-text-editor\" data-id=\"72ef5e1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Continuing from where we left off earlier, so where do we start when given a data problem? That is where the\u00a0<em>Data Science Process<\/em>\u00a0comes in. As will be discussed in the forthcoming sections of this article, the data science process provides a systematic approach for tackling a data problem. By following through on these recommended guidelines, you will be able to make use of a tried-and-true workflow in approaching data science projects. So without further ado, let\u2019s get started!<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e141b17 elementor-widget elementor-widget-heading\" data-id=\"e141b17\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\">Data Science Life Cycle<\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8a234a0 elementor-widget elementor-widget-text-editor\" data-id=\"8a234a0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The\u00a0<em>data science life cycle<\/em>\u00a0is essentially comprised of data collection, data cleaning, exploratory data analysis, model building and model deployment. For more information, please check out the excellent video by\u00a0<a href=\"https:\/\/www.youtube.com\/channel\/UCiT9RITQ9PW6BhXK0y2jaeg\" target=\"_blank\" rel=\"noreferrer noopener\">Ken Jee\u00a0<\/a>on the\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=BZFfNwj7JhE\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Different Data Science Roles Explained (by a Data Scientist)<\/em><\/a>. A summary infographic of this life cycle is shown below:<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f0cc873 elementor-widget elementor-widget-image\" data-id=\"f0cc873\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2182\/1*1oVjIRY3Bnmbw-idCtg4BQ@2x.jpeg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4ca7864 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4ca7864\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7b75a59\" data-id=\"7b75a59\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d661d48 elementor-widget elementor-widget-text-editor\" data-id=\"d661d48\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:html -->\n<figure><iframe src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FBZFfNwj7JhE%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DBZFfNwj7JhE&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FBZFfNwj7JhE%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube\" width=\"654\" height=\"430\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/figure>\n<!-- \/wp:html -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a025cc8 elementor-widget elementor-widget-text-editor\" data-id=\"a025cc8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Such a process or workflow of drawing insights from data is best described by CRISP-DM and OSEMN. It should be noted that both are comprised of essentially the same core concepts while each framework was released at different time. Particularly, CRISP-DM was released at a time (1996) when data mining has started to gain traction and was missing a standard protocol for carrying out data mining tasks in a robust manner. Fourteen years later (2010), the OSEMN framework was introduced and it summarizes the key tasks of a data scientist.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Personally, having started my own journey into the world of data in 2004 and the field was known back then as\u00a0<em>Data Mining<\/em>. Much of the emphasis at the time was placed in translating data to knowledge where another common term that is also used to refer to data mining is\u00a0<em>Knowledge Discovery in Data<\/em>.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Over the years, the field has matured and evolved to encompass other skillsets that led to the eventual coining of the term\u00a0<em>Data Science<\/em>\u00a0that goes beyond merely building models but also encompasses other skillsets both technical and soft skills. Previously, I have drawn an infographic that summarizes these 8 essential skillsets of data science as shown below. Also check out the accompanying YouTube video on\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=jhImgx8I8oI\" target=\"_blank\" rel=\"noreferrer noopener\"><em>How to Become a Data Scientist (Learning Path and Skill Sets Needed)<\/em><\/a>.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-05ba702 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"05ba702\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a20d269\" data-id=\"a20d269\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2ca8c9a elementor-widget elementor-widget-image\" data-id=\"2ca8c9a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2183\/1*o7DTEHgBknK2iBE9c-2Ing.jpeg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-36790bf elementor-widget elementor-widget-text-editor\" data-id=\"36790bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:html -->\n<figure><iframe src=\"https:\/\/cdn.embedly.com\/widgets\/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FjhImgx8I8oI%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DjhImgx8I8oI&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FjhImgx8I8oI%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube\" width=\"654\" height=\"430\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/figure>\n<!-- \/wp:html -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9da9399 elementor-widget elementor-widget-heading\" data-id=\"9da9399\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\">CRISP-DM<\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4dead62 elementor-widget elementor-widget-text-editor\" data-id=\"4dead62\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The acronym CRISP-DM stands for Cross Industry Standard Process for Data Mining and CRISP-DM was introduced in 1996 in efforts to standardize the process of data mining (also referred to as knowledge discovery in data) such that it can serve as a standard and reliable workflow that can be adopted and applied in various industry. Such standard process would serve as a\u00a0<em>\u201cbest practice\u201d<\/em>\u00a0that boasts several benefits.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Aside from providing a reliable and consistent of process by which to follow in carrying out data mining projects but it would also instill confidence to customers and stakeholders who are looking to adopt data mining in their organizations.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>It should be noted that back in 1996, data mining had just started to gain mainstream attention and was at the early phases and the formulation of a standard process would help to lay the solid foundation and groundwork for early adopters. A more in-depth historical look of CRISP-DM is provided in the article by\u00a0<a href=\"http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.198.5133\" target=\"_blank\" rel=\"noreferrer noopener\">Wirth and Hipp (2000)<\/a>.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-108bfb3 elementor-widget elementor-widget-image\" data-id=\"108bfb3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/2182\/1*zKTXJKxpEUbN11kMfg3LLQ@2x.jpeg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0f79397 elementor-widget elementor-widget-text-editor\" data-id=\"0f79397\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The\u00a0<strong>CRISP-DM framework<\/strong>\u00a0is comprised of 6 major steps:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol>\n<li><strong><em>Business understanding<\/em><\/strong>\u00a0\u2014 This entails the understanding of a project\u2019s objectives and requirements from the business viewpoint. Such business perspectives are used to figure out what business problems to solve via the use of data mining.<\/li>\n<li><strong><em>Data understanding<\/em><\/strong>\u2014 This phase allows us to become familiarize with the data and this involves performing exploratory data analysis. Such initial data exploration may allow us to figure out which subsets of data to use for further modeling as well as aid in the generation of hypothesis to explore.<\/li>\n<li><strong><em>Data preparation<\/em><\/strong>\u00a0\u2014 This can be considered to be the most time-consuming phase of the data mining process as it involves rigorous data cleaning and pre-processing as well as the handling of missing data.<\/li>\n<li><strong><em>Modelling<\/em><\/strong>\u00a0\u2014 The pre-processed data are used for model building in which learning algorithms are used to perform multivariate analysis.<\/li>\n<li><strong><em>Evaluation<\/em><\/strong>\u00a0\u2014 In performing the 4 aforementioned steps, it is important to evaluate the accrued results and review the process performed thusfar to determine whether the originally set business objectives are met or not. If deemed appropriate, some steps may need to be performed again. Rinse and repeat. Once it is deemed that the results and process are satisfactory then we are ready to move to deployment. Additionally, in this evaluation phase, some findings may ignite new project ideas for which to explore.<\/li>\n<li><strong><em>Deployment<\/em><\/strong>\u00a0\u2014 Once the model is of satisfactory quality, the model is then deployed, which may range from being a simple report, an API that can be accessed via programmatic calls, a web application, etc.<\/li>\n<\/ol>\n<!-- \/wp:list -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2c11aa0 elementor-widget elementor-widget-heading\" data-id=\"2c11aa0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><!-- wp:heading {\"level\":1} -->\n<h1 id=\"0db8\">OSEMN<\/h1>\n<!-- \/wp:heading --><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c3782fa elementor-widget elementor-widget-text-editor\" data-id=\"c3782fa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In a 2010 post\u00a0<a href=\"http:\/\/www.dataists.com\/2010\/09\/a-taxonomy-of-data-science\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\"><strong><em>\u201cA Taxonomy of Data Science\u201d<\/em><\/strong><\/a>\u00a0on dataists blog, Hilary Mason and Chris Wiggins introduced the OSEMN framework that essentially constitutes a taxonomy of the general workflow that data scientists typically perform as shown in the diagram below. Shortly after in 2012, Davenport and Patil published their landmark article\u00a0<a href=\"https:\/\/hbr.org\/2012\/10\/data-scientist-the-sexiest-job-of-the-21st-century\" target=\"_blank\" rel=\"noreferrer noopener\"><em>\u201cData Scientist: The Sexiest Job of the 21st Century\u201d<\/em><\/a>in the Harvard Business Review that has attracted even more attention to the burgeoning field of data science.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-47719b5 elementor-widget elementor-widget-image\" data-id=\"47719b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1705\/1*Q7VCgPKQAI7XVKEvU4fxXA@2x.jpeg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6c71045 elementor-widget elementor-widget-text-editor\" data-id=\"6c71045\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>The\u00a0<strong>OSEMN framework<\/strong>\u00a0is comprised of 5 major steps and can be summarized as follows:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:list {\"ordered\":true} -->\n<ol>\n<li><strong><em>Obtain Data<\/em><\/strong>\u00a0\u2014 Data forms the requisite of the data science process and data can come from pre-existing ones or from newly acquired data (from surveys), from newly queried data (from databases or APIs), downloaded from the internet (e.g. from repositories available on the cloud such as GitHub) or extracted<\/li>\n<li><strong><em>Scrub Data<\/em><\/strong>\u00a0\u2014 Scrubbing the data is essentially data cleaning and this phase is considered to be the most time-consuming as it involves handling missing data as well as pre-processing it to be as error-free and uniform as possible.<\/li>\n<li><strong><em>Explore Data<\/em><\/strong>\u00a0\u2014 This is essentially exploratory data analysis and this phase allows us to gain an understanding of the data such that we can figure out the course of actions and areas that we can to explore in the modeling phase. This entails the use of descriptive statistics and data visualizations.<\/li>\n<li><strong><em>Model Data<\/em>\u00a0<\/strong>\u2014 Here, we make use of machine learning algorithms in efforts to make sense of data and gain useful insights that are essential for data-driven decision-making.<\/li>\n<li><strong><em>Interpret Results<\/em><\/strong>\u2014 This is perhaps one of the most important phase and yet the least technical as it pertains to actually making sense of the data by figuring out how to simplify and summarize results from all the models built. This entails drawing meaningful conclusion and rationalizing actionable insights that would essentially allow us to figure out what the next course of actions are. For example, what are the most important features that influences the class labels (<strong>Y<\/strong>\u00a0variables).<\/li>\n<\/ol>\n<!-- \/wp:list -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-33f0b11 elementor-widget elementor-widget-heading\" data-id=\"33f0b11\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h1 class=\"elementor-heading-title elementor-size-default\"><!-- wp:heading {\"level\":1} -->\n<h1 id=\"6066\">Conclusion<\/h1>\n<!-- \/wp:heading --><\/h1>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fe229d4 elementor-widget elementor-widget-text-editor\" data-id=\"fe229d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>In summary, we have gone covered the data science process by showing you the highly simplified data science life cycle along with the widely popular CRISP-DM and OSEMN frameworks. These frameworks provides a high-level guidance on handling a data science project from end to end where all encompasses the same core concepts of data compilation, pre-processing, exploration, modeling, evaluation, interpretation and deployment. It should be noted that the flow amongst these processes is not linear and that in practice the flow can be non-linear and can re-iterate until satisfactory condition is met.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>What do you exactly do to transform a data problem through to completion and generate data-driven insights?  And most importantly of all, where do you start? <\/p>\n","protected":false},"author":886,"featured_media":9454,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94,580,579],"ppma_author":[3736],"class_list":["post-9449","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science","tag-data-driven-insights","tag-transform-data"],"authors":[{"term_id":3736,"user_id":886,"is_guest":0,"slug":"chanin-nantasenamat","display_name":"Chanin Nantasenamat","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/08\/Chanin-Nantasenamat-150x150.jpg","user_url":"http:\/\/www.mahidol.ac.th\/mueng\/","last_name":"Nantasenamat","first_name":"Chanin","job_title":"","description":"Chanin Nantasenamat is Associate Professor and Head, Center of Data Mining and Biomedical Informatics at Mahidol University, Thailand. He is also Founder of Data Professor YouTube Channel and Associate Editor at Frontiers in Pharmacology. Thought Leader on AI and ML Education, he was a Visiting Professor at Uppsala University, Lund University, University of California at Los Angeles as well as the California State University at Fullerton."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9449","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/886"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=9449"}],"version-history":[{"count":7,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9449\/revisions"}],"predecessor-version":[{"id":34101,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9449\/revisions\/34101"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/9454"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=9449"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=9449"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=9449"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9449"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}