{"id":943,"date":"2018-10-25T03:25:09","date_gmt":"2018-10-25T00:25:09","guid":{"rendered":"http:\/\/kusuaks7\/?p=548"},"modified":"2021-05-11T14:00:41","modified_gmt":"2021-05-11T14:00:41","slug":"the-machine-learning-workflow","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/the-machine-learning-workflow\/","title":{"rendered":"The Machine Learning Workflow"},"content":{"rendered":"<p><strong><em>Ready to learn Machine Learning? Browse<\/em><\/strong> <strong><em><a href=\"https:\/\/www.experfy.com\/training\/tracks\/machine-learning-training-certification\">Machine Learning Training and Certification courses<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<h4>What&#39;s different about machine learning projects? How do you reduce risks and build a good solution quickly?<\/h4>\n<h4>&nbsp;<\/h4>\n<p style=\"text-align: center;\"><img decoding=\"async\" alt=\"image\" src=\"https:\/\/images.ctfassets.net\/2yr4wv2jga4w\/6M1CIOPuxOGUGm2cye82i6\/b60ce5aa245acc5420d0b14f36f2f356\/Machine_Learning_Project_Workflow.png\" style=\"width: 700px; height: 305px;\" \/><\/p>\n<h4>Machine Learning Project Workflow<\/h4>\n<p>In standard software development, you simply answer the question:<\/p>\n<blockquote>\n<p><em>What do you want to implement?<\/em><\/p>\n<\/blockquote>\n<p>And then you, well, implement.<\/p>\n<p>But in machine learning projects, you first need to&nbsp;<strong>explore<\/strong>&nbsp;what&rsquo;s possible &ndash; with the data you have. So the first question is:<\/p>\n<blockquote>\n<p><em>What&nbsp;<strong>can you<\/strong>&nbsp;implement?<\/em><\/p>\n<\/blockquote>\n<p style=\"text-align: center;\"><img decoding=\"async\" alt=\"Machine Learning vs Normal Software\" src=\"https:\/\/images.ctfassets.net\/2yr4wv2jga4w\/5lJGT7OAVOEmg0qsmEMyU6\/c2b01bb2840815043f80d4181b8699f8\/Machine_Learning_vs_Normal_Software.png\" \/><\/p>\n<p>Here&rsquo;s what we learned works to keep a machine learning project on track from start to finish:<\/p>\n<h2 id=\"1-define-the-task\">1. Define the task<\/h2>\n<p style=\"text-align: center;\"><img decoding=\"async\" alt=\"Step 1 Define the Task\" src=\"https:\/\/images.ctfassets.net\/2yr4wv2jga4w\/2oR9Y015rG0CQU2S6Aom8G\/ec826687bf19b1322b88441f96658f87\/Step_1_Define_the_Task.png\" style=\"width: 693px; height: 255px;\" \/><\/p>\n<p>It&rsquo;s easy to get drawn into AI projects that don&rsquo;t go anywhere. A proper machine learning project definition drastically reduces this risk.<\/p>\n<p>These are the questions you need to answer to define a project:<\/p>\n<h3 id=\"understand-the-current-process\">Understand the current process<\/h3>\n<p>What is your current process? Your machine learning solution will replace a process that already exists. How are decisions currently made in this process? Considering the current process will give you a lot of&nbsp;<em>domain knowledge<\/em>&nbsp;and help you define how your machine learning system has to look.<\/p>\n<h3 id=\"define-what-you-want-to-predict\">Define what you want to predict<\/h3>\n<p>What exact variable do you want to predict? Define the output of your machine learning system &mdash; in as much detail as possible.<\/p>\n<h3 id=\"list-the-useful-data-sources\">List the useful data sources<\/h3>\n<p>What data do you have that&rsquo;s&nbsp;<strong>useful<\/strong>&nbsp;to predict this output correctly? Start by listing the data sources the current process relies on. One way to list useful data sources is by asking yourself: &ldquo;<em>If I &mdash; as a human &mdash; needed to make this prediction, what data points would I want to know about?<\/em>&rdquo;<\/p>\n<p>If you understand the current process, know what you want to predict, and have identified all the useful data sources, then you&rsquo;re in a good position to decide whether it makes sense to proceed to the next stage.<\/p>\n<h2 id=\"2-find-an-approach-that-works\">2. Find an approach that works<\/h2>\n<p style=\"text-align: center;\"><img decoding=\"async\" alt=\"Step 2 Show It Works\" src=\"https:\/\/images.ctfassets.net\/2yr4wv2jga4w\/5ig2kLskSk8G4IOACqYOcq\/b463e16a0afa9702b9be1e2007a7551d\/Step_2_Show_It_Works.png\" style=\"width: 700px; height: 261px;\" \/><\/p>\n<p>Even if you have a good problem definition, you can&rsquo;t know yet how accurate your machine learning model will be in the end &ndash; or whether it will be worth replacing the current process.<\/p>\n<p>A proof of concept is the cheapest way to find out what ROI you can expect from your final solution. These are the steps:<\/p>\n<h3 id=\"research\">Research<\/h3>\n<p>Research all the ways other teams have resolved similar tasks &mdash; whether they used machine learning or not. Then&nbsp;<strong>make a plan<\/strong>, using what you&rsquo;ve learned from both your research and the existing process you want to replace.<\/p>\n<h3 id=\"build-a-dataset\">Build a dataset<\/h3>\n<p>The central part of any machine learning project is the sample dataset! This includes&nbsp;<em>realistic&nbsp;<\/em>examples of exactly those cases for which you want your machine learning system to make correct predictions. Think of it as an Excel table, with:<\/p>\n<ul>\n<li>One row per example, and<\/li>\n<li>A number of columns of useful input data, plus<\/li>\n<li>One column containing the output (aka the target).<\/li>\n<\/ul>\n<p>The model then has to learn to&nbsp;<em>predict the output from the input<\/em>. For example, predicting a customer&rsquo;s credit rating (output) from their payment history (input).<\/p>\n<p>This dataset is like the&nbsp;<em>requirements document<\/em>&nbsp;in a normal software project &mdash; the point of reference against which you check whether you&rsquo;re on track.<\/p>\n<h3 id=\"experiment\">Experiment<\/h3>\n<p>Start with the most promising approach, evaluate it, and then improve from there. Repeat &ndash; until you&rsquo;ve found an approach that is&nbsp;<strong>good enough<\/strong>.<\/p>\n<h2 id=\"3-build-a-full-scale-solution\">3. Build a full-scale solution<\/h2>\n<p style=\"text-align: center;\"><img decoding=\"async\" alt=\"Step 3 Scale\" src=\"https:\/\/images.ctfassets.net\/2yr4wv2jga4w\/4df4JQlIvCsKyK0CQmYgoU\/638116ade4144f73d16a1c6dca3c75eb\/Step_3_Scale.png\" style=\"width: 700px; height: 260px;\" \/><\/p>\n<blockquote>\n<p>Working software is the primary measure of progress. &#8211;&nbsp;<em><a href=\"http:\/\/agilemanifesto.org\/principles.html\" rel=\"noopener\">Agile Manifesto<\/a><\/em><\/p>\n<\/blockquote>\n<p>A proof-of-concept doesn&rsquo;t make you any money. So here are the steps to take you to a stable, full-scale solution.<\/p>\n<h3 id=\"improve-accuracy\">Improve accuracy<\/h3>\n<p>A proof-of-concept is a 20\/80 implementation. Now it&rsquo;s time to make the critical improvements you left out in your first iteration:<\/p>\n<ul>\n<li>Add more data;<\/li>\n<li>Build new features;<\/li>\n<li>Try other algorithms;<\/li>\n<li>Fine-tune the model parameters.<\/li>\n<\/ul>\n<h3 id=\"scale\">Scale<\/h3>\n<p>It&rsquo;s a big step from a proof-of-concept script to a production-ready solution.<\/p>\n<ul>\n<li><strong>Scalability &amp; Stability<\/strong>: Rewrite data processing steps into separate, scalable tasks within a data pipeline.<\/li>\n<li><strong>Tests<\/strong>: Write additional unit and integration tests &mdash; that also cover possible errors in the data.<\/li>\n<li><strong>Deployment<\/strong>: Build flexible, repeatable, easy deployment that can handle the throughput and processing speed you need (including automated build-up of your infrastructure).<\/li>\n<\/ul>\n<h3 id=\"ab-test\">A\/B Test<\/h3>\n<p>Similarly to other software updates, the final test for your newly automated process is comparing it with the current process. With an A\/B test, you can measure the improvement you&rsquo;ve achieved, as well as the ROI of your project.<\/p>\n<h3 id=\"api\">API<\/h3>\n<p>Your machine learning service needs a way to speak to the rest of your infrastructure. That&rsquo;s either done by continually saving the results into a database or making the algorithm available through an API.<\/p>\n<h3 id=\"documentation\">Documentation<\/h3>\n<p>Beyond documentation for the code, you should consider writing a user guide that explains how the solution works. It&rsquo;s important to clarify the ideas behind the implementation: in data science, it can be hard to understand your reasoning from your code alone.<\/p>\n<h2 id=\"optional-add-ons\">Optional Add-ons<\/h2>\n<ul>\n<li><strong>Versioning.<\/strong>&nbsp;Maybe you need to A\/B test against an older model, or switch to a previous version of your pipeline on short notice &mdash; correct versioning makes this easy.<\/li>\n<li><strong>Automated retraining.<\/strong>&nbsp;Models get outdated &mdash; and eventually, you&rsquo;ll have to retrain yours on new data. In some cases, it makes sense to automate model updating.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>What&#8217;s different about machine learning projects? How do you reduce risks and build a good solution quickly? In standard software development, you simply answer the question: What do you want to implement? And then you, well, implement. But in machine learning projects, you first need to&nbsp;explore&nbsp;what&rsquo;s possible &ndash; with the data you have. So the first question is: What&nbsp;can you&nbsp;implement? Here&rsquo;s what we learned works to keep machine learning project on track from start to finish.<\/p>\n","protected":false},"author":314,"featured_media":3279,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97],"ppma_author":[2069],"class_list":["post-943","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence"],"authors":[{"term_id":2069,"user_id":314,"is_guest":0,"slug":"markus-schmitt","display_name":"Markus Schmitt","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Schmitt","first_name":"Markus","job_title":"","description":"Markus Schmitt is the founder and head of data science at Data Revenue, a Machine Learning Agency based in Berlin, Germany, where he builds custom end-to-end machine learning systems for Medical, Finance and Marketing clients. Before Data Revenue he developed new ventures for the company builder Team Europe and studied Mathematics &amp; Economics at Warwick."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/943","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/314"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=943"}],"version-history":[{"count":1,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/943\/revisions"}],"predecessor-version":[{"id":6050,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/943\/revisions\/6050"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3279"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=943"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=943"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=943"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=943"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}