{"id":22450,"date":"2020-11-20T07:14:52","date_gmt":"2020-11-20T07:14:52","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/data-science-your-machine-learning-model-likely-fail\/"},"modified":"2021-05-21T03:33:35","modified_gmt":"2021-05-21T03:33:35","slug":"data-science-your-machine-learning-model-likely-fail","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/data-science-your-machine-learning-model-likely-fail\/","title":{"rendered":"Is Your Machine Learning Model Likely to Fail?"},"content":{"rendered":"\n<p class=\"has-medium-font-size wp-block-paragraph\"><strong>5 missteps to avoid in your planning process<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9b5c\"><strong>TL;DR&nbsp;<\/strong>\u2014 Amidst intentions of generating brilliant statistical analyses and breakthroughs in machine learning, don\u2019t get tripped up by these&nbsp;<strong>five common mistakes<\/strong>&nbsp;in the Data Science planning process.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4d4d\"><a href=\"https:\/\/towardsdatascience.com\/6-months-data-science-e875e69aab0a\" target=\"_blank\" rel=\"noreferrer noopener\">As a Federal consultant<\/a>, I work with U.S. government agencies that conduct scientific research, support veterans, offer medical services, and maintain healthcare supply chains. Data Science can be a very important tool to help these teams advance their mission-driven work. I\u2019m deeply invested in making sure we\u00a0<strong>don\u2019t waste time and energy<\/strong>\u00a0on Data Science models that:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Don\u2019t go into production<\/li><li>Don\u2019t deliver actionable insights<\/li><li>Don\u2019t make someone\u2019s life easier<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"49ed\">Based on my experience, I\u2019m sharing hard-won lessons about&nbsp;<strong>five missteps&nbsp;<\/strong>in the Data Science planning process \u2014 shortfalls that you can avoid if you follow these recommendations.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867#0abd\" target=\"_blank\" rel=\"noreferrer noopener\">Focus on Agility and Diversity<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867#bcfd\" target=\"_blank\" rel=\"noreferrer noopener\">Design for End Users\u2019 Needs<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867#dfc4\" target=\"_blank\" rel=\"noreferrer noopener\">Plan for Productionization<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867#4575\" target=\"_blank\" rel=\"noreferrer noopener\">Understand Data Debt<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867#fb79\" target=\"_blank\" rel=\"noreferrer noopener\">Pursue Options Beyond Machine Learning<\/a><\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"97eb\">Motivation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"6154\">Just like the visible light spectrum, the work we do as Data Scientists constitutes a small portion of a broader range. A Data Scientist\u2019s&nbsp;<strong>blindness to the data lifecycle can cause their machine learning project to fail<\/strong>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1rgXAeQtCmethjchde3sFnQ.png\" alt=\"Data Science Planning: Is Your Machine Learning Model Likely to Fail?\"\/><figcaption>Data Science is only a fraction of the work that needs to be done to support Data Science activities in an organization.<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"333c\">The data lifecycle spans the full journey from\u00a0<em>planning<\/em>\u00a0to\u00a0<em>archiving<\/em>. In large organizations, there may be a separate team that\u2019s responsible for the\u00a0<a href=\"https:\/\/medium.com\/atlas-research\/data-science-team-eae84b1af65d#155d\" target=\"_blank\" rel=\"noreferrer noopener\">DataOps<\/a>\u00a0work required to wrangle data into a workable shape for advanced analytics. There may be another team,\u00a0<a href=\"https:\/\/medium.com\/atlas-research\/data-science-team-eae84b1af65d#a352\" target=\"_blank\" rel=\"noreferrer noopener\">DevSecOps<\/a>, that conducts the work of putting models into production. In small organizations, the Data Scientists may be responsible for managing the end-to-end data pipeline.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"09d6\">Wherever your organization falls on this spectrum, it\u2019s beneficial for Data Scientists to possess a clear-eyed view of the data lifecycle during the project planning process. In this article, I share five recommendations to support advanced analytics, machine learning, and model deployment across all the stages of project planning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"cc68\">To start off, it\u2019s helpful if data owners, consumers, and stakeholders share a basic level of\u00a0<strong>data literacy<\/strong>. The\u00a0<a href=\"https:\/\/amzn.to\/32oK8hH\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Data Management Body of Knowledge<\/em><\/a>\u00a0is good reference material.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1855\">This 600+ pager continues to be exceedingly useful in my Data Science consulting work, and also useful in case anyone tries to fight me about the importance of data management.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0abd\">#1 \u2014 Focus on Agility and Diversity<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"cd37\">Agile methodology refers to a\u00a0<strong>set of values and practices\u00a0<\/strong>that enhance flexibility and accountability in software development. The four values of\u00a0<a href=\"https:\/\/agilemanifesto.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">The Agile Manifesto<\/a>:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>Individuals and Interactions&nbsp;<\/strong><em>over processes and tools<\/em><\/li><li><strong>Working Software<\/strong><em>&nbsp;over comprehensive documentation<\/em><\/li><li><strong>Customer Collaboration<\/strong><em>&nbsp;over contract negotiation<\/em><\/li><li><strong>Responding to Change<\/strong><em>&nbsp;over following a plan<\/em><\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4203\">What does a project planning discipline invented by 17 white men have to do with advancing diversity? Agile encourages&nbsp;<strong>frequent interaction&nbsp;<\/strong>with stakeholders that represent a&nbsp;<strong>diversity of functional roles<\/strong>&nbsp;across the organization.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"f128\">Just as diversity from a business perspective leads to the production of better software \u2014 diversity in gender, race, ethnicity, and other personal characteristics can&nbsp;<strong>enhance creative problem solving<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"6c19\">Agile methodology values\u00a0<strong>responsiveness to change<\/strong>. It requires\u00a0<strong>fast iteration cycles<\/strong>\u00a0that rely on\u00a0<strong>trust and openness<\/strong>\u00a0within a team. Meeting structures such as\u00a0<strong>reviews and retrospectives<\/strong>\u00a0can help teams to pause and reflect on whether they are promoting effective ways of working for all team members. In summary, Agile teams should leverage <a href=\"https:\/\/www.experfy.com\/blog\/futureofwork\/best-diversity-hiring-practices-and-its-challenges\/\" target=\"_blank\" rel=\"noreferrer noopener\">these practices <\/a>and values with the aim of fully embracing the benefits of diversity.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ffb4\"><a href=\"https:\/\/www.youtube.com\/watch?v=h9ejBuHH99I&amp;feature=youtu.be&amp;t=225\" target=\"_blank\" rel=\"noreferrer noopener\">In the words<\/a>\u00a0of\u00a0<a href=\"https:\/\/www.sianlewis.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sian Lewis<\/a>, a Lead Data Scientist at Booz Allen Hamilton and chair of the Professional Development Committee of the organization\u2019s African American Network:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Countless studies have shown the more diverse your team, the more successful they\u2019re going to be. I equate diversity with success. And I\u2019m a human being \u2014 I enjoy success.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"49ac\">Read more about the&nbsp;<strong>impact of diversity on business outcomes<\/strong>:<a href=\"https:\/\/www.mckinsey.com\/featured-insights\/diversity-and-inclusion\/diversity-wins-how-inclusion-matters\" target=\"_blank\" rel=\"noreferrer noopener\">Diversity wins: How inclusion mattersDiversity wins is the third report in a McKinsey series investigating the business case for diversity, following Why\u2026www.mckinsey.com<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5f2e\">How to&nbsp;<strong>focus on agility and diversity<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Design&nbsp;<strong>short iterations<\/strong>&nbsp;in your workflow and&nbsp;<strong>get stakeholder feedback<\/strong>&nbsp;with each iteration<\/li><li><strong>Actively seek diversity<\/strong>&nbsp;to promote different ways of thinking<\/li><li>Create an environment of&nbsp;<strong>openness and trust<\/strong>&nbsp;via frequent conversations with your team about ways of working<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"bcfd\">#2 \u2014 Design for End Users\u2019 Needs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"0023\">The culture of Data Science encourages\u00a0<a href=\"https:\/\/towardsdatascience.com\/build-full-stack-ml-12-hours-50c310fedd51\" target=\"_blank\" rel=\"noreferrer noopener\">toy projects<\/a>\u00a0and\u00a0<a href=\"https:\/\/towardsdatascience.com\/10-data-science-competitions-for-you-to-hone-your-skills-for-2020-32d87ee19cc9\" target=\"_blank\" rel=\"noreferrer noopener\">Kaggle competitions<\/a>\u00a0to explore new ideas, sharpen skillsets, and hone dance battle skills. While these capabilities are important, Data Scientists also need to practice the crucial skill of project scoping. Struggle #2 refers to thinking that a Data Science initiative can succeed without a thorough investigation of end users\u2019 needs at the outset.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"611a\">The Data Scientist should take the approach of\u00a0<a href=\"https:\/\/medium.com\/atlas-research\/model-selection-d190fb8bbdda#9d9b\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Human Centered Design<\/strong><\/a>\u00a0to learn directly from the end users via interviews, then iteratively test and design a solution that takes their feedback into account. While the HCD work could be done by a separate team in a large organization, it\u2019s best if the Data Scientist works directly with the users if at all possible.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0AwXr7AIKTy16fFN.png\" alt=\"Data Science Planning: Is Your Machine Learning Model Likely to Fail?\"\/><figcaption><em>HCD is also the key to unlocking vast new markets for technological problem solving tools. v<\/em>ia\u00a0<a href=\"https:\/\/99percentinvisible.org\/episode\/the-next-billion-users\/\" target=\"_blank\" rel=\"noreferrer noopener\">99% Invisible<\/a>.<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5d0b\">In partnership with Google\u2019s\u00a0<a href=\"https:\/\/nextbillionusers.google\/\" target=\"_blank\" rel=\"noreferrer noopener\">Next Billion User Initiative<\/a>, the\u00a0<a href=\"https:\/\/99percentinvisible.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">99% Invisible<\/a>\u00a0team explored potential pitfalls of software development for new audiences around the world:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>Tech gone wrong \u2014&nbsp;<\/em>attempting to digitize the transaction ledger of small shop owners without first consulting on whether this was a problem the shop owners needed solved<\/li><li><em>Tech done right \u2014&nbsp;<\/em>offering web-based education in concert with entertainment via movie vans<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9647\">HCD principles can help produce\u00a0<strong>joyous technology experiences<\/strong>, ensure a\u00a0<strong>seamless handoff<\/strong>\u00a0between machine learning application and human users, and\u00a0<a href=\"https:\/\/medium.com\/atlas-research\/model-selection-d190fb8bbdda#1ac9\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>mitigate negative feedback loops<\/strong><\/a>to the greatest extent possible.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"99e7\">Read more about the\u00a0<a href=\"https:\/\/amzn.to\/3eQvPYe\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>impact of HCD on global access to technology<\/strong><\/a><strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"dd8e\">How to&nbsp;<strong>design for the end users\u2019 needs<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Create an\u00a0<strong>interview schedule<\/strong>\u00a0early in the project<\/li><li>Make sure the data product\u00a0<strong>solves a problem<\/strong>\u00a0for the end user<\/li><li><strong>Seek feedback<\/strong>\u00a0on each iteration cycle<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-photo is-provider-giphy\"><div class=\"wp-block-embed__wrapper\">\nhttps:\/\/giphy.com\/gifs\/fallontonight-jimmy-fallon-tonight-show-jojo-siwa-1zJEcsB4MQ8DOAXCg6\n<\/div><figcaption>I can only imagine this is how Kaggle decides tiebreakers. via\u00a0<a href=\"https:\/\/giphy.com\/gifs\/fallontonight-jimmy-fallon-tonight-show-jojo-siwa-1zJEcsB4MQ8DOAXCg6\/links\" target=\"_blank\" rel=\"noreferrer noopener\">giphy<\/a><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"dfc4\">#3 \u2014 Plan for Productionization<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8a2c\"><a href=\"https:\/\/venturebeat.com\/2019\/07\/19\/why-do-87-of-data-science-projects-never-make-it-into-production\/\" target=\"_blank\" rel=\"noreferrer noopener\">87% of Data Science products\u00a0<strong>never make it into production<\/strong><\/a>. The problems include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Unrealistic stakeholder expectations<\/li><li>Failure to plan using agile methodology and HCD<\/li><li>Lack of DevOps deployment skills like pruning and containerization<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"430a\">Add in a dash of Data Science tinkering (<em>\u201cI think I ran my Jupyter Notebook cells out of order\u201d<\/em>) \u2014 and it\u2019s no surprise 9 out of 10 Data Science projects never see the light of day.<a target=\"_blank\" rel=\"noreferrer noopener\" href=\"https:\/\/towardsdatascience.com\/the-last-defense-against-another-ai-winter-c589b48c561\">Another AI Winter?How to deploy more ML solutions <\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a5b5\">Given that the only data product I\u2019ve deployed\u00a0<a href=\"https:\/\/towardsdatascience.com\/6-months-data-science-e875e69aab0a\" target=\"_blank\" rel=\"noreferrer noopener\">thus far<\/a>\u00a0is this\u00a0<a href=\"https:\/\/towardsdatascience.com\/walkthrough-mapping-gis-data-in-python-92c77cd2b87a\" target=\"_blank\" rel=\"noreferrer noopener\">clustering-based Neighborhood Explorer dashboard<\/a>, I\u2019ll leave it to the more experienced to walk you through the deployment process.\u00a0<a href=\"https:\/\/rebeccabilbro.github.io\/kubernetes-ml-unboxing\/\" target=\"_blank\" rel=\"noreferrer noopener\">Rebecca Bilbro<\/a>, Machine Learning Consultant at Unisys and co-creator of the\u00a0<a href=\"https:\/\/www.scikit-yb.org\/en\/latest\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">Yellowbrick<\/a>\u00a0package, writes that:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Data Scientists should understand how to deploy and scale their own models\u2026Overspecialization is generally a mistake.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"76b9\">She recommends\u00a0<a href=\"https:\/\/www.oreilly.com\/content\/kubernetes-a-simple-overview\/\" target=\"_blank\" rel=\"noreferrer noopener\">courses<\/a>\u00a0and\u00a0<a href=\"https:\/\/twimlai.com\/kubernetes-ebook\/\" target=\"_blank\" rel=\"noreferrer noopener\">reading material<\/a>\u00a0on\u00a0<strong>Kubernetes<\/strong>\u00a0for Data Science. This container management tool represents the dominant force in cloud deployment.\u00a0<strong>Containerization<\/strong>\u00a0allows for app components to run in a variety of environments \u2014 e.g. other computers, servers, the cloud \u2014 and offers a lightweight alternative to virtual machines given that containers share resources with the host operating system rather than requiring a guest operating system.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c36d\">In development, containers help overcome the challenges posed by multiple environments used by Data Scientists, Software Engineers, and DevOps practitioners. In production, containers deliver&nbsp;<strong>microservices<\/strong>&nbsp;to customers in response to network requests.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5f28\">Of companies\u00a0<a href=\"https:\/\/www.cncf.io\/blog\/2020\/03\/04\/2019-cncf-survey-results-are-here-deployments-are-growing-in-size-and-speed-as-cloud-native-adoption-becomes-mainstream\/\" target=\"_blank\" rel=\"noreferrer noopener\">surveyed<\/a>\u00a0by the Cloud Native Computing Foundation, 84% used containers in production in 2019 and 78% employ Kubernetes to manage their deployments. Kubernetes provides provisioning, networking, load-balancing, security, and scaling \u2014 all the elements necessary to make container deployment simple, consistent, and scalable.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ac66\">As containerization becomes increasingly mainstream, the approach opens new frontiers for data science model deployment. Scaling resources allows a model to better&nbsp;<strong>serve<\/strong>&nbsp;<strong>spikes in demand<\/strong>. Moreover, Kubernetes enables&nbsp;<strong>deployment on the edge<\/strong>&nbsp;\u2014 reducing latency, conserving bandwidth, improving privacy, and generally empowering smarter machine learning. Plus, it can manage deployment strategies such as&nbsp;<strong>A\/B testing, blue-green deployments, and canary releases<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b4ba\">Read more from\u00a0<a rel=\"noreferrer noopener\" href=\"https:\/\/medium.com\/u\/ae1782e046c3?source=post_page-----c0649c52f867--------------------------------\" target=\"_blank\">Caleb Kaiser<\/a>\u00a0on how to\u00a0<strong>optimize your models for production<\/strong>: <a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/how-to-reduce-the-cost-of-machine-learning-inference-4b466be90ba4\">How to reduce the cost of machine learning inferenceA complete checklist for optimizing inference costst<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a482\">How to&nbsp;<strong>plan for productionization<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Familiarize yourself with\u00a0<strong>deployment tools<\/strong>\u00a0such as\u00a0<a href=\"https:\/\/www.heroku.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Heroku<\/a>,\u00a0<a href=\"https:\/\/www.streamlit.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">Streamlit<\/a>,\u00a0<a href=\"https:\/\/docs.docker.com\/get-started\/\" target=\"_blank\" rel=\"noreferrer noopener\">Docker<\/a>, and\u00a0<a href=\"https:\/\/kubernetes.io\/docs\/home\/\" target=\"_blank\" rel=\"noreferrer noopener\">Kubernetes<\/a><\/li><li><strong>Designate resources\u00a0<\/strong>for deployment during the planning phase, engaging\u00a0<a href=\"https:\/\/medium.com\/atlas-research\/data-science-team-eae84b1af65d#a352\" target=\"_blank\" rel=\"noreferrer noopener\">DevOps expertise<\/a>\u00a0early<\/li><li>Include specifications for\u00a0<strong>monitoring and model redeployment<\/strong>\u00a0in your deployment strategy<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4575\">#4 \u2014 Understand Data Debt<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c28e\">If you discover data quality issues while\u00a0<a href=\"https:\/\/towardsdatascience.com\/10-underrated-python-skills-dfdff5741fdf\" target=\"_blank\" rel=\"noreferrer noopener\">conducting EDA<\/a>\u00a0and then silently proceed with your Data Science project, you are perpetuating your organization\u2019s data debt. Per data strategist,\u00a0<a href=\"https:\/\/johnladley.com\/a-bit-more-on-data-debt\/\" target=\"_blank\" rel=\"noreferrer noopener\">John Ladley<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Enormous costs are incurred the longer you delay even the simplest and most basic levels of data management.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c272\">The term&nbsp;<strong>data debt<\/strong>&nbsp;provides quantification and rationale for weighing the costs associated with poor quality data. It\u2019s based on the concept of tech debt, which refers to the impact of choosing the quick-and-dirty solution over a more thoughtful long-term fix.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"71ea\">Formal&nbsp;<strong>Data Governance documentation<\/strong>&nbsp;should address the problem of data debt by putting forward a plan to report data issues. This way, they can be rectified upstream. To quote John Ladley again:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Like all debts, data quality issues must be paid for eventually \u2014 either slowly over time (and with interest) or in a big chunk that pays off the debt.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"684a\">Far better to resolve issues at their source than to apply a temporary fix at the analytics stage. An effective&nbsp;<strong>Data Quality Reporting Process<\/strong>&nbsp;could take the form of an intranet portal where issues can be sent to a specified Data Quality team (or at the very least, to the data source owner).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c7fd\">Insufficient action to remedy data debt represents Struggle #4 because Data Science practitioners typically don\u2019t receive sufficient training in effective data management. In contrast to more established fields like Software Development and Data Engineering, there are few certification programs that are considered standard for data scientists. I\u2019d really like to see more emphasis on\u00a0<a href=\"https:\/\/towardsdatascience.com\/best-data-science-certification-4f221ac3dbe3\" target=\"_blank\" rel=\"noreferrer noopener\">formal training in\u00a0<strong>data strategy<\/strong><\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4da8\">Read more about how the\u00a0<strong>best practices of software engineering<\/strong>\u00a0can be applied to machine learning: <a rel=\"noreferrer noopener\" target=\"_blank\" href=\"https:\/\/towardsdatascience.com\/must-read-data-science-papers-487cce9a2020\">5 Must-Read Data Science Papers (and How to Use Them)Foundational ideas to keep you on top of the machine learning game.<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"460d\">How to&nbsp;<strong>avoid data debt<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Connect the benefits of high-quality data and the risks of low-quality data to your&nbsp;<strong>organization\u2019s strategic objectives<\/strong><\/li><li>Empower end users to&nbsp;<strong>report data quality issues<\/strong>&nbsp;to a designated team or to data source owners via a formal Data Quality Reporting Process<\/li><li>Conduct a&nbsp;<strong>Data Quality Assessment<\/strong>&nbsp;that includes evaluating the quality of metadata<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"fb79\">#5 \u2014 Pursue Options Beyond Machine Learning<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"6e05\">We\u2019ve discussed three key processes (<strong>Agile Methodology<\/strong>,&nbsp;<strong>Human Centered Design<\/strong>,and<strong>&nbsp;Data Quality Reporting<\/strong>) and two key teams (<strong>DataOps<\/strong>&nbsp;and&nbsp;<strong>DevSecOps<\/strong>) to support Data Science.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c2d0\">At the risk of beating you over the head with my 3 pound copy of the\u00a0<a href=\"https:\/\/amzn.to\/32oK8hH\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Data Management Body of Knowledge<\/em><\/a><em>,\u00a0<\/em>there\u2019s a lot of behind-the-scenes work that goes into setting up advanced analytics.\u00a0<a href=\"https:\/\/www.mckinsey.com\/~\/media\/McKinsey\/Business%20Functions\/McKinsey%20Analytics\/Our%20Insights\/Why%20data%20culture%20matters\/Why-data-culture-matters.ashx\" target=\"_blank\" rel=\"noreferrer noopener\">The Data Science consultants at McKinsey agree<\/a>, and caution that initiatives shouldn\u2019t be approached with the mentality of a science project.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1vIP6I7wOf0UfPt6tmuj43g.png\" alt=\"Data Science Planning: Is Your Machine Learning Model Likely to Fail?\"\/><figcaption>Aiken Data Framework via\u00a0<a href=\"https:\/\/amzn.to\/2F0MGcM\" target=\"_blank\" rel=\"noreferrer noopener\"><em>Data Management Body of Knowledge<\/em><\/a><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"627b\">Like Maslow\u2019s hierarchy of needs, Data Science actualization cannot be attained without first achieving the physiological and safety needs of Data Governance, Data Architecture, Data Quality, Metadata, etc. at the foundational levels of the&nbsp;<strong>Aiken Pyramid<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9354\">Before assuming that an organizational challenge needs to be solved with Data Science, it\u2019s worth investigating whether that energy is better invested in a data quality improvement project followed by straightforward analytics.\u00a0<a href=\"https:\/\/www.dataengineeringpodcast.com\/power-bi-business-intelligence-episode-154\/\" target=\"_blank\" rel=\"noreferrer noopener\">According to business intelligence expert, Rob Collie<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The most shocking thing is how far we have to go. Close to everyone we work with will say \u2018we\u2019re really primitive here. We\u2019re way behind the curve.\u2019<\/p><p>But everyone\u2019s saying that. When I work at company X, I assume that every other company in the world has the basics done right. And they don\u2019t.<\/p><p>Everyone is in the dark ages still.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9a6e\">Not all problems are Data Science problems. The organization may not have reached sufficient data maturity for advanced analytics. And that\u2019s okay \u2014 as Rob puts it:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The next big thing: doing the basics right for the first time ever.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"15db\">Data Science is part of the bigger ecosystem \u2014 a crucial component of business improvement and a core element of data-driven action. It represents the pinnacle of data-related activities. As such, it needs to be supported by robust data management practices.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9bc3\">How to&nbsp;<strong>pursue options beyond machine learning<\/strong>:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Invest in your Data Engineering skills by\u00a0<a href=\"https:\/\/mystery.knightlab.com\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>learning SQL<\/strong><\/a><strong>\u00a0and understanding\u00a0<\/strong><a href=\"https:\/\/amzn.to\/3eQvPYe\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>data modeling<\/strong><\/a><\/li><li>Invest in your ability to perform\u00a0<strong>straightforward analytics and communicate\u00a0<\/strong>the results for effective decision making<\/li><li>Ensure Data Science proof of concepts are\u00a0<strong>designed for deployment<\/strong>, not learning<\/li><\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"d097\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4356\">Working in Data Science is as&nbsp;<strong>awesome as it sounds<\/strong>. There\u2019s so much potential to make the world a better place by helping organizations&nbsp;<strong>leverage their data as a strategic asset<\/strong>. Avoiding these 5 potential struggles will help advance your problem solving capabilities to deliver effective data products<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A Data Scientist\u2019s blindness to the data lifecycle can cause their machine learning project to fail. This article shares five recommendations to support advanced analytics, machine learning, and model deployment across all the stages of project planning.<\/p>\n","protected":false},"author":940,"featured_media":16868,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[863,1028,92,1029,1030],"ppma_author":[3860],"class_list":["post-22450","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-advanced-analytics","tag-data-lifecycle","tag-machine-learning","tag-model-deployment","tag-project-planning"],"authors":[{"term_id":3860,"user_id":940,"is_guest":0,"slug":"nicole-janeway-bills","display_name":"Nicole Janeway","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/Nicole-Janeway-Bills-150x150.jpg","author_category":"","user_url":"https:\/\/page.co\/ahje9p","last_name":"Janeway","first_name":"Nicole","job_title":"","description":"Nicole Janeway, a Data Scientist at Atlas Research, has  experience in commercial and federal consulting.  She helps organizations leverage their top asset:  a simple and robust Data Strategy. <a href=\"https:\/\/page.co\/ahje9p\/\"> Sign up<\/a> for more of her writing."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/940"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22450"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22450\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/16868"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22450"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22450"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22450"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}