{"id":22724,"date":"2021-04-06T06:30:00","date_gmt":"2021-04-06T06:30:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/avoiding-major-pitfalls-data-science-projects\/"},"modified":"2023-08-28T07:10:55","modified_gmt":"2023-08-28T07:10:55","slug":"avoiding-major-pitfalls-data-science-projects","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/avoiding-major-pitfalls-data-science-projects\/","title":{"rendered":"Avoiding The 4 Major Pitfalls Of Data Science Projects"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22724\" class=\"elementor elementor-22724\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-fc1e885 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"fc1e885\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c8021bb\" data-id=\"c8021bb\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-08bd546 elementor-widget elementor-widget-text-editor\" data-id=\"08bd546\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\"><em>Working on a data science project, especially with a new stakeholder, can be challenging. Learn how to avoid the main pitfalls.<\/em><\/p>\n<p id=\"f948\">As data science, machine learning, and AI continue to grow in popularity across business domains, companies are trying to leverage these technologies for new problems. As a <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/master-data-analytics-first-before-becoming-a-data-scientist\/\" target=\"_blank\" rel=\"noreferrer noopener\">data scientist<\/a>, you know that there\u2019s a high likelihood that your upcoming project might fail. This is especially true if machine learning is only a small aspect of your company\u2019s operations \u2014 a feature in the main product or a method for generating valuable business insight. At&nbsp;<a href=\"https:\/\/medium.com\/riskified-technology\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">Riskified<\/a>, machine learning is at the core of our business, but keeping these pitfalls in mind is always important.<\/p>\n<p id=\"e5ec\">There are countless reasons a project could fail due to unprofessionalism on the data scientist\u2019s side: selection bias, target leakage, data drift, p-hacking, overfitting and more. In this article, however, we\u2019ll focus on some fundamental issues pertaining to collaboration with the business.<\/p>\n<p id=\"cb0c\">Incorporating data science is the same as any change management process \u2014 it requires careful consideration and planning to get right. Any significant change process needs a strong coalition of members, including champions who will create a vision, promote it, and remove obstacles. One of the greatest classic change management models is&nbsp;<a href=\"https:\/\/cio-wiki.org\/wiki\/Kotter%27s_8-Step_Change_Model\" target=\"_blank\" rel=\"noreferrer noopener\">John Kotter\u2019s 8 step process<\/a>, which emphasizes the importance of planning the change process well in advance. Let\u2019s see what can happen when that critical preparation work hasn\u2019t been performed. Hopefully, the tips below can help save your data science project or avoid starting one that\u2019s destined to fail.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-40a1245 elementor-widget elementor-widget-heading\" data-id=\"40a1245\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Bad data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f4b45a elementor-widget elementor-widget-text-editor\" data-id=\"4f4b45a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"1ae1\">If you\u2019re the first data scientist in your organization (or in a business unit that hasn\u2019t used data science before), there\u2019s a good chance you\u2019re either missing crucial data, or the data quality isn\u2019t good enough to get started. Talking with data engineers or DBAs about existing data, getting permissions and access to that data, exploring it to check the features, null rates, consistency, validating how far back it goes, the existence of the predicted label, the accuracy of that label, and other quality issues can take a tremendous amount of time. If your company has been doing pretty well so far, those data providers won\u2019t necessarily understand why they have to meet new stringent demands concerning data quality (\u201ceverything worked great so far, no?\u201d).<\/p>\n<p id=\"9c33\">For the business people outside your line of work, it can seem like no progress is being made during this long stretch of time, as it\u2019s a seemingly never-ending process of data cleaning. If senior management doesn\u2019t understand the significant prep work required to get the data to a minimal working condition, they won\u2019t have the necessary patience and expect results much too soon. In companies that take on a lot of technical debt to move quickly, the data quality will likely be even worse, and their patience even lower as they\u2019re accustomed to fast deliverables.<\/p>\n<p id=\"d825\">If you\u2019re in this tricky situation, it\u2019s vital to get access to the data and start some minimal exploration before making any assumptions about how fast you might be able to progress on your project (or whether the project is even feasible). If you\u2019re a stakeholder that wants to incorporate data science for the first time, it\u2019s important to start preparing the data well in advance or risk having an extremely frustrated hire who might leave quickly.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b711bf elementor-widget elementor-widget-heading\" data-id=\"5b711bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Lack of buy-in and trust<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9baf29f elementor-widget elementor-widget-text-editor\" data-id=\"9baf29f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6fa5\">Your company\u2019s senior management is likely getting bombarded with messages promising miracles by implementing \u201cAI.\u201d You may find a lone senior executive decided to take a chance by introducing machine learning into their organization. The specifics around how AI will actually be leveraged, the business problems it will solve, and the value it can bring are&nbsp;<a href=\"https:\/\/medium.com\/riskified-technology\/how-we-choose-what-to-research-57acb835fdd7\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">left as details for you to figure out<\/a>. While this can work, the organization has to understand both the amount of prep work required to leverage machine learning and the types of problems it excels at solving.<\/p>\n<p id=\"4091\">If the organization isn\u2019t ready, you\u2019ll typically see limited buy-in into machine learning throughout the company (i.e., it\u2019s driven top-down by executives), which means you might hit roadblocks with middle-management stakeholders. From the data scientist\u2019s perspective, it might look a little like this: you just completed a project for a new stakeholder who isn\u2019t accustomed to working with machine learning. During one of the final meetings, you don\u2019t get many questions or feedback about the work. As any salesperson knows, it\u2019s easier to work with a \u2018no\u2019 than to not get any feedback. Less experienced data scientists might try to prove the value of their solution. However, by the time you\u2019ve hit this point, it\u2019s probably too late. Your stakeholders just aren\u2019t engaged and won\u2019t end up using your project.<\/p>\n<p id=\"5b26\">Additionally, many companies aren\u2019t truly data-driven. While every company says their decisions are driven by data (in today\u2019s business environment, it\u2019s practically blasphemy to say otherwise), many companies don\u2019t actually operate this way. In many cases, this may stem from not having sufficient data. As a result, they haven\u2019t improved their analytical capabilities, and the company culture may not value the methodical testing of new processes. If a data scientist (or specifically a decision scientist) delivers&nbsp;<a href=\"https:\/\/medium.com\/riskified-technology\/data-vs-insight-the-thin-line-between-good-and-bad-reports-91997d5e9cd\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">unique and critical insight<\/a>&nbsp;that counters the company\u2019s tribal knowledge, there\u2019s a good chance that intuition will prevail. Employees just won\u2019t feel comfortable enough pushing back and challenging this type of work and if it\u2019s different enough than what they know, they just won\u2019t accept it.<\/p>\n<p id=\"1a6e\">To avoid this, it\u2019s crucial to understand who your stakeholders are and engage them throughout the process. Don\u2019t show up after several weeks of work with a solution and expect them just to accept it. By involving your stakeholders frequently, leveraging their domain expertise, and allowing them to shape your project\u2019s solution, you can utilize the&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/IKEA_effect\" target=\"_blank\" rel=\"noreferrer noopener\">Ikea effect<\/a>&nbsp;\u2014 the stakeholder will value the project more once they\u2019ve personally invested time into it.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-879fea6 elementor-widget elementor-widget-heading\" data-id=\"879fea6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Stakeholders who don\u2019t know<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5f82c53 elementor-widget elementor-widget-text-editor\" data-id=\"5f82c53\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8eae\">Sometimes the stakeholders are interested in leveraging machine learning but don\u2019t know how to use it. You might genuinely have enough engagement and interest at the relevant levels (both executive buy-in and middle management), but they aren\u2019t sure what exactly to expect and lack a clear understanding of the process behind machine learning. They may be expecting immediate results and grow impatient when data scientists keep mentioning data quality issues, lack of required tools and infrastructure, or difficulties deploying their first models within the existing production system.<\/p>\n<p id=\"0fdf\">From the data scientist\u2019s perspective, it could look like this: you\u2019ve started a project with a new stakeholder who has little experience working with data science. They want to tap into that \u2018ML magic\u2019 but have a hard time defining their specific problem. You might feel pressured to start exploring the data and \u2018see what you can come up with.\u2019 While this can work, it should be avoided whenever possible.<\/p>\n<p id=\"5972\">First, pinpoint the specific metric you\u2019d like to try and predict \u2014 every moment spent on detailing the metric calculation and the definition of done is time well-spent. This can also help you set expectations from the get-go \u2014 some stakeholders might expect the impossible from ML, and it\u2019s much better to be upfront about this before starting the project. Feel free to explore the data first, but don\u2019t be tempted to just start and see how well you can do. It\u2019s also crucial to reiterate the inherent unpredictability of any research project. Issues you didn\u2019t consider will pop up and the performance you were hoping for may not be reached. Be very cautious about committing to specific performance goals or delivery dates.<\/p>\n<p id=\"1c30\">As the project progresses, continuously run what-if analyzes before advancing too far ahead. For example, if you\u2019ve started with a simple benchmark model, discuss the accuracy metrics with the stakeholders. Is the model already good enough to be valuable? If not, is it even in the right ballpark? Is the complexity feasible for implementation in production? What other constraints need to be taken into account?<\/p>\n<p id=\"6ad7\">It\u2019s good to discuss these issues frequently, especially when the project has advanced and new information needs to be considered.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1878c9c elementor-widget elementor-widget-heading\" data-id=\"1878c9c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">When Stakeholders change their mind<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-24c3792 elementor-widget elementor-widget-text-editor\" data-id=\"24c3792\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4cd0\">In some cases, the stakeholders might be very clear on what they\u2019re asking for and you\u2019ve agreed upon all the expectations before going ahead. As data science projects can take several weeks (or even months), stakeholders may change their request. This can occur due to:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7549bbc elementor-widget elementor-widget-text-editor\" data-id=\"7549bbc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li>The original request is no longer relevant. Whenever possible, working with faster incremental deliverables (agile over waterfall) can help reduce these types of issues.<\/li><li>Success in the original task drives the project in different directions. For example, the stakeholder may have originally requested a very accurate sales forecast. While this sounds straightforward, the solution might involve an ensemble of multiple models, each with various parameters and seasonality. After you have finally improved the model to meet the original threshold and it is being leveraged by the business, the stakeholder now requests the ability to explain the rationale behind each prediction. This can be a much larger request than the stakeholder would imagine, and such a necessity is best discussed at the onset of the project, as it could impact the complexity of the designed solution.<\/li><li>Stakeholders change \u2014 don\u2019t assume that everyone will treat your work similarly. A new stakeholder with less buy-in, limited experience working with machine learning, and a general higher level of skepticism might severely impact your project\u2019s progress. Again, it\u2019s important to strive for incremental quick wins to showcase the value to the business. It\u2019s much easier to greenlight a track that\u2019s demonstrating value than to continue investing in a black box with no clear end date.<\/li><\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d7c9040 elementor-widget elementor-widget-text-editor\" data-id=\"d7c9040\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"dfd5\">As a true professional, you need to consider how your stakeholder views your work and weigh the constraints and limitations that impact your work. Whether you\u2019re working on a project with a new stakeholder or just starting up data science at your company, always plan several steps ahead \u2014 manage the expectations, focus on incremental value, and generate the buy-in required for your project to become a success. There\u2019s just too little time and too much work to let an otherwise well-run data science project go to waste. Good luck!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Working on a data science project, especially with a new stakeholder, can be challenging. Learn how to avoid the main pitfalls.<\/p>\n","protected":false},"author":1098,"featured_media":19087,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[94,92,1475,266,352],"ppma_author":[3894],"class_list":["post-22724","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-data-science","tag-machine-learning","tag-pitfalls","tag-project-management","tag-strategy"],"authors":[{"term_id":3894,"user_id":1098,"is_guest":0,"slug":"elad-cohen","display_name":"Elad Cohen","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Elad-Cohen-150x150.jpeg","user_url":"https:\/\/www.riskified.com","last_name":"Cohen","first_name":"Elad","job_title":"","description":"Elad Cohen is VP Data Science &amp; Research at Riskified,  the AI platform powering the eCommerce revolution."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22724","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1098"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22724"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22724\/revisions"}],"predecessor-version":[{"id":31656,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22724\/revisions\/31656"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/19087"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22724"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22724"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22724"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22724"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}