{"id":22440,"date":"2020-11-13T15:00:37","date_gmt":"2020-11-13T15:00:37","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/how-to-future-proof-your-data-science-project\/"},"modified":"2021-11-25T09:08:05","modified_gmt":"2021-11-25T09:08:05","slug":"how-to-future-proof-your-data-science-project","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/how-to-future-proof-your-data-science-project\/","title":{"rendered":"How to Future-Proof Your Data Science Project"},"content":{"rendered":"\n<p class=\"has-medium-font-size wp-block-paragraph\"><em>5 critical elements of ML model selection &amp; deployment<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c1b4\"><a href=\"https:\/\/venturebeat.com\/2019\/07\/19\/why-do-87-of-data-science-projects-never-make-it-into-production\/\" target=\"_blank\" rel=\"noreferrer noopener\">87% of Data Science projects&nbsp;<strong>never make it into production<\/strong><\/a><strong>.<\/strong>&nbsp;That statistic is shocking. Yet if you\u2019re like most Data Scientists, it probably doesn\u2019t surprise you.&nbsp;<em>Nontechnical stakeholders struggle to define business requirements. Crossfunctional teams face an uphill battle to set up robust pipelines for replicable data delivery. Deployment is hard. Machine learning models can take on a life of their own.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8fba\">Here\u2019s a list of&nbsp;<strong>five practical steps<\/strong>&nbsp;for future-proofing your model against these challenges of model selection and deployment. If you\u2019ve been ignoring these critical elements in the past, you may find your deployment rate skyrockets. Your data products may depend on correctly deploying the tips from this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ed8f\">1.0 Don\u2019t Underestimate Interpretability<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"0e47\">An interpretable model is one that is inherently explainable. For example, Decision Tree based methods \u2014 Random Forest, Adaboost, Gradient Tree Boosting \u2014 offer up a&nbsp;<strong>clear view of their underlying decision logic<\/strong>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1sKo3xaYZ1azmK0iZHSmg8w.jpeg\" alt=\"How to Future-Proof Your Data Science Project\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@andreasdress?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Andreas Dress<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Unsplash<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5c3d\">Interpretability may be mandatory in the heavily regulated fields of&nbsp;<a href=\"https:\/\/medium.com\/atlas-research\/ethical-ai-tools-b9d276a49fea#23ee\" rel=\"noopener\">criminal justice<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/medium.com\/atlas-research\/ethical-ai-tools-b9d276a49fea#e53a\" rel=\"noopener\">finance<\/a>. It also tends to be an underrated element of a strong data science project.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"86ee\"><strong>Along with inherent interpretability, a Decision Tree model&nbsp;<\/strong>has the following helpful properties:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Easily depicted in a visual format<\/li><li>Able to detect any non-linear bivariate relationship<\/li><li>Good predictive power across a wide variety of use cases<\/li><li>Provides ranked feature importance<\/li><li>Low requirements for feature preprocessing<\/li><li>Works with categorical features using&nbsp;<code>sklearn.OneHotEncoder\u200b<\/code><\/li><li>Handles outliers well and does not easily overfit<\/li><li>Can be used for either classification or regression<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"de41\">For these reasons, Decision Trees are a solid initial model to explore many typical business problems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"806c\">At the point of making a decision, are stakeholders more likely to trust an uninterpretable black box Neural Network or a Random Forest? Consider that a very detail-oriented (or very bored) business person could clearly trace the logic in every single underlying Decision Tree if they so chose. If the job of a Chief Data Officer is to keep the CEO out of jail, then this level of interpretability is clearly a win.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/medium.com\/atlas-research\/ethical-ai-tools-b9d276a49fea#c918\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1xmAVSr5fDl_xXEb1DYuYEQ-scaled.jpeg\" alt=\"How to Future-Proof Your Data Science Project\"\/><\/a><figcaption>Uninterpretable models run the risk of&nbsp;<a href=\"https:\/\/medium.com\/atlas-research\/ethical-ai-tools-b9d276a49fea#c918\" rel=\"noopener\">perpetuating societal inequalities<\/a>, such as the systematic \u201credlining\u201d of Black families by human and AI-based mortgage lending systems \u2014 unless concrete steps are taken to mitigate bias against vulnerable groups. Photo by&nbsp;<a href=\"https:\/\/www.pexels.com\/@august-de-richelieu?utm_content=attributionCopyText&amp;utm_medium=referral&amp;utm_source=pexels\" rel=\"noopener\">August de Richelieu<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/www.pexels.com\/photo\/family-making-breakfast-in-the-kitchen-4259140\/?utm_content=attributionCopyText&amp;utm_medium=referral&amp;utm_source=pexels\" rel=\"noopener\">Pexels<\/a>.<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"f28f\">Beyond the Decision Tree, the family of interpretable models includes&nbsp;<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/naive_bayes.html\" rel=\"noopener\">Naive Bayes Classifier<\/a>,&nbsp;<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LinearRegression.html\" rel=\"noopener\">Linear<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.LogisticRegression.html\" rel=\"noopener\">Logistic Regression<\/a>, and&nbsp;<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.neighbors.KNeighborsClassifier.html\" rel=\"noopener\">K-Nearest Neighbors<\/a>&nbsp;(for clustering and regression). These intrinsically interpretable models have the added benefit that they&nbsp;<strong>save significant time and resources in training and serving&nbsp;<\/strong>at a negligible cost to predictive performance relative to black box Neural Networks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"e0c5\">1.1 How to&nbsp;<strong>select the right model<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c351\">Whether aiming for interpretability or not, use this resource (<em>Decision Trees everywhere!<\/em>) to guide your model selection:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><a href=\"https:\/\/scikit-learn.org\/stable\/tutorial\/machine_learning_map\/index.html\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/02TnDlCN1PwpA-j0o.png\" alt=\"How to Future-Proof Your Data Science Project\"\/><\/a><figcaption>via&nbsp;<a href=\"https:\/\/scikit-learn.org\/stable\/tutorial\/machine_learning_map\/index.html\" rel=\"noopener\">sklearn<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"f3ff\">1.2 Read more about Model Selection<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/atlas-research\/model-selection-d190fb8bbdda\" target=\"_blank\" rel=\"noreferrer noopener\">Comprehensive Guide to Model SelectionA systematic approach to picking the right algorithm.<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5afc\">2.0 Prune for Productionization<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d85c\">Of course, sometimes going with a Neural Network may be your best option. Perhaps you\u2019re doing image recognition or natural language processing (NLP). Perhaps you\u2019re working with a very complicated dataset. If you\u2019re using a Neural Net, you should consider how to pare back the model before&nbsp;<a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867\" rel=\"noopener\">putting it into production<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1iFpx-xb7c26mecyOSatyzw-scaled.jpeg\" alt=\"How to Future-Proof Your Data Science Project\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/www.pexels.com\/@pixabay\" rel=\"noopener\">Pixabay<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/www.pexels.com\/@pixabay\" rel=\"noopener\">Pexels<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b544\">In the words of&nbsp;<a rel=\"noreferrer noopener\" href=\"https:\/\/medium.com\/u\/bcb31cd617bf?source=post_page-----cf754459f7ca--------------------------------\" target=\"_blank\">Mark Kurtz<\/a>, <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/guide-to-interpretable-machine-learning\/\" target=\"_blank\" rel=\"noreferrer noopener\">Machine Learning<\/a> Lead at&nbsp;<a href=\"https:\/\/neuralmagic.com\/\" rel=\"noopener\">Neural Magic<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Most weights in a neural network are actually useless.<\/p><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"2b84\">After training,&nbsp;<strong>60\u201390% of weights can be removed with no impact on performance<\/strong>. The result is faster inference time, reduced model size, and lower cost to serve users. In fact, the Neural Magic team argues that this sparsification could enable a&nbsp;<a href=\"https:\/\/medium.com\/limitlessai\/why-am-i-using-gpus-for-deep-learning-inference-7d0b10fb7624\" rel=\"noopener\">renaissance in CPU-based architectures and \u201cno hardware\u201d AI<\/a>.ICML Paper: Inducing and Exploiting Activation Sparsity for Fast Neural Network InferenceIn July 2020, at the International Conference on Machine Learning, we presented a paper on methods for maximizing the\u2026neuralmagic.com<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"76a5\">Pruning involves removing the unused pathways in the Neural Network, keeping the necessary ones.&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1506.02626\" rel=\"noopener\"><strong>Gradual magnitude pruning<\/strong>&nbsp;(GMP)<\/a>&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2003.03033\" rel=\"noopener\">has emerged as a favorite technique<\/a>. In general, unstructured pruning \u2014 i.e. the&nbsp;<strong>removal of specific weights<\/strong>&nbsp;rather than entire neurons \u2014 allows for greater control over the sparsification process, resulting in better performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"21b8\">2.1 How to prune your model before productionizing<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Retrain network at a slightly higher learning rate than final one used in training<\/li><li>At the start of epoch 1, set all the sparsity for all layers to be pruned to 5%<\/li><li>Iteratively remove the weights closest to zero once per epoch until designated sparsity is reached<\/li><li>Hold sparsity constant for the remainder of retraining while reducing learning rate<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9edc\">2.2 Read more about the Lottery Ticket Hypothesis<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a target=\"_blank\" rel=\"noreferrer noopener\" href=\"https:\/\/towardsdatascience.com\/must-read-data-science-papers-487cce9a2020\">5 Must-Read Data Science Papers (and How to Use Them)Foundational ideas to keep you on top of the data science game.<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1c0d\">3.0 Prevent Data and Model Drift<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"fce1\">After deployment, the forces of drift will inevitably buffet your model and cause its performance to degrade over time. Data drift occurs when the model\u2019s&nbsp;<strong>underlying input changes<\/strong>&nbsp;with a data feature or features longer measuring what was originally measured. Model drift occurs when&nbsp;<strong>environmental conditions change<\/strong>, and the model is no longer reliably representing the real world.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0WOpUfz5xYVd_32o5-scaled.jpeg\" alt=\"How to Future-Proof Your Data Science Project\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@ellenaalice?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Ellena McGuinness<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Unsplash<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"26b3\">3.0a Data Drift<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"491a\">Data drift is typically the result of changes in the data collection process. For example, a sensor at a manufacturing plant could break, recording several hours of zero temperatures before the problem can be corrected. Then the new may sensor may record temperatures in celsius, rather than the previous measurement in Fahrenheit. Without context on these changes, the zero values and switch to a new standard of measurement will have an adverse effect on the downstream model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"85ed\">The same can be said for changes to qualitative information. Survey data collection methodology \u2014 e.g. switching from mailing questionnaires to polling landlines \u2014 will have an impact on the demographics of respondents. Even slight changes to the way a question is worded will adversely impact a model\u2019s capability to draw longitudinal inferences from the dataset.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"dacc\">Data drift could also result from changes to the definitions of the fields in the dataset. For example, the data owner at the manufacturing plant could decide that the term \u201cscrap\u201d should refer not just to unusable material, but also material that will eventually reprocessed into recycled products. This change in terminology will also impact model performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"ff35\">3.0b Model Drift<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"04ea\">Changes in the real world environment may degrade a model\u2019s predictive power.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a789\">Given the cataclysm of a year that 2020 has been, models of consumer behavior generally need to be kicked to the curb.&nbsp;<a href=\"https:\/\/medium.com\/u\/fbfc6b1ae37a?source=post_page-----cf754459f7ca--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">Carl gold<\/a>&nbsp;is the Chief Data Scientist at&nbsp;<a href=\"https:\/\/www.zuora.com\/\" rel=\"noopener\">Zuora<\/a>, a services provider for subscription businesses that helps them move beyond analytics with advanced data products.&nbsp;<a href=\"https:\/\/theartistsofdatascience.fireside.fm\/carl-gold\" rel=\"noopener\">In a recent interview<\/a>, he shared his perspective on the impact of the pandemic:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>I\u2019m telling everyone to update their model. Now, if you do a new churn model, you should really only use data since COVID if possible.<\/p><p>That will only be possible for a consumer company that has a lot of observations. Generally, business-to-business companies have a small data challenge. So there\u2019s so many competing concerns with refitting your model.<\/p><p><strong>The job doesn\u2019t stop once you\u2019ve deployed.<\/strong><\/p><p>You should continuously monitor your model\u2019s predictions for accuracy because that\u2019ll actually give you the warning sign if it\u2019s been too long since retraining.<\/p><\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"8682\">3.1 How to make your model robust to drift<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Set up a&nbsp;<strong>Data Sharing Agreement<\/strong>&nbsp;with data source owners to receive advanced warnings of data drift<\/li><li>Monitor the distribution of incoming data against original training data \u2014 you can do this using the&nbsp;<a href=\"https:\/\/www.statisticshowto.com\/kolmogorov-smirnov-test\/\" rel=\"noopener\"><strong>Kolmogorov-Smirnov (K-S) test<\/strong><\/a>&nbsp;or simply comparing the&nbsp;<a href=\"https:\/\/www.statisticshowto.com\/probability-and-statistics\/z-score\/\" rel=\"noopener\"><strong>z-score<\/strong><\/a><\/li><li>Monitor a time series dataset for drift from the previous time period \u2014 you may want to deploy the&nbsp;<a href=\"https:\/\/www.listendata.com\/2015\/05\/population-stability-index.html\" rel=\"noopener\"><strong>Population Stability Index (PSI)<\/strong><\/a>&nbsp;metric to do so<\/li><li>Retrain your model on a scheduled basis \u2014 e.g every five months \u2014 or through&nbsp;<strong>online learning<\/strong>, where the model is constantly intaking new training data and new versions are released in a continuous integration \/ continuous deployment process.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10e9\">3.2 Read more about model retraining<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.researchgate.net\/publication\/321627304_Online_Ensemble_Learning_with_Abstaining_Classifiers_for_Drifting_and_Noisy_Data_Streams\" target=\"_blank\" rel=\"noreferrer noopener\">Online Ensemble Learning with Abstaining Classifiers for Drifting and Noisy Data StreamsMining data streams is among most vital contemporary topics in machine learning. Such scenario requires adaptive\u2026www.researchgate.net<\/a><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/15GTwbE5HZ4F4Cykjrd_q6Q.png\" alt=\"Read more about model retraining\"\/><figcaption>via&nbsp;<a href=\"https:\/\/www.linkedin.com\/posts\/eric-weber-060397b7_data-datascience-activity-6725400328046567424-pO4f\" rel=\"noopener\">LinkedIn<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4d19\">4.0 Take Advantage of Positive Feedback Loops<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"a592\">Algorithms are a powerful tool for empowering data-driven action. Through retraining on paired predicted and actual data, the results of the model become increasingly sophisticated over time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c0e1\">The output of the&nbsp;<a href=\"https:\/\/districtdatalabs.silvrback.com\/the-age-of-the-data-product\" rel=\"noopener\"><strong>data product<\/strong><\/a>&nbsp;provides high quality signals when integrated back into the data lifecycle.&nbsp;<a href=\"https:\/\/medium.com\/u\/592ce2a67248?source=post_page-----cf754459f7ca--------------------------------\" target=\"_blank\" rel=\"noreferrer noopener\">Andrew Ng<\/a>&nbsp;referred to this concept as the<a href=\"https:\/\/www.youtube.com\/watch?v=21EiKfQYZXc\" rel=\"noopener\"><strong>virtuous cycle of AI<\/strong><\/a>.&nbsp;<em>Harvard Business Review<\/em>&nbsp;called it the&nbsp;<a href=\"https:\/\/hbr.org\/2016\/09\/building-an-insights-engine\" rel=\"noopener\"><strong>insights engine<\/strong><\/a>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1oGDrCcJG2M4fR4cCG2wYfA-scaled.jpeg\" alt=\"Take Advantage of Positive Feedback Loops\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@noemieke?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">No\u00e9mi Macavei-Kat\u00f3cz<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Unsplash<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c64c\">Robust capture of data-driven decisions and their outcomes could further enrich the data collection process. Hopefully soon,&nbsp;<strong>more<\/strong>&nbsp;<strong>feedback collection opportunities<\/strong>&nbsp;will be built into dashboards, web interfaces, and other data products. Feedback collection can empower the end user and improve the insight engine.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"6c9e\">4.1 How to take advantage of positive cycles<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Communicate with stakeholders<\/strong>&nbsp;at the beginning of the&nbsp;<a href=\"https:\/\/towardsdatascience.com\/data-science-planning-c0649c52f867\" rel=\"noopener\">planning process<\/a>&nbsp;about the outsized benefits of effective machine learning models<\/li><li>Create<strong>&nbsp;data collection pipelines<\/strong>&nbsp;from the deployed model<\/li><li>Ensure&nbsp;<strong>accuracy of metadata<\/strong><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"8ab7\">4.2 Read more about what makes for an effective data product<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/districtdatalabs.silvrback.com\/the-age-of-the-data-product\" target=\"_blank\" rel=\"noreferrer noopener\">The Age of the Data ProductWe are living through an information revolution. Like any economic revolution, it has had a transformative effect on\u2026districtdatalabs.silvrback.com<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"911a\">5.0 Prevent Negative Feedback Loops<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"63f9\">A word of caution: far from being a self-sustaining system, a data product requires consistent monitoring. While the algorithmic feedback loop can create an insight-enriched dataset, it can also generate a&nbsp;<strong>bias-perpetuating cycle<\/strong>. There are many examples where the deployment of machine learning tools, particularly those with limited interpretability and explainability, accidentally deepened societal biases.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1gS2QQ7boRM_-pRBBaMwNsw.jpeg\" alt=\"Prevent Negative Feedback Loops\"\/><figcaption>Photo by&nbsp;<a href=\"https:\/\/unsplash.com\/@kevin_lee?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Kevin Lee<\/a>&nbsp;on&nbsp;<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" rel=\"noopener\">Unsplash<\/a><\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3501\">For example, a data science contracting firm created an algorithm to predict recidivism that was deployed in New York, Wisconsin, California, Florida, and other jurisdictions.&nbsp;<a href=\"https:\/\/www.propublica.org\/article\/how-we-analyzed-the-compas-recidivism-algorithm\" rel=\"noopener\">ProPublica<\/a>&nbsp;found that&nbsp;<strong>the algorithm perpetuated existing inequalities into a well-trodden feedback loop<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"0f4a\">Although the defendant\u2019s race was explicitly left out of the feature set, the algorithm used features highly correlated to race that informed inadvertently biased judgments. These features should also have been eliminated in order to reduce disparities in the judgment of the machine learning system.&nbsp;<a href=\"https:\/\/medium.com\/atlas-research\/model-selection-d190fb8bbdda#1ac9\" rel=\"noopener\"><em>Read more about these risks in this article<\/em><\/a><em>.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b7ba\">As a lighthearted solution to the stagnation of a negative feedback loop, a computer scientist invented a randomness generator to shake up his social life:Randomized LivingStarting in 2015, I let a computer decide where I lived and what I did for over two years. It sent me all over the world\u2026maxhawkins.me<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"486f\">5.1 How to avoid a downward spiral<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Start with a&nbsp;<a href=\"https:\/\/deon.drivendata.org\/\" rel=\"noopener\"><strong>checklist<\/strong><\/a>&nbsp;that helps you think through the ethical implications of your model<\/li><li>Thoroughly&nbsp;<a href=\"https:\/\/github.com\/Trusted-AI\/AIF360\" rel=\"noopener\"><strong>investigate potential sources<\/strong><\/a>&nbsp;of bias in your pre-processing, processing, and post-processing phases of model training \u2014 and then remediate sources of bias<\/li><li><a href=\"https:\/\/modelcards.withgoogle.com\/\" rel=\"noopener\"><strong>Communicate model performance<\/strong><\/a>&nbsp;across protected classes in documentation<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"faa9\">5.2 Read more about anti-bias tools<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/medium.com\/atlas-research\/ethical-ai-tools-b9d276a49fea\" target=\"_blank\" rel=\"noreferrer noopener\">3 Open Source Tools for Ethical AIBefore integrating artificial intelligence into your organization\u2019s workflow, consider these tools to prevent machine\u2026<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"60ef\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3c46\"><strong><em>Failing to plan is planning to fail.<\/em><\/strong>&nbsp;So said Benjamin Franklin, immediately before getting struck by lightning while flying a kite out his bedroom window during a thunderstorm.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"84b2\">I like to think that if he were alive today, the Founding Father of the $100 bill would have been building a GPU-powered deep learning box, regularly PR\u2019ing open source projects, and&nbsp;<strong>selecting and deploying models<\/strong>&nbsp;like a boss.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5a7d\">By starting off your next data science project with a robust planning process, you can ensure your model has better than&nbsp;<a href=\"https:\/\/venturebeat.com\/2019\/07\/19\/why-do-87-of-data-science-projects-never-make-it-into-production\/\" rel=\"noopener\"><strong>1:9 odds<\/strong>&nbsp;of making it into production<\/a>. Use these tips for better model selection and deployment:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><a href=\"https:\/\/towardsdatascience.com\/model-selection-and-deployment-cf754459f7ca#ed8f\" rel=\"noopener\">Don\u2019t Underestimate Interpretability<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/model-selection-and-deployment-cf754459f7ca#5afc\" rel=\"noopener\">Prune for Productionization<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/model-selection-and-deployment-cf754459f7ca#1c0d\" rel=\"noopener\">Prevent Data and Model Drift<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/model-selection-and-deployment-cf754459f7ca#4d19\" rel=\"noopener\">Take Advantage of Positive Feedback Loops<\/a><\/li><li><a href=\"https:\/\/towardsdatascience.com\/model-selection-and-deployment-cf754459f7ca#911a\" rel=\"noopener\">Prevent Negative Feedback Loops<\/a><\/li><\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Only fly a kite out your bedroom window during a thunderstorm if you want to get&nbsp;<a href=\"https:\/\/quoteinvestigator.com\/2018\/07\/08\/plan\/\" rel=\"noopener\">misquoted<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deployment is hard. Machine learning models can take on a life of their own.Here\u2019s a list of five practical steps for future-proofing your model against these challenges of model selection and deployment.<\/p>\n","protected":false},"author":940,"featured_media":16856,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[1003,1004,850,1005],"ppma_author":[3860],"class_list":["post-22440","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-data-science-project","tag-deployment","tag-machine-learning-models","tag-model-selection"],"authors":[{"term_id":3860,"user_id":940,"is_guest":0,"slug":"nicole-janeway-bills","display_name":"Nicole Janeway","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/Nicole-Janeway-Bills-150x150.jpg","author_category":"","user_url":"https:\/\/page.co\/ahje9p","last_name":"Janeway","first_name":"Nicole","job_title":"","description":"Nicole Janeway, a Data Scientist at Atlas Research, has  experience in commercial and federal consulting.  She helps organizations leverage their top asset:  a simple and robust Data Strategy. <a href=\"https:\/\/page.co\/ahje9p\/\"> Sign up<\/a> for more of her writing."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22440","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/940"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22440"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22440\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/16856"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22440"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22440"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22440"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22440"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}