{"id":2243,"date":"2020-02-07T01:38:04","date_gmt":"2020-02-06T22:38:04","guid":{"rendered":"http:\/\/kusuaks7\/?p=1848"},"modified":"2024-01-16T11:42:56","modified_gmt":"2024-01-16T11:42:56","slug":"experiment-management-how-to-organize-your-model-development-process","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/experiment-management-how-to-organize-your-model-development-process\/","title":{"rendered":"Experiment Management: How to Organize Your Model Development Process"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2243\" class=\"elementor elementor-2243\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-5c045c68 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5c045c68\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-75bb946c\" data-id=\"75bb946c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-350f36d9 elementor-widget elementor-widget-text-editor\" data-id=\"350f36d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<section data-element_type=\"section\" data-id=\"d484f9c\">Let me share a story that I\u2019ve heard too many times.<em>\u201d So I was developing a machine learning model with my team and within a few weeks of extensive experimentation we\u00a0<strong>got promising results<\/strong>\u2026<\/em><em>\u2026unfortunately, we couldn\u2019t tell exactly what performed best because\u00a0<strong>we didn\u2019t track<\/strong>\u00a0feature versions, didn\u2019t record the parameters, and used different environments to run our models\u2026<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-69eb58f elementor-widget elementor-widget-text-editor\" data-id=\"69eb58f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<em>\u2026after a few weeks,\u00a0<strong>we weren\u2019t even sure what we have actually tried<\/strong>\u00a0so we needed to rerun pretty much everything\u201d<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6ecd773 elementor-widget elementor-widget-text-editor\" data-id=\"6ecd773\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSound familiar?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8ed78b1 elementor-widget elementor-widget-text-editor\" data-id=\"8ed78b1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn this article, I will show you how you can keep track of your machine learning experiments and organize your model development efforts so that stories like that will never happen to you.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dfaac30 elementor-widget elementor-widget-heading\" data-id=\"dfaac30\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>You will learn about<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a08903 elementor-widget elementor-widget-text-editor\" data-id=\"2a08903\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"3a3a440e\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#1\" class=\"broken_link\" rel=\"noopener\">What is experiment management?\u200b<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#1\" class=\"broken_link\" rel=\"noopener\">What is experiment management?\u200b<\/a><\/section><section data-element_type=\"section\" data-id=\"3dffd26\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#2\" class=\"broken_link\" rel=\"noopener\">Tracking ML experiments\u200b<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#2\" class=\"broken_link\" rel=\"noopener\">Tracking ML experiments\u200b<\/a><\/section><section data-element_type=\"section\" data-id=\"2f8491a\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#3\" class=\"broken_link\" rel=\"noopener\">Code version control for data science\u200b<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#3\" class=\"broken_link\" rel=\"noopener\">Code version control for data science\u200b<\/a><\/section><section data-element_type=\"section\" data-id=\"3fc92c5\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#4\" class=\"broken_link\" rel=\"noopener\">Tracking hyperparameters<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#4\" class=\"broken_link\" rel=\"noopener\">Tracking hyperparameters<\/a><\/section><section data-element_type=\"section\" data-id=\"19ffc8a\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#5\" class=\"broken_link\" rel=\"noopener\">Data versioning\u200b<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#5\" class=\"broken_link\" rel=\"noopener\">Data versioning\u200b<\/a><\/section><section data-element_type=\"section\" data-id=\"af54bfc\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#6\" class=\"broken_link\" rel=\"noopener\">Tracking machine learning metrics<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#6\" class=\"broken_link\" rel=\"noopener\">Tracking machine learning metrics<\/a><\/section><section data-element_type=\"section\" data-id=\"ac37d98\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#7\" class=\"broken_link\" rel=\"noopener\">Versioning data science environment\u200b<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#7\" class=\"broken_link\" rel=\"noopener\">Versioning data science environment\u200b<\/a><\/section><section data-element_type=\"section\" data-id=\"55672ca\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#8\" class=\"broken_link\" rel=\"noopener\">Experiment organization<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#8\" class=\"broken_link\" rel=\"noopener\">Experiment organization<\/a><\/section><section data-element_type=\"section\" data-id=\"755fac3\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#9\" class=\"broken_link\" rel=\"noopener\">Working in creative iterations<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#9\" class=\"broken_link\" rel=\"noopener\">Working in creative iterations<\/a><\/section><section data-element_type=\"section\" data-id=\"51b680f\"><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#10\" class=\"broken_link\" rel=\"noopener\">Model results exploration\u200b<\/a><a href=\"https:\/\/neptune.ai\/blog\/experiment-management#10\" class=\"broken_link\" rel=\"noopener\">Model results exploration\u200b<\/a><\/section><section data-element_type=\"section\" data-id=\"37d6c38\">\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6e4845d elementor-widget elementor-widget-heading\" data-id=\"6e4845d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>What is experiment management?<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f43937 elementor-widget elementor-widget-text-editor\" data-id=\"4f43937\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"a4989fc\">Experiment management in the context of <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/what-machine-learning-data-poisoning\/\">machine learning<\/a> is a process of\u00a0<strong>tracking experiment metadata<\/strong>\u00a0like:\n<ul>\n \t<li>code versions<\/li>\n \t<li>data versions<\/li>\n \t<li>hyperparameters<\/li>\n \t<li>environment<\/li>\n \t<li>metrics<\/li>\n<\/ul>\n<strong>organizing them<\/strong>\u00a0in a meaningful way and making them\u00a0<strong>available to access and collaborate on<\/strong>\u00a0within your organization.\n\nIn the next sections, you will see exactly what that means with examples and implementations.\n\n<\/section><section data-element_type=\"section\" data-id=\"327ba6e\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-99f7553 elementor-widget elementor-widget-heading\" data-id=\"99f7553\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Tracking ML experiments<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-43aa84e elementor-widget elementor-widget-text-editor\" data-id=\"43aa84e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"7105476\">What I mean by\u00a0<strong>tracking is collecting all the metainformation<\/strong>\u00a0about your machine learning experiments that is needed to:\n<ul>\n \t<li>share your results and insights with the team (and you in the future),<\/li>\n \t<li>reproduce results of the machine learning experiments,<\/li>\n \t<li>keep your results, that take a long time to generate, safe.<\/li>\n<\/ul>\nLet\u2019s go through all the pieces of an experiment that I believe should be recorded, one by one.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-67e8e17 elementor-widget elementor-widget-heading\" data-id=\"67e8e17\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Code version control for data science<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-021ab64 elementor-widget elementor-widget-text-editor\" data-id=\"021ab64\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"fd10019\">Okay, in 2019 I think pretty much everyone working with code knows about version control. Failing to keep track of your code is a big, but obvious and easy to fix the problem.Should we just proceed to the next section? Not so fast.<\/section><section data-element_type=\"section\" data-id=\"88ca07c\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5c06dc1 elementor-widget elementor-widget-heading\" data-id=\"5c06dc1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Problem 1: Jupyter notebook version control<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-59d8b27 elementor-widget elementor-widget-text-editor\" data-id=\"59d8b27\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"55c4217\">A large part of\u00a0<strong>data science development is happening in Jupyter notebooks<\/strong>\u00a0which are more than just code. Fortunately, there are tools that help with notebook versioning and diffing. Some tools that I know:\n<ul>\n \t<li>nbconvert (.ipynb -&gt; .py conversion)<\/li>\n \t<li><a href=\"https:\/\/github.com\/jupyter\/nbdime\" target=\"_blank\" rel=\"noopener noreferrer\">nbdime<\/a>\u00a0(diffing)<\/li>\n \t<li><a href=\"https:\/\/github.com\/mwouts\/jupytext\" target=\"_blank\" rel=\"noopener noreferrer\">jupytext<\/a>\u00a0(conversion+versioning)<\/li>\n \t<li><a href=\"https:\/\/docs.neptune.ml\/notebooks\/introduction.html#quick-start\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">neptune-notebooks<\/a>\u00a0(versioning+diffing+sharing)<\/li>\n<\/ul>\nOnce you have your notebook versioned, I would suggest to go the extra mile and make sure that it runs top to bottom. For that you can use jupytext or nbconvert:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7867dcf elementor-widget elementor-widget-text-editor\" data-id=\"7867dcf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"6d8170e\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">jupyter nbconvert &#8211;to script train_model.ipynb python train_model.py;\npython train_model.py<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"9213f8b\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"7e7da42\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-af395f1 elementor-widget elementor-widget-heading\" data-id=\"af395f1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Problem 2: Experiments on dirty commits<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d34d746 elementor-widget elementor-widget-text-editor\" data-id=\"d34d746\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"cab450e\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-d598ca0 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d598ca0\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d8e93ad\" data-id=\"d8e93ad\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-073cacc elementor-widget elementor-widget-text-editor\" data-id=\"073cacc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"7e661bc\">Data science people tend to not follow the best practices of software development. You can always find someone (me included) who would ask:\n<blockquote>\n<h4 style=\"text-align: center;\"><em>\u201cBut how about tracking code in-between commits? What if someone runs an experiment without committing\u00a0the code?\u201d<\/em><\/h4>\n<\/blockquote>\nOne option is to explicitly forbid running code on dirty commits. Another option is to give users an additional safety net and snapshot code whenever they run an experiment. Each one has its pros and cons and it is up to you to decide.\n\n<\/section><section data-element_type=\"section\" data-id=\"259ee3e\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2e75607 elementor-widget elementor-widget-heading\" data-id=\"2e75607\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Tracking hyperparameters<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-53809ce elementor-widget elementor-widget-text-editor\" data-id=\"53809ce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"e8f356e\">Every machine learning model or pipeline needs hyperparameters. Those could be learning rate, number of trees or a missing value imputation method. Failing to keep track of hyperparameters can result in weeks of wasted time looking for them or retraining models.The good thing is,<strong>\u00a0keeping track of hyperparameters can be really simple<\/strong>. Let\u2019s start with the way people tend to define them and then we\u2019ll proceed to hyperparameter tracking:<\/section><section data-element_type=\"section\" data-id=\"25b5027\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e278615 elementor-widget elementor-widget-heading\" data-id=\"e278615\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Config files<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0a7ebf7 elementor-widget elementor-widget-text-editor\" data-id=\"0a7ebf7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"e371d39\">Typically a\u00a0<em>.yaml<\/em>\u00a0file that contains all the information that your script needs to run. For example:<\/section><section data-element_type=\"section\" data-id=\"5f3bdf1\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">data:\ntrain_path: &#8216;\/path\/to\/my\/train.csv&#8217;\nvalid_path: &#8216;\/path\/to\/my\/valid.csv&#8217;<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">\u200bmodel:\nobjective: &#8216;binary&#8217;\nmetric: &#8216;auc&#8217;\nlearning_rate: 0.1\nnum_boost_round: 200\nnum_leaves: 60\nfeature_fraction: 0.2<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"94c0ecd\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"07728c8\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f4d5643 elementor-widget elementor-widget-heading\" data-id=\"f4d5643\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Command line + argparse<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0844f80 elementor-widget elementor-widget-text-editor\" data-id=\"0844f80\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"0a030e0\">You simply pass your parameters to your script as arguments:<\/section><section data-element_type=\"section\" data-id=\"571049a\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">python train_evaluate.py\n&#8211;train_path &#8216;\/path\/to\/my\/train.csv&#8217;\n&#8211;valid_path &#8216;\/path\/to\/my\/valid.csv&#8217;\n&#8212; objective &#8216;binary&#8217;\n&#8212; metric &#8216;auc&#8217;\n&#8212; learning_rate 0.1\n&#8212; num_boost_round 200\n&#8212; num_leaves 60\n&#8212; feature_fraction 0.2<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"e43cdd5\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"1230423\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d9dff5d elementor-widget elementor-widget-heading\" data-id=\"d9dff5d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Parameters dictionary in main.py<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4733752 elementor-widget elementor-widget-text-editor\" data-id=\"4733752\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"72d0154\">You put all of your parameters in a dictionary inside your script:<\/section><section data-element_type=\"section\" data-id=\"3cb2291\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">TRAIN_PATH = &#8216;\/path\/to\/my\/train.csv&#8217;\nVALID_PATH = &#8216;\/path\/to\/my\/valid.csv&#8217;<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">PARAMS = {&#8216;objective&#8217;: &#8216;binary&#8217;,\n&#8216;metric&#8217;: &#8216;auc&#8217;,\n&#8216;learning_rate&#8217;: 0.1,\n&#8216;num_boost_round&#8217;: 200,\n&#8216;num_leaves&#8217;: 60,\n&#8216;feature_fraction&#8217;: 0.2}<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"7cc7459\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"8a82c6b\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a3451a7 elementor-widget elementor-widget-heading\" data-id=\"a3451a7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Magic numbers all over the place<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5114e41 elementor-widget elementor-widget-text-editor\" data-id=\"5114e41\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"c4aed0f\">Whenever you need to pass a parameter you simply pass a value of that parameter.<\/section><section data-element_type=\"section\" data-id=\"71810eb\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">&#8230;\ntrain = pd.read_csv(&#8216;\/path\/to\/my\/train.csv&#8217;)<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">model = Model(objective=&#8217;binary&#8217;,\nmetric=&#8217;auc&#8217;,\nlearning_rate=0.1,\nnum_boost_round=200,\nnum_leaves=60,\nfeature_fraction=0.2)\nmodel.fit(train)<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">valid = pd.read_csv(&#8216;\/path\/to\/my\/valid.csv&#8217;)\nmodel.evaluate(valid)<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"e4c800b\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"4fa4127\">We all do that sometimes but it is not a great idea especially if someone will need to take over your work.Ok, so I do like\u00a0<em>.yaml<\/em>\u00a0configs and passing arguments from the command line (option 1 and 2), but anything other than magic numbers is fine. What is important is that you\u00a0<strong>log those parameters for every experiment<\/strong>.If you decide to pass all parameters as the script arguments\u00a0<strong>make sure to log them somewhere<\/strong>. It is easy to forget, so using an experiment management tool that does this automatically can save you here.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ebca3f2 elementor-widget elementor-widget-text-editor\" data-id=\"ebca3f2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"0fd396e\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">parser = argparse.ArgumentParser()\nparser.add_argument(&#8216;&#8211;number_trees&#8217;)\nparser.add_argument(&#8216;&#8211;learning_rate&#8217;)\nargs = parser.parse_args()<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">experiment_manager.create_experiment(params=vars(args))\n&#8230;\n# experiment logic\n&#8230;<\/span><\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6c203e3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6c203e3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-37fb3bc\" data-id=\"37fb3bc\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f16bb60 elementor-widget elementor-widget-text-editor\" data-id=\"f16bb60\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"826de91\">There is\u00a0<strong>nothing so painful<\/strong>\u00a0<strong>as to<\/strong>\u00a0have a perfect script on perfect data version producing perfect metrics only to\u00a0<strong>discover that you don\u2019t remember what are the hyperparameters<\/strong>\u00a0that were passed as arguments.<\/section><section data-element_type=\"section\" data-id=\"5bcec413\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;}\"><strong>Note:<\/strong>A bonus of having your hyperparameters abstracted away entirely (option 1 and 2) is that you implicitly turn your training and evaluation scripts into an<strong>\u00a0objective function<\/strong>\u00a0that you can optimize automatically:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-fe6aa83 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"fe6aa83\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-249444c\" data-id=\"249444c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-dc595f4 elementor-widget elementor-widget-text-editor\" data-id=\"dc595f4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThat means you can use readily available libraries and\u00a0<strong>run hyperparameter optimization algorithms with virtually no additional work<\/strong>! If you are interested in the subject please check out my blog post series about hyperparameter optimization libraries in Python.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4d004e6 elementor-widget elementor-widget-heading\" data-id=\"4d004e6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Data versioning<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4dc59db elementor-widget elementor-widget-text-editor\" data-id=\"4dc59db\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"68a2800\">In real-life projects, data is changing over time. Some typical situations include:\n<ul>\n \t<li>new images are added,<\/li>\n \t<li>labels are improved,<\/li>\n \t<li>mislabeled\/wrong data is removed,<\/li>\n \t<li>new data tables are discovered,<\/li>\n \t<li>new features are engineered and processed,<\/li>\n \t<li>validation and testing datasets change to reflect the production environment.<\/li>\n<\/ul>\nWhenever your\u00a0<strong>data changes<\/strong>, the output of your analysis, report or\u00a0<strong>experiment results will likely change<\/strong>\u00a0even though the code and environment did not. That is why to make sure you are comparing apples to apples you need to\u00a0<strong>keep track of your data versions<\/strong>.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3530fed elementor-widget elementor-widget-text-editor\" data-id=\"3530fed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong>Having almost everything versioned<\/strong>\u00a0and getting different results can be extremely frustrating, and\u00a0<strong>can mean a lot of time (and money) in wasted effort<\/strong>. The sad part is that you can do little about it afterward. So again, keep your experiment data versioned.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8b5568a elementor-widget elementor-widget-text-editor\" data-id=\"8b5568a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFor the vast majority of use cases whenever new data comes in you can\u00a0<strong>save it in a new location and log this location and a hash<\/strong>\u00a0of the data. Even if the data is very large, for example when dealing with images, you can create a smaller metadata file with image paths and labels and track changes of that file.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4e0d79a elementor-widget elementor-widget-text-editor\" data-id=\"4e0d79a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nA wise man once told me:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b76a77c elementor-widget elementor-widget-heading\" data-id=\"b76a77c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><blockquote>\n<h4 style=\"text-align: center\"><em>\u201cStorage is cheap, training a model for 2 weeks on an 8-GPU node is not.\u201d<\/em><\/h4>\n<\/blockquote><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a3eb880 elementor-widget elementor-widget-text-editor\" data-id=\"a3eb880\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnd if you think about it, logging this information doesn\u2019t have to be rocket science.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-94293f0 elementor-widget elementor-widget-text-editor\" data-id=\"94293f0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"aaf5dae\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">exp.set_property(&#8216;data_path&#8217;, &#8216;DATASET_PATH&#8217;)\nexp.set_property(&#8216;data_version&#8217;, md5_hash(&#8216;DATASET_PATH&#8217;))<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"21af1b5\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"e6da899\">You can calculate hash yourself, use a simple\u00a0data versioning extension\u00a0or outsource hashing to a full-blown data versioning tool like\u00a0<a href=\"https:\/\/dvc.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">DVC<\/a>.Whichever option you decide is best for your project\u00a0<strong>please version your data<\/strong>.<\/section><section data-element_type=\"section\" data-id=\"0902884\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;}\"><strong>Note:<\/strong>I know that 10x data scientists can read data hash and know exactly what it is, but you may also want to log something a bit more readable for us mere mortals. For example, I wrote a simple function that lets you\u00a0<a href=\"https:\/\/neptune-contrib.readthedocs.io\/examples\/image_dir_snapshots.html#\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">log a snapshot of your image directory<\/a>\u00a0to Neptune:\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">from neptunecontrib.versioning.data import log_image_dir_snapshots\nlog_image_dir_snapshots(&#8216;path\/to\/my\/image_dir\/&#8217;)<\/span><\/div>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-cc2336c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"cc2336c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c85f03e\" data-id=\"c85f03e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bedf610 elementor-widget elementor-widget-heading\" data-id=\"bedf610\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Tracking machine learning metrics<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a3f9a77 elementor-widget elementor-widget-text-editor\" data-id=\"a3f9a77\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"734b1e6\">I have never found myself in a situation where I thought that I have logged too many metrics for my experiment, have you?<strong>In a real-world project, the metrics you care about can change<\/strong>\u00a0due to new discoveries or changing specifications so logging more metrics can actually save you some time and trouble in the future.Either way, my suggestion is:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6fa567d elementor-widget elementor-widget-heading\" data-id=\"6fa567d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><blockquote>\n<h4 style=\"text-align: center\"><em>\u201cLog metrics, log them all\u201d<\/em><\/h4>\n<\/blockquote><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1b8a6ab elementor-widget elementor-widget-text-editor\" data-id=\"1b8a6ab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTypically, metrics are as simple as a single number\n\n<\/section><section data-element_type=\"section\" data-id=\"3bf9e15\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">exp.send_metric(&#8216;train_auc&#8217;, train_auc)\nexp.send_metric(&#8216;valid_auc&#8217;, valid_auc)<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"baadf52\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"09e624d\">but I like to think of it as something a bit broader. To understand if your model has improved, you may want to take a look at a chart, confusion matrix or distribution of predictions. Those, in my view, are still metrics because they help you measure the performance of your experiment.<\/section><section data-element_type=\"section\" data-id=\"39f8b44\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-38c0837 elementor-widget elementor-widget-text-editor\" data-id=\"38c0837\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">exp.send_image(&#8216;diagnostics&#8217;, &#8216;confusion_matrix.png&#8217;)<br \/>exp.send_image(&#8216;diagnostics&#8217;, &#8216;roc_auc.png&#8217;)<br \/>exp.send_image(&#8216;diagnostics&#8217;, &#8216;prediction_dist.png&#8217;)<\/span><\/div><section data-element_type=\"section\" data-id=\"4f60819\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"bad538f\"><p style=\"text-align: center;\">\u00a0<\/p><\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-8fa45c9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8fa45c9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-bd497c4\" data-id=\"bd497c4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e494958 elementor-widget elementor-widget-text-editor\" data-id=\"e494958\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNote:<\/strong>Tracking metrics<strong>\u00a0both on training and validation<\/strong>\u00a0datasets can help you assess the risk of the model not performing well in production. The smaller the gap the lower the risk. A great resource is this kaggle days talk by Jean-Fran\u00e7ois Puget.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-15c8425 elementor-widget elementor-widget-video\" data-id=\"15c8425\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;youtube_url&quot;:&quot;https:\\\/\\\/www.youtube.com\\\/embed\\\/VC8Jc9_lNoY?feature=oembed&amp;start=706&amp;end&amp;wmode=opaque&amp;loop=0&amp;controls=1&amp;mute=0&amp;rel=0&amp;modestbranding=0\\&quot;&quot;,&quot;video_type&quot;:&quot;youtube&quot;,&quot;controls&quot;:&quot;yes&quot;}\" data-widget_type=\"video.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-wrapper elementor-open-inline\">\n\t\t\t<div class=\"elementor-video\"><\/div>\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b32c11 elementor-widget elementor-widget-text-editor\" data-id=\"7b32c11\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"8b394de\">Moreover, if you are working with data collected at different timestamps you can assess model performance decay and\u00a0<strong>suggest proper model retraining schema<\/strong>. Simply track metrics at different timeframes of your validation data and see how the performance drops.<\/section><section data-element_type=\"section\" data-id=\"54042db\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7621163 elementor-widget elementor-widget-heading\" data-id=\"7621163\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Versioning data science environment<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ae98d34 elementor-widget elementor-widget-text-editor\" data-id=\"ae98d34\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"b1191fe\">The majority of problems with environment versioning can be summarized by the infamous quote:\n<blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-20b753f elementor-widget elementor-widget-heading\" data-id=\"20b753f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><blockquote>\n<h4 style=\"text-align: center\"><em>\u201cI don\u2019t understand, it worked on my machine.\u201d<\/em><\/h4>\n<\/blockquote><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6e9cf86 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6e9cf86\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b0db90e\" data-id=\"b0db90e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8a320ff elementor-widget elementor-widget-text-editor\" data-id=\"8a320ff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne approach that helps solve this issue can be called\u00a0<strong><em>\u201cenvironment as code\u201d<\/em><\/strong>\u00a0where the environment can be created by executing instructions (<em>bash\/yaml\/docker<\/em>) step-by-step. By embracing this approach you can<strong>\u00a0switch from versioning the environment to versioning environment set-up code<\/strong>\u00a0which we know how to do.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-69d6c44 elementor-widget elementor-widget-text-editor\" data-id=\"69d6c44\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are a few options that I know to be used in practice (by no means this is a full list of approaches).\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b7dc17 elementor-widget elementor-widget-heading\" data-id=\"5b7dc17\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><\/section><section data-element_type=\"section\" data-id=\"055e527\">\n<h4>Docker images<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9cc15c4 elementor-widget elementor-widget-text-editor\" data-id=\"9cc15c4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"1e415a7\">This is the preferred option and there are a lot of resources on the subject. One that I particularly like is the<a href=\"https:\/\/towardsdatascience.com\/learn-enough-docker-to-be-useful-b7ba70caeb4b\" target=\"_blank\" rel=\"noopener noreferrer\">\u00a0\u201cLearn Enough Docker to be useful\u201d series<\/a>\u00a0by Jeff Hale.\nIn a nutshell, you define the Dockerfile with some instructions.<\/section><section data-element_type=\"section\" data-id=\"5072974\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\"># Use a miniconda3 as base image\nFROM continuumio\/miniconda3<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\"># Installation of jupyterlab\nRUN pip install jupyterlab==0.35.6 &amp;&amp;\npip install jupyterlab-server==0.2.0 &amp;&amp;\nconda install -c conda-forge nodejs<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\"># Installation of Neptune and enabling neptune extension\nRUN pip install neptune-client &amp;&amp;\npip install neptune-notebooks &amp;&amp;\njupyter labextension install neptune-notebooks<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\"># Setting up Neptune API token as env variable\nARG NEPTUNE_API_TOKEN\nENV NEPTUNE_API_TOKEN=$NEPTUNE_API_TOKEN<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\"># Adding current directory to container\nADD . \/mnt\/workdir\nWORKDIR \/mnt\/workdir<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"36d801f\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"7b47646\">You build your environment from those instructions:<\/section><section data-element_type=\"section\" data-id=\"789e32d\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">docker build -t jupyterlab\n&#8211;build-arg NEPTUNE_API_TOKEN=$NEPTUNE_API_TOKEN .<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"396a318\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"a181a0e\">And you can run scripts on the environment by going:<\/section><section data-element_type=\"section\" data-id=\"1e90276\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">docker run\n-p 8888:8888\njupyterlab:latest\n\/opt\/conda\/bin\/jupyter lab\n&#8211;allow-root\n&#8211;ip=0.0.0.0\n&#8211;port=8888<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"ab7d9f3\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"75043fab\" data-settings=\"{&quot;background_background&quot;:&quot;classic&quot;}\"><strong>Note:<\/strong>The example I showed was used to run a\u00a0<a href=\"https:\/\/docs.neptune.ml\/#how-to-setup-neptune-enabled-jupyterlab-on-aws\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">Neptune enabled Jupyterlab server on AWS<\/a>. Check it out if you are interested.<\/section><section data-element_type=\"section\" data-id=\"1e61b04\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e1f8e04 elementor-widget elementor-widget-heading\" data-id=\"e1f8e04\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Conda Environments<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fc95b50 elementor-widget elementor-widget-text-editor\" data-id=\"fc95b50\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"c9a31ad\">It\u2019s a simpler option and in many cases, it is enough to manage your environments with no problems. It doesn\u2019t give you as many options or guarantees as docker does, but it can be enough for your use case.\nThe environment can be defined as a<em>\u00a0.yaml<\/em>\u00a0configuration file just like this one:<\/section><section data-element_type=\"section\" data-id=\"1e6c4c1\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">name: salt<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">dependencies:\n&#8211; pip=19.1.1\n&#8211; python=3.6.8\n&#8211; psutil\n&#8211; matplotlib\n&#8211; scikit-image<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">&#8211; pip:\n&#8211; neptune-client==0.3.0\n&#8211; neptune-contrib==0.9.2\n&#8211; imgaug==0.2.5\n&#8211; opencv_python==3.4.0.12\n&#8211; torch==0.3.1\n&#8211; torchvision==0.2.0\n&#8211; pretrainedmodels==0.7.0\n&#8211; pandas==0.24.2\n&#8211; numpy==1.16.4\n&#8211; cython==0.28.2\n&#8211; pycocotools==2.0.0<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"18f7c10\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"b9596b6\">You can create conda environment by running:<\/section><section data-element_type=\"section\" data-id=\"db8056d\">\n<pre>conda env create -f environment.yaml<\/pre>\n<\/section><section data-element_type=\"section\" data-id=\"35dc0f0\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"d407239\">What is pretty cool is that you can always dump the state of your environment to such config by running:<\/section><section data-element_type=\"section\" data-id=\"60ef0dc\">\n<pre>conda env export &gt; environment.yaml<\/pre>\n<\/section><section data-element_type=\"section\" data-id=\"b2dd2b8\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"ade9435\">Simple and gets the job done.<\/section><section data-element_type=\"section\" data-id=\"7ea216c\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4eac076 elementor-widget elementor-widget-heading\" data-id=\"4eac076\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h4 class=\"elementor-heading-title elementor-size-default\"><h4>Makefile<\/h4><\/h4>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b92ce61 elementor-widget elementor-widget-text-editor\" data-id=\"b92ce61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<section data-element_type=\"section\" data-id=\"02f571e\">You can always define all your bash instructions explicitly in the Makefile. For example:<\/section><section data-element_type=\"section\" data-id=\"2d7370a\"><div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">git clone git@github.com:neptune-ml\/open-solution-mapping-challenge.git<br \/>cd open-solution-mapping-challenge<\/span><\/div><div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">pip install -r requirements.txt<\/span><\/div><div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">mkdir data<br \/>cd data<br \/>curl -0 https:\/\/www.kaggle.com\/c\/imagenet-object-localization-challenge\/data\/LOC_synset_mapping.txt<\/span><\/div><\/section><section data-element_type=\"section\" data-id=\"5d2c4c1\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"f54dd28\">and set it up by running:<\/section><section data-element_type=\"section\" data-id=\"118fb47\"><pre>source Makefile<\/pre><\/section><section data-element_type=\"section\" data-id=\"612d163\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"4ea3d49\">It is often difficult to read those files and you are giving up a ton of additional features of conda and\/or docker but it doesn\u2019t get much simpler than this.Now, that you have your environment defined as code, make sure to\u00a0<strong>log the environment file for every experiment<\/strong>.Again, if you are using an experiment manager you can snapshot your code whenever you create a new experiment, even if you forget to git commit:<p>\u00a0<\/p><\/section><section data-element_type=\"section\" data-id=\"045ddf8\"><div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">experiment_manager.create_experiment(upload_source_files=[&#8216;environment.yml&#8217;)<br \/>&#8230;<br \/># machine learning magic<br \/>&#8230;<\/span><\/div><\/section><section data-element_type=\"section\" data-id=\"e9dc050\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"34bb825\">and have it safely stored in the app:<\/section><section data-element_type=\"section\" data-id=\"53739aa\"><p style=\"text-align: center;\">\u00a0<\/p><\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c457c34 elementor-widget elementor-widget-text-editor\" data-id=\"c457c34\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t&#8230;\n# machine learning magic\n&#8230;<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"e9dc050\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"34bb825\">and have it safely stored in the app:<\/section><section data-element_type=\"section\" data-id=\"53739aa\">\n<p style=\"text-align: center;\"><img fetchpriority=\"high\" decoding=\"async\" class=\"alignnone\" style=\"width: 700px; height: 746px;\" src=\"https:\/\/i1.wp.com\/neptune.ai\/wp-content\/uploads\/env_versioning.png?fit=350%2C373&amp;ssl=1\" sizes=\"(max-width: 350px) 100vw, 350px\" srcset=\"https:\/\/i1.wp.com\/neptune.ai\/wp-content\/uploads\/env_versioning.png?w=350&amp;ssl=1 350w, https:\/\/i1.wp.com\/neptune.ai\/wp-content\/uploads\/env_versioning.png?resize=282%2C300&amp;ssl=1 282w\" alt=\"environment versioning Experiment Management: How to Organize Your Model Development Process\" width=\"350\" height=\"373\" data-attachment-id=\"11595\" data-comments-opened=\"1\" data-image-description=\"\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8567d74 elementor-widget elementor-widget-heading\" data-id=\"8567d74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>How to organize your model development process?<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-62d8904 elementor-widget elementor-widget-text-editor\" data-id=\"62d8904\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"6e0fc83\">As much as I think tracking experimentation and ensuring the reproducibility of your work is important it is just a part of the puzzle. Once you have tracked hundreds of experiment runs you will quickly face new problems:\n<ul>\n \t<li>how to search through and visualize all of those experiments,<\/li>\n \t<li>how to organize them into something that you and your colleagues can digest,<\/li>\n \t<li>how to make this data shareable and accessible inside your team\/organization?<\/li>\n<\/ul>\nThis is where experiment management tools really come in handy. They let you:\n<ul>\n \t<li>filter\/sort\/tag\/group experiments,<\/li>\n \t<li>visualize\/compare experiment runs,<\/li>\n \t<li>share (app and programmatic query API) experiment results and metadata.<\/li>\n<\/ul>\nFor example, by sending a link I can share a\u00a0<a href=\"https:\/\/ui.neptune.ai\/o\/neptune-ml\/org\/credit-default-prediction\/compare?shortId=%5B%22CRED-93%22%2C%22CRED-92%22%2C%22CRED-91%22%2C%22CRED-89%22%2C%22CRED-85%22%2C%22CRED-83%22%2C%22CRED-80%22%2C%22CRED-70%22%5D\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">comparison of machine learning experiments<\/a>\u00a0with all the additional information available.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-98ded2d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"98ded2d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0364ac4\" data-id=\"0364ac4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-460f928 elementor-widget elementor-widget-text-editor\" data-id=\"460f928\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"2200731\">With that, you and all the people on your team know exactly what is happening when it comes to model development. It makes it easy to track the progress, discuss problems, and discover new improvement ideas.<\/section><section data-element_type=\"section\" data-id=\"8b52982\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7dc3436 elementor-widget elementor-widget-heading\" data-id=\"7dc3436\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Working in creative iterations<\/h3>\n<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cf96f9b elementor-widget elementor-widget-text-editor\" data-id=\"cf96f9b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"0903aed\">Tools like that are a big help and a huge improvement from spreadsheets and notes. However, what I believe can take your machine learning projects to the next level is a focused experimentation methodology that I call creative iterations.I\u2019d like to start with some pseudocode and explain it later:<\/section><section data-element_type=\"section\" data-id=\"5e17a0d\">\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">time, budget, business_goal = business_specification()<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">creative_idea = initial_research(business_goal)<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">while time and budget and not business_goal:\nsolution = develop(creative_idea)\nmetrics = evaluate(solution, validation_data)\nif metrics &gt; best_metrics:\nbest_metrics = metrics\nbest_solution = solution\ncreative_idea = explore_results(best_solution)<\/span><\/div>\n<div style=\"background: #eee; border: 1px solid #ccc; padding: 5px 10px;\"><span style=\"font-family: courier new,courier,monospace;\">\u00a0\u00a0 time.update()\nbudget.update()<\/span><\/div>\n<\/section><section data-element_type=\"section\" data-id=\"ac18b29\">\u00a0<\/section><section data-element_type=\"section\" data-id=\"479e4d7\">In every project, there is a phase where the\u00a0<strong>business_specification<\/strong>\u00a0is created that usually entails a\u00a0<strong>timeframe, budget, and goal<\/strong>\u00a0of the machine learning project. When say goal, I mean a set of KPIs, business metrics, or if you are super lucky machine learning metrics. At this stage, it is very important to manage business expectations but it\u2019s a story for another day. If you are interested in those things I suggest you take a look at some articles by Cassie Kozyrkov, for instance,\u00a0<a href=\"https:\/\/medium.com\/hackernoon\/ai-reality-checklist-be34e2fdab9\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"broken_link\">this one<\/a>.Assuming that you and your team know what is the business goal you can do\u00a0<strong>initial_research<\/strong>\u00a0and cook up a baseline approach, a first\u00a0<strong>creative_idea<\/strong>. Then you\u00a0<strong>develop<\/strong>\u00a0it and come up with a\u00a0<strong>solution<\/strong>\u00a0which you need to\u00a0<strong>evaluate<\/strong>\u00a0and get your first set of\u00a0<strong>metrics<\/strong>. Those, as mentioned before, don\u2019t have to be simple numbers (and often are not) but could be charts, reports or user study results. Now you should study your\u00a0<strong>solution, metrics, and explore_results<\/strong>.It may be here where your project will end because:\n<ul>\n \t<li>your first solution\u00a0<strong>is good enough<\/strong>\u00a0to satisfy business needs,<\/li>\n \t<li>you can reasonably expect that there is\u00a0<strong>no way to reach business goals<\/strong>\u00a0within the previously assumed time and budget,<\/li>\n \t<li>you discover that there is a<strong>\u00a0low-hanging fruit problem somewhere close<\/strong>\u00a0and your team should focus their efforts there.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9562f44 elementor-widget elementor-widget-text-editor\" data-id=\"9562f44\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIf none of the above apply, you list all the underperforming parts of your\u00a0<strong>solution<\/strong>\u00a0and figure out which ones could be improved and what\u00a0<strong>creative_ideas<\/strong>\u00a0can get you there. Once you have that list, you need to prioritize them based on expected\u00a0<strong>goal<\/strong>\u00a0improvements and\u00a0<strong>budget<\/strong>. If you are wondering how can you estimate those improvements, the answer is simple:\u00a0<strong>results exploration<\/strong>.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c21d0c elementor-widget elementor-widget-text-editor\" data-id=\"3c21d0c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tYou have probably noticed that results exploration comes up a lot. That\u2019s because it is so very important that it deserves its own section.\n\n<\/section><section data-element_type=\"section\" data-id=\"1aab263\">\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cf48bd1 elementor-widget elementor-widget-heading\" data-id=\"cf48bd1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3>Model results exploration<\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d55fd1f elementor-widget elementor-widget-text-editor\" data-id=\"d55fd1f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/section><section data-element_type=\"section\" data-id=\"236732d\">This is an extremely important part of the process. You need to\u00a0<strong>understand thoroughly where the current approach fails<\/strong>, how far time\/budget wise are you from your goal, what are the risks associated with using your approach in production. In reality, this part is far from easy but mastering it is extremely valuable because:\n<ul>\n \t<li>it leads to business problem understanding,<\/li>\n \t<li>it leads to focusing on the problems that matter and saves a lot of time and effort for the team and organization,<\/li>\n \t<li>it leads to discovering new business insights and project ideas.<\/li>\n<\/ul>\nSome good resources I found on the subject are:\n<ul>\n \t<li>\u201cUnderstanding and diagnosing your machine-learning models\u201d PyData talk by Gael Varoquaux<\/li>\n<\/ul>\n<\/section><section data-element_type=\"section\" data-id=\"3e6d8d1\">\n<p style=\"text-align: center;\"><iframe title=\"youtube Video Player\" src=\"https:\/\/www.youtube.com\/embed\/kbj3llSbaVA?feature=oembed&amp;start&amp;end&amp;wmode=opaque&amp;loop=0&amp;controls=1&amp;mute=0&amp;rel=0&amp;modestbranding=0\" width=\"700\" height=\"500\" frameborder=\"0\" scrolling=\"no\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n\n<\/section><section data-element_type=\"section\" data-id=\"1c2a6ff\">\n<ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-9af833a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9af833a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-92462e4\" data-id=\"92462e4\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e1778a2 elementor-widget elementor-widget-text-editor\" data-id=\"e1778a2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul>\n \t<li>\u201cCreating correct and capable classifiers\u201d PyData talk by Ian Osvald<\/li>\n<\/ul>\n<section data-element_type=\"section\" data-id=\"37e7abe\"><\/section><section data-element_type=\"section\" data-id=\"1008e11\">\n<ul>\n \t<li><a href=\"https:\/\/towardsdatascience.com\/using-what-if-tool-to-investigate-machine-learning-models-913c7d4118f\" target=\"_blank\" rel=\"noopener noreferrer\">\u201cUsing the \u2018What-If Tool\u2019 to investigate Machine Learning models\u201d<\/a>\u00a0article by Parul Pandey<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-89c8818 elementor-widget elementor-widget-video\" data-id=\"89c8818\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;youtube_url&quot;:&quot;https:\\\/\\\/www.youtube.com\\\/embed\\\/DkLPYccEJ8Y?feature=oembed&amp;start&amp;end&amp;wmode=opaque&amp;loop=0&amp;controls=1&amp;mute=0&amp;rel=0&amp;modestbranding=0&quot;,&quot;video_type&quot;:&quot;youtube&quot;,&quot;controls&quot;:&quot;yes&quot;}\" data-widget_type=\"video.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-wrapper elementor-open-inline\">\n\t\t\t<div class=\"elementor-video\"><\/div>\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-23ef8ac elementor-widget elementor-widget-text-editor\" data-id=\"23ef8ac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tDiving deeply into results exploration is a story for another day and another blog post, but the key takeaway is that investing your time in\u00a0<strong>understanding your current solution can be extremely beneficial<\/strong>\u00a0for your business.\n\n<\/section><section data-element_type=\"section\" data-id=\"86aea8c\"><\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f82cc8f elementor-widget elementor-widget-heading\" data-id=\"f82cc8f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Final thoughts<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b08fb3a elementor-widget elementor-widget-text-editor\" data-id=\"b08fb3a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<h2>Final thoughts<\/h2>\n<\/section><section data-element_type=\"section\" data-id=\"b090bf4\">In this article, I explained:\n<ul>\n \t<li>what experiment management is,<\/li>\n \t<li>how organizing your model development process improves your workflow.<\/li>\n<\/ul>\nFor me, adding\u00a0<strong>experiment management tools<\/strong>\u00a0to my \u201cstandard\u201d software development best practices was an\u00a0<strong>aha-moment<\/strong>\u00a0which made my machine learning projects more likely to succeed. I think, if you give it a go you will feel the same.\n\n<\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>This article explains what experiment management is, and how organizing your model development process improves your workflow. Adding experiment management tools&nbsp;to standard software development best practices can make machine learning projects more likely to succeed. You will learn about tracking ML experiments, code version control for data science, tracking hyperparameters. You will also learn about data versioning, tracking machine learning metrics, experiment organization, working in creative iteration, and model results exploration. &nbsp;<\/p>\n","protected":false},"author":712,"featured_media":3606,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[92],"ppma_author":[3528],"class_list":["post-2243","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-machine-learning"],"authors":[{"term_id":3528,"user_id":712,"is_guest":0,"slug":"jakub-czakon","display_name":"Jakub Czakon","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Czakon","first_name":"Jakub","job_title":"","description":"Jakub Czakon is Senior Data Scientist at neptune.ai."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2243","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/712"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2243"}],"version-history":[{"count":7,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2243\/revisions"}],"predecessor-version":[{"id":35514,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2243\/revisions\/35514"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3606"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2243"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2243"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2243"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2243"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}