{"id":2248,"date":"2020-02-10T04:37:43","date_gmt":"2020-02-10T01:37:43","guid":{"rendered":"http:\/\/kusuaks7\/?p=1853"},"modified":"2024-01-12T19:34:28","modified_gmt":"2024-01-12T19:34:28","slug":"the-5-step-recipe-to-make-your-deep-learning-models-bug-free","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/the-5-step-recipe-to-make-your-deep-learning-models-bug-free\/","title":{"rendered":"The 5-Step Recipe to Make Your Deep Learning Models Bug-Free"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2248\" class=\"elementor elementor-2248\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-31fc3a57 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"43238\" data-id=\"31fc3a57\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-367520a0\" data-eae-slider=\"94902\" data-id=\"367520a0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5cb99b9 elementor-widget elementor-widget-text-editor\" data-id=\"5cb99b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn traditional software engineering, a bug usually leads to the program crashing. While this is annoying for the user, it is critical for the developer as they can inspect the errors to understand why. With deep learning, we sometimes encounter errors but all too often the program crashes without a clear reason why. While these issues can be debugged manually, deep learning models most often fail because of poor output predictions. What\u2019s worse is that when the model performance is low, there is usually no signal about why or when the models failed.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7e28cbf elementor-widget elementor-widget-text-editor\" data-id=\"7e28cbf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn fact, a common sentiment among practitioners is that they spend 80\u201390% of time debugging and tuning the models, and only 10\u201320% of time deriving math equations and implementing things.\u00a0Suppose that you are trying to reproduce a result from a research paper for your work but your results are worse. You might wonder why the performance of your model is significantly worse than the paper that you\u2019re trying to reproduce?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9228c61 elementor-widget elementor-widget-text-editor\" data-id=\"9228c61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are many different things that can cause this:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-20ec24c elementor-widget elementor-widget-image\" data-id=\"20ec24c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579132842152-WTJA4M8695OBHEDYJ4Y7\/ke17ZwdGBToddI8pDm48kJx6aZWuoqt_bAK_mphlbUgUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKceE2rCUDOFSEdNpxkgA0fY5i8h8xtF9mSbAF65BiPN_L0AZiGBqZN3lbp_vSXEoQa\/worse_performance.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a388a87 elementor-widget elementor-widget-text-editor\" data-id=\"a388a87\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li>It can be <strong>implementation bugs<\/strong>. Most bugs in deep learning are actually invisible.<\/li>\n \t<li><strong>Hyper-parameter choices<\/strong> can also cause your performance to degrade. Deep learning models are very sensitive to hyper-parameters. Even very subtle choices of learning rate and weight initialization can make a big difference.<\/li>\n \t<li>Performance can also be worse just because of <strong>data\/model fit<\/strong>. For example, you pre-train your model on ImageNet data and fit it on self-driving car images, which are harder to learn.<\/li>\n \t<li>Finally, poor model performance could be caused not by your model but your <strong>dataset construction<\/strong>. Common issues here include not having enough examples, dealing with noisy labels and imbalanced classes, splitting train and test set with different distributions.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b87f1c9 elementor-widget elementor-widget-text-editor\" data-id=\"b87f1c9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tI recently attended the Full-Stack Deep Learning Bootcamp in the UC Berkeley campus, which is a wonderful course that teaches full-stack production deep learning. <a href=\"http:\/\/josh-tobin.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Josh Tobin<\/a> delivered a great lecture on troubleshooting deep neural networks. As a courtesy of Josh\u2019s lecture, this article will provide a mental recipe for how to improve deep learning model\u2019s performance; assuming that you already have an initial test dataset, a single metric to improve, as well as target performance based on human-level performance, published results, previous baselines, etc.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bd295f4 elementor-widget elementor-widget-image\" data-id=\"bd295f4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133220860-ZWZ8H3M2KNALDB81AGES\/ke17ZwdGBToddI8pDm48kILP7KOvppJC-gqRj0K6sFIUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcRMFN34FFW34-0pTMNI-dB3j60-lF70LEIrSh_7IX-PjtAoWI78hWEe10vUReD3oU\/debugging_overview.jpg\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5f55317 elementor-widget elementor-widget-text-editor\" data-id=\"5f55317\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong>Note: <\/strong><em>You can also watch the version of<\/em> <a href=\"https:\/\/www.youtube.com\/watch?v=XtCNNwDi9xg\" target=\"_blank\" rel=\"noopener noreferrer\"><em>Josh\u2019s talk at Reinforceconf 2019<\/em><\/a> <em>and go through<\/em> <a href=\"http:\/\/josh-tobin.com\/troubleshooting-deep-neural-networks.html\" target=\"_blank\" rel=\"noopener noreferrer\"><em>the full guide on Josh\u2019s website<\/em><\/a><em>.<\/em>\n<blockquote><a href=\"https:\/\/t.co\/EeGxZ2SKz4\" rel=\"noopener\" data-card-breakpoints=\"w450 w400 w350 w300 w250 w200 w150 w100 w50 \" data-theme=\"light\"><img decoding=\"async\" src=\"https:\/\/pbs.twimg.com\/card_img\/1225904833384927232\/JJp6EFwS?format=jpg&amp;name=144x144_2\" alt=\"\" \/><\/a>\n<p dir=\"ltr\"><a href=\"https:\/\/t.co\/EeGxZ2SKz4\" rel=\"noopener\" data-card-breakpoints=\"w450 w400 w350 w300 w250 w200 w150 w100 w50 \" data-theme=\"light\">Troubleshooting Deep Neural Networks<\/a>,\u00a0<a href=\"https:\/\/t.co\/EeGxZ2SKz4\" rel=\"noopener\" data-card-breakpoints=\"w450 w400 w350 w300 w250 w200 w150 w100 w50 \" data-theme=\"light\">A Field Guide to Fixing Your Model<\/a>;\u00a0<a style=\"font-size: 13px;\" href=\"https:\/\/t.co\/EeGxZ2SKz4\" rel=\"noopener\" data-card-breakpoints=\"w450 w400 w350 w300 w250 w200 w150 w100 w50 \" data-theme=\"light\">josh-tobin.com<\/a><\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fec0338 elementor-widget elementor-widget-heading\" data-id=\"fec0338\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2><strong>1\u200a\u2014\u200aStart\u00a0Simple<\/strong><\/h2>\n<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3df3828 elementor-widget elementor-widget-text-editor\" data-id=\"3df3828\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are a few things to consider when you want to start simple. The first is how to <strong>choose a simple architecture<\/strong>. These are architectures that are easy to implement and are likely to get you part of the way towards solving your problem without introducing as many bugs.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-301655e elementor-widget elementor-widget-text-editor\" data-id=\"301655e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tArchitecture selection is one of the many intimidating parts of getting into deep learning because there are tons of papers coming out all-the-time and claiming to be state-of-the-art on some problems. They get very complicated fast. In the limit, if you\u2019re trying to get to maximal performance, then architecture selection is challenging. But when starting on a new problem, you can actually just solve a simple set of rules that will allow you to pick an architecture that allows you to do a decent job on the problem that you\u2019re working on.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a5c3c75 elementor-widget elementor-widget-text-editor\" data-id=\"a5c3c75\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li>If your data looks like <strong>images<\/strong>, start with a LeNet-like architecture and consider using something like ResNet as your codebase gets more mature.<\/li>\n \t<li>If your data looks like <strong>sequences<\/strong>, start with an LSTM with one hidden layer and\/or temporal\/classical convolutions. Then, when your problem gets more mature, you can move to an Attention-based model or a WaveNet-like model.<\/li>\n \t<li>For <strong>all other tasks<\/strong>, start with a fully-connected neural network with one hidden layer and use more advanced networks later depending on the problem.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b45bf66 elementor-widget elementor-widget-image\" data-id=\"b45bf66\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133344603-734QYDDYKYWWTOSFR6W1\/ke17ZwdGBToddI8pDm48kEFSSsfX6hw4Yzeo5Ki4ELIUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcw3uqtvieW1KlpUYsSKN9NOS7-HVOhcD_nxOtut1hyQYW-_ypRuKInG5glPDCQ3lp\/input_modalities.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dbb9bb2 elementor-widget elementor-widget-text-editor\" data-id=\"dbb9bb2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn reality, many times the input data contains multiple of those things above. So how to deal with multiple input modalities into a neural network? Here is the 3-step strategy that Josh recommended:\n<ul data-rte-list=\"default\">\n \t<li>First, map each of these modalities into a lower-dimensional feature space. In the example above, the images are passed through a ConvNet and the words are passed through an LSTM.<\/li>\n \t<li>Then we flatten the outputs of those networks to get a single vector for each of the inputs that will go into the model. Then we concatenate those inputs.<\/li>\n \t<li>Finally, we pass them through some fully-connected layers to an output.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3097edf elementor-widget elementor-widget-text-editor\" data-id=\"3097edf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter choosing a simple architecture, the next thing to do is to <strong>select sensible hyper-parameter defaults<\/strong> to start out with. Here are the defaults that Josh recommended:\n<ul data-rte-list=\"default\">\n \t<li><a href=\"https:\/\/twitter.com\/karpathy\/status\/801621764144971776?lang=en\" target=\"_blank\" rel=\"noopener noreferrer\">Adam optimizer with a \u201cmagic\u201d learning rate value of 3e-4<\/a>.<\/li>\n \t<li><a href=\"https:\/\/stats.stackexchange.com\/questions\/226923\/why-do-we-use-relu-in-neural-networks-and-how-do-we-use-it\" target=\"_blank\" rel=\"noopener noreferrer\">ReLU<\/a> activation for fully-connected and convolutional models and <a href=\"https:\/\/stats.stackexchange.com\/questions\/330559\/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function\" target=\"_blank\" rel=\"noopener noreferrer\">Tanh<\/a> activation for LSTM models.<\/li>\n \t<li><a href=\"https:\/\/datascience.stackexchange.com\/questions\/13061\/when-to-use-he-or-glorot-normal-initialization-over-uniform-init-and-what-are\" target=\"_blank\" rel=\"noopener noreferrer\">He initialization for ReLU activation function and Glorot initialization for Tanh activation function<\/a>.<\/li>\n \t<li>No regularization and data normalization.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-f1dc4e7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"81220\" data-id=\"f1dc4e7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-51d5367\" data-eae-slider=\"99290\" data-id=\"51d5367\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ae03866 elementor-widget elementor-widget-image\" data-id=\"ae03866\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133344603-734QYDDYKYWWTOSFR6W1\/ke17ZwdGBToddI8pDm48kEFSSsfX6hw4Yzeo5Ki4ELIUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcw3uqtvieW1KlpUYsSKN9NOS7-HVOhcD_nxOtut1hyQYW-_ypRuKInG5glPDCQ3lp\/input_modalities.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c75708a elementor-widget elementor-widget-text-editor\" data-id=\"c75708a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe next step is to <strong>normalize the input data<\/strong>, which means subtracting the mean and dividing by the variance. Note that for images, it\u2019s fine to scale values to [0, 1] or [-0.5, 0.5] (for example, by dividing by 255)\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-cb3bda4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"42490\" data-id=\"cb3bda4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-df9ae16\" data-eae-slider=\"29755\" data-id=\"df9ae16\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-605e71f elementor-widget elementor-widget-text-editor\" data-id=\"605e71f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe next step is to <strong>normalize the input data<\/strong>, which means subtracting the mean and dividing by the variance. Note that for images, it\u2019s fine to scale values to [0, 1] or [-0.5, 0.5] (for example, by dividing by 255).\n\nThe final thing you should do is to consider <strong>simplifying the problem<\/strong> itself. If you have a complicated problem with massive data and tons of classes to deal with, then you should consider<strong>:<\/strong>\n<ul data-rte-list=\"default\">\n \t<li>Working with a small training set around 10,000 examples.<\/li>\n \t<li>Using a fixed number of objects, classes, input size\u2026<\/li>\n \t<li>Creating a simpler synthetic training set like in research labs.<\/li>\n<\/ul>\nThis is important because (1) you will have reasonable confidence that your model should be able to solve, and (2) your iteration speed will increase.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-816066b elementor-widget elementor-widget-text-editor\" data-id=\"816066b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe diagram below neatly summarizes how to start simple:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9762ace elementor-widget elementor-widget-image\" data-id=\"9762ace\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133439681-7P79SVIBA4RQQ92WU8YT\/ke17ZwdGBToddI8pDm48kAojIkCOkRzn2YirFlaf6V4UqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcdlX9EcKWqR9ql8qPL8LXCGRDT7PNzJtHvWRk0af7M-tzhuE8t0xyYovWl3jO13Hd\/starting_simple.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-28696cb elementor-widget elementor-widget-heading\" data-id=\"28696cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2><strong>2\u200a\u2014\u200aImplement and\u00a0Debug<\/strong><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4e5e0c6 elementor-widget elementor-widget-text-editor\" data-id=\"4e5e0c6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTo give you a preview, below are the 5 most common bugs in deep learning models that Josh recognized:\n<ul data-rte-list=\"default\">\n \t<li><strong>Incorrect shapes for the network tensors: <\/strong>This bug is a common one and can fail silently. A lot of time, this happens due to the fact that the automatic differentiation systems in deep learning framework do silent broadcasting. Tensors become different shapes in the network and can cause a lot of problems.<\/li>\n \t<li><strong>Pre-processing inputs incorrectly: <\/strong>For example, you forget to normalize your inputs or apply too much input pre-processing (over-normalization and excessive data augmentation).<\/li>\n \t<li><strong>Incorrect input to the model\u2019s loss function: <\/strong>For example, you use softmax outputs to a loss that expects logits.<\/li>\n \t<li><strong>Forgot to set up train mode for the network correctly: <\/strong>For example, toggling train\/evaluation mode or controlling batch norm dependencies.<\/li>\n \t<li><strong>Numerical instability:<\/strong> For example, you get `inf` or `NaN` as outputs. This bug often stems from using an exponent, a log, or a division operation somewhere in the code.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-60eb900 elementor-widget elementor-widget-text-editor\" data-id=\"60eb900\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tHere are 3 pieces of general advice for implementing your model:\n<ul data-rte-list=\"default\">\n \t<li>Start with a <strong>lightweight implementation<\/strong>. You want a minimum possible new lines of code for the 1st version of your model. The rule of thumb is less than 200 lines. This doesn\u2019t count tested infrastructure components or TensorFlow\/PyTorch code.<\/li>\n \t<li>Use <strong>off-the-shelf components <\/strong>such as Keras if possible, since most of the stuff in Keras works well out-of-the-box. If you have to use TensorFlow, then use the built-in functions, don\u2019t do the math yourself. This would help you avoid a lot of the numerical instability issues.<\/li>\n \t<li>Build <strong>complicated data pipelines later<\/strong>. These are important for large-scale ML systems, but you should not start with them because data pipelines themselves can be a big source of bugs. Just start with a dataset that you can load into memory.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0898286 elementor-widget elementor-widget-image\" data-id=\"0898286\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133489386-FQEURR51BIJHW6B4838C\/ke17ZwdGBToddI8pDm48kEPf3fgCD00hC7jhC1ciHtYUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcXNqYV3MshsujdPU57vEDmGwji22PkrsojXvPvfT6laCdex84os5iBDkPuiha9EVf\/image-asset.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-94971a9 elementor-widget elementor-widget-text-editor\" data-id=\"94971a9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe first step of implementing bug-free deep learning models is <strong>getting your model to run<\/strong> at all. There are a few things that can prevent this from happening:\n<ul data-rte-list=\"default\">\n \t<li><strong>Shape mismatch \/<\/strong> <strong>Casting issue: <\/strong>To address this type of problem, you should step through your model creation and inference step-by-step in a debugger, checking for correct shapes and data types of your tensors.<\/li>\n \t<li><strong>Out-Of-Memory-Issues:<\/strong> This can be very difficult to debug. You can scale back your memory-intensive operations one-by-one. For example, if you create large matrices anywhere in your code, you can reduce the size of their dimensions or cut your batch size in half.<\/li>\n \t<li><strong>Other Issues:<\/strong> You can simply Google it. Stack Overflow would be great most of the time.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e5589b3 elementor-widget elementor-widget-text-editor\" data-id=\"e5589b3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/ul>\nLet\u2019s zoom in on the process of stepping through model creation in a debugger and talk about debuggers for deep learning code:\n<ul data-rte-list=\"default\">\n \t<li>In PyTorch, you can use <a href=\"https:\/\/pypi.org\/project\/ipdb\/\" target=\"_blank\" rel=\"noopener noreferrer\">ipdb<\/a>\u200a\u2014\u200awhich exports functions to access the interactive <a href=\"http:\/\/ipython.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">IPython<\/a> debugger.<\/li>\n \t<li>In TensorFlow, it\u2019s trickier. TensorFlow separates the process of creating the graph and executing operations in the graph. There are 3 options you can try: (1) step through the graph creation itself and inspect each tensor layer, (2) step into the training loop and evaluate the tensor layers, or (3) use <a href=\"https:\/\/mullikine.github.io\/posts\/tensorflow-debugger-tfdb-and-emacs\/\" target=\"_blank\" rel=\"noopener noreferrer\">TensorFlow Debugger<\/a> (tfdb) which does option 1 and 2 automatically.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ebb4133 elementor-widget elementor-widget-text-editor\" data-id=\"ebb4133\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter getting your model to run, the next thing you need to do is to <strong>overfit a single batch of data.<\/strong> This is a heuristic that can catch an absurd number of bugs. This really means that you want to drive your training error arbitrarily close to 0..\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-46b7f89 elementor-widget elementor-widget-image\" data-id=\"46b7f89\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133530471-4ZK4ZXKRT6OUBKO9355F\/ke17ZwdGBToddI8pDm48kHmvazCOjmZYizfSkWZKe4AUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcdrleTAvzOMaxyUBni2xjGxuqQAvKiT2HPiD0kHQNf10JPLoa1TQMD75U1XzYOEdM\/overfit_single_batch.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a09fc80 elementor-widget elementor-widget-text-editor\" data-id=\"a09fc80\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are a few things that can happen when you try to overfit a single batch and it fails:\n<ul data-rte-list=\"default\">\n \t<li><strong>Error goes up:<\/strong> Commonly this is due to a flip sign somewhere in the loss function\/gradient.<\/li>\n \t<li><strong>Error explodes:<\/strong> This is usually a numerical issue, but can also be caused by a high learning rate.<\/li>\n \t<li><strong>Error oscillates:<\/strong> You can lower the learning rate and inspect the data for shuffled labels or incorrect data augmentation.<\/li>\n \t<li><strong>Error plateaus:<\/strong> You can increase the learning rate and get rid of regulation. Then you can inspect the loss function and the data pipeline for correctness.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-30695f2 elementor-widget elementor-widget-image\" data-id=\"30695f2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133590822-Z00QVC721XKKTO7YX02E\/ke17ZwdGBToddI8pDm48kOCYKSCBvS7VuxJ-3YU6PsYUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcI2j9I8z9Wee0pZWWUWVY41gH8sqJl2I0CMMam34QMJYepE5OZ5V8rCBuqPsR9GBw\/compare_known_results.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-87661f4 elementor-widget elementor-widget-text-editor\" data-id=\"87661f4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nOnce your model overfits in a single batch, there can still be some other issues that cause bugs. The last step here is to <strong>compare your results to a known result.<\/strong> So what sort of known results are useful?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e0c385f elementor-widget elementor-widget-text-editor\" data-id=\"e0c385f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li>The most useful results come from an <em>official model implementation evaluated on a similar dataset to yours<\/em>. You can step through the code in both models line-by-line and ensure your model has the same output. You want to ensure that your model performance is up to par with expectations.<\/li>\n \t<li>If you can\u2019t find an official implementation on a similar dataset, you can compare your approach to results from an <em>official model implementation evaluated on a benchmark dataset<\/em>. You most definitely want to walk through the code line-by-line and ensure you have the same output.<\/li>\n \t<li>If there is no official implementation of your approach, you can compare it to results from an <em>unofficial model implementation<\/em>. You can review the code the same as before, but with lower confidence because almost all the unofficial implementations on GitHub have bugs.<\/li>\n \t<li>Then, you can compare to results from a <em>paper with no code (<\/em>to ensure that your performance is up to par with expectations), results from <em>your model on a benchmark dataset<\/em> (to make sure your model performs well in a simpler setting), and results from <em>a similar model on a similar dataset<\/em> (to help you get a general sense of what kind of performance can be expected).<\/li>\n \t<li>An under-rated source of results come from <em>simple baselines<\/em> (for example, the average of outputs or linear regression), which can help make sure that your model is learning anything at all.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5c34327 elementor-widget elementor-widget-text-editor\" data-id=\"5c34327\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe diagram below neatly summarizes how to implement and debug deep neural networks:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5b2c03c elementor-widget elementor-widget-image\" data-id=\"5b2c03c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133631547-O2Z6FIYEVHO562Q3THVN\/ke17ZwdGBToddI8pDm48kLAxFtCQsitaiu26pOqBCRAUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcAeUrhzZ9cS_0PtH4_0d5A3LFxY0PtezP1UTQayni2F5fUXFOAkM-QfQIJoiEffRq\/implement_and_debug.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b8e8850 elementor-widget elementor-widget-heading\" data-id=\"b8e8850\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2><strong>3\u200a\u2014\u200aEvaluate<\/strong><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-48cc246 elementor-widget elementor-widget-text-editor\" data-id=\"48cc246\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe next step is to evaluate your model performance and use that evaluation to prioritize what you are going to do to improve it. You want to apply the <a href=\"https:\/\/stats.stackexchange.com\/questions\/192286\/bias-variance-decomposition\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>bias-variance decomposition<\/strong><\/a> concept here.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e488622 elementor-widget elementor-widget-image\" data-id=\"e488622\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133665313-FBJC05TD0JNP51UEIX9H\/ke17ZwdGBToddI8pDm48kKp_k0HaIfhu1dVNDMQyRXUUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKc5bNQafLfUQfSXvIG50513YZkuXAeeSAhV2aVOKxWFNeT3Tcv5G7dgbZFrGNeoTJy\/bias_variance_1.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-feaab64 elementor-widget elementor-widget-text-editor\" data-id=\"feaab64\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li>On the left plot, the blue line is <strong>the human-level performance<\/strong>, the green line is <strong>the training error curve<\/strong> which decreasingly approaches the blue line, the red line is <strong>the validation error curve<\/strong> which is typically a little bit higher than the training curve, and the purple is the <strong>test error curve<\/strong> which is typically a little bit higher than the validation curve.<\/li>\n \t<li>As shown in the right plot, the bias-variance decomposition decomposes the final test error in your model into its component parts. Those include (1) <strong>irreducible error<\/strong> that comes from your baseline performance, (2) <strong>avoidable bias<\/strong> (also known as <strong>under-fitting<\/strong>) which is measured by the gap between the irreducible error and the training error, (3) <strong>variance<\/strong> (also known as <strong>over-fitting<\/strong>) which is measured by the gap between the training error and the validation error, and (4) <strong>validation set overfitting<\/strong> (how much your model overfits the validation set) which is the gap between the validation error and the test error.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e09757d elementor-widget elementor-widget-text-editor\" data-id=\"e09757d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<\/ul>\nThis assumes that the training, validation, and test set all come from the same data distribution. What if that\u2019s not the case? For example, you are training an object detection model for autonomous vehicles, but your train data are images during the day while your test data are images during the evening.\n<ul data-rte-list=\"default\">\n \t<li>The strategy here is to use 2 validation sets: (1) one set sampled from the training distribution, and (2) the other set sampled from the test distribution.<\/li>\n \t<li>As seen in the left plot below, In addition to the training error and the test error, now we have 2 validation set errors: one on the training set and one on the test set.<\/li>\n \t<li>Our bias-variance decomposition formula now gets one more term: a measure of <strong>distribution shift<\/strong> which is the difference between your training validation error and your test validation error.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-30c4354 elementor-widget elementor-widget-image\" data-id=\"30c4354\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133692863-D0ZCFWZDZ7IBD1LZLOYW\/ke17ZwdGBToddI8pDm48kFmbiPaaBEKwzctsz_zBwZAUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcxvqTsUU9PWy-U5CGZfk75Yxfj9IGz7IxLdI4YJ2fharEDoKK1q3rbNihlEPVA7FU\/image-asset.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1c6b094 elementor-widget elementor-widget-text-editor\" data-id=\"1c6b094\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAs a quick summary, the strategy for evaluating model performance is quite simple:\n<pre>Test Error = Irreducible Error + Bias + Variance + Distribution Shift + Validation Overfitting<\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e08cd0b elementor-widget elementor-widget-heading\" data-id=\"e08cd0b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2><strong>4\u200a\u2014\u200aImprove The Models and\u00a0Data<\/strong><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f5078d4 elementor-widget elementor-widget-text-editor\" data-id=\"f5078d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn the order of prioritizing model improvements, you should start by <strong>addressing under-fitting<\/strong> (aka, reducing model\u2019s bias).\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-69c924a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"68131\" data-id=\"69c924a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-81a730e\" data-eae-slider=\"89390\" data-id=\"81a730e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0f0a083 elementor-widget elementor-widget-image\" data-id=\"0f0a083\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133776535-N5XY7GOONY4T3KQFI2PW\/ke17ZwdGBToddI8pDm48kOkO4yIo1Xs-bFx0uYsYHJsUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcFwmY0yjJwfq6MMJpbK2VybjudSAX7oX6C1rP-5dt06q1VkDZe1Mb3MAKjTleThVw\/image-asset.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec9b072 elementor-widget elementor-widget-text-editor\" data-id=\"ec9b072\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are a number of strategies that you can use to address under-fitting:\n<ul data-rte-list=\"default\">\n \t<li>The simplest and most often best strategy to do is to make your neural network bigger by adding layers or using more units per layer.<\/li>\n \t<li>You can also try to reduce regularization.<\/li>\n \t<li>Do an error analysis.<\/li>\n \t<li>Move to a different neural network architecture that is closer to the state-of-the-art.<\/li>\n \t<li>Tune model\u2019s hyper-parameters.<\/li>\n \t<li>Or add more features.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-105ce4f elementor-widget elementor-widget-text-editor\" data-id=\"105ce4f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe second step to improve your model performance is to <strong>address over-fitting<\/strong> (aka, reducing the model\u2019s variance).\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-d91d21a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"91548\" data-id=\"d91d21a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8dec0dd\" data-eae-slider=\"97320\" data-id=\"8dec0dd\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6a69733 elementor-widget elementor-widget-image\" data-id=\"6a69733\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133823277-WZ7ESDVRX4PI6908K6MS\/ke17ZwdGBToddI8pDm48kJ_qm8JhfMhP1RizFcSqoxsUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcmyhx6NeDvIfwyvlKwhr3L593Tp1uDRFTeSueIRqeuJcoxmfIv_Y3xWVscJbL65--\/address_overfitting.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5a7be0e elementor-widget elementor-widget-text-editor\" data-id=\"5a7be0e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere are a number of strategies that you can use to address over-fitting:\n<ul data-rte-list=\"default\">\n \t<li>The simplest and often the best strategy is to add more training data if possible.<\/li>\n \t<li>If not, you can add normalization (batch norm or layer norm).<\/li>\n \t<li>Augment your data.<\/li>\n \t<li>Increase regularization (dropout, L2, weight decay).<\/li>\n \t<li>Do an error analysis.<\/li>\n \t<li>Choose a different model architecture that is closer to the state-of-the-art.<\/li>\n \t<li>Tune model\u2019s hyper-parameters.<\/li>\n \t<li>Other strategies include using early stopping, removing features and reducing model size.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-19a1e56 elementor-widget elementor-widget-text-editor\" data-id=\"19a1e56\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOnce the training error and training-validation error are in the region that you expect them to be, the next step is to <strong>address the distribution shift<\/strong> present in your data.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e5ddf14 elementor-widget elementor-widget-image\" data-id=\"e5ddf14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579133972542-DN9JXUA3Y7Q7595TM51Z\/ke17ZwdGBToddI8pDm48kIZiwcAZ75XaL-ueJTr7FRkUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcyIriRmP8WabvCIo5RdCK5NYjXQ_Pk3a9EbqfUL2aCmANlQc7jtAXenF5e1n9y2pd\/address_distribution_shift.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c67b52 elementor-widget elementor-widget-text-editor\" data-id=\"3c67b52\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong>There are fewer strategies to do this:<\/strong>\n<ul data-rte-list=\"default\">\n \t<li>You can look manually at the errors that your model makes on the test-validation set for generalizable mistakes and go collect more training data for your model to handle those cases.<\/li>\n \t<li>You can do a similar process; but instead of collecting more training data, you can synthesize more training data to compensate for that.<\/li>\n \t<li>Lastly, you can apply some <a href=\"https:\/\/en.wikipedia.org\/wiki\/Domain_adaptation\" target=\"_blank\" rel=\"noopener noreferrer\">domain adaptation<\/a> techniques to training and test distributions. These techniques are still more in the research realm than in a production-ready environment. In particular, they can be trained on \u201csource\u201d distribution and generalize to another \u201ctarget\u201d using only unlabeled data or limited labeled data. You should consider using it when access to labeled data from test distribution is limited and\/or access to relatively similar data is plentiful. Broadly speaking, there are 2 types of domain adaptation: (1) <strong>Supervised<\/strong>\u200a\u2014\u200ayou have limited data from the target domain. Examples include fine-tuning a pre-trained model and adding target data to the train set; (2) <strong>Unsupervised<\/strong>\u200a\u2014\u200ayou have lots of unlabeled data from the target domain. Examples include <a href=\"https:\/\/arxiv.org\/abs\/1607.01719\" target=\"_blank\" rel=\"noopener noreferrer\">correlation alignment<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/1412.3474\" target=\"_blank\" rel=\"noopener noreferrer\">domain confusion<\/a>, and <a href=\"https:\/\/machinelearningmastery.com\/what-is-cyclegan\/\" target=\"_blank\" rel=\"noopener noreferrer\">CycleGAN<\/a>.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4d85cab elementor-widget elementor-widget-text-editor\" data-id=\"4d85cab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe final step, if applicable, to improve your model is to <strong>rebalance your datasets<\/strong>. Periodically during training, you should check the error on the actual hold-out test set. If the model performance on the test &amp; validation set is significantly better than the performance on the test set, you over-fit to the validation set. This can happen with small validation sets or lots of hyper-parameter tuning. When it does happen, you can recollect the validation data by re-shuffling the test\/validation split ratio.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-546f9ed elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"30320\" data-id=\"546f9ed\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-38a84d1\" data-eae-slider=\"59423\" data-id=\"38a84d1\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c68d659 elementor-widget elementor-widget-image\" data-id=\"c68d659\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/images.squarespace-cdn.com\/content\/v1\/59d9b2749f8dce3ebe4e676d\/1579134066358-3W8MSYKST3VHLOA6MT3C\/ke17ZwdGBToddI8pDm48kJcVGmGkAahqLMYTjXXLXwNZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZUJFbgE-7XRK3dMEBRBhUpxzD15V0nafjknaHfjZxvWKaWK1mkOBhG07EwVnqr7_0SGTuWVQPCHkxzDmR_Xl2H4\/hyperparams-sensitivity.png?format=750w\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-14fd514 elementor-widget elementor-widget-text-editor\" data-id=\"14fd514\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tGiven this complexity, it\u2019s clear that finding the optimal configuration for these variables in a high-dimensional space is challenging. This is because searching for hyper-parameters is an iterative process that is constrained by computing power, time, and money. There are a couple of methods available for doing this:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2d8d9e3 elementor-widget elementor-widget-heading\" data-id=\"2d8d9e3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><strong>METHOD 1\u200a\u2014\u200aMANUAL OPTIMIZATION<\/strong><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e7ca908 elementor-widget elementor-widget-text-editor\" data-id=\"e7ca908\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li>This method is 100% manual. You must thoroughly understand the algorithm at use to train and evaluate the model, then guess a better hyper-parameter value and re-evaluate the model\u2019s performance. You can combine with other methods, for example, manually selecting parameter ranges to optimize over.<\/li>\n \t<li>For a skilled practitioner, this may require the least amount of computation to get good results.<\/li>\n \t<li>However, the method is time-consuming and requires a detailed understanding of the algorithm.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b6d2a3f elementor-widget elementor-widget-heading\" data-id=\"b6d2a3f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><strong>METHOD 2\u200a\u2014\u200aGRID\u00a0SEARCH<\/strong><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6d70495 elementor-widget elementor-widget-text-editor\" data-id=\"6d70495\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Hyperparameter_optimization#Grid_search\" target=\"_blank\" rel=\"noopener noreferrer\">Grid search<\/a> is a naive approach of simply trying every possible configuration. It\u2019s super simple to implement (<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.GridSearchCV.html\" target=\"_blank\" rel=\"noopener noreferrer\">GridSearchCV<\/a>) and can produce good results.<\/li>\n \t<li>Unfortunately, it\u2019s not very efficient since we need to train the model on all cross-combinations of the hyper-parameters. It also requires prior knowledge about the parameters to get good results.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eb7468a elementor-widget elementor-widget-heading\" data-id=\"eb7468a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><strong>METHOD 3\u200a\u2014\u200aRANDOM\u00a0SEARCH<\/strong><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8dc9d21 elementor-widget elementor-widget-text-editor\" data-id=\"8dc9d21\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Hyperparameter_optimization#Random_search\" target=\"_blank\" rel=\"noopener noreferrer\">Random search<\/a> is different from grid search such that we pick the point randomly from the configuration space instead of all possible combinations. It\u2019s also easy to implement (<a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.model_selection.RandomizedSearchCV.html\" target=\"_blank\" rel=\"noopener noreferrer\">RandomizedSearchCV<\/a>) and often produces better results than grid search.<\/li>\n \t<li>But the random search is not very interpretable and may also require prior knowledge about the parameters to get good results.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-527ea06 elementor-widget elementor-widget-heading\" data-id=\"527ea06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><strong>METHOD 4\u200a\u2014\u200aCOARSE-TO-FINE<\/strong><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9f087c8 elementor-widget elementor-widget-text-editor\" data-id=\"9f087c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li>This means that you can discretize the available value range of each parameter into a \u201ccoarse\u201d grid of values to estimate the effect of increasing or decreasing the value of that parameter. After selecting the value that seems most promising or meaningful, you perform a \u201cfiner\u201d search around it to optimize even further.<\/li>\n \t<li>This helps you narrow in only on very high performing hyper-parameters and is a common practice in the industry. The only drawback is that it is somewhat a manual process.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f8a8206 elementor-widget elementor-widget-heading\" data-id=\"f8a8206\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><strong>METHOD 5\u200a\u2014\u200aBAYESIAN OPTIMIZATION<\/strong><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7073baa elementor-widget elementor-widget-text-editor\" data-id=\"7073baa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul data-rte-list=\"default\">\n \t<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Hyperparameter_optimization#Bayesian_optimization\" target=\"_blank\" rel=\"noopener noreferrer\">This search strategy<\/a> builds a surrogate model that tries to predict the metrics we care about from the hyper-parameters configuration. At a high level, we start with a prior estimate of parameter distributions. Then we maintain a probabilistic model fo the relationship between hyper-parameter values and model performance. We can alternate between (1) training with the hyper-parameter values that maximize the expected improvement and (2) using training results to update our probabilistic model. <a href=\"https:\/\/towardsdatascience.com\/a-conceptual-explanation-of-bayesian-model-based-hyperparameter-optimization-for-machine-learning-b8172278050f\" target=\"_blank\" rel=\"noopener noreferrer\">This post<\/a> from Will Koehrsen will give you a more detailed conceptual explanation of hyper-parameter tuning using Bayesian optimization.<\/li>\n \t<li>The big advantage is that Bayesian optimization is generally the most efficient hands-off way to choose hyper-parameters. But it\u2019s difficult to implement from scratch and can be hard to integrate with off-the-shelf tools.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5c1c806 elementor-widget elementor-widget-text-editor\" data-id=\"5c1c806\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSo in brief, you should start by trying out <strong>coarse-to-fine random searches<\/strong> first and consider moving to <strong>Bayesian hyper-parameter optimization<\/strong> solutions as your codebase matures.\n<h2><strong>Conclusion<\/strong><\/h2>\nTo wrap up this post, deep learning troubleshooting and debugging is really hard. It\u2019s difficult to tell if you have a bug because there are lots of possible sources for the same degradation in performance. Furthermore, the results can be sensitive to small changes in hyper-parameters and dataset makeup.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e0a38b9 elementor-widget elementor-widget-text-editor\" data-id=\"e0a38b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTo train bug-free Deep Learning models, we really need to treat building them as an iterative process. <strong><em>If you skipped to the end<\/em><\/strong>, the following steps can make this process easier and catch errors as early as possible:\n<ul data-rte-list=\"default\">\n \t<li>Choose the simplest model and data possible.<\/li>\n \t<li>Once the model runs, overfit a single batch and reproduce a known result.<\/li>\n \t<li>Apply the bias-variance decomposition to decide what to do next.<\/li>\n \t<li>Use coarse-to-fine random searches to tune the model\u2019s hyper-parameters.<\/li>\n \t<li>Make your model bigger if your model under-fits and add more data and\/or regularization if your model over-fits.<\/li>\n<\/ul>\nHopefully, this post has presented helpful information for you to debug deep learning models.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Deep learning troubleshooting and debugging is really hard. It&rsquo;s difficult to tell if you have a bug because there are lots of possible sources for the same degradation in performance. Furthermore, the results can be sensitive to small changes in hyper-parameters and dataset makeup. To train bug-free Deep Learning models, we really need to treat building them as an iterative process. To make this process easier and catch errors as early as possible, this article suggests steps you can follow.<\/p>\n","protected":false},"author":86,"featured_media":3630,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[92],"ppma_author":[1842],"class_list":["post-2248","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-machine-learning"],"authors":[{"term_id":1842,"user_id":86,"is_guest":0,"slug":"james-le","display_name":"James Le","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Le","first_name":"James","job_title":"","description":"James Le is a Software Developer with experiences in Product Management and Data Analytics. He played a pivotal role in the operation of a start-up organization at Denison University."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2248","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/86"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2248"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2248\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3630"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2248"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2248"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2248"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2248"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}