{"id":22493,"date":"2020-12-11T08:16:17","date_gmt":"2020-12-11T08:16:17","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/shortcut-learning-reason-ml-models-fail-practice\/"},"modified":"2023-09-21T17:57:35","modified_gmt":"2023-09-21T17:57:35","slug":"shortcut-learning-reason-ml-models-fail-practice","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/shortcut-learning-reason-ml-models-fail-practice\/","title":{"rendered":"Shortcut Learning, The Reason ML Models Often Fail in Practice"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22493\" class=\"elementor elementor-22493\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-8b1304a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8b1304a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-352451b\" data-id=\"352451b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-065636d elementor-widget elementor-widget-text-editor\" data-id=\"065636d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\">TLDR, models always take the route of least effort.<\/p>\n\n<p id=\"53ec\">Training machine learning models is far from easy. In fact, the unaware data scientist might trip and fall in as many pitfalls as there are living AWS instances. The list is endless but divides itself nicely into two broad categories: underfitting, your model is bad, and overfitting, your model is still bad,&nbsp;<em>but you think it isn\u2019t<\/em>. While overfitting can manifest itself in various ways, shortcut learning is a recurring flavor when dealing with custom datasets and novel problems. It affected me; it might be affecting you.<\/p>\n\n<p id=\"790d\"><strong>Informally,\u00a0<em>shortcut learning occurs whenever a model fits a problem on data not expected to be relevant or present, in general<\/em>.<\/strong><\/p>\n\n<p id=\"4f7d\">A practical example is a dog\/cat classifier that, instead of properly recognizing dog- and cat-features, specializes in detecting leashes. Assuming leashes means dogs will likely do well most of the time, but leashes are not a general descriptor of dogness. That\u2019s lazy work!<\/p>\n\n<p id=\"dfb8\">In other words,&nbsp;<em>the model took a shortcut to solve the problem.&nbsp;<\/em>It cheated.<\/p>\n\n<p id=\"a8fd\">Shortcut learning typically arises when&nbsp;<em>there isn\u2019t enough data to force algorithms into learning the task properly<\/em>. In our dog\/cat example, most dog pictures likely included a leash while cat pictures didn\u2019t. Learning to detect leashes is a far simpler task than recognizing pets: they are simple, they contrast with the dog, and they are usually vibrant colored.<\/p>\n\n<p id=\"c0f2\">The catch is that&nbsp;<em>machine learning algorithms favor the route of least effort.<\/em><\/p>\n\n<p id=\"a5eb\">Putting it another way, a machine learning algorithm will only learn what you want if that is the easiest thing it can do to maximize its metrics. As long as there are leash-like shortcuts, models will cheat.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bb3bf0f elementor-widget elementor-widget-image\" data-id=\"bb3bf0f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-1024x683.jpeg\" class=\"attachment-large size-large wp-image-18142\" alt=\"Shortcut Learning, The Reason ML Models Often Fail in Practice\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-1024x683.jpeg 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-300x200.jpeg 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-768x512.jpeg 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-1536x1024.jpeg 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-2048x1365.jpeg 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-610x407.jpeg 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-750x500.jpeg 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0QCBCzFjZlxUZXNdE-1140x760.jpeg 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Photo by\u00a0<a href=\"https:\/\/unsplash.com\/@mirkosajkov?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Mirko Sajkov<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Unsplash<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6909672 elementor-widget elementor-widget-text-editor\" data-id=\"6909672\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"e0df\">Shortcut learning helps us realize that&nbsp;<em>we don\u2019t give algorithms any real, meaningful instruction on what to learn<\/em>. We can\u2019t blame our tools for slacking \u2014 they just fit data, after all.<\/p>\n\n<p id=\"85fd\">From the above, it follows that<em>&nbsp;we don\u2019t have any reliable way to teach algorithms how to tell background from the foreground<\/em>. If all dog pictures were outdoor and all cat ones were indoor, it wouldn\u2019t surprise me to see classifiers becoming experts at detecting sky and grass \u2014 not dogs and cats.<\/p>\n\n<p id=\"e300\">These small observations carry a profound insight into machine learning&#8217;s current nature:&nbsp;<em>it is still a highly random process<\/em>.<\/p>\n\n<p id=\"a515\">To this date, the only sure-fire way to reduce the likelihood of shortcuts in your data is to\u2026 add more data!<\/p>\n\n<p id=\"7c65\">If we increase our dogs and cats dataset to include outdoor and indoor pictures of each kind and leash-less dogs and leashed cats, we might get a step closer to a true dog and cat classifier.&nbsp;<em>We might get<\/em>. There is no guarantee there aren\u2019t any other shortcuts lingering around \u2014 you never know.<\/p>\n\n<p id=\"fd49\">Removing shortcuts is much harder than it seems. It is quite challenging to disentangle concepts by simply adding contrasting data. For instance, most cars are found outdoors; however, a car is a car regardless of its location. If we blindly search for car images, we are bound to get shortcuts such as road = car, asphalt = car, tire = car, etc.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aef1e3b elementor-widget elementor-widget-heading\" data-id=\"aef1e3b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">To better illustrate this, consider this simple custom dataset:<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a5f040c elementor-widget elementor-widget-image\" data-id=\"a5f040c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"396\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-1024x396.png\" class=\"attachment-large size-large wp-image-18143\" alt=\"Shortcut Learning, The Reason ML Models Often Fail in Practice\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-1024x396.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-300x116.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-768x297.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-610x236.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-750x290.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw-1140x441.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1cGju63Ft-slr6GkkXkh5Tw.png 1325w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Dataset provided by the Author<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b311a7a elementor-widget elementor-widget-text-editor\" data-id=\"b311a7a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"78e8\">This eight-images dataset shows a single book and a single headset. While no one in their right mind would even consider this to be a proper dataset, let\u2019s give it the benefit of the doubt and wonder how a model would fit it.<\/p>\n\n<p id=\"5eec\">The book images have many straight lines, lots of white, lots of brown, and text, while the headset images have lots of blue, stripes, black, and curves. A model trained on such data would promptly classify any black book as a headset and any brown thing as a book. A CNN, in particular, would quickly fit local patterns such as text = books and stripes = headset.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e2253c5 elementor-widget elementor-widget-heading\" data-id=\"e2253c5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">How can we solve this? More data!<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-719dbc4 elementor-widget elementor-widget-text-editor\" data-id=\"719dbc4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"d6de\">After some coding, we got ourselves a Google Images crawler and downloaded a thousand book images and a thousand headset images. Let\u2019s inspect what Google has to offer regarding <a href=\"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/top-books-kickstart-machine-learning-journey\/\" target=\"_blank\" rel=\"noreferrer noopener\">books <\/a>and headsets:<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d448d53 elementor-widget elementor-widget-image\" data-id=\"d448d53\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"1024\" height=\"299\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-1024x299.png\" class=\"attachment-large size-large wp-image-18144\" alt=\"Shortcut Learning, The Reason ML Models Often Fail in Practice\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-1024x299.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-300x88.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-768x225.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-1536x449.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-2048x599.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-610x178.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-750x219.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1_t1xDcOk1AlmXbUph_mgPA-1140x333.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">First two rows of searching \u201cBook\u201d at Google Images<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-84c2e6e elementor-widget elementor-widget-image\" data-id=\"84c2e6e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"311\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-1024x311.png\" class=\"attachment-large size-large wp-image-18145\" alt=\"First two rows of searching \u201cHeadset\u201d at Google Images\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-1024x311.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-300x91.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-768x234.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-1536x467.png 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-2048x623.png 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-610x185.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-750x228.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1XC8lIogrCYVAkWW3JKbMWQ-1140x347.png 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">First two rows of searching \u201cHeadset\u201d at Google Images<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-757a378 elementor-widget elementor-widget-text-editor\" data-id=\"757a378\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9e7b\">What we got is that most book images are of book stacks or bookshelves, which, by the way, are quite likely to be made of wood, and all headset images come from shopping sites; these are all illustrations, not real images. If we scroll down (a lot), we get pictures of people wearing headsets, opening an avenue for the algorithm to mistake faces for headsets.<\/p>\n\n<p id=\"d4a8\"><strong>In other words,\u00a0<em>blindly pouring data only made our problem worse<\/em>.<\/strong><\/p>\n\n<p id=\"d2f7\">While all datasets mentioned so far are toy problems, these shortcut issues are more common than we would like to acknowledge. Shortcuts are especially frequent when constructing custom datasets, which is what many people do. Such datasets are typically small and contain data coming from only a handful of sources or were generated by just a couple of hardworking people.<\/p>\n\n<p id=\"3705\">A practical example is AI applied to medicine. Building a medical imaging dataset is quite expensive, as you need everyone\u2019s permission, only doctors can label it, and some conditions are naturally rare. On top of that, images are of high resolution, and their content can be quite complex. It is rather easy for models to fit arbitrary structures instead of learning their intended task. Some images also contain metadata, such as the patient name, hospital logo, and time of day, all of which can become nasty shortcuts.<\/p>\n\n<p id=\"fd62\">Custom datasets involving humans, in particular, are quite prone to such issues as well. In many, there is a limited budget for actors and scenes, leading to poor variety. This setup is quite open for algorithms to overfit to specific actors or clothing elements rather than what they are doing.<\/p>\n\n<p id=\"6152\">To better understand this phenomenon, we need a bit of formality\u2026<\/p>\n\n<p id=\"201c\">Classification is a mapping from some data&nbsp;<em>x&nbsp;<\/em>to a set of labels<em>&nbsp;y,&nbsp;<\/em>or&nbsp;<em>x \u2192 y<\/em>. For instance, if we want to detect persons in pictures,&nbsp;<em>x&nbsp;<\/em>are pictures, and y is whether they have a person or not. In this framework, a classifier is a function that, given an&nbsp;<em>x,&nbsp;<\/em>answers its&nbsp;<em>y<\/em>. A good classifier will answer the right label most of the time; a bad classifier won\u2019t.<\/p>\n\n<p id=\"9ca2\">Without any loss of generality, we might breakdown our&nbsp;<em>x&nbsp;<\/em>elements as a \u201csignal,\u201d which is the real thing we want to track, and a set of \u201cnoises\u201d that tarnish it. Thus,&nbsp;<em>x&nbsp;<\/em>can be rewritten as<em>&nbsp;s + n\u2081 + n\u2082 + n\u2083 + \u2026<\/em><\/p>\n\n<p id=\"22b5\">In our original example, the signals are the dogs and cats, the things we really want to track, and the noises are everything else: the leashes, the background elements, the pet color, their pose, etc. By themselves, these noisy elements are harmless.&nbsp;<em>Shortcut learning arises when these are simultaneously easily detectable and correlated with y.&nbsp;<\/em>Thus, detecting the noise becomes the route of least effort \u2014 the shortcut.<\/p>\n\n<p id=\"05b4\">This gets worse with powerful models, such as neural networks and support vector machines. These models\u2019 flexibility might backfire into learning non-trivial couplings of noises, such as \u201cleash or outdoor equals dog\u201d and \u201craised tail equals cat.\u201d&nbsp;<em>Never underestimate a model&#8217;s ability to overfit.<\/em><\/p>\n\n<p id=\"a74e\">It is important to mention that our signal itself is also not a single beacon. Many pet breeds vary in color, and owners might style their pets&nbsp;<em>to their hearts\u2019 content<\/em>. A classifier might fit some of these signals, but not all \u2014 or jointly fit both noise and signal elements.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-31c639c elementor-widget elementor-widget-image\" data-id=\"31c639c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"704\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-1024x704.jpeg\" class=\"attachment-large size-large wp-image-18146\" alt=\"Shortcut Learning, The Reason ML Models Often Fail in Practice\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-1024x704.jpeg 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-300x206.jpeg 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-768x528.jpeg 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-1536x1056.jpeg 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-2048x1407.jpeg 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-610x419.jpeg 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-750x515.jpeg 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0MMgeNOptMnQzdU6t-1140x783.jpeg 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Photo by\u00a0<a href=\"https:\/\/unsplash.com\/@ralphkayden?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Ralph (Ravi) Kayden<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Unsplash<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a9d0186 elementor-widget elementor-widget-text-editor\" data-id=\"a9d0186\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"090a\">Apowerful concept in this analysis is the&nbsp;<em>signal-to-noise-ratio<\/em>. As we add more images, some noises cancel out. For instance, as we added unleashed dogs to our data, detecting leashes became increasingly less rewarding. The more data we add, the less noise we get, and, gradually, our signal shall become the only reliable source for classification.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e813f57 elementor-widget elementor-widget-heading\" data-id=\"e813f57\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">At this point, we are ready to talk about&nbsp;<em>addressing shortcut learning<\/em>. Without changing the problem, there are a couple of ways in which we can alleviate it:<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-689f285 elementor-widget elementor-widget-text-editor\" data-id=\"689f285\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li><strong>Add lots of data<\/strong>: as discussed, one effective way of raising the signal-to-noise-ratio is pouring more data, evening out the noise. For simple domains, however, blindly adding data might work against you. A safer alternative is to\u2026<\/li><li><strong>Add the right data<\/strong>: when it&#8217;s expensive to find quality data, it helps to locate current shortcuts and aim for opposing data. In our example, that means searching for leashed cats and unconstrained dogs.<\/li><li><strong>Include regularization<\/strong>: techniques such as dropout and weight decay aim at making the learning task artificially harder, forcing the algorithm to rely on robust signals rather than occasional hints. Similarly, you might\u2026<\/li><li><strong>Use smaller, dumber models<\/strong>: tiny models cannot focus on as many properties as bigger ones; therefore, they are less prone to rely on spurious correlations and more likely to pay attention to the true signal.<\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-209b3d4 elementor-widget elementor-widget-text-editor\" data-id=\"209b3d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"823e\">In this regard, it is&nbsp;<em>imperative<\/em>&nbsp;to highlight that&nbsp;<strong>data augmentation is not an effective measure to address shortcut learning<\/strong>. The reason being that most operations are likely to either preserve or augment pre-existing shortcuts. A leash is still a leash even if flipped, rotated, zoomed, sheared, and hue-shifted. Cutout might help to some extent, but I wouldn\u2019t rely on it.<\/p>\n\n<p id=\"c588\"><em>If you are able to change the problem formulation<\/em>, an effective way of avoiding shortcuts&nbsp;<em>almost entirely<\/em>&nbsp;is recasting classification as detection or segmentation. In these more advanced formulations, you are explicitly telling the algorithm what to look for and requiring it to show its findings. While this approach effectively solves shortcut learning, annotating your existing data might prove to be prohibitive in most cases.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb36de2 elementor-widget elementor-widget-image\" data-id=\"cb36de2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"682\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-1024x682.jpeg\" class=\"attachment-large size-large wp-image-18147\" alt=\"The Reason ML Models Often Fail in Practice\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-1024x682.jpeg 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-300x200.jpeg 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-768x512.jpeg 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-1536x1024.jpeg 1536w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-2048x1365.jpeg 2048w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-610x407.jpeg 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-750x500.jpeg 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0TJCVQ7Y8YUbs1COu-1140x760.jpeg 1140w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Photo by\u00a0<a href=\"https:\/\/unsplash.com\/@jdk4lyfe?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Jeanie de Klerk<\/a>\u00a0on\u00a0<a href=\"https:\/\/unsplash.com\/?utm_source=medium&amp;utm_medium=referral\" target=\"_blank\" rel=\"noopener\">Unsplash<\/a><\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0ec187f elementor-widget elementor-widget-text-editor\" data-id=\"0ec187f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"b69c\">One aspect we haven\u2019t discussed so far is&nbsp;<em>how to spot shortcut learning<\/em>. This is a tricky subject, as with detecting any form of overfitting; however, we can take some specific actions to uncover shortcuts or at least find evidence that something is wrong.<\/p>\n\n<p id=\"e646\">The direct approach is to run\u00a0<a href=\"https:\/\/christophm.github.io\/interpretable-ml-book\/agnostic.html\" target=\"_blank\" rel=\"noreferrer noopener\">interpretability models<\/a>. These algorithms try to explain which features of your data contribute the most to the models\u2019 outputs. Many of these techniques are model agnostic, meaning that they can be applied regardless of the model used, such as the\u00a0<a href=\"https:\/\/christophm.github.io\/interpretable-ml-book\/shap.html\" target=\"_blank\" rel=\"noreferrer noopener\">SHAP\u00a0<\/a>technique.<\/p>\n\n<p id=\"e6d8\">In the case of vision models, most interpretation tools highlight the specific areas that contributed the most to a specific answer. It takes some time to inspect all images, but it certainly pays off. You should look for the obvious: the area that contributes to \u201cdog\u201d should be the dog itself or some of its dog-features, such as its face, ears, and feet.<\/p>\n\n<p id=\"a552\">Occasional exceptions are OK, but you should move them to a separate folder for a second look. Inspecting all odd detections together might reveal interesting patterns that you wouldn\u2019t otherwise spot. I also recommend sorting your data by loss. You are bound to find odd entries and badly labeled data at both the low-loss and high-loss ends.<\/p>\n\n<p id=\"bb60\">If you don\u2019t want to go through all this trouble, there are some automated ways of checking if something is odd.&nbsp;<em>These methods won\u2019t generally tell you what is wrong but are sure to raise some eye browns if something is awry<\/em>.<\/p>\n\n<p id=\"ea86\">A quick assessment is swapping your training\/test split. If you train-on-test and test-on-train, not much should change, apart from somewhat inferior results. If results are significantly lower (or higher!) or the training progresses at a particularly different pace, something might be off.<\/p>\n\n<p id=\"6705\">A more robust assessment is\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-validation_(statistics)\" target=\"_blank\" rel=\"noreferrer noopener\">cross-validation<\/a>. The general idea is to create several training-test splits out of the same dataset and train a model for each. In normal circumstances, all models should reach similar performance levels. If that\u2019s not the case, your model is not truly generalizing to unseen data.<\/p>\n\n<p id=\"6a87\">Inthis article, we overviewed what shortcut learning is, why it is so hard to get rid of it, a couple of measures we can take to handle it, and, finally, how we can search for them directly and indirectly. If you are working with custom, in-house, datasets, I highly encourage you to go through some of the outlined steps and see if your models are really fitting to the intended signals and, if they aren\u2019t, uncover what they are really fitting to.<\/p>\n\n<p id=\"1fe2\">Throughout this piece, I focused on image classification. However, this is not strictly a vision issue. There can be shortcuts on text, graphs, and audio as well. Similar issues also arise in regression and reinforcement scenarios.<\/p>\n\n<p id=\"ef78\">I hope this has been a pleasant read for you as it was for me to write. If you would like to read more on the topic, I highly recommend the recent work of Geirhos\u00a0<em>et al.\u00a0<\/em><a href=\"https:\/\/arxiv.org\/abs\/2004.07780\" target=\"_blank\" rel=\"noreferrer noopener\">You can find it here.<\/a><\/p>\n\n<p id=\"bdc1\">Thanks for reading <\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Training machine learning models is far from easy. Shortcut learning typically arises when there isn\u2019t enough data to force algorithms into learning the task properly.<\/p>\n","protected":false},"author":996,"featured_media":18148,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[563,92,1110,1111],"ppma_author":[3859],"class_list":["post-22493","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-algorithms","tag-machine-learning","tag-ml-models","tag-shortcut-learning"],"authors":[{"term_id":3859,"user_id":996,"is_guest":0,"slug":"serpa","display_name":"Ygor Rebou\u00e7as Serpa","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/10c04385a4cc7e1aba3fc8d1561f4778f1cd025cd5a15a2fc39f55e0da7d24da?s=96&d=mm&r=g","user_url":"https:\/\/medium.com\/@ygorrebouasserpa%20","last_name":"Rebou\u00e7as Serpa","first_name":"Ygor","job_title":"","description":"Ygor Rebou\u00e7as Serpa is a Data Scientist and Game Developer. He is currently working on explainable AI models applied to healthcare."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/996"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22493"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22493\/revisions"}],"predecessor-version":[{"id":33125,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22493\/revisions\/33125"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18148"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22493"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}