{"id":2313,"date":"2020-03-12T05:03:45","date_gmt":"2020-03-12T05:03:45","guid":{"rendered":"http:\/\/kusuaks7\/?p=1918"},"modified":"2024-01-01T10:11:17","modified_gmt":"2024-01-01T10:11:17","slug":"understanding-limits-of-cnns-ai-greatest-achievements","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/understanding-limits-of-cnns-ai-greatest-achievements\/","title":{"rendered":"Understanding The Limits Of CNNs, One Of AI\u2019s Greatest Achievements"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2313\" class=\"elementor elementor-2313\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-44e72e39 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"44e72e39\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-80a0eda\" data-id=\"80a0eda\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1af7fdea elementor-widget elementor-widget-text-editor\" data-id=\"1af7fdea\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAfter a prolonged winter, artificial intelligence is experiencing a scorching summer mainly thanks to advances in deep learning and\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/08\/05\/what-is-artificial-neural-network-ann\/\" rel=\"noopener\">artificial neural networks<\/a>. To be more precise, the renewed interest in deep learning is largely due to the success of\u00a0<a href=\"https:\/\/bdtechtalks.com\/2020\/01\/06\/convolutional-neural-networks-cnn-convnets\/\" rel=\"noopener\">convolutional neural networks (CNNs)<\/a>, a neural network structure that is especially good at dealing with visual data.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-65a0186 elementor-widget elementor-widget-text-editor\" data-id=\"65a0186\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut what if I told you that CNNs are fundamentally flawed? That was what Geoffrey Hinton, one of the pioneers of\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/02\/15\/what-is-deep-learning-neural-networks\/\" rel=\"noopener\">deep learning<\/a>, talked about in his keynote speech at the AAAI conference, one of the main yearly AI conferences.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-30e4b98 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"30e4b98\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e744e20\" data-id=\"e744e20\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cf4d4f8 elementor-widget elementor-widget-video\" data-id=\"cf4d4f8\" data-element_type=\"widget\" data-e-type=\"widget\" data-settings=\"{&quot;youtube_url&quot;:&quot;https:\\\/\\\/www.youtube.com\\\/embed\\\/UX8OubxsY8w?version=3&amp;rel=1&amp;fs=1&amp;autohide=2&amp;showsearch=0&amp;showinfo=1&amp;iv_load_policy=1&amp;wmode=transparent&quot;,&quot;video_type&quot;:&quot;youtube&quot;,&quot;controls&quot;:&quot;yes&quot;}\" data-widget_type=\"video.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-wrapper elementor-open-inline\">\n\t\t\t<div class=\"elementor-video\"><\/div>\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-355aeb8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"355aeb8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a132445\" data-id=\"a132445\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-99719f7 elementor-widget elementor-widget-text-editor\" data-id=\"99719f7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tHinton, who attended the conference with Yann LeCun and Yoshua Bengio, with whom he constitutes the Turin Award\u2013winning \u201c<a href=\"https:\/\/www.bloomberg.com\/news\/articles\/2019-03-27\/three-godfathers-of-deep-learning-selected-for-turing-award\" class=\"broken_link\" rel=\"noopener\">godfathers of deep learning<\/a>\u201d trio, spoke about the limits of CNNs as well as capsule networks, his masterplan for the next breakthrough in AI.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-1283d7e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"1283d7e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5ebdf93\" data-id=\"5ebdf93\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a82b184 elementor-widget elementor-widget-text-editor\" data-id=\"a82b184\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAs with all his speeches, Hinton went into a lot of technical details about what makes convnets inefficient\u2014or different\u2014compared to the human visual system. Following is some of the key points he raised. But first, as is our habit, some background on how we got here and why CNNs have become such a great deal for the AI community.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-db770eb elementor-widget elementor-widget-heading\" data-id=\"db770eb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Solving computer vision<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-56a4904 elementor-widget elementor-widget-text-editor\" data-id=\"56a4904\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSince the early days of artificial intelligence, scientists sought to create computers that could see the world like humans. The efforts have led to their own field of research collectively known as\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/01\/14\/what-is-computer-vision\/\" rel=\"noopener\">computer vision<\/a>.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-54525db elementor-widget elementor-widget-text-editor\" data-id=\"54525db\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nEarly work in computer vision involved the use of\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/11\/18\/what-is-symbolic-artificial-intelligence\/\" rel=\"noopener\">symbolic artificial intelligence<\/a>, software in which every single rule must be specified by human programmers. The problem is, not every function of the human visual apparatus can be broken down in explicit computer program rules. The approach ended up having very limited success and use.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-08ee181 elementor-widget elementor-widget-text-editor\" data-id=\"08ee181\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tA different approach was the use of\u00a0<a href=\"https:\/\/bdtechtalks.com\/2017\/08\/28\/artificial-intelligence-machine-learning-deep-learning\/\" rel=\"noopener\">machine learning<\/a>. Contrary to symbolic AI, machine learning algorithms are given a general structure and unleashed to develop their own behavior by examining training examples. However, most early machine learning algorithms still required a lot of manual effort to engineers the parts that detect relevant features in images.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-980400a elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"980400a\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-25bc57b\" data-id=\"25bc57b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d968002 elementor-widget elementor-widget-image\" data-id=\"d968002\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2018\/12\/classic-machine-learning-breast-cancer-detection.png?resize=696%2C393&#038;ssl=1\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-193c4a4 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"193c4a4\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4f10023\" data-id=\"4f10023\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fd8d369 elementor-widget elementor-widget-text-editor\" data-id=\"fd8d369\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\"><span style=\"text-align: center; background-color: rgba(0, 0, 0, 0.05);\">Classic machine learning approaches involved lots of complicated steps and required the collaboration of dozens of domain experts, mathematicians, and programmers<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-2c3ea33 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"2c3ea33\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8c9ce9d\" data-id=\"8c9ce9d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-469ccac elementor-widget elementor-widget-text-editor\" data-id=\"469ccac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/what-are-convolutional-neural-networks-cnn\/\">Convolutional neural networks<\/a>, on the other hand, are end-to-end AI models that develop their own feature-detection mechanisms. A well-trained CNN with multiple layers automatically recognizes features in a hierarchical way, starting with simple edges and corners down to complex objects such as faces, chairs, cars, dogs, etc.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a28b07 elementor-widget elementor-widget-text-editor\" data-id=\"2a28b07\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tCNNs were first introduced in 1980s by LeCun, then a postdoc research associate in Hinton\u2019s lab in University of Toronto. But because of their immense compute and data requirements, they fell by the wayside and gained very limited adoption. It took three decades and advances in computation hardware and data storage technology for CNNs to manifest their full potential.\n<p style=\"text-align: center;\">Today, thanks to the availability of large computation clusters, specialized hardware, and vast amounts of data, convnets have found\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/12\/30\/computer-vision-applications-deep-learning\/\" rel=\"noopener\">many useful applications<\/a>\u00a0in image classification and object recognition.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2cd85bc elementor-widget elementor-widget-heading\" data-id=\"2cd85bc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>The difference between CNNs and human vision<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6f497b7 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6f497b7\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8db0ce6\" data-id=\"8db0ce6\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2b0253d elementor-widget elementor-widget-text-editor\" data-id=\"2b0253d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cCNNs learn everything end to end. They get a huge win by wiring in the fact that if a feature is good in one place, it\u2019s good somewhere else. This allows them to combine evidence and generalize nicely across position,\u201d Hinton said in his AAAI speech. \u201cBut they\u2019re very different from human perception.\u201d\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-4ed0484 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4ed0484\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3541f31\" data-id=\"3541f31\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2b3adb2 elementor-widget elementor-widget-text-editor\" data-id=\"2b3adb2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne of the key challenges of computer vision is to deal with the variance of data in the real world. Our visual system can recognize objects from different angles, against different backgrounds, and under different lighting conditions. When objects are partially obscured by other objects or colored in eccentric ways, our vision system uses cues and other pieces of knowledge to fill in the missing information and reason about what we\u2019re seeing.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-3842576 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3842576\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3b22a23\" data-id=\"3b22a23\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-69b1a14 elementor-widget elementor-widget-text-editor\" data-id=\"69b1a14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tCreating AI that can replicate the same object recognition capabilities has proven to be very difficult.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f33f5fb elementor-widget elementor-widget-text-editor\" data-id=\"f33f5fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cCNNs are designed to cope with translations,\u201d Hinton said. This means that a well-trained convnet can identify an object regardless of where it appears in an image. But they\u2019re not so good at dealing with other effects of changing viewpoints such as rotation and scaling.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-85b96c3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"85b96c3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-8d8a555\" data-id=\"8d8a555\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d43045c elementor-widget elementor-widget-text-editor\" data-id=\"d43045c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne approach to solving this problem, according to Hinton, is to use 4D or 6D maps to train the AI and later perform object detection. \u201cBut that just gets hopelessly expensive,\u201d he added.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fa039a1 elementor-widget elementor-widget-text-editor\" data-id=\"fa039a1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nFor the moment, the best solution we have is to gather massive amounts of images that display each object in various positions. Then we train our CNNs on this huge dataset, hoping that it will see enough examples of the object to generalize and be able to detect the object with reliable accuracy in the real world. Datasets such as ImageNet, which contains more than 14 million annotated images, aim to achieve just that.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bd70272 elementor-widget elementor-widget-text-editor\" data-id=\"bd70272\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cThat\u2019s not very efficient,\u201d Hinton said. \u201cWe\u2019d like neural nets that generalize to new viewpoints effortlessly. If they learned to recognize something, and you make it 10 times as big and you rotate it 60 degrees, it shouldn\u2019t cause them any problem at all. We know computer graphics is like that and we\u2019d like to make neural nets more like that.\u201d\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-f4b10cd elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f4b10cd\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5edef22\" data-id=\"5edef22\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b1a8437 elementor-widget elementor-widget-text-editor\" data-id=\"b1a8437\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn fact, ImageNet, which is currently the go-to benchmark for evaluating computer vision systems, has proven to be flawed. Despite its huge size, the dataset fails to capture all the possible angles and positions of objects. It is mostly composed of images that have been taken under ideal lighting conditions and from known angles.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-2ec8cc2 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"2ec8cc2\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d0b989f\" data-id=\"d0b989f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-c480e79 elementor-widget elementor-widget-text-editor\" data-id=\"c480e79\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis is acceptable for the human vision system, which can easily generalize its knowledge. In fact, after we see a certain object from a few angles, we can usually imagine what it would look like in new positions and under different visual conditions.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-8f3a6d9 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"8f3a6d9\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-389b63a\" data-id=\"389b63a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2a4e7d9 elementor-widget elementor-widget-text-editor\" data-id=\"2a4e7d9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nBut CNNs need detailed examples of the cases they need to handle, and they don\u2019t have the creativity of the human mind. Deep learning developers usually try to solve this problem by applying a process called \u201cdata augmentation,\u201d in which they flip the image or rotate it by small amounts before training their neural networks. In effect, the CNN will be trained on multiple copies of every image, each being slightly different. This will help the AI better generalize over variations of the same object. Data augmentation, to some degree, makes the AI model more robust.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-9a71e0d elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9a71e0d\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-d23d465\" data-id=\"d23d465\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cd9f901 elementor-widget elementor-widget-text-editor\" data-id=\"cd9f901\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut data augmentation won\u2019t cover corner cases that CNNs and other neural networks can\u2019t handle, such as an upturned chair, or a crumpled t-shirt lying on a bed. These are real-life situation can\u2019t be achieved with pixel manipulation.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-f998496 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f998496\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-b23c9a0\" data-id=\"b23c9a0\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ff0be05 elementor-widget elementor-widget-image\" data-id=\"ff0be05\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/i0.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2019\/12\/objectnet_controls_table.png?fit=296%2C300&#038;ssl=1\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ef2cec2 elementor-widget elementor-widget-text-editor\" data-id=\"ef2cec2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\"><span style=\"background-color: rgba(0, 0, 0, 0.05);\">\u00a0ImageNet vs reality: In ImageNet (left column) objects are neatly positioned, in ideal background and lighting conditions. In the real world, things are messier (source: objectnet.dev)<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-74185c3 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"74185c3\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-324841b\" data-id=\"324841b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9bbc458 elementor-widget elementor-widget-text-editor\" data-id=\"9bbc458\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere have been efforts to solve this generalization problem by creating\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/12\/16\/objectnet-dataset-ai-computer-vision\/\" rel=\"noopener\">computer vision benchmarks and training datasets<\/a>\u00a0that better represent the messy reality of the real world. But while they will improve the results of current AI systems, they don\u2019t solve the fundamental problem of generalizing across viewpoints. There will always be new angles, new lighting conditions, new colorings, and poses that these new datasets don\u2019t contain. And those new situations will befuddle even the largest and most advanced AI system.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-5730c13 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5730c13\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-043961a\" data-id=\"043961a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-44895fb elementor-widget elementor-widget-heading\" data-id=\"44895fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Differences can be dangerous<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d133f3a elementor-widget elementor-widget-text-editor\" data-id=\"d133f3a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFrom the points raised above, it is obvious that CNNs recognize objects in a way that is very different from humans. But these differences are not limited to weak generalization and the need for many more examples to learn an object. The internal representations that CNNs develop of objects are also very different from that of the biological neural network of the human brain.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f252fac elementor-widget elementor-widget-text-editor\" data-id=\"f252fac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tHow does this manifest itself? \u201cI can take an image and a tiny bit of noise and CNNs will recognize it as something completely different and I can hardly see that it\u2019s changed. That seems really bizarre and I take that as evidence that CNNs are actually using very different information from us to recognize images,\u201d Hinton said in his keynote speech at the AAAI Conference.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dc9673f elementor-widget elementor-widget-text-editor\" data-id=\"dc9673f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThese slightly modified images are known as \u201c<a href=\"https:\/\/bdtechtalks.com\/2018\/12\/27\/deep-learning-adversarial-attacks-ai-malware\/\" rel=\"noopener\">adversarial examples<\/a>,\u201d and are a hot area of research in the AI community.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-66a3855 elementor-widget elementor-widget-image\" data-id=\"66a3855\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/i2.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2019\/04\/artificial-intelligence-adversarial-example-panda.png?fit=573%2C227&#038;ssl=1\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-376d212 elementor-widget elementor-widget-text-editor\" data-id=\"376d212\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p style=\"text-align: center;\"><span style=\"background-color: rgba(0, 0, 0, 0.05);\">Adversarial examples can cause neural networks to misclassify images while appearing unchanged to the human eye<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bdc7eb7 elementor-widget elementor-widget-text-editor\" data-id=\"bdc7eb7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cIt\u2019s not that it\u2019s wrong, they\u2019re just doing it in a very different way, and their very different way has some differences in how it generalizes,\u201d Hinton says.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dce487e elementor-widget elementor-widget-text-editor\" data-id=\"dce487e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut many examples show that adversarial perturbations can be extremely dangerous. It\u2019s all cute and funny when your image classifier mistakenly tags a panda as a gibbon. But when it\u2019s the computer vision system of a self-driving car missing a stop sign, an evil hacker bypassing a facial recognition security system, or Google Photos\u00a0<a href=\"https:\/\/mashable.com\/2015\/07\/01\/google-photos-black-people-gorillas\/?europe=true\" target=\"_blank\" rel=\"noopener noreferrer\">tagging humans as gorillas<\/a>, then you have a problem.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aa5ff39 elementor-widget elementor-widget-text-editor\" data-id=\"aa5ff39\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThere have been a lot of studies around\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/08\/20\/ai-adversarial-examples-hierarchical-random-switching\/\" rel=\"noopener\">detecting adversarial vulnerabilities<\/a>\u00a0and creating\u00a0<a href=\"https:\/\/bdtechtalks.com\/2019\/02\/20\/mit-ibm-ai-robustness-adversarial-examples\/\" rel=\"noopener\">robust AI systems<\/a>\u00a0that are resilient against adversarial perturbations. But adversarial examples also bear a reminder: Our visual system has evolved over generations to process the world around us, and we have also created our world to accommodate our visual system. Therefore, as long as our computer vision systems work in ways that are fundamentally different from human vision, they will be unpredictable and unreliable, unless they\u2019re supported by complementary technologies such as lidar and radar mapping.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-22fe28a elementor-widget elementor-widget-heading\" data-id=\"22fe28a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Coordinate frames and part-whole relationships are important<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-70564bf elementor-widget elementor-widget-text-editor\" data-id=\"70564bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnother problem that Geoffrey Hinton pointed to in his AAAI keynote speech is that convolutional neural networks can\u2019t understand images in terms of objects and their parts. They recognize them as blobs of pixels arranged in distinct patterns. They do not have explicit internal representations of entities and their relationships.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5148340 elementor-widget elementor-widget-text-editor\" data-id=\"5148340\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cYou can think of CNNs as you center of various pixel locations and you get richer and richer descriptions of what is happening at that pixel location that depends on more and more context. And in the end, you get such a rich description that you know what objects are in the image. But they don\u2019t explicitly parse images,\u201d Hinton said.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3e26583 elementor-widget elementor-widget-text-editor\" data-id=\"3e26583\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOur understanding of the composition of objects help us understand the world and make sense of things we haven\u2019t seen before, such as this bizarre teapot.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ac4c798 elementor-widget elementor-widget-image\" data-id=\"ac4c798\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/03\/Toilet-Teapot.jpg?fit=500%2C413&#038;ssl=1\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bb21645 elementor-widget elementor-widget-text-editor\" data-id=\"bb21645\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAlso missing from CNNs are coordinate frames, a fundamental component of human vision. Basically, when we see an object, we develop a mental model about its orientation, and this helps us to parse its different features. For instance, in the following picture, consider the face on the right. If you turn it upside down, you\u2019ll get the face on the left. But in reality, you don\u2019t need to physically flip the image to see the face on the left. Merely mentally adjusting your coordinate frame will enable you to see both faces, regardless of the picture\u2019s orientation.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-68091ed elementor-widget elementor-widget-image\" data-id=\"68091ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/i1.wp.com\/bdtechtalks.com\/wp-content\/uploads\/2020\/03\/two-way-head-optical-illusion.jpg?fit=500%2C313&#038;ssl=1\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fde3188 elementor-widget elementor-widget-text-editor\" data-id=\"fde3188\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cYou have a completely different internal percept depending on what coordinate frame you impose. Convolutional neural nets really can\u2019t explain that. You give them an input, they have one percept, and the percept doesn\u2019t depend on imposing coordinate frames. I would like to think that that is linked to adversarial examples and linked to the fact that convolutional nets are doing perception in a very different way from people,\u201d Hinton says.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c1befb7 elementor-widget elementor-widget-heading\" data-id=\"c1befb7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2>Taking lessons from computer graphics<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f9d89ac elementor-widget elementor-widget-text-editor\" data-id=\"f9d89ac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne very handy approach to solving computer vision, Hinton argued in his speech at the AAAI Conference, is to do inverse graphics. 3D computer graphics models are composed of hierarchies of objects. Each object has a transformation matrix that defines its translation, rotation, and scale in comparison to its parent. The transformation matrix of the top object in each hierarchy defines its coordinates and orientation relative to the world origin.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0064107 elementor-widget elementor-widget-text-editor\" data-id=\"0064107\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFor instance, consider the 3D model of a car. The base object has a 4\u00d74 transformation matrix that says the car\u2019s center is located at, say, coordinates (X=10, Y=10, Z=0) with rotation (X=0, Y=0, Z=90). The car itself is composed of many objects, such as wheels, chassis, steering wheel, windshield, gearbox, engine, etc. Each of these objects have their own transformation matrix that define their location and orientation in comparison to the parent matrix (center of the car). For instance, the center of the front-left wheel is located at (X=-1.5, Y=2, Z=-0.3). The world coordinates of the front-left wheel can be obtained by multiplying its transformation matrix by that of its parent.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d4f7ea5 elementor-widget elementor-widget-text-editor\" data-id=\"d4f7ea5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSome of these objects might have their own set of children. For instance, the wheel is composed of a tire, a rim, a hub, nuts, etc. Each of these children have their own transformation matrices.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6faf897 elementor-widget elementor-widget-text-editor\" data-id=\"6faf897\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tUsing this hierarchy of coordinate frames makes it very easy to locate and visualize objects regardless of their pose and orientation or viewpoint. When you want to render an object, each triangle in the 3D object is multiplied by its transformation matrix and that of its parents. It is then oriented with the viewpoint (another matrix multiplication) and then transformed to screen coordinates before being rasterized into pixels.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8147cd7 elementor-widget elementor-widget-text-editor\" data-id=\"8147cd7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\u201cIf you say [to someone working in computer graphics], \u2018Could you show me that from another angle,\u2019 they won\u2019t say, \u2018Oh, well, I\u2019d like to, but we didn\u2019t train from that angle so we can\u2019t show it to you from that angle.\u2019 They just show it to you from another angle because they have a 3D model and they model a spatial structure as the relations between parts and wholes and those relationships don\u2019t depend on viewpoint at all,\u201d Hinton says. \u201cI think it\u2019s crazy not to make use of that beautiful structure when dealing with images of 3D objects.\u201d\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd70317 elementor-widget elementor-widget-text-editor\" data-id=\"fd70317\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tCapsule networks, Hinton\u2019s ambitious new project, try to do inverse computer graphics. While capsules deserve their own separate set of articles, the basic idea behind them is to take an image, extract its objects and their parts, define their coordinate frames, and create a modular structure of the image.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a602942 elementor-widget elementor-widget-text-editor\" data-id=\"a602942\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tCapsule networks are still in the works, and since their introduction in 2017, they have undergone several iterations. But if Hinton and his colleagues succeed to make them work, we will be much closer to replicating the human vision.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Artificial intelligence is experiencing a scorching summer mainly thanks to advances in deep learning and&nbsp;artificial neural networks. To be more precise, the renewed interest in deep learning is largely due to the success of&nbsp;convolutional neural networks (CNNs), a neural network structure that is especially good at dealing with visual data. Early work in computer vision involved the use of&nbsp;symbolic artificial intelligence, software in which every single rule must be specified by human programmers.<\/p>\n","protected":false},"author":109,"featured_media":8200,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[226,97,542,918,487,919,920],"ppma_author":[1946],"class_list":["post-2313","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-ai","tag-artificial-intelligence","tag-cnn","tag-cnns","tag-computer-vision","tag-convolutional-neural-networks","tag-human-vision"],"authors":[{"term_id":1946,"user_id":109,"is_guest":0,"slug":"ben-dickson","display_name":"Ben Dickson","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_8aaf6bea-c4c1-455f-8156-8007d70910f8-150x150.jpg","user_url":"https:\/\/bdtechtalks.com\/","last_name":"Dickson","first_name":"Ben","job_title":"","description":"Ben Dickson is an experienced software engineer and tech blogger. He contributes regularly to major tech websites such as the Next Web, the Daily Dot, PCMag.com, Cointelegraph, VentureBeat, International Business Times UK, and The Huffington Post."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2313","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/109"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2313"}],"version-history":[{"count":7,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2313\/revisions"}],"predecessor-version":[{"id":35260,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2313\/revisions\/35260"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/8200"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2313"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2313"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2313"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2313"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}