{"id":22746,"date":"2021-04-16T07:46:00","date_gmt":"2021-04-16T07:46:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/neural-network-anatomical-structures\/"},"modified":"2023-08-26T06:52:48","modified_gmt":"2023-08-26T06:52:48","slug":"neural-network-anatomical-structures","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/neural-network-anatomical-structures\/","title":{"rendered":"Two Fundamental Neural Network Anatomical Structures"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22746\" class=\"elementor elementor-22746\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-fc66bf6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-eae-slider=\"63200\" data-id=\"fc66bf6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-0c7ac16\" data-eae-slider=\"86614\" data-id=\"0c7ac16\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-51d6e4a elementor-widget elementor-widget-text-editor\" data-id=\"51d6e4a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\">Fully-connected versus Convolutional<\/p>\n<p id=\"bf50\">Like other organisms, artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional. The second one is better suited to problems in image processing in which there are local features in a space with geometry. The first one is generally appropriate on problems in which there isn\u2019t a geometry and spatial locality of features is not paramount.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fd45c8a elementor-widget elementor-widget-heading\" data-id=\"fd45c8a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Single Neurons<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-adcb7ef elementor-widget elementor-widget-text-editor\" data-id=\"adcb7ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"370b\">Let\u2019s start with models of single artificial neurons, the \u201cLeggo bricks\u201d of neural networks. A neuron takes a vector&nbsp;<strong>x<\/strong>&nbsp;as input and derives a scalar output y from it. Most neuron models conform to y = f(<strong>wx +&nbsp;<\/strong>b). Here&nbsp;<strong>w&nbsp;<\/strong>is the vector of weights of the same dimensionality as&nbsp;<strong>x<\/strong>, and b is a scalar called the neuron\u2019s bias. This is graphically depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f7e0dcd elementor-widget elementor-widget-image\" data-id=\"f7e0dcd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"635\" height=\"245\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/064uKvuiIx-m8fFZp.png\" class=\"attachment-large size-large wp-image-19159\" alt=\"Two Fundamental Neural Network Anatomical Structures\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/064uKvuiIx-m8fFZp.png 635w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/064uKvuiIx-m8fFZp-300x116.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/064uKvuiIx-m8fFZp-610x235.png 610w\" sizes=\"(max-width: 635px) 100vw, 635px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Multiply the input vector with a weight vector, then feed through an activation function<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6dfcaf0 elementor-widget elementor-widget-text-editor\" data-id=\"6dfcaf0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"160a\">The configurable part of this is the neuron\u2019s activation function f. Different choices of&nbsp;<em>f<\/em>&nbsp;lead to neurons with quite different capabilities.<\/p>\n<p id=\"8f08\">The most common choices are linear:&nbsp;<em>f<\/em>(<em>u<\/em>) =&nbsp;<em>u<\/em>, step:&nbsp;<em>f<\/em>(<em>u<\/em>) = sign(<em>u<\/em>), and sigmoid: f(u) =&nbsp;<em>1<\/em>\/(1+e^-<em>u<\/em>). These are depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3dabc3f elementor-widget elementor-widget-image\" data-id=\"3dabc3f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"452\" height=\"258\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0nmqR9QtxOUUKHjhM.png\" class=\"attachment-large size-large wp-image-19160\" alt=\"Activation Functions\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0nmqR9QtxOUUKHjhM.png 452w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0nmqR9QtxOUUKHjhM-300x171.png 300w\" sizes=\"(max-width: 452px) 100vw, 452px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Activation functions: sigmoid, linear, step<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3a88f06 elementor-widget elementor-widget-text-editor\" data-id=\"3a88f06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6a63\">When the target output y is expected to be (approximately) a linear function of&nbsp;<strong>x<\/strong>, a linear activation function is called for. This is the setting of linear regression.<\/p>\n<p id=\"9ce3\">When the target output y is binary, we have a choice: a sigmoidal activation function or a step activation function. Both theory and practice favor the sigmoid.<\/p>\n<p id=\"44bf\">There are several reasons for this.<\/p>\n<p id=\"466e\">One is that the sigmoid is differentiable whereas the step function is not. This allows the neuron\u2019s parameters (the weights\u00a0<strong>w<\/strong>\u00a0and the bias b) to be trained via gradient descent on a <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/how-to-become-data-engineer\/\" target=\"_blank\" rel=\"noreferrer noopener\">data<\/a> set of input-output pairs.<\/p>\n<p id=\"725d\">A second one is that using a sigmoid captures more information which can often be used profitably. For example, consider two situations, one in which the neuron\u2019s output is 0.9 and one in which it is 0.7. We might be inclined to classify both as 1s. (Remember we seek a binary output.) Should we do so, it makes sense to attach higher confidence to the first one since the neuron\u2019s output was higher.<\/p>\n<p id=\"c141\">Another way we can use the additional information in the neuron\u2019s output is by adjusting the binary classification threshold. This lets us become more (or less) conservative in our decision-making.<\/p>\n<p id=\"7509\">The key point here is that the classification threshold can be adjusted post-training. In fact, any time we wish to. (This is especially useful after the neuron starts making decisions in the \u201cfield\u201d and we realize we\u2019d like to tweak its behavior.) This threshold is not a parameter during training. Only the weight vector&nbsp;<strong>w&nbsp;<\/strong>and bias b are.<\/p>\n<p id=\"abb0\">Simply put, if we observe the neuron is overly sensitive (i.e. its outputs are sometimes towards 1 when the target is 0) we can increase the classification threshold. Similarly, if the neuron is not sensitive enough, we can decrease the classification threshold.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-47ade55 elementor-widget elementor-widget-heading\" data-id=\"47ade55\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Networks of Neurons<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1b2363a elementor-widget elementor-widget-text-editor\" data-id=\"1b2363a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4799\">Well, actually a single neuron is already an example. A useful one at that. It can be used to map a vector of inputs to a numeric or binary output. That is, to solve regression and binary classification problems.<\/p>\n<p id=\"ccb8\">Often, not very well though.<\/p>\n<p id=\"f769\">The first breakthrough is an intermediate layer of neurons between the input and the output. This is graphically depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7086612 elementor-widget elementor-widget-image\" data-id=\"7086612\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"392\" height=\"219\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Z6FKpQoea5i1GeD3.png\" class=\"attachment-large size-large wp-image-19161\" alt=\"Two Fundamental Neural Network Anatomical Structures\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Z6FKpQoea5i1GeD3.png 392w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0Z6FKpQoea5i1GeD3-300x168.png 300w\" sizes=\"(max-width: 392px) 100vw, 392px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">A hidden layer between X and y. Sigmoidal hidden neurons are especially attractive<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dfa3d71 elementor-widget elementor-widget-text-editor\" data-id=\"dfa3d71\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"56fb\">The intermediate layer is called a hidden layer. In the schematic, we have used sigmoidal neurons in the internal layer. Linear hidden neurons are less useful and step neurons have issues we discussed earlier.<\/p>\n<p id=\"8763\">A sigmoidal hidden neuron may be viewed as representing some binary feature of the input. The neuron\u2019s value is derived from the input vector&nbsp;<strong>x<\/strong>. This value may be viewed as the probability that the associated feature is present in the input.<\/p>\n<p id=\"3d39\">A neural network with a hidden layer maps an input vector to a vector in a space of features. This mapping is nonlinear. The feature vector is then mapped to the output. This indirect approach results in an architecture that is in principle more powerful than one without the hidden layer.<\/p>\n<p id=\"13e9\">In practice, there are some issues. How many neurons should go into the hidden layer? This depends on the complexity of the input-to-output mapping problem. This complexity may not be known. For linear problems, we may not need any hidden neurons. In fact, having them might hurt.<\/p>\n<p id=\"e1ae\">The short answer is we don\u2019t. That said, we could try a different number of hidden neurons and pick the one that works best or at least adequately.<\/p>\n<p id=\"4154\">On to the next question. For a fixed number of hidden neurons, how do the features they represent get learned? The short answer is via the learning process, typically a form of gradient descent called back-propagation. We won\u2019t go into the details here.<\/p>\n<p id=\"dd94\">That said, we will depict the roles the various weights touching a hidden neuron play.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7bc40b0 elementor-widget elementor-widget-image\" data-id=\"7bc40b0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"452\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-1024x452.png\" class=\"attachment-large size-large wp-image-19162\" alt=\"Feature Function\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-1024x452.png 1024w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-300x132.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-768x339.png 768w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-610x269.png 610w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-750x331.png 750w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4-1140x503.png 1140w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0ukpZZEEL9t-nhVM4.png 1205w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">The input-to-feature weights learn the feature function. The weights from the various features to the output control the relative influence of the various features on the final output.<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a2c8b83 elementor-widget elementor-widget-text-editor\" data-id=\"a2c8b83\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"fab0\">Finally, let\u2019s mention that, as before, the output neuron is sigmoidal for a binary classification problem and linear for a regression problem.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a3ad84 elementor-widget elementor-widget-heading\" data-id=\"4a3ad84\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Image Classification Problems<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b69c24 elementor-widget elementor-widget-text-editor\" data-id=\"7b69c24\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"cf8f\">Say we have a large set of images, some containing cats, others not. We\u2019d like to learn a classifier that can tell whether or not an image has a cat in it.<\/p>\n<p id=\"a574\">Say each of our images has lots of pixels in them. 100 X 100 = 10,000 for concreteness.<\/p>\n<p id=\"ddb6\">In principle, we could map this problem to a neural network with one hidden layer, as depicted below.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1ce25c1 elementor-widget elementor-widget-image\" data-id=\"1ce25c1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"690\" height=\"549\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0l_0miISOjClPpBkn.png\" class=\"attachment-large size-large wp-image-19163\" alt=\"Two Fundamental Neural Network Anatomical Structures\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0l_0miISOjClPpBkn.png 690w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0l_0miISOjClPpBkn-300x239.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0l_0miISOjClPpBkn-610x485.png 610w\" sizes=\"(max-width: 690px) 100vw, 690px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Lots of input neurons. With even a moderate number of hidden neurons, this means lots and lots of input-to-hidden weights!<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d49a504 elementor-widget elementor-widget-text-editor\" data-id=\"d49a504\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"da8d\">In practice this approach has difficulties. We have 10,000 input neurons. With a hidden layer of&nbsp;<em>m<\/em>&nbsp;neurons, this means we have 10000*<em>m<\/em>&nbsp;input-to-hidden weights. It&#8217;s hard to imagine that we could use less than 20 hidden neurons to adequately learn a cat-or-not classifier.<\/p>\n<p id=\"7ab5\">20 hidden neurons means 200,000 input-to-hidden weights. That&#8217;s a lot of weights to learn! Even with a rich training set, overfitting is a significant risk.<\/p>\n<p id=\"61d0\">Let\u2019s think differently. Is there structure in this domain (images) that we might be able to exploit? It turns out the answer is yes. First some observations.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec064e4 elementor-widget elementor-widget-text-editor\" data-id=\"ec064e4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li>The input neurons are in a spatial grid.<\/li><li>Features in an image are often local.<\/li><li>The same feature may occur at different locations in the input.<\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-26a9cb4 elementor-widget elementor-widget-text-editor\" data-id=\"26a9cb4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"12dc\">Let\u2019s expand on 2 and 3. Consider the picture below. It shows horizontal edges at various locations. Each edge is the same feature but at a different location. Each of these feature occurrences is also local. Local just means that to detect the edge at any one location, one only needs to look at pixel values in the proximity of the edge.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-03745b3 elementor-widget elementor-widget-image\" data-id=\"03745b3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"440\" height=\"296\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0gFyQPgQcJrh0osr2.png\" class=\"attachment-large size-large wp-image-19164\" alt=\"Local Feature\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0gFyQPgQcJrh0osr2.png 440w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0gFyQPgQcJrh0osr2-300x202.png 300w\" sizes=\"(max-width: 440px) 100vw, 440px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">A local feature (horizontal edge) that occurs at many locations in the image<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e3aaac7 elementor-widget elementor-widget-text-editor\" data-id=\"e3aaac7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7767\">Okay, we now see that each feature is also on the same spatial grid as the input pixels. The implication of this is that for any one feature, there isn\u2019t a single value, rather a grid of values. For each location (i,j), feature&nbsp;<em>f<\/em>&nbsp;has a value that indicates whether f is present or absent at that location.<\/p>\n<p id=\"eab2\">Great. Seems like we have gone in the opposite direction. Seems like we have made the problem more complex. Instead of a feature having a single value, it now has a spatial grid of values, one per location.<\/p>\n<p id=\"5b77\">Not really. How would we try to represent a local feature in an MLNN? Separating out the actual feature function from the location where it applies? Plus, leverage its locality. We can\u2019t.<\/p>\n<p id=\"8122\">The inability of the MLNN to (i) exploit feature locality and (ii) the ability to evaluate the feature at many different locations means that we need a lot of hidden neurons to model the combination of the functional and spatial aspects of a feature. (Functional meaning what the feature is, spatial meaning where it is evaluated.) On top of that, because we are unable to exploit locality, the number of input-to-hidden weights explodes.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c211833 elementor-widget elementor-widget-text-editor\" data-id=\"c211833\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre class=\"wp-block-preformatted\">lots of input neurons X lots of hidden neurons X fully-connected \u2192 lots and lots of input-to-hidden weights \u2192 Network way overly complex<\/pre>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-34b77bf elementor-widget elementor-widget-text-editor\" data-id=\"34b77bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"4326\">The picture below depicts the alternative that leverages the input geometry, the locality of features, and the need to evaluate a feature at many different locations.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bea449a elementor-widget elementor-widget-image\" data-id=\"bea449a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"716\" height=\"579\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0JK3uk3slPVO7P3YG.png\" class=\"attachment-large size-large wp-image-19165\" alt=\"Two Fundamental Neural Network Anatomical Structures\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0JK3uk3slPVO7P3YG.png 716w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0JK3uk3slPVO7P3YG-300x243.png 300w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0JK3uk3slPVO7P3YG-610x493.png 610w\" sizes=\"(max-width: 716px) 100vw, 716px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">The value of a feature is itself a map. The feature\u2019s function is scanned over the entire input image to produce its feature map.<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3fe4f61 elementor-widget elementor-widget-text-editor\" data-id=\"3fe4f61\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"8c15\">Let\u2019s zoom into the feature\u2019s neuron at location (i,j).<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2842637 elementor-widget elementor-widget-image\" data-id=\"2842637\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"467\" height=\"410\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0aPNMac7dn2byS4cS.png\" class=\"attachment-large size-large wp-image-19166\" alt=\"Locality made explicit with the receptive field. The feature\u2019s weights depend only on its function, not on the location it is evaluated.\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0aPNMac7dn2byS4cS.png 467w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/0aPNMac7dn2byS4cS-300x263.png 300w\" sizes=\"(max-width: 467px) 100vw, 467px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Locality made explicit with the receptive field. The feature\u2019s weights depend only on its function, not on the location it is evaluated.<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5657e74 elementor-widget elementor-widget-text-editor\" data-id=\"5657e74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c9b2\">Okay, so any single (local) feature is represented by the same small set of weights. We just slide this feature\u2019s function (called a Kernel) over the various locations (<em>i<\/em>,&nbsp;<em>j<\/em>) to get a reading on the feature\u2019s values over the entire grid. This sliding process is called convolution.<\/p>\n<p id=\"fb47\">By contrast, the MLP has no explicit mechanisms for exploiting either locality or sharing of weights.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ce084c6 elementor-widget elementor-widget-heading\" data-id=\"ce084c6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Summary<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-826776e elementor-widget elementor-widget-text-editor\" data-id=\"826776e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"c0ae\">We have covered the anatomical structures of the two most important (feedforward) neural network architectures: fully-connected multi-layer neural networks and convolutional neural networks. We have discussed why convolutional neural networks are better suited to image processing than multi-layer neural networks. On the MLNNs are well-suited to problems in which locality and convolutions don\u2019t come into play in obvious ways.<\/p>\n<p id=\"b1df\"><strong>Further Reading<\/strong><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9edba6f elementor-widget elementor-widget-text-editor\" data-id=\"9edba6f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ol><li><a href=\"http:\/\/deeplearning.stanford.edu\/tutorial\/supervised\/ConvolutionalNeuralNetwork\/\" rel=\"noopener\">http:\/\/deeplearning.stanford.edu\/tutorial\/supervised\/ConvolutionalNeuralNetwork\/<\/a><\/li><li><a href=\"http:\/\/deeplearning.stanford.edu\/tutorial\/supervised\/MultiLayerNeuralNetworks\/\" rel=\"noopener\">http:\/\/deeplearning.stanford.edu\/tutorial\/supervised\/MultiLayerNeuralNetworks\/<\/a><\/li><\/ol>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Artificial neural networks have evolved through the ages. In this post, we cover two key anatomies that have emerged: fully-connected versus convolutional.<\/p>\n","protected":false},"author":1044,"featured_media":19167,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[97,297,1500,1501,1502],"ppma_author":[3691],"class_list":["post-22746","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-artificial-neural-networks","tag-convolutional-network","tag-multilayer-perceptron","tag-sigmoid-function"],"authors":[{"term_id":3691,"user_id":1044,"is_guest":0,"slug":"arun-jagota","display_name":"Arun Jagota","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Arun-Jagota-150x150.jpeg","author_category":"","user_url":"https:\/\/www.salesforce.com\/in\/?ir=1","last_name":"Jagota","first_name":"Arun","job_title":"","description":"Arun Jagota is Director of Data Science at Salesforce.com. A PhD in computer science, he has taught undergraduate, graduate, and continuing education courses in Computer Science at many US Universities from 1992 through 2006. He has written a number of books, most available at Amazon.com, 50 academic publications and has 17+ patents issued."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22746","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1044"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22746"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22746\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/19167"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22746"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22746"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22746"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22746"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}