{"id":9299,"date":"2020-08-12T08:58:27","date_gmt":"2020-08-12T08:58:27","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=9299"},"modified":"2023-11-21T11:13:49","modified_gmt":"2023-11-21T11:13:49","slug":"deriving-convolution-from-first-principles","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/deriving-convolution-from-first-principles\/","title":{"rendered":"Deriving Convolution From First Principles"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"9299\" class=\"elementor elementor-9299\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6f639b33 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6f639b33\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-3ea6ce32\" data-id=\"3ea6ce32\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cee8a2e elementor-widget elementor-widget-heading\" data-id=\"cee8a2e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2><em>Basics Of Deep Learning<\/em><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-073ac83 elementor-widget elementor-widget-text-editor\" data-id=\"073ac83\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p><strong>TL;DR:<\/strong><em>\u00a0Have you even wondered what is so special about convolution? In this post, I derive the convolution from first principles and show that it naturally emerges from translational symmetry.<\/em><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8e6059d elementor-widget elementor-widget-image\" data-id=\"8e6059d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1281\/1*u8YBOP6arEINUnJWX6ih0A.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eeab52b elementor-widget elementor-widget-text-editor\" data-id=\"eeab52b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>La connoissance de certains principes suppl\u00e9e facilement \u00e0 la connoissance de certains faits. (Claude Adrien Helv\u00e9tius)<\/p>\n<\/blockquote>\n\n\n\n<p>During my undergraduate studies, which I did in Electrical Engineering at the Technion in Israel, I was always appalled that such an important concept as convolution [1] just landed out of nowhere. This seemingly arbitrary definition disturbed the otherwise beautiful picture of the signal processing world like a grain of sand in one\u2019s eye. How nice would it be to have the convolution emerge from first principles rather than have it postulated! As I will show in this post, such first principles are the notion of translational invariance or symmetry.<\/p>\n\n\n\n<p>Let me start with the formula taught in basic signal processing courses defining the discrete convolution [2] of two\u00a0<em>n<\/em>-dimensional vectors\u00a0<strong>x<\/strong>\u00a0and\u00a0<strong>w<\/strong>:<\/p>\n\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f479f94 elementor-widget elementor-widget-image\" data-id=\"f479f94\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4356\/1*LSOz7E7Oa6-WwHJFgCHWQA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-84dea3f elementor-widget elementor-widget-text-editor\" data-id=\"84dea3f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Here, for convenience, I assume that all the indices run from zero to\u00a0<em>n<\/em>\u22121 and are modulo\u00a0<em>n<\/em>; it is convenient to think of vectors as defined on a circle. Writing the above formula as a matrix-vector multiplication leads to a very special matrix that is called\u00a0<em>circulant<\/em>:<\/p>\n<!-- \/wp:paragraph --\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3817b3c elementor-widget elementor-widget-image\" data-id=\"3817b3c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4088\/1*g6V-4zS8i_nWXz6U9vhRDQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2fda371 elementor-widget elementor-widget-text-editor\" data-id=\"2fda371\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>A circulant matrix has multi-diagonal structure, with elements on each diagonal having the same value. It can be formed by stacking together shifted (modulo\u00a0<em>n<\/em>) versions of a vector\u00a0<strong>w\u00a0<\/strong>[3]; for this reason, I use the notation\u00a0<strong>C<\/strong>(<strong>w<\/strong>) referring to a circulant matrix formed by the vector\u00a0<strong>w<\/strong>. Since any convolution\u00a0<strong>x<\/strong>\u2217<strong>w<\/strong>can beequivalently represented as a multiplication by the circulant matrix\u00a0<strong>C<\/strong>(<strong>w<\/strong>)<strong>x<\/strong>, I will use the two terms interchangeably.<\/p>\n\n\n\n<p>One of the first things we are taught in linear algebra is that matrix multiplication is non-commutative, i.e., in general,\u00a0<strong>AB<\/strong>\u2260<strong>BA<\/strong>. However, circulant matrices are very special exception:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Circulant matrices commute,<\/p>\n<\/blockquote>\n\n\n\n<p>or in other words,\u00a0<strong>C<\/strong>(<strong>w<\/strong>)<strong>C<\/strong>(<strong>u<\/strong>)=<strong>C<\/strong>(<strong>u<\/strong>)<strong>C<\/strong>(<strong>w<\/strong>). This is true for any circulant matrix, or any choice of\u00a0<strong>u<\/strong>\u00a0and\u00a0<strong>w<\/strong>. Equivalently, we can say that the convolution is a commutative operation,\u00a0<strong>x<\/strong>\u2217<strong>w<\/strong>=<strong>w<\/strong>\u2217<strong>x<\/strong>.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d0b2391 elementor-widget elementor-widget-image\" data-id=\"d0b2391\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4410\/1*K0fmvr-Z5ZuquUPfHecumg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8f65cc5 elementor-widget elementor-widget-text-editor\" data-id=\"8f65cc5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>A particular choice of\u00a0<strong>w<\/strong>=[0,1,0\u2026,0] yields a special circulant matrix that shifts vectors to the right by one position. This matrix is called the (right)\u00a0<em>shift operator\u00a0<\/em>[4] and denoted by\u00a0<strong>S<\/strong>. The transpose of the right shift operator is the left shift operator. Obviously, shifting left and then right (or vice versa) does not do anything, which means\u00a0<strong>S<\/strong>\u00a0is an orthogonal matrix:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-e10e6b5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e10e6b5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-023b009\" data-id=\"023b009\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fb4d3aa elementor-widget elementor-widget-image\" data-id=\"fb4d3aa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4410\/1*jF98Q5QwbIgbjotHQnDUDQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-00ddd31 elementor-widget elementor-widget-text-editor\" data-id=\"00ddd31\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t&#8220;Deriving Convolution From First Principles&#8221;\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-00f6314 elementor-widget elementor-widget-text-editor\" data-id=\"00f6314\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Circulant matrices can be characterised by their commutativity property. It appears to be sufficient to show only commutativity with shift (Lemma 3.1 in [5]):<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>A matrix is circulant if and only if it commutes with shift.<\/p>\n<\/blockquote>\n\n\n\n<p>The first direction of this \u201cif and only if\u201d statement leads to a very important property called\u00a0<em>translation<\/em>\u00a0or\u00a0<em>shift equivariance\u00a0<\/em>[6]: the convolution\u2019s commutativity with shift implies that it does not matter whether we first shift a vector and then convolve it, or first convolve and then shift \u2014 the result will be the same.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f134cde elementor-widget elementor-widget-image\" data-id=\"f134cde\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4410\/1*XeIGa8__ibj4zW6QdUcaWg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0fefa94 elementor-widget elementor-widget-text-editor\" data-id=\"0fefa94\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t&#8220;Deriving Convolution From First Principles&#8221;\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e065448 elementor-widget elementor-widget-text-editor\" data-id=\"e065448\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>The second direction allows us to\u00a0<em>define<\/em>\u00a0convolution as the shift-equivariant linear operation: in order to commute with shift, a matrix must have the circulant structure. This is exactly what we aspired to from the beginning, to have the convolution emerge from the first principles of translational symmetry [7]. Instead of being given a formula of the convolution and proving its shift equivariance property, as it is typically done in signal processing books, we can start from the requirement of shift equivariance and arrive at the formula of the convolution as the only possible linear operation satisfying it.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-083486e elementor-widget elementor-widget-image\" data-id=\"083486e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4011\/1*qmguqQXWr2ACluPh8N1Vcw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a79e5f0 elementor-widget elementor-widget-text-editor\" data-id=\"a79e5f0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t&#8220;Deriving Convolution From First Principles&#8221; \n<figcaption>Illustration of shift equivariance as the interchangeability of shift and blur operations.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f83217b elementor-widget elementor-widget-text-editor\" data-id=\"f83217b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Another important fact taught in signal processing courses is the connection between the convolution and the Fourier transform [8]. Here as well, the Fourier transform lands out of the blue, and then one is shown that it diagonalises the convolution operation, allowing to perform convolution of two vectors in the frequency domain as element-wise product of their Fourier transforms. Nobody ever explains where these sines and cosines come from and what is so special about them.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>In order to get to the bottom of it, recall a fact from linear algebra:<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:quote -->\n<blockquote class=\"wp-block-quote\">\n<p>Commuting matrices are jointly diagonalisable.<\/p>\n<\/blockquote>\n<!-- \/wp:quote -->\n\n<!-- wp:paragraph -->\n<p>In other words, two matrices satisfying\u00a0<strong>AB<\/strong>=<strong>BA<\/strong>\u00a0will have the same eigenvectors (but possibly different eigenvalues) [9]. Since all circulant matrices commute, we can pick one of them and compute its eigenvectors \u2014 the above theorem assures that these will be the eigenvectors of all circulant matrices as well.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7f2e944 elementor-widget elementor-widget-text-editor\" data-id=\"7f2e944\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>It is convenient to pick the shift operator\u00a0<strong>S<\/strong>. Since\u00a0<strong>S<\/strong>\u00a0is orthogonal matrix, we expect its eigenvectors to be orthogonal [10]. A simple calculation (see Section 4.1 in [5]) leads to the conclusions that<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:quote -->\n<blockquote class=\"wp-block-quote\">\n<p>Shift operator is diagonalised by the Fourier transform.<\/p>\n<\/blockquote>\n<!-- \/wp:quote -->\n\n<!-- wp:paragraph -->\n<p>I hope that at this point you have your second \u201caha\u201d moment for today: this is where the sines and cosines come from! They are the eigenvectors of the shift operator; I denote them as columns of the matrix\u00a0<strong>\u03a6<\/strong>. Note that the eigenvectors are complex, so we need to take complex conjugation when transposing\u00a0<strong>\u03a6<\/strong>. Multiplication (from the left) by\u00a0<strong>\u03a6*<\/strong>\u00a0is called the Fourier transform, and by\u00a0<strong>\u03a6<\/strong>\u00a0the inverse Fourier transform.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-48cd0c8 elementor-widget elementor-widget-image\" data-id=\"48cd0c8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4410\/1*teSN3OBvVjVJtxMg5-32Og.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-daac239 elementor-widget elementor-widget-text-editor\" data-id=\"daac239\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t&#8220;Deriving Convolution From First Principles&#8221;\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4013f0f elementor-widget elementor-widget-text-editor\" data-id=\"4013f0f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>Since all circulant matrices are jointly diagonalisable, they are also diagonalised by the Fourier transform [11]. They differ only in their eigenvalues. The last missing bit is the realisation that<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:quote -->\n<blockquote class=\"wp-block-quote\">\n<p>The eigenvalues of C(<strong><em>w<\/em><\/strong>) are the Fourier transform of w.<\/p>\n<\/blockquote>\n<!-- \/wp:quote -->\n\n<!-- wp:paragraph -->\n<p>We can now put all the pieces of the puzzle into a statement known as the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Convolution_theorem\" target=\"_blank\" rel=\"noreferrer noopener\">Convolution Theorem<\/a>: the convolution\u00a0<strong>x<\/strong>\u2217<strong>w<\/strong>\u00a0can be computed either as a circulant matrix\u00a0<strong>C<\/strong>(<strong>w<\/strong>) applied to\u00a0<strong>x<\/strong>\u00a0in the original system of coordinate (sometimes this is called \u201cspatial domain\u201d convolution), or in the Fourier basis (\u201cspectral domain\u201d) by first computing the Fourier transform of\u00a0<strong>\u03a6*x<\/strong>, multiplying it by the Fourier transform of\u00a0<strong>w<\/strong>\u00a0[12], and then computing the inverse Fourier transform.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-d2e25e5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d2e25e5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4a43b77\" data-id=\"4a43b77\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-609de66 elementor-widget elementor-widget-image\" data-id=\"609de66\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/4410\/1*Xx9cyDX8k1mestra0gYeaA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-72a7bfc elementor-widget elementor-widget-text-editor\" data-id=\"72a7bfc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t&#8220;Deriving Convolution From First Principles&#8221;\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2e22804 elementor-widget elementor-widget-text-editor\" data-id=\"2e22804\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>Because\u00a0<strong>\u03a6\u00a0<\/strong>has a special redundant structure, the products\u00a0<strong>\u03a6*x<\/strong>\u00a0and\u00a0<strong>\u03a6x\u00a0<\/strong>can be computed with \ud835\udcaa(<em>n\u00a0<\/em>log\u00a0<em>n<\/em>) complexity using a\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Fast_Fourier_transform#:~:text=A%20fast%20Fourier%20transform%20(FFT,frequency%20domain%20and%20vice%20versa.\" target=\"_blank\" rel=\"noreferrer noopener\">Fast Fourier Transform<\/a>\u00a0(FFT) algorithm.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>Why is such a definition of convolution a big deal and should be taught this way? I will repeat here the quote from Helv\u00e9tius I opened this post with: \u201cThe knowledge of certain principles easily compensates the lack of knowledge of certain facts\u201d. In the case of convolution, its derivation from first principles allow easy generalisation to other domains. In a next post, I will show how to define convolution on graphs, in order to produce a key building block of <a href=\"https:\/\/www.experfy.com\/blog\/deep-learning-on-graphs-successes-challenges-and-next-steps\/\" target=\"_blank\" rel=\"noreferrer noopener\">graph deep learning<\/a> architectures.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:separator --><hr class=\"wp-block-separator\" \/><!-- \/wp:separator -->\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d137229 elementor-widget elementor-widget-text-editor\" data-id=\"d137229\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>[1] A. Dominquez-Torres,\u00a0<a href=\"https:\/\/www.slideshare.net\/Alexdfar\/origin-adn-history-of-convolution\" target=\"_blank\" rel=\"noreferrer noopener\">The history and origins of convolution<\/a>\u00a0provides a fascinating excursion into the\u00a0<a href=\"https:\/\/www.embs.org\/pulse\/articles\/history-convolution-operation\/\" target=\"_blank\" rel=\"noreferrer noopener\">historical development<\/a>\u00a0of the concept and notation of the convolution operation. The convolution integral shows up for the first time in a derivation of Taylor\u2019s theorem in J. B. D\u2019Alembert, Recherches sur diff\u00e9rents points importants du syst\u00e8me du monde (1754). The priority is often mistakenly attributed to P.-S. de Laplace, M\u00e9moire sur l\u2019inclinaison moyenne des orbites des com\u00e8tes, sur la figure de la terre et sur les fonctions (1773). M\u00e9moires de l\u2019Acad\u00e9mie Royale des Sciences de Paris, Savants \u00c9trangers 7:503\u2013540, though this publication contains no trace of convolution.\u00a0<a href=\"https:\/\/mathshistory.st-andrews.ac.uk\/DSB\/Laplace.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Laplace<\/a>\u00a0indeed used the convolution in his later memoir on probability, written in 1778 and published in 1781. Early naming attempts included\u00a0<em>r\u00e9sultante<\/em>\u00a0(French for \u201cresultant\u201d, first used by Charles Cailler in 1899),\u00a0<em>composizione<\/em>\u00a0(Italian for \u201ccomposition\u201d, by Vito Volterra in 1910), and\u00a0<em>faltung<\/em>\u00a0(literally meaning \u201cfolding\u201d in German, by Gustav Doetsch in 1923); the latter term dominated the German literature in the beginning of the 20th century. The English name\u00a0<em>convolution<\/em>\u00a0comes from the Latin\u00a0<em>con<\/em>\u00a0(\u201ctogether\u201d) and\u00a0<em>volvere<\/em>\u00a0(\u201croll up\u201d) and is a literal translation of the German\u00a0<em>faltung,\u00a0<\/em>and so is the Russian variant\u00a0<em>\u0441\u0432\u0451\u0440\u0442\u043a\u0430<\/em>. The first use of the English term can be traced to the 1934 paper of Aurel Friedrich Wintner; it was later cemented in the literature by authoritative works of Doetsch (1937) and Gardner and Barnes (1942). The star symbol was first used by Volterra in 1910, though in a different form. Percy John Daniel used a dot notation. The first modern notation of convolution as\u00a0<em>f<\/em>\u2217<em>g<\/em>, a combination of the two, is due to Doetsch (1923).<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[2] Technically speaking, what I define here is\u00a0<em>circular convolution<\/em>.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[3] Note that the rows of\u00a0<strong>C<\/strong>(<strong>w<\/strong>) have the vector\u00a0<strong>w<\/strong>\u00a0transposed, resulting in the reflection that appears in the convolution formula and distinguished it from a related notion of\u00a0<a href=\"https:\/\/towardsdatascience.com\/convolution-vs-correlation-af868b6b4fb5\" target=\"_blank\" rel=\"noreferrer noopener\">correlation<\/a>. Pay attention to the boundary conditions (the elements of\u00a0<strong>C<\/strong>\u00a0in the top right and bottom left corners).<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[4] I use the terms\u00a0<em>operator<\/em>\u00a0and\u00a0<em>matrix<\/em>\u00a0interchangeably.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[5] B. Bamieh,\u00a0<a href=\"https:\/\/arxiv.org\/pdf\/1805.05533\" target=\"_blank\" rel=\"noreferrer noopener\">Discovering transforms: a tutorial on circulant matrices, circular convolution, and the discrete Fourier transform<\/a>\u00a0(2018). arXiv:1805.05533 provides the details of the derivations I discuss in this post.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04a483f elementor-widget elementor-widget-text-editor\" data-id=\"04a483f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<!-- wp:paragraph -->\n<p>[6] Some often confuse\u00a0<em>invariance<\/em>\u00a0(meaning \u201cunchanged\u201d in Latin) and\u00a0<em>equivariance<\/em>\u00a0(\u201cchanging in the same way\u201d), with many signal processing books referring to the property I discuss here as \u201cshift invariance\u201d. A function is\u00a0<em>shift equivariant\u00a0<\/em>if<em>\u00a0f<\/em>(<strong>Sx<\/strong>)=<strong>S<\/strong><em>f<\/em>(<strong>x<\/strong>)<em>.<\/em>\u00a0In other words, it does not matter whether we first apply the shift and then\u00a0<em>f<\/em>\u00a0or vice versa. Differently, shift invariance is the property of being unaffected by shift: the function\u00a0<em>f<\/em>(<strong>Sx<\/strong>)=<em>f<\/em>(<strong>x<\/strong>) is\u00a0<em>shift invariant<\/em>. Shift invariance is a fundamental concept in physics (where it often appears under the name of \u201ctranslational symmetry\u201d), stating that the laws of physics do not depend on the location in space. In variational formulation of classical mechanics, such a fundamental law as the conservation of momentum follows from shift invariance by virtue of\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Noether%27s_theorem\" target=\"_blank\" rel=\"noreferrer noopener\">Noether\u2019s theorem<\/a>.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[7] The concept of equivariance is more general and can be extended using the group theory formalism. This framework was used in T. Cohen and M. Welling,\u00a0<a href=\"http:\/\/proceedings.mlr.press\/v48\/cohenc16.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Group equivariant convolutional networks\u00a0<\/a>(2016). Proc. ICML, to extend the shift equivariance of convolutional layers of CNNs to more general operations such as rotations. Assuming\u00a0<em>f\u00a0<\/em>: X\u2192Y, where X and Y are some different spaces with corresponding groups \ud835\udca2 and \ud835\udca2\u2019 of operations defined on the elements of X and Y respectively, group equivariance is expressed as\u00a0<em>f<\/em>(\ud835\udd24(<strong>x<\/strong>))=\ud835\udd24\u2019(<em>f<\/em>(<strong>x<\/strong>)) where \ud835\udd24\u2208\ud835\udca2 and \ud835\udd24\u2019\u2208\ud835\udca2\u2019. Note that \ud835\udd24\u2019 is not necessarily equal to \ud835\udd24 as the structure and even dimension of the output space Y can be different from that of the input X. The standard convolution discussed in this post is a particular case with X=Y being the space of\u00a0<em>n<\/em>-dimensional vectors, \ud835\udca2=\ud835\udca2\u2019 the translation group, and \ud835\udd24=\ud835\udd24\u2019 the shift operator.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[8] Since we deal with finite-dimensional vectors, the term \u201cFourier transform\u201d refers here to the\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Discrete_Fourier_transform\" target=\"_blank\" rel=\"noreferrer noopener\">Discrete Fourier Transform<\/a>\u00a0(DFT).<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[9] More precisely, joint diagonalisation implies that two commuting matrices have the same<em>\u00a0eigenspaces<\/em>, as in the general case the eigenvalues can have non-trivial multiplicity. Since in the case I discuss here all the eigenvalues are simple, we can talk about a common eigenbasis.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[10] However, since\u00a0<strong>S<\/strong>\u00a0is non-symmetric, it does not have real eigenvalues (symmetric real matrices have real eigenvalues). The eigenvalues of\u00a0<strong>S<\/strong>\u00a0happen to be the complex\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Root_of_unity\" target=\"_blank\" rel=\"noreferrer noopener\">roots of unity<\/a>.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[11] When I say that matrix\u00a0<strong>C<\/strong>\u00a0is \u201cdiagonalised\u201d by the Fourier transform, I mean that the matrix\u00a0<strong>\u03a6*C\u03a6\u00a0<\/strong>is diagonal. Since Fourier transform is an orthogonal matrix (<strong>\u03a6*\u03a6<\/strong>=<strong>I<\/strong>), geometrically it acts as a change of the system of coordinates that amounts to an\u00a0<em>n<\/em>-dimensional rotation. In this system of coordinates, the action of\u00a0<strong>C<\/strong>\u00a0becomes element-wise product.<\/p>\n<!-- \/wp:paragraph -->\n\n<!-- wp:paragraph -->\n<p>[12] In signal processing, one typically designs the filter in the frequency domain, so the Fourier transform of\u00a0<strong>w<\/strong>\u00a0is never explicitly computed.<\/p>\n<!-- \/wp:paragraph -->\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Have you even wondered what is so special about convolution? Know how to derive the convolution from first principles and show that it naturally emerges from translational symmetry.<\/p>\n","protected":false},"author":874,"featured_media":9300,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[544,543,206],"ppma_author":[3686],"class_list":["post-9299","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-convolution","tag-convolutional-newral-net","tag-deep-learning"],"authors":[{"term_id":3686,"user_id":874,"is_guest":0,"slug":"michael-bronstein","display_name":"Michael Bronstein","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/08\/Michael-Bronstein-150x150.jpg","user_url":"https:\/\/www.imperial.ac.uk\/people\/m.bronstein","last_name":"Bronstein","first_name":"Michael","job_title":"","description":"Michael Bronstein is Professor, Chair in Machine Learning and Pattern Recognition at Imperial College, London, besides Head of Graph ML at Twitter \/ ML Lead at ProjectCETI\/ ex Founder &amp; Chief Scientist at Fabula_ai\/ ex at Intel #AI #ML #graphs. His main expertise is in theoretical and computational geometric methods for data analysis, and his research encompasses a broad spectrum of applications ranging from machine learning, computer vision, and pattern recognition to geometry processing, computer graphics, and imaging. He has authored over 150 papers, the book Numerical geometry of non-rigid shapes (Springer 2008), and holds over 30 granted patents."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9299","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/874"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=9299"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9299\/revisions"}],"predecessor-version":[{"id":34215,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/9299\/revisions\/34215"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/9300"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=9299"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=9299"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=9299"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=9299"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}