{"id":22787,"date":"2021-05-06T06:47:00","date_gmt":"2021-05-06T06:47:00","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/interesting-useful-recurrent-neural-network\/"},"modified":"2023-06-26T09:47:50","modified_gmt":"2023-06-26T09:47:50","slug":"interesting-useful-recurrent-neural-network","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/interesting-useful-recurrent-neural-network\/","title":{"rendered":"The Simplest Interesting (And Useful) Recurrent Neural Network"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\" id=\"c5b3\">A recurrent neural network (RNN) processes an input sequence arriving as a stream. It maintains state, i.e. memory. This captures whatever it has seen in the input to this point that it deems relevant for predicting the output (see below).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"56db\">At each step, the RNN first derives a new state from the current state combined with the new input value. This becomes the new current state. It then outputs a value derived from its current state.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"351d\">Thus, an RNN may be viewed as a transformer of an input sequence to an output sequence, with the state capturing whatever features it thinks will help it produce the desired output sequence. Learning happens when the network\u2019s output does not match its target.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"0763\">RNNs have many uses. One notable one is in <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/how-to-build-a-machine-learning-model\/\" target=\"_blank\" rel=\"noreferrer noopener\">machine translation<\/a>. The input might be a sequence of words in English, the desired output might be its good translation into French, another sequence of words, albeit on a different lexicon (French).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"986c\">In this post, we focus our attention on the simplest RNN that is interesting and useful. \u2018Simplest\u2019, \u2018interesting\u2019, and \u2018useful\u2019 are not falsifiable so you be the judge.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"aa2d\">This <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/an-introduction-to-recurrent-neural-networks\/\" target=\"_blank\" rel=\"noreferrer noopener\">RNN<\/a> has one input neuron, one hidden neuron that is sigmoidal, and one output neuron that is also sigmoidal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"55bf\">This RNN evolves as follows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>h<\/em>(<em>t<\/em>) = <em>f<\/em>(<em>hh<\/em>*<em>h<\/em>(<em>t<\/em>-1) + <em>ih<\/em>*<em>x<\/em>(<em>t<\/em>))<br><em>y<\/em>(<em>t<\/em>) = <em>g<\/em>(<em>ho<\/em>*<em>h<\/em>(<em>t<\/em>))<\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"412b\">Here&nbsp;<em>f<\/em>&nbsp;and&nbsp;<em>g<\/em>&nbsp;are sigmoids. The quantities&nbsp;<em>ih<\/em>,&nbsp;<em>hh<\/em>, and&nbsp;<em>ho<\/em>&nbsp;are the input-to-hidden weight, the hidden-to-hidden weight, and the hidden-to-output weight respectively. These weights are what change when learning happens.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c6bc\">Can this thing do anything interesting? Let\u2019s find out.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"2b64\">Consider a binary sequence, i.e. a sequence of 0s and 1s, that runs forever. We\u2019d like to predict&nbsp;<em>x<\/em>(<em>t<\/em>+1) from&nbsp;<em>x<\/em>(1) through&nbsp;<em>x<\/em>(<em>t<\/em>). At time<em>&nbsp;t<\/em>, the target is&nbsp;<em>y<\/em>(<em>t<\/em>) equal to&nbsp;<em>x<\/em>(<em>t<\/em>+1).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3a91\">This problem has a deceptively simple formulation. It has its uses though. To give one a sense of this, imagine trying to predict whether tomorrow will be sunny (1) or cloudy (0) from the binary sequence of daily outcomes (sunny or cloudy) to this point. Think of doing this at a per-city level. Now imagine running this over all the cities on Earth continually (each will have its own RNN). If you created a web service out of it, you might get some visitors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"7cd8\">This imagined use case was put in front of you to get you thinking. No doubt there are many use cases of predicting the next value of a binary sequence.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"fd31\"><strong>Learning<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"2652\">Let\u2019s first write out the learning equations, as derived from first principles. These equations will provide the scaffolding upon which we will reveal qualitative insights as to what the various weights are learning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4437\">We are given the network\u2019s output at time&nbsp;<em>t<\/em>. Let\u2019s call it&nbsp;<em>y<\/em>^(<em>t<\/em>). The target output is&nbsp;<em>y<\/em>(<em>t<\/em>). We will define the error of the network as (\u00bd)(<em>y<\/em>(<em>t<\/em>)-<em>y<\/em>^(<em>t<\/em>))\u00b2. We could use some other error function. This one is familiar and it suffices for our purpose in this post.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"17eb\">The aim is to change the weights&nbsp;<em>ih<\/em>,&nbsp;<em>hh<\/em>, and&nbsp;<em>ho<\/em>&nbsp;in ways that reduce the error.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"5972\">We will use the principle of gradient descent in error space, made famous in the multilayer neural network setting as the back-propagation algorithm.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b179\">First, let\u2019s write out the negated gradient of the error with respect to the weight&nbsp;<em>ho<\/em>. (Negated because we want to reduce the error.)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d52d\">(<em>y<\/em>&#8211;<em>y<\/em>^)*<em>g<\/em>*(1-<em>g<\/em>)*<em>h<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"452b\">where&nbsp;<em>y<\/em>&nbsp;==&nbsp;<em>y<\/em>(<em>t<\/em>),&nbsp;<em>g<\/em>&nbsp;==&nbsp;<em>g<\/em>(<em>h<\/em>(<em>t<\/em>)), and&nbsp;<em>h<\/em>&nbsp;==&nbsp;<em>h<\/em>(<em>t<\/em>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"be84\">Our update rule for this weight will be<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"811a\">delta&nbsp;<em>ho<\/em>&nbsp;= eta*(<em>y<\/em>&#8211;<em>y<\/em>^)*<em>g<\/em>*(1-<em>g<\/em>)*<em>h<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"c40e\">Here eta is the learning rate which is a small positive value.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3b5f\">First, we note that&nbsp;<em>g<\/em>*(1-<em>g<\/em>) is always positive. From this, we see that<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"09be\">sign(delta&nbsp;<em>ho<\/em>) = sign(<em>y<\/em>&#8211;<em>y<\/em>^)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"229e\">Here sign(<em>a<\/em>) is 1 if&nbsp;<em>a<\/em>&nbsp;is positive, 0 if&nbsp;<em>a<\/em>&nbsp;is 0, and -1 if&nbsp;<em>a<\/em>&nbsp;is negative.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ac4c\">This means that the weight&nbsp;<em>ho<\/em>&nbsp;should increase (decrease) when the predicted output is less than (greater than) the target output.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"905c\"><em>In short, ho learns to chase y(t)<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"e7c7\">Next, consider the negated gradient of the error with respect to the weight&nbsp;<em>ih<\/em>. It is<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ccac\">(<em>y<\/em>&#8211;<em>y<\/em>^)*<em>g<\/em>*(1-<em>g<\/em>)*<em>ho<\/em>*<em>f<\/em>*(1-<em>f<\/em>)*<em>x<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"8f33\">where&nbsp;<em>f<\/em>&nbsp;==&nbsp;<em>f<\/em>(<em>hh<\/em>*<em>h<\/em>(<em>t<\/em>-1) +&nbsp;<em>ih<\/em>*<em>x<\/em>(<em>t<\/em>)) and&nbsp;<em>x<\/em>&nbsp;==&nbsp;<em>x<\/em>(<em>t<\/em>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"51fe\">So<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"6a97\">delta&nbsp;<em>ih<\/em>&nbsp;= eta*(<em>y<\/em>&#8211;<em>y<\/em>^)*<em>g<\/em>*(1-<em>g<\/em>)*<em>ho<\/em>*<em>f<\/em>*(1-<em>f<\/em>)*<em>x<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"7bb2\">Just like&nbsp;<em>g<\/em>*(1-<em>g<\/em>),&nbsp;<em>f<\/em>*(1-<em>f<\/em>) is always positive. Plus, when&nbsp;<em>x<\/em>(<em>t<\/em>) is 0, delta&nbsp;<em>ih<\/em>&nbsp;is also 0.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"04e7\">So, when&nbsp;<em>x<\/em>(<em>t<\/em>) is 1<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d306\">sign(delta&nbsp;<em>ih<\/em>) = sign((<em>y<\/em>&#8211;<em>y<\/em>^)*<em>ho<\/em>) = sign(<em>y<\/em>&#8211;<em>y<\/em>^)*sign(<em>ho<\/em>)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"9151\">Similarly, the negated gradient of the error with respect to the weight&nbsp;<em>hh<\/em>&nbsp;is<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"6762\">(<em>y<\/em>&#8211;<em>y<\/em>^)*<em>g<\/em>*(1-<em>g<\/em>)*<em>ho<\/em>*<em>f<\/em>*(1-<em>f<\/em>)*<em>h<\/em>(<em>t<\/em>-1)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d539\">from which noting that&nbsp;<em>g<\/em>*(1-<em>g<\/em>),&nbsp;<em>f<\/em>*(1-<em>f<\/em>), and&nbsp;<em>h<\/em>(<em>t<\/em>-1) are all positive we get<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ecb8\">sign(delta&nbsp;<em>hh<\/em>) = sign((<em>y<\/em>&#8211;<em>y<\/em>^)*sign(<em>ho<\/em>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"ec38\"><strong>How the weights evolve<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"551b\">Let\u2019s start by tracking how&nbsp;<em>ho<\/em>&nbsp;changes over time on particular input sequences that are illuminating.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1033\">All the discussion below is grounded in empirical analysis. The python code for this is included at the end of this post. The experimental conditions are also described there in case someone wishes to repeat the experiment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"c462\"><strong>On a long streak of the same value<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"23c8\">First, let\u2019s see what happens on the input sequence 1\u00b9\u2070. (Ten straight 1s.) The weight&nbsp;<em>ho<\/em>&nbsp;increases monotonically, settling at 1.98. The monotonic increase makes sense. As ho sees more and more 1s, the vigor with which it chases a 1 increases.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"dc23\">The weight hh also increases monotonically, settling at 0.43. This increase also makes sense. As the observed streak gets longer,&nbsp;<em>hh<\/em>\u2019s confidence that the streak will continue increases. Why 0.43 here vs 1.98 for&nbsp;<em>ho<\/em>? Because the error is propagated back to&nbsp;<em>hh<\/em>&nbsp;through two sigmoids, making it less than the error that&nbsp;<em>ho<\/em>&nbsp;sees.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"11b8\">In the previous paragraph, the expression \u201cobserved streak gets longer\u201d holds for streaks of 0 as well. Let\u2019s elaborate on this by considering the input sequence 0\u00b9\u2070.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"edaf\">The weight&nbsp;<em>ho<\/em>&nbsp;decreases monotonically, settling at -1.98. This makes sense. As&nbsp;<em>ho<\/em>&nbsp;sees more and more 0s, the vigor with which it chases a 0 increases.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1871\">The weight&nbsp;<em>hh<\/em>&nbsp;on the other hand increases monotonically, settling at 0.5 as it did before. To understand this better, consider the values of the weights after training on the first few 0s. After training on the first 0,&nbsp;<em>ho<\/em>&nbsp;is negative. After training on the second 0,&nbsp;<em>hh<\/em>&nbsp;has increased. This is because&nbsp;<em>ho<\/em>&nbsp;is negative. As is&nbsp;<em>y<\/em>&#8211;<em>y<\/em>^ since&nbsp;<em>y<\/em>&nbsp;is 0. So their product is positive.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"7dda\">Why does&nbsp;<em>hh<\/em>&nbsp;settle at 0.5 on 0\u00b9\u2070 whereas it settled at 0.43 on 1\u00b9\u2070? This is an artifact of our choices. The weight&nbsp;<em>ih<\/em>&nbsp;does not learn at all while processing 0\u00b9\u2070 since all the inputs are 0. So&nbsp;<em>hh<\/em>&nbsp;learns to compensate a bit.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"daed\"><em>hh is learning that long streaks predict that the streak will continue.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0a16\"><strong>On a long streak of 1s followed by a long streak of 0s<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"832b\">Next, let\u2019s see how the weights evolve while training on the input 1\u00b9\u2070 0\u00b9\u2070. Below we will use + to denote that the weight increases and \u2014 to denote that it decreases. A zero-crossing will be shown by a comma.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"430c\"><em>ho<\/em>\u2019s sequence is +\u00b9\u2070 -\u2075 , -\u2075. This is easy to explain. As the first streak (comprising 1s) unfolds,&nbsp;<em>ho<\/em>&nbsp;increases monotonically. As the second streak (comprising 0s) unfolds,&nbsp;<em>ho<\/em>&nbsp;decreases monotonically. In the middle of the second streak,&nbsp;<em>ho<\/em>&nbsp;crosses 0 from above.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3b90\"><em>hh<\/em>\u2019s sequence is more interesting. It is +\u00b9\u2070 -\u2075 , +\u2075. The explanation goes like this. When the first few 0s are seen in the second streak,&nbsp;<em>ho<\/em>&nbsp;is still positive (although it has started decreasing). Since&nbsp;<em>y<\/em>&#8211;<em>y<\/em>^ is negative,&nbsp;<em>hh<\/em>&nbsp;must decrease. After the 5th 0 in the second streak,&nbsp;<em>ho<\/em>&nbsp;becomes positive. From then on&nbsp;<em>hh<\/em>&nbsp;continues to increase.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"f2cd\">How does this play out in the predictions?&nbsp;<em>ho<\/em>&nbsp;is still positive after training on the 4th 0 in the second streak. So it still predicts that the next value will be 1, albeit with less confidence than before. Fortunately,&nbsp;<em>hh<\/em>&nbsp;is also still positive (albeit decreasing) after the 4th 0, so it predicts 0 (mildly) since that extends the current streak (of 0s). This divergence tempers&nbsp;<em>ho<\/em>\u2019s mild enthusiasm for 1 further.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"3506\">After the 5th 0 is seen, both&nbsp;<em>hh<\/em>&nbsp;and&nbsp;<em>ho<\/em>&nbsp;are on the same page.&nbsp;<em>ho<\/em>&nbsp;has switched to chasing 0s.&nbsp;<em>hh<\/em>&nbsp;helps it along by reinforcing this prediction since it extends the streak of 0s. This harmony drives the prediction towards 0 even faster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4ffe\"><strong>On alternating 1s and 0s<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"46e4\">Let\u2019s see what happens on the sequence (10)\u00b9\u2070. After training on this sequence completes, the&nbsp;<em>hh<\/em>,&nbsp;<em>ih<\/em>, and&nbsp;<em>ho<\/em>&nbsp;weights are 0.25, 0.51, and -0.43 respectively. Hmm.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b758\">Let\u2019s step back and check what the predicted&nbsp;<em>y<\/em>^ is after each training step. It\u2019s in the range from 0.44 to 0.5. This suggests that the training resulted in the network becoming conservative. The training apparently was unable to capture the alternating pattern.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"51d8\">Intuition suggests we should be able to rig together an RNN of the same structure to generate an alternating binary sequence. Here is one. Let\u2019s replace the sigmoids by&nbsp;<em>tanh<\/em>&nbsp;functions. Next, we will set&nbsp;<em>ih<\/em>&nbsp;to -1,&nbsp;<em>hh<\/em>&nbsp;to 0, and&nbsp;<em>ho<\/em>&nbsp;to 1. Next, we will set the gain of the&nbsp;<em>tanh<\/em>&nbsp;g to 10. We will then initialize&nbsp;<em>x<\/em>(0) to 1 and run the network forward. It produces an alternating sequence of values close to 1 and close to -1.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"b1fa\">The point of this exercise was just to demonstrate that within this structure we can produce this behavior. We won\u2019t cover whether or not we can learn this behavior.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9cf4\"><strong>Python code<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"1ecd\">The learning rate eta was set to 5.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ae03\">The function&nbsp;<em>run<\/em>&nbsp;runs the network forward for&nbsp;<em>n<\/em>&nbsp;steps for a given initial value of x, and for specified weights&nbsp;<em>ih<\/em>,&nbsp;<em>hh<\/em>, and&nbsp;<em>ho<\/em>. This implementation is custom to generating the alternating sequence. It uses the&nbsp;<em>tanh<\/em>&nbsp;function, plus a large slope (10) for&nbsp;<em>g<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"e918\">The function&nbsp;<em>rnn<\/em>&nbsp;trains the network on an input sequence X.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import math<br>def <strong>derivative<\/strong>(f,x):<br>  return f(x)*(1-f(x))def <strong>run<\/strong>(X0, ih, hh, ho, n = 10):<br>  f = lambda x: 2*(1.0\/(1.0+math.exp(-x)))-1<br>  g = f<br>  h = 0.5<br>  X = X0<br>  for t in range(n):<br>    o = hh*h + ih*X<br>    h = f(o)<br>    y = g(10*ho*h)<br>    print(y)<br>    X = ydef <strong>rnn<\/strong>(X, eta = 1):<br>  f = lambda x: 1.0\/(1.0+math.exp(-x))<br>  g = f<br>  h = 0.0<br>  a, b, c = 0.0,0.0,0.0<br>  res = [[\u2018y(t)\u2019,\u2019a\u2019,\u2019b\u2019,\u2019c\u2019,\u2019y^(t+1)\u2019,\u2019y^(t+1) c only\u2019]]<br>  for t in range(len(X)-1):<br>     htminus1 = h<br>     o = a*h + b*X[t]<br>     h = f(o)<br>     y = g(c*h)<br>     err = X[t+1] \u2014 y<br>     c += eta*err*derivative(g,c*h)*h<br>     b += eta*err*derivative(g,c*h)*c*derivative(f,o)*X[t]<br>     a += eta*err*derivative(g,c*h)*c*derivative(f,o)*htminus1<br>     yhattplus1 = g(c*f(a*h + b*X[t+1]))<br>     yhattplus1conly = g(c*X[t+1])<br>     res.append([X[t+1],a,b,c,yhattplus1,yhattplus1conly])<br>  <br>  return res<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"0f6c\"><strong>Summary<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d7d6\">In this post, we described one of the simplest recurrent neural networks that exhibit interesting behaviors. We met some minimum requirements. It has a hidden layer whose neuron computes a nonlinear activation function.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"ffde\">On the task of learning to predict the next value in a binary sequence, the output weight learns to chase the output. The hidden neuron behaves as it is tracking the length of the current streak. The hidden neuron\u2019s weight predicts that the longer the streak, the more likely it is to continue. This has the effect of tempering the output weight\u2019s enthusiasm for chasing the output when a streak is interrupted by sporadic noise.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"d57e\">For readers who like looking at code, and maybe even running it, we have included the Python code.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"b0ec\"><strong>Further Reading<\/strong><\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Recurrent_neural_network\" rel=\"noopener\">Recurrent neural network<\/a><\/li><\/ol>\n","protected":false},"excerpt":{"rendered":"<p>This post describes one of the simplest recurrent neural networks that exhibit interesting behaviors.<\/p>\n","protected":false},"author":1044,"featured_media":19318,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[183],"tags":[97,92,1561,1411],"ppma_author":[3691],"class_list":["post-22787","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence","tag-machine-learning","tag-recurrent-neural-network","tag-sequence-learning"],"authors":[{"term_id":3691,"user_id":1044,"is_guest":0,"slug":"arun-jagota","display_name":"Arun Jagota","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Arun-Jagota-150x150.jpeg","author_category":"","user_url":"https:\/\/www.salesforce.com\/in\/?ir=1","last_name":"Jagota","first_name":"Arun","job_title":"","description":"Arun Jagota is Director of Data Science at Salesforce.com. A PhD in computer science, he has taught undergraduate, graduate, and continuing education courses in Computer Science at many US Universities from 1992 through 2006. He has written a number of books, most available at Amazon.com, 50 academic publications and has 17+ patents issued."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22787","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1044"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22787"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22787\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/19318"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22787"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22787"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22787"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22787"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}