{"id":1860,"date":"2019-08-02T09:33:29","date_gmt":"2019-08-02T09:33:29","guid":{"rendered":"http:\/\/kusuaks7\/?p=1465"},"modified":"2024-07-23T07:23:17","modified_gmt":"2024-07-23T07:23:17","slug":"activation-functions-within-neural-networks","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/activation-functions-within-neural-networks\/","title":{"rendered":"Activation Functions within Neural Networks"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"1860\" class=\"elementor elementor-1860\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-3d7efbdb elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"3d7efbdb\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-620c835b\" data-id=\"620c835b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6ad83bdd elementor-widget elementor-widget-text-editor\" data-id=\"6ad83bdd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong>In this post you will learn the most common Activation Functions within Deep Learning and when you should use them. You will also discover why you mostly need to use non-linear activation functions.<\/strong>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dc1d9af elementor-widget elementor-widget-text-editor\" data-id=\"dc1d9af\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIt is important to know which activation functions to use within your neural network. Be aware of the fact that you can use different activation functions at different layers. In my previous posts I only used the sigmoid function but often other functions can work much better.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-aa1830d elementor-widget elementor-widget-heading\" data-id=\"aa1830d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h3><b>tanh\u00a0<\/b><\/h3><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7bff480 elementor-widget elementor-widget-text-editor\" data-id=\"7bff480\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tA activation function that nearly almost works better than the sigmoid function is the tanh activation function.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3f78bb7 elementor-widget elementor-widget-text-editor\" data-id=\"3f78bb7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe tanh function is actually mathematically a shifted version of the sigmoid function. The sigmoid function only maps values between 0 and 1 but the tanh function maps them between -1 and 1.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-bfa9dbf elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"bfa9dbf\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6fe0a2e\" data-id=\"6fe0a2e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-873889e elementor-widget elementor-widget-text-editor\" data-id=\"873889e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tUsing it within the units of a neural network almost always works a lot better than using the sigmoid function.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5843e1d elementor-widget elementor-widget-text-editor\" data-id=\"5843e1d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBecause of the values between -1 and +1 the mean of the activations that come out of the hidden layer are close to having a zero mean, which makes learning for the next layer a little bit easier.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-767c2b5 elementor-widget elementor-widget-text-editor\" data-id=\"767c2b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe only exception for using the sigmoid function is using it at the output layer at binary classification problems while using the Relu function at the hidden layers. Because when you want to predict either 0 or 1 it makes sense that y-hat should be between 0 and 1 and not between -1 and +1.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e4b2279 elementor-widget elementor-widget-heading\" data-id=\"e4b2279\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><b>rectified linear unit (relu)<\/b><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6e3371f elementor-widget elementor-widget-text-editor\" data-id=\"6e3371f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnother very popular activation function within machine learning is the Rectified Linear Unit function which is also just called relu.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cd633e8 elementor-widget elementor-widget-text-editor\" data-id=\"cd633e8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe derivative is 1 as long as z (a point at the x-axes) is positive and the derivative is 0 when z is negative.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6fe70b5 elementor-widget elementor-widget-text-editor\" data-id=\"6fe70b5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIf your not sure which function to use for your hidden layer then the rely function is a good choice but be aware of the fact that there are no perfect guidelines about which function to use because your data and your problems will always be very unique. Choosing the right one is more of an art than a science. Consequently you should try things out if your not very sure.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-086ba84 elementor-widget elementor-widget-heading\" data-id=\"086ba84\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><b>leaky rectified linear unit<\/b><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49bdee5 elementor-widget elementor-widget-text-editor\" data-id=\"49bdee5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe leaky relu function is a slightly changed version of the relu function. Instead of the slope being zero when z is equal to 1 the function has a slight slope.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bcaaaff elementor-widget elementor-widget-text-editor\" data-id=\"bcaaaff\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis works a bit better most of the time but isn\u2019t used that much in practice.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-050bd9e elementor-widget elementor-widget-text-editor\" data-id=\"050bd9e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAn advantage of both functions is that for a lot of the space of z the slope of the activation function is very different to zero which let\u2019s you neural network work much faster.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-da2ac8c elementor-widget elementor-widget-heading\" data-id=\"da2ac8c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><h3><b>Why do you need non-linear activation functions?<\/b><\/h3><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-47a9473 elementor-widget elementor-widget-text-editor\" data-id=\"47a9473\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIf we use a linear activation function at the hidden layers our neural networks just outputs a linear function of the input. That will happen no matter how many layers a neural network has. This then makes a neural network no more better than logistic regression.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-623f3bf elementor-widget elementor-widget-text-editor\" data-id=\"623f3bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe key takeaway for you should be that linear activation functions within hidden layers are more or less useless except some very special cases.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9a1e0d7 elementor-widget elementor-widget-text-editor\" data-id=\"9a1e0d7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne case where you could use it, is if you are working at a regression problem where y is a real number, like predicting the prices of houses. But only at the output layer, the hidden layers should use non-linear functions. You can see an example at the picture below.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-60527f3 elementor-widget elementor-widget-text-editor\" data-id=\"60527f3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNevertheless even then you could use a relu instead of a linear function at the output layer with the same result. This is one of the reasons why a sigmoid function is rarely used nowadays.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>It is important to know which activation functions to use within your neural network. Be aware of the fact that you can use different activation functions at different layers. Most often the sigmoid function is used but often other functions can work much better. In this post you will learn the most common Activation Functions within Deep Learning and when you should use them. You will also discover why you mostly need to use non-linear activation functions.<\/p>\n","protected":false},"author":413,"featured_media":21951,"comment_status":"open","ping_status":"open","sticky":false,"template":"single-post-2.php","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[92],"ppma_author":[2327],"class_list":["post-1860","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-machine-learning"],"authors":[{"term_id":2327,"user_id":413,"is_guest":0,"slug":"niklas-donges","display_name":"Niklas Donges","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Donges","first_name":"Niklas","job_title":"","description":"<a href=\"https:\/\/www.linkedin.com\/in\/niklas-donges\/\">Niklas Donges<\/a>&nbsp;is Machine Learning Engineer at SAP. He is a Technical Blogger for the &#039;Towards Data Science&#039; publication"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1860","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/413"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1860"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1860\/revisions"}],"predecessor-version":[{"id":36929,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1860\/revisions\/36929"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/21951"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1860"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1860"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1860"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}