{"id":1394,"date":"2019-02-15T10:32:07","date_gmt":"2019-02-15T10:32:07","guid":{"rendered":"http:\/\/kusuaks7\/?p=999"},"modified":"2023-08-07T12:36:47","modified_gmt":"2023-08-07T12:36:47","slug":"coding-deep-learning-for-beginners-linear-regression-part-2-cost-function","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/coding-deep-learning-for-beginners-linear-regression-part-2-cost-function\/","title":{"rendered":"Coding Deep Learning for Beginners\u200a\u2014\u200aLinear Regression (Part 2): Cost Function"},"content":{"rendered":"<p><strong><em>Ready to learn Machine Learning? Browse<\/em><\/strong> <strong><em><a href=\"https:\/\/www.experfy.com\/training\/tracks\/machine-learning-training-certification\">Machine Learning Training and Certification courses<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<blockquote><p>This is the 4th article of series \u201cCoding Deep Learning for Beginners\u201d. Here, you will find\u00a0links to the\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/coding-deep-learning-for-beginners-start\">1st article<\/a>, the\u00a0<a href=\"https:\/\/www.experfy.com\/blog\/coding-deep-learning-for-beginners-types-of-machine-learning\">2nd\u00a0article<\/a>, and the <a href=\"https:\/\/www.experfy.com\/blog\/coding-deep-learning-for-beginners-linear-regression-part-1-initialization-and-prediction\">3rd article<\/a>.<\/p><\/blockquote>\n<section>\n<h3 id=\"729f\"><strong>Recap<\/strong><\/h3>\n<p id=\"a926\">The last article has introduced the problem which will be solved after Linear Regression implementation is finished. The goal is to\u00a0<strong>predict the prices of Cracow apartments<\/strong>. The dataset consists of samples described by three features:\u00a0<strong>distance_to_city_center<\/strong>,\u00a0<strong>room<\/strong>, and\u00a0<strong>size<\/strong>. To simplify visualizations and make learning more efficient \u2014 only size feature will be used.<\/p>\n<p id=\"0757\">Additionally, the mathematical formula behind Linear Regression model was presented and explained. For the equation to be complete, it\u2019s parameters needs to have assigned values. Then, the formula is ready to return a numerical prediction for any given input sample.<\/p>\n<p id=\"d325\">The two steps described here are called\u00a0<strong>Initialization<\/strong>\u00a0and\u00a0<strong>Prediction<\/strong>. Both were\u00a0<strong>turned into separate Python functions and used to create a Linear Regression model<\/strong>\u00a0with all parameters initialized to zeros and used to predict prices for apartments based on size parameter.<\/p>\n<figure id=\"ecdc\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*mjAJg9YZGVh3b7-sorXHsg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*mjAJg9YZGVh3b7-sorXHsg.png\" \/><\/figure>\n<p style=\"text-align: center;\">Code used to prepare the graph is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/942fa9aaaa95be04f75a316a5824343c\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/942fa9aaaa95be04f75a316a5824343c\" data->link<\/a>.<\/p>\n<h3 id=\"0ea4\"><strong>Next problem to\u00a0solve<\/strong><\/h3>\n<p id=\"c8b1\">Model with current parameters will return a zero for every value of area parameter because of all weights of the model and bias equal to zeros. Now let\u2019s modify the parameters and see how the projection of the model changes.<\/p>\n<figure id=\"aec6\" data-scroll=\"native\"><canvas width=\"75\" height=\"22\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 219px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*1Ba62SxY84fIxF6U-FiZ-g.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*1Ba62SxY84fIxF6U-FiZ-g.png\" \/><\/figure>\n<p id=\"d673\" style=\"text-align: center;\">Code used to prepare these graphs is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/73eb3ff38fccc865b730ac74cb692b5a\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/73eb3ff38fccc865b730ac74cb692b5a\" data->link<\/a>.<\/p>\n<p>There are two sets of parameters that cause a Linear Regression model to return different apartment price for each value of size feature. Because data has the linear pattern, the model could become accurate approximator of the price after proper calibration of the parameters.<\/p>\n<h4 id=\"5ace\"><strong>Question to\u00a0answer<\/strong><\/h4>\n<p id=\"f6c1\">For which set of parameters, the model returns better results?<\/p>\n<ul>\n<li id=\"7f30\">Orange:\u00a0<code>w = 3<\/code>\u00a0,\u00a0<code>b = 200<\/code><\/li>\n<li id=\"bd98\">Lime:\u00a0<code>w = 12<\/code>\u00a0,\u00a0<code>b = -160<\/code><\/li>\n<\/ul>\n<p id=\"09ce\">Even though it might be possible to guess the answer by visual judgment correctly,\u00a0<strong>the computer doesn\u2019t imagine \u2014 it compares the values<\/strong>. This is where cost Cost Function comes to help.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"5861\"><strong>Cost<\/strong><strong> Function<\/strong><\/h3>\n<p id=\"b6d8\">It is a function that\u00a0<strong>measures the performance of a Machine Learning model<\/strong>\u00a0for given data. Cost Function quantifies the error between predicted values and expected values and\u00a0<strong>presents it in the form of a single real number<\/strong>. Depending on the problem Cost Function can be formed in many different ways. The purpose of Cost Function is to be either:<\/p>\n<ul>\n<li id=\"1da1\"><strong>Minimized\u00a0<\/strong>&#8211; then returned value is usually called\u00a0<strong>cost<\/strong>,\u00a0<strong>loss<\/strong>\u00a0or\u00a0<strong>error<\/strong>. The goal is to find the values of model parameters for which Cost Function return as small number as possible.<\/li>\n<li id=\"2520\"><strong>Maximized\u00a0<\/strong>&#8211; then the value it yields is named a\u00a0<strong>reward<\/strong>. The goal is to find values of model parameters for which returned number is as large as possible.<\/li>\n<\/ul>\n<p id=\"f63f\"><strong>For algorithms relying on Gradient Descent to optimize model parameters, every function has to be differentiable.<\/strong><\/p>\n<h3 id=\"ceb6\"><strong>Tailoring Cost\u00a0Function<\/strong><\/h3>\n<p id=\"cb20\">Given a model using the following formula:<\/p>\n<figure id=\"dfcc\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*yR6IbZa9gCfiQG58E7nA1Q.png\" data-height=\"20\" data-image-id=\"1*yR6IbZa9gCfiQG58E7nA1Q.png\" data-width=\"72\" \/><\/figure>\n<p id=\"130f\">where:<\/p>\n<ul>\n<li id=\"9df1\">\u0177 &#8211; predicted value,<\/li>\n<li id=\"1acd\">x &#8211; vector of data used for prediction or training,<\/li>\n<li id=\"6aba\">w &#8211; weight.<\/li>\n<\/ul>\n<p id=\"e2c9\">Notice that\u00a0<em>bias parameter is omitted on purpose<\/em>. Let\u2019s try to find the value of weight parameter, so for the following data samples:<\/p>\n<figure id=\"a565\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*1ul5Qj5VdvzjGnEd7s3VAg.png\" data-height=\"20\" data-image-id=\"1*1ul5Qj5VdvzjGnEd7s3VAg.png\" data-width=\"307\" \/><\/figure>\n<p id=\"8f3c\">the outputs of the model are as close as possible to:<\/p>\n<figure id=\"91d8\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*tpV8a-jbnKVKIISx-hzI6w.png\" data-height=\"18\" data-image-id=\"1*tpV8a-jbnKVKIISx-hzI6w.png\" data-width=\"257\" \/><\/figure>\n<p id=\"b09f\">Now it\u2019s time to assign a random value to weight parameter and visualize the results of the model. Let\u2019s pick\u00a0<code>w = 5.0<\/code>\u00a0for now.<\/p>\n<figure id=\"0ac4\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*aYjrye1-wgLiH2hNYz87yg.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*aYjrye1-wgLiH2hNYz87yg.png\" \/><\/figure>\n<p id=\"f98d\" style=\"text-align: center;\">Code used to prepare the graph is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/86f400f6d88facbf5375286db7029ca2\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/86f400f6d88facbf5375286db7029ca2\" data->link<\/a>.<\/p>\n<p>It can be observed that model predictions are different than expected values. How can it be expressed mathematically? The most straightforward idea is to subtract both values from each other and see if the result of that operation equals zero. Any other result means that the values differ. The size of the\u00a0<strong>received number provides information about how significant the error is<\/strong>. From the geometrical perspective, it is possible to state that\u00a0<strong>error is the distance between two points in the coordinate system<\/strong>. Let\u2019s define the distance as:<\/p>\n<figure id=\"5928\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*H0g8syS5kQUGHkGIfxizng.png\" data-height=\"20\" data-image-id=\"1*H0g8syS5kQUGHkGIfxizng.png\" data-width=\"166\" \/><\/figure>\n<p id=\"da35\">According to the formula, calculate the errors between the predictions and expected values:<\/p>\n<figure id=\"d11a\"><canvas width=\"75\" height=\"72\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*Z4NmpyEbxKKWkEPvlDdJpQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*Z4NmpyEbxKKWkEPvlDdJpQ.png\" \/><\/figure>\n<p id=\"e610\">As it was stated before, Cost Function is a single number describing model performance. Therefore let\u2019s sum up the errors.<\/p>\n<figure id=\"9f8c\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*-dY6DuvUKFfhjrPaHwQrqw.png\" data-height=\"73\" data-image-id=\"1*-dY6DuvUKFfhjrPaHwQrqw.png\" data-width=\"281\" \/><\/figure>\n<p id=\"6a77\">However, now imagine there are a million points instead of four. The accumulated errors would become a bigger number for model making a prediction on larger dataset than on a smaller dataset. Consequently, those models could not be compared. That\u2019s why it has to be scaled in some way. The right idea is to\u00a0<strong>divide the accumulated errors by the number of points<\/strong>. Cost stated like that is mean of errors that model has made for given dataset.<\/p>\n<figure id=\"6d3e\"><canvas width=\"75\" height=\"22\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*pmPfnUGVYOMranuuOS4faw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*pmPfnUGVYOMranuuOS4faw.png\" \/><\/figure>\n<p id=\"38cc\">Unfortunately, the formula is unfinished yet. Before that,\u00a0<strong>all cases have to be considered<\/strong>\u00a0so let\u2019s try picking smaller weight now and see if the created Cost Function works. Now, weight is about to be set to\u00a0<code>w = 0.5<\/code>\u00a0.<\/p>\n<figure id=\"43dd\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*CIa-faf3AmTwjIby1FQNsQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*CIa-faf3AmTwjIby1FQNsQ.png\" \/><figcaption>\u00a0<\/figcaption><\/figure>\n<p style=\"text-align: center;\">Code used to prepare the graph is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/15eb3f36444fb3dd4ed64c21ab300bfc\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/15eb3f36444fb3dd4ed64c21ab300bfc\" data->link<\/a>.<\/p>\n<p id=\"8734\">The predictions are off again. However, what\u2019s different in comparison to the previous case is that predicted points are below expected points. Numerically predictions are smaller. The cost formula is going to malfunction because calculated distances have negative values.<\/p>\n<figure id=\"790b\"><canvas width=\"75\" height=\"55\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*W32nyKK_9SOUrwu0coociA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*W32nyKK_9SOUrwu0coociA.png\" \/><\/figure>\n<p id=\"2e31\">The cost value is also negative:<\/p>\n<figure id=\"b1d4\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*flNlLYOhp_gBWyurSQuUnQ.png\" data-height=\"47\" data-image-id=\"1*flNlLYOhp_gBWyurSQuUnQ.png\" data-width=\"570\" \/><\/figure>\n<p id=\"f49c\">It is incorrect to say that distance can have negative value. It is possible to attach a more substantial penalty to the predictions that are located above or below the expected results (some cost functions do so, e.g. RMSE), but the\u00a0<strong>value shouldn\u2019t be negative as it will cancel out positive errors<\/strong>. Then it is going to become impossible to properly minimize or maximize the Cost Function.<\/p>\n<p id=\"f12a\">So how about fixing the problem by using the absolute value of the distance? After stating the distance as:<\/p>\n<figure id=\"8a79\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*rDI80s9exPrEFN489rgiPA.png\" data-height=\"23\" data-image-id=\"1*rDI80s9exPrEFN489rgiPA.png\" data-width=\"177\" \/><\/figure>\n<p id=\"a4df\">The costs for each value of weights are:<\/p>\n<figure id=\"6ea5\"><canvas width=\"75\" height=\"13\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*otJXYZrRbGn6-SsAu6L7eA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*otJXYZrRbGn6-SsAu6L7eA.png\" \/><\/figure>\n<p id=\"926f\">Now the costs for both weights\u00a0<code>w = 5.0<\/code>\u00a0and\u00a0<code>w = 0.5<\/code>\u00a0are correctly calculated. It is possible to compare the parameters. The model achieves better results for\u00a0<code>w = 0.5<\/code>\u00a0as the cost value is smaller.<\/p>\n<p id=\"a0a5\">The function that was created is called\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_error\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/Mean_absolute_error\" data-><strong>Mean Absolute Error<\/strong><\/a><strong>.<\/strong><\/p>\n<h3 id=\"2b66\"><strong>Mean Absolute\u00a0Error<\/strong><\/h3>\n<p id=\"b8d5\">Regression metric which measures the<strong>\u00a0average magnitude of errors in a group of predictions<\/strong>, without considering their directions. In other words, it\u2019s a<strong>\u00a0mean of absolute differences among predictions and expected results where all individual deviations have even importance<\/strong>.<\/p>\n<figure id=\"912b\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*7IaDJ6omDsdV4yxk2zYXRA.png\" data-height=\"28\" data-image-id=\"1*7IaDJ6omDsdV4yxk2zYXRA.png\" data-width=\"250\" \/><\/figure>\n<p id=\"906b\">where:<\/p>\n<ul>\n<li id=\"953d\">i &#8211; index of sample,<\/li>\n<li id=\"7863\">\u0177 &#8211; predicted value,<\/li>\n<li id=\"0e97\">y &#8211; expected value,<\/li>\n<li id=\"4048\">m &#8211; number of samples in dataset.<\/li>\n<\/ul>\n<p id=\"8650\">Sometimes it is possible to see the form of formula with swapped predicted value and expected value, but it works the same.<\/p>\n<p id=\"6d78\">Let\u2019s turn math into the code:<\/p>\n<p id=\"b01a\">The function takes as an input two arrays of the same size:\u00a0<code>predictions<\/code>\u00a0and\u00a0<code>targets<\/code>. The parameter\u00a0<code>m<\/code>\u00a0of the formula, which is the number of samples, equals to the length of sent arrays. Thanks to the fact that arrays have the same length it is possible to iterate over both of them at the same time. The absolute value of the difference between each\u00a0<code>prediction<\/code>\u00a0and\u00a0<code>target<\/code>\u00a0is calculated and added to\u00a0<code>accumulated_error<\/code>\u00a0variable. After gathering errors from all pairs, the accumulated result is averaged by the parameter\u00a0<code>m<\/code>\u00a0which returns MAE error for given data.<\/p>\n<h3 id=\"437c\"><strong>Mean Squared\u00a0Error<\/strong><\/h3>\n<p id=\"1f6e\">One of the most commonly used and firstly explained\u00a0<strong>regression metrics<\/strong>.\u00a0<strong>Average squared difference between the predictions and expected results.<\/strong>\u00a0In other words, an alteration of MAE where instead of taking the absolute value of differences, they are squared.<\/p>\n<p id=\"d5c3\">In MAE, the partial error values were equal to the distances between points in the coordinate system.\u00a0<strong>Regarding MSE, each partial error is equivalent to the area of the square created out of the geometrical distance between the measured points.<\/strong>\u00a0All region areas are summed up and averaged.<\/p>\n<figure id=\"8a8c\"><canvas width=\"75\" height=\"75\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*iuffLsSzJK7-ZBa8K95p9A.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*iuffLsSzJK7-ZBa8K95p9A.png\" \/><\/figure>\n<p style=\"text-align: center;\">Code used to prepare the graph is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/fcd05b0eb3a3d12a680f03c68c5fdb40\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/fcd05b0eb3a3d12a680f03c68c5fdb40\" data->link<\/a>.<\/p>\n<p id=\"c7eb\">The MSE formula can be written like this:<\/p>\n<figure id=\"ab13\"><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*tbQtS4iwkbcTZy6SJSdOgA.png\" data-height=\"28\" data-image-id=\"1*tbQtS4iwkbcTZy6SJSdOgA.png\" data-width=\"272\" \/><\/figure>\n<ul>\n<li id=\"b17f\">i &#8211; index of sample,<\/li>\n<li id=\"2662\">\u0177 &#8211; predicted value,<\/li>\n<li id=\"aff8\">y &#8211; expected value,<\/li>\n<li id=\"eba7\">m &#8211; number of samples in dataset.<\/li>\n<\/ul>\n<p id=\"2ff2\">There are different forms of MSE formula, where there is no division by two in the denominator. Its presence makes MSE derivation calculus cleaner.<\/p>\n<p id=\"0a7e\">Calculating derivative of equations using absolute value is problematic. MSE uses exponentiation instead and consequently has good mathematical properties which make the computation of it\u2019s derivative easier in comparison to MAE. It is relevant when using a model that relies on the Gradient Descent algorithm.<\/p>\n<p id=\"d1c0\">MSE can be written in Python as follows:<\/p>\n<p id=\"bd60\">The only distinctions from, introduced in the previous paragraph,\u00a0<code>mae(predictions, targets)<\/code>\u00a0function are:<\/p>\n<ul>\n<li id=\"849d\">difference between\u00a0<code>prediction<\/code>\u00a0and\u00a0<code>target<\/code>\u00a0is squared,<\/li>\n<li id=\"5072\"><code>2<\/code>\u00a0in the averaging denominator.<\/li>\n<\/ul>\n<h3 id=\"7e90\"><strong>Differences between MAE and\u00a0MSE<\/strong><\/h3>\n<p id=\"0a40\">There is much more regression metrics that can be used as Cost Function for measuring the performance of models that try to solve regression problems (estimating the value). MAE and MSE seem to be relatively simple and very popular.<\/p>\n<h4 id=\"4342\"><strong>Why there are so many\u00a0metrics?<\/strong><\/h4>\n<p id=\"5366\">Each metric treats the differences between observations and expected results in a unique way.<strong>\u00a0<\/strong>The distance between ideal result and predictions are having attached a penalty by metric, based on the magnitude and direction in the coordinate system. For example, a different metric such as RMSE more aggressively penalizes predictions which values are lower than expected than those which are higher. Its usage might lead to the creation of a model which returns inflated estimations.<\/p>\n<p id=\"1cbc\">How MAE and MSE are treating the differences between the points? To check it, let\u2019s calculate the cost for different weight values:<\/p>\n<figure id=\"f59a\" data-scroll=\"native\"><canvas width=\"75\" height=\"8\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*EbV2VrPiKSG2NVw2cF7xAQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*EbV2VrPiKSG2NVw2cF7xAQ.png\" \/><\/figure>\n<p style=\"text-align: center;\">Table presents the errors of many models created with different weight parameter. Cost of each model was calculated with both MAE and MSE\u00a0metrics.<\/p>\n<p id=\"18e9\">And display it on the graphs:<\/p>\n<figure id=\"53e7\" data-scroll=\"native\"><canvas width=\"75\" height=\"22\"><\/canvas><img decoding=\"async\" style=\"width: 700px; height: 219px;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*_CqqBEbpdnb0MIqS3kMTEQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/900\/1*_CqqBEbpdnb0MIqS3kMTEQ.png\" \/><\/figure>\n<p style=\"text-align: center;\">The graphs show how metric value change for different values of parameter w. Code used to prepare these graphs is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/ca707f8af758917dd38bc978aab37169\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/ca707f8af758917dd38bc978aab37169\" data->link<\/a>.<\/p>\n<p id=\"ab0d\">It is possible to observe that:<\/p>\n<ul>\n<li id=\"4809\">MAE doesn\u2019t add any additional weight to the distance between points \u2014 the\u00a0<strong>error growth is linear<\/strong>.<\/li>\n<li id=\"6b17\">MSE\u00a0<strong>errors grow exponentially with larger values of distance<\/strong>. It\u2019s a metric that\u00a0<strong>adds a massive penalty to points which are far away and a minimal penalty for points which are close<\/strong>\u00a0to the expected result. Error curve has a parabolic shape.<\/li>\n<\/ul>\n<p id=\"e55a\">Additionally, by checking various weight values, it was possible to find the parameter for error is equal to zero. If the\u00a0<code>w = 2.0<\/code>\u00a0is used to build the model, then the predictions look as following:<\/p>\n<figure id=\"196c\"><canvas width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*gu9a_UBmG2-mW5qCXWAsZw.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/720\/1*gu9a_UBmG2-mW5qCXWAsZw.png\" \/><\/figure>\n<p style=\"text-align: center;\">Code used to prepare the graph is available under this\u00a0<a href=\"https:\/\/gist.github.com\/FisherKK\/ece7aa7a6d15a04e2d07293c45c1bd84\" target=\"_blank\" rel=\"noopener noreferrer\" data-href=\"https:\/\/gist.github.com\/FisherKK\/ece7aa7a6d15a04e2d07293c45c1bd84\" data->link<\/a>.<\/p>\n<p id=\"8781\"><strong>When predictions and expected results overlap<\/strong>, then the value of each reasonable\u00a0<strong>Cost Function is equal to zero<\/strong>.<\/p>\n<h3 id=\"8663\"><strong>Answer<\/strong><\/h3>\n<p id=\"5350\">It\u2019s high time to answer the question about which set of parameters,\u00a0<strong>orange <\/strong>or\u00a0<strong>lime<\/strong>, creates better approximator for prices of Cracow apartments. Let\u2019s\u00a0<strong>use MSE to calculate the error of both models<\/strong>\u00a0and see which one is lower.<\/p>\n<p id=\"e6e6\">Majority of the code was explained in the previous article. Instead of calling\u00a0<code>init(n)<\/code>\u00a0function, parameter dictionaries were created manually for testing purposes. Notice that both models use bias this time. Function\u00a0<code>predict(x, parameters)<\/code>\u00a0was used for the same data with different\u00a0<code>parameters<\/code>\u00a0argument. Then resulting predictions named\u00a0<code>orange_pred<\/code>\u00a0and\u00a0<code>lime_pred<\/code>\u00a0became an argument for\u00a0<code>mse(predictions, targets)<\/code>\u00a0function which returned error value for each model separately.<\/p>\n<p id=\"58a0\">The results are as following:<\/p>\n<ul>\n<li id=\"f8d5\">orange:\u00a0<strong>4909.18<\/strong><\/li>\n<li id=\"b0a4\">lime:\u00a0<strong>10409.77<\/strong><\/li>\n<\/ul>\n<p id=\"6ab3\">which means that\u00a0<strong>orange parameters create better model\u00a0<\/strong>as the cost is smaller.<\/p>\n<\/section>\n<section>\n<hr \/>\n<h3 id=\"cd91\"><strong>Summary<\/strong><\/h3>\n<p id=\"8cfb\">In this article, I have explained the idea of Cost Function \u2014 a tool which allows us to evaluate model parameters. I have introduced you to two most often used regression metrics MAE and MSE.<\/p>\n<p id=\"196d\">In the next article, I am going to show you how to train model parameters with Gradient Descent algorithm.<\/p>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>It is a function that&nbsp;measures the performance of a Machine Learning model&nbsp;for given data. Cost Function quantifies the error between predicted values and expected values and&nbsp;presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways. The&nbsp;purpose of&nbsp;Cost Function&nbsp;is to be either: Minimized &#8211; then returned value is usually called cost, loss or error.&nbsp;The&nbsp;goal is to find&nbsp;the&nbsp;values of&nbsp;model&nbsp;parameters for which&nbsp;Cost Function&nbsp;return as small number as possible.<\/p>\n","protected":false},"author":321,"featured_media":3261,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97],"ppma_author":[2905],"class_list":["post-1394","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence"],"authors":[{"term_id":2905,"user_id":321,"is_guest":0,"slug":"kamil-krzyk","display_name":"Kamil Krzyk","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Krzyk","first_name":"Kamil","job_title":"","description":"Kamil Krzyk is Data Scientist at <a href=\"http:\/\/www.azimo.com\/\">Azimo<\/a>,&nbsp; In the past, he was a Full Stack Engineer on the mobile team. Passionate about Machine Learning technology, he focuses on building software components which use data and math as its core."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/321"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1394"}],"version-history":[{"count":6,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1394\/revisions"}],"predecessor-version":[{"id":29961,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1394\/revisions\/29961"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3261"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1394"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}