{"id":22451,"date":"2020-11-20T07:51:30","date_gmt":"2020-11-20T07:51:30","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/debugging-error-resolving-data-scientists-essential-guide\/"},"modified":"2021-05-21T03:33:28","modified_gmt":"2021-05-21T03:33:28","slug":"debugging-error-resolving-data-scientists-essential-guide","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/debugging-error-resolving-data-scientists-essential-guide\/","title":{"rendered":"The Essential Guide To Debugging And Error Resolving For Data Scientists"},"content":{"rendered":"\n<p>Debugging is a funny-sounding word. The word originates from an actual bug getting in a computer and impeding the computer&#8217;s function back in the first computers&#8217; times. Since then it has taken a new meaning. Now, it means finding the source of a problem in your code and resolving it.<\/p>\n\n\n\n<p>When you&#8217;re first starting out with coding, debugging your code or resolving errors can be one of the hardest things to do. After all, the courses that teach how to code do not provide you with the tools you need to find the source of a problem and fix it. So <a href=\"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/advice-for-aspiring-data-scientists\/\" target=\"_blank\" rel=\"noreferrer noopener\">aspiring data scientists<\/a> feel lost and confused when they encounter an issue in their code.<\/p>\n\n\n\n<p>It&#8217;s actually not at all that complicated. You only need to make sure to follow some simple procedures to determine the source of your problem. Let&#8217;s see what they are.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Read the error<\/h2>\n\n\n\n<p><strong>\u200d<\/strong>It sounds obvious, I know, but it\u2019s very common for someone who recently started coding to think of the error message as gibberish. This leads to them not utilising the whole potential of this error message and ending up aimlessly poking around in the code while trying to figure out the problem. That&#8217;s why it&#8217;s the first thing you should do, read, actually read the error message. Very likely, it will tell you exactly what went wrong.<\/p>\n\n\n\n<p>Let\u2019s look at some examples:<\/p>\n\n\n\n<div id=\"w-node-e87154e3136b-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f0e37e13c7e3f94d8e565_Screenshot-2020-10-31-at-19.53.14.png\" alt=\"The Essential Guide To Debugging And Error Resolving For Data Scientists\"\/><\/figure><\/div>\n\n\n\n<p><br>In this case, the code tells us, it couldn\u2019t find the file because it does not exist. So, I know your first reaction would be to say, &#8220;no the file is there, here I\u2019m looking at it&#8221;. Instead of getting flustered though, it\u2019s important to understand what the message means. And no worries, in time, you will get better at recognising the meaning behind certain errors as you will see them a lot.<\/p>\n\n\n\n<p>Your code is not calling you a liar, it\u2019s merely saying that it could not find the file named iris-dataset.csv in the place you said it would find it. This could mean that:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>you have a typo in the name of the file or<\/li><li>your file is not in the same folder as your notebook.<\/li><\/ul>\n\n\n\n<p>So you might need to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>make sure the name is written correctly (in this case it was iris_dataset.csv &#8211; with an underscore and not a dash) or<\/li><li>add the name of the folder where this piece of data is.<\/li><\/ul>\n\n\n\n<p>Another example:<\/p>\n\n\n\n<div id=\"w-node-404b2ae45c87-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f0f8ba964720cf7fc8233_Screenshot-2020-10-31-at-19.54.48.png\" alt=\"The Essential Guide To Debugging And Error Resolving For Data Scientists\"\/><\/figure><\/div>\n\n\n\n<p>Looks scary but all it is telling you is that there is no such column named \u201csepal_widt\u201d in your data. This could mean that:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>you forgot to include all the columns you wanted in your data frame or<\/li><li>you have a typo.&nbsp;<\/li><\/ul>\n\n\n\n<p>I\u2019ll tell you how to understand which one is the case later in this article. I chose the two examples above because they are fairly common in data science work. But there is yet another trick to reading the error message: following the trace of the error. For example, in the example below, we get a TypeError.&nbsp;<\/p>\n\n\n\n<div id=\"w-node-7e0e6f48f08b-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f103c81c62ea6b7881836_Screenshot-2020-10-31-at-20.00.03.png\" alt=\"The Essential Guide To Debugging And Error Resolving For Data Scientists\"\/><\/figure><\/div>\n\n\n\n<p>So, TypeError sounds very abstract but we can trace the error thanks to the Traceback feature to understand the cause of this error. The upper arrow points to the line in our code that caused this error. But it doesn\u2019t end there, because the error it points to happens inside another function. A function that this piece of code calls. So it shows you when things went wrong inside that function with the lower arrow. This way, you know where to go to fix your issue.<\/p>\n\n\n\n<p>A small note here; this error might be caused not by the exact line of code the Traceback arrow points to but by a value you are setting in a different part of this code. We\u2019ll see how to follow this back too in this article.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Google the error no matter how unusual it seems<\/h2>\n\n\n\n<p>When you get an error for the first time, it might feel like you\u2019re the first person on the face of the earth to get that error. But 99.999% of the time, unless you are working with a framework or a library that very few people use, Google will have an answer for you.<\/p>\n\n\n\n<p>All you have to do is to copy and paste the exact error you are seeing. In the previous image, for example, do not copy and paste the whole message but only the part where it says \u201cTypeError: \u2018list\u2019 object cannot be interpreted as an integer.\u201d<\/p>\n\n\n\n<p>Here is what I found Googling this error. And yes, that\u2019s exactly the problem with this code.<\/p>\n\n\n\n<div id=\"w-node-ea8cb6e50d90-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f1192bd7f50bf3d4bca0f_Screenshot-2020-10-31-at-20.22.48.png\" alt=\"The Essential Guide To Debugging And Error Resolving For Data Scientists\"\/><\/figure><\/div>\n\n\n\n<p>Googling errors is good for understanding error messages and sometimes also to get a solution. But apart from resolving errors, you will learn a ton just by reading other people\u2019s take on these errors. The more you code, get errors and fix them, the more you will understand what they mean by just reading the error messages and eventually the less you would need to consult Google.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">On Jupyter notebooks, work backwards<\/h2>\n\n\n\n<p>Working on a notebook environment means that it is easy to run parts of the code and see results. We don\u2019t only divide our codes into different cells of the notebooks because it\u2019s fun. It is a way to observe the flow of the data and check that everything runs smoothly and as expected.<\/p>\n\n\n\n<p>So when you run into an error (or a piece of code doesn\u2019t output what you expected it to), your next step should be tracing back the problem.<\/p>\n\n\n\n<p>What I do is printing the variables\/dataframes I used in a certain cell and compare them to what I expected to see. This includes checking the values with df.head() or checking the types of columns with df.dtypes.<\/p>\n\n\n\n<p>Let\u2019s see an example. Let\u2019s say I have a data frame with sepal_lengths and sepal_widths of flowers. And I created a new column taking the average of these two value. But when I plot them, I see that the value of the average column is unexpectedly not between the sepal_length and sepal_width value. Obviously, something is wrong.<\/p>\n\n\n\n<div id=\"w-node-8a9ec4e20755-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f12d5e5a1816d66941124_Screenshot-2020-10-31-at-20.33.50.png\" alt=\"Blue bar: sepal_length, orange bar: sepal_width and green bar: sepal values\u2019 average.\"\/><figcaption>Blue bar: sepal_length, orange bar: sepal_width and green bar: sepal values\u2019 average.<\/figcaption><\/figure><\/div>\n\n\n\n<p>What I do then is to print out the data frame I use in this plot to see what the data looks like. Looking at the data I see that the sepal_average value does not reflect the average of sepal length and sepal width. Most likely something is wrong with the calculation of this new column.<\/p>\n\n\n\n<div id=\"w-node-c971f6f8f416-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f1319428c5845475d866d_Screenshot-2020-10-31-at-20.33.57.png\" alt=\"calculation of this new column\"\/><\/figure><\/div>\n\n\n\n<p>And yes, checking the calculation of the new average column, I see that I divided the sum by 3 and not 2.<\/p>\n\n\n\n<div id=\"w-node-1daaf790b1ef-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f1339be2154fde3e73665_Screenshot-2020-10-31-at-20.34.05.png\" alt=\"checking the calculation of the new average column\"\/><\/figure><\/div>\n\n\n\n<p>Of course, problems are not always this obvious. The same problem might have been caused by me accidentally changing the value of sepal_average column after I set its value. But by working backwards from the issue and inspecting where I altered the values of the problematic column will always help me find the source of the issue.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Write prints for more complex code<\/h2>\n\n\n\n<p>Sometimes your code might have functions, loops and other complex structures making it hard for you to manually trace errors back. In that case, one workaround is to use the print command to your advantage. Let&#8217;s look at an example:<\/p>\n\n\n\n<div id=\"w-node-62ed46c41487-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f144d9cb4090c3d0f1d4e_Screenshot-2020-10-31-at-19.59.23.png\" alt=\"The Essential Guide To Debugging And Error Resolving For Data Scientists\"\/><\/figure><\/div>\n\n\n\n<p>Let\u2019s say when you run this code, the accumulated_list value does not look like what you expected it to look like, or something in this code is giving you an error and you don\u2019t understand it. In that case, you can put print commands in certain places where some sort of value transformation is happening to see what is going wrong.<\/p>\n\n\n\n<div id=\"w-node-378d013fe20d-4e978b8b\" class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/5f9f14a243de7e2e3f32b849_Screenshot-2020-10-31-at-20.01.03.png\" alt=\"accumulated_list value\"\/><\/figure><\/div>\n\n\n\n<p>And with the new additions, you will print the value your code is working with. You will be able to observe what happens to that value over time and what the intermediate list (new_list) looks like. It will help you pin down the exact line things start looking unexpected or what the values of your variables are right before an error is triggered. Thus, giving you more insights on how to resolve the issue.<\/p>\n\n\n\n<p>And that&#8217;s it, just by using these simple 4 ways of inspecting your errors, you can become much more efficient in debugging your code. And&nbsp;<a href=\"https:\/\/www.soyouwanttobeadatascientist.com\/post\/what-every-aspiring-data-scientist-needs-to-know-about-coding\" target=\"_blank\" rel=\"noreferrer noopener\">as I said in another article<\/a>, no worries if it takes you some time to resolve issues or it takes a while before you get used to dealing with errors. By working hands-on, you\u2019re training your brain to see patterns and resolve issues more and more quickly every time. Personally, I found that debugging and hunting for answers online is responsible for a lot of my current knowledge. So don&#8217;t let errors and issues discourage you and keep at it!<\/p>\n\n\n\n<p>Originally published in <a href=\"https:\/\/disqus.com\/home\/forums\/soyouwanttobeadatascientist-com\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>soyouwanttobeadatascientist<\/strong><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Debugging means finding the source of a problem in your code and resolving it. Debugging your code or resolving errors can be one of the hardest things to do. Follow some simple procedures to determine the source of your problem.<\/p>\n","protected":false},"author":945,"featured_media":16869,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[343,394,1013,1031],"ppma_author":[3790],"class_list":["post-22451","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-code","tag-data-scientist","tag-debugging","tag-errors"],"authors":[{"term_id":3790,"user_id":945,"is_guest":0,"slug":"turp","display_name":"M\u0131sra Turp","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/bd5cd791453bddbbb39b1123508b76cd531012ef2c493f3907e89cabe484fcc5?s=96&d=mm&r=g","user_url":"https:\/\/www.soyouwanttobeadatascientist.com\/%20","last_name":"Turp","first_name":"M\u0131sra","job_title":"","description":"M\u0131sra Turp is Data Scientist at <a href=\"https:\/\/mytomorrows.com\/en\/\"> myTomorrows <\/a>, an independent, globally operating organization in Hospital &amp; Health Care."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22451","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/945"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22451"}],"version-history":[{"count":1,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22451\/revisions"}],"predecessor-version":[{"id":23190,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22451\/revisions\/23190"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/16869"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22451"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22451"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22451"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22451"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}