{"id":22619,"date":"2021-02-12T10:30:24","date_gmt":"2021-02-12T10:30:24","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/mathematical-laws-know-data-scientist\/"},"modified":"2023-09-05T06:35:10","modified_gmt":"2023-09-05T06:35:10","slug":"mathematical-laws-know-data-scientist","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/mathematical-laws-know-data-scientist\/","title":{"rendered":"3 Mathematical Laws To Know As A Data Scientist"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"22619\" class=\"elementor elementor-22619\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-6909228 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6909228\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-41be781\" data-id=\"41be781\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a87e9e8 elementor-widget elementor-widget-text-editor\" data-id=\"a87e9e8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\"><em>Some interesting laws that help you as a Data Scientist<\/em><\/p>\n<p id=\"8ed5\">While Data Scientist was working with Data for their main activity, it doesn&#8217;t mean that Mathematical knowledge is something we do not need. Data scientists need to learn and understand the mathematical theory behind machine learning to efficiently solving business problems.<\/p>\n<p id=\"d7bf\">The mathematical behind machine learning is not just a random notation thrown here and there, but it consisted of many theories and thoughts. This thought creates a lot of mathematical laws that contributed to the machine learning we able to use right now. Although you could use the mathematical in any way you want to solve the problem, mathematical laws are not limited to machine learning after all.<\/p>\n<p id=\"08df\">In this article, I want to outline some of the interesting mathematical laws that could help you as a Data Scientist. Let\u2019s get into it.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9aa3272 elementor-widget elementor-widget-heading\" data-id=\"9aa3272\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Benford\u2019s Law<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a1bf7c elementor-widget elementor-widget-text-editor\" data-id=\"4a1bf7c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7716\"><strong>Benford\u2019s law<\/strong>&nbsp;also called the&nbsp;<strong>Newcomb\u2013Benford law<\/strong>, the&nbsp;<strong>law of anomalous numbers<\/strong>, or the&nbsp;<strong>first-digit law,&nbsp;<\/strong>is a mathematical law about the leading digit number in a real-world dataset.<\/p>\n<p id=\"c4ac\">When we think about the first digit of the numbers, it should be distributed uniformly when we randomly took a number. Intuitively, the random number leading digit 1 should have the same probability as leading digit 9, which is ~11.1%. Surprisingly, this is not what happens.<\/p>\n<p id=\"e6e7\"><strong>Benford\u2019s law states that the leading digit is likely to be small in many naturally occurring collections of numbers<\/strong>. Leading digit 1 happens more often than 2, leading digit 2 occurs more often than 3, and so on.<\/p>\n<p id=\"a532\">Let\u2019s try using a real-world dataset to see how this law is applicable. For this article, I would use the data from\u00a0Kaggle\u00a0regarding Spotify Track song from 1921\u20132020. From the data, I would take the leading digit of the song durations.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-465ee5a elementor-widget elementor-widget-image\" data-id=\"465ee5a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"546\" height=\"377\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12vsbH4YGKc8L7u-HYWnidQ.png\" class=\"attachment-large size-large wp-image-18700\" alt=\"3 Mathematical Laws To Know As A Data Scientist\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12vsbH4YGKc8L7u-HYWnidQ.png 546w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/12vsbH4YGKc8L7u-HYWnidQ-300x207.png 300w\" sizes=\"(max-width: 546px) 100vw, 546px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Image created by Author<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-169bcec elementor-widget elementor-widget-text-editor\" data-id=\"169bcec\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"7ace\">From the image above, we can see that the leading digit 1 occurs the most, then it is decreasing following the higher number. This is what Benford\u2019s Law state above.<\/p>\n<p id=\"4b05\">If we talk about the proper definition, Benford law state that a set of numbers is said to satisfy Benford\u2019s law if the leading digit d ( \u22081,\u2026,9) occurs with the equation.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ff85ad elementor-widget elementor-widget-image\" data-id=\"7ff85ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"581\" height=\"45\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1rgsR4tCDiEUqcMXcWMj4fw.png\" class=\"attachment-large size-large wp-image-18701\" alt=\"3 Mathematical Laws To Know As A Data Scientist\" srcset=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1rgsR4tCDiEUqcMXcWMj4fw.png 581w, https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1rgsR4tCDiEUqcMXcWMj4fw-300x23.png 300w\" sizes=\"(max-width: 581px) 100vw, 581px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Image created by Author<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fab4a03 elementor-widget elementor-widget-text-editor\" data-id=\"fab4a03\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"918f\">From this equation, we acquired the leading digit with the following distribution.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bc6ced4 elementor-widget elementor-widget-image\" data-id=\"bc6ced4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"162\" height=\"258\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/1D1N8wG23Mo8z_eqPHRIRpw.png\" class=\"attachment-large size-large wp-image-18702\" alt=\"3 Mathematical Laws To Know As A Data Scientist\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Image created by Author<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e6668b8 elementor-widget elementor-widget-text-editor\" data-id=\"e6668b8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"cc23\">With this distribution, we can predict that 1 as the leading digit is 30% likely to occurs more than the other leading digit.<\/p>\n<p id=\"a6b0\">Many applications for this law, for example, fraud detection on tax forms, election results, economic numbers, and accounting figures.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c6ad777 elementor-widget elementor-widget-heading\" data-id=\"c6ad777\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Law of Large Numbers (LLN)<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0e5ab1b elementor-widget elementor-widget-text-editor\" data-id=\"0e5ab1b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p class=\"has-medium-font-size\" id=\"2118\"><mark>The Law of Large Number stated that&nbsp;<\/mark><mark><strong>as the number of trials of a random process increases, the results&#8217; average would get closer to the expected values or theoretical values.<\/strong><\/mark><\/p>\n<p id=\"4267\">For example, when rolling the dice. The possibility of 6-side dice is 1,2,3,4,5 and 6. The mean for the 6-side dice would be 3.5. As we are rolling the dice, the number we get would be random from 1 to 6, but as we keep rolling the dice, the result&#8217;s average would get closer to the expected value, which is 3.5. This is what the Law of Large Numbers denote.<\/p>\n<p id=\"ef12\">While it is useful, the tricky part here is that you need many experiments or occurrences. However, a large number required means that it is good to predict long-term stability.<\/p>\n<p id=\"48ac\">The Law of Large Numbers is different than the Law of Average, where it was used to express a belief that outcomes of a random event will \u201ceven out\u201d within a small sample. This is what we called&nbsp;<strong>\u201cGambler\u2019s Fallacy,\u201d&nbsp;<\/strong>where we expect the expected value would occur in a smaller sample.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-82a588c elementor-widget elementor-widget-heading\" data-id=\"82a588c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Zipf\u2019s Law<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ed324b0 elementor-widget elementor-widget-text-editor\" data-id=\"ed324b0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"6bae\">Zipf\u2019s law was created for quantitative linguistic, which states that given some natural language dataset corpus, any word&#8217;s frequency<strong>&nbsp;is inversely proportional to its frequency table rank<\/strong>. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word.<\/p>\n<p id=\"3528\">For example, in the previous Spotify dataset, I would try to split all the words and punctuation to count them. Below is the top 12 of the most common words and their frequency.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f121e9b elementor-widget elementor-widget-image\" data-id=\"f121e9b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"164\" height=\"284\" src=\"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/14NQFIze8By8GKReOtzAYfQ.png\" class=\"attachment-large size-large wp-image-18703\" alt=\"3 Mathematical Laws To Know As A Data Scientist\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Image Created by Author<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ba26006 elementor-widget elementor-widget-text-editor\" data-id=\"ba26006\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"dc2b\">When I sum all the word that exists in the Spotify corpus, the total is 759389. We could see if Zipf\u2019s law applies to this dataset by counting the probability when they occur. The first most occurring word or punctuation is \u2018-\u2019 with 32258, which has the probability of ~4% then followed by \u2018the,\u2019 which has the probability of ~2%.<\/p>\n<p id=\"7550\">Faithful to the law, the probability would keep going down in some of the words. Of course, there is a little deviation, but the probability would go down most of the time following the frequency rank increase.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6d370ee elementor-widget elementor-widget-heading\" data-id=\"6d370ee\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-77afaab elementor-widget elementor-widget-text-editor\" data-id=\"77afaab\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p id=\"9a6a\">These are some interesting mathematical laws to know as a Data Scientist and definitely would help you in your Data Science work. The laws are:<\/p>\n<ul><li>Benford\u2019s Law<\/li><li>Law of Large Number<\/li><li>Zipf\u2019s Law<\/li><\/ul>\n<p id=\"c496\">I hope it helps!<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Some interesting laws that help you as a Data Scientist While Data Scientist was working with Data for their main activity, it doesn&#8217;t mean that Mathematical knowledge is something we do not need. Data scientists need to learn and understand the mathematical theory behind machine learning to efficiently solving business problems. The mathematical behind machine<\/p>\n","protected":false},"author":1052,"featured_media":18704,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[394,92,1331],"ppma_author":[3873],"class_list":["post-22619","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-data-scientist","tag-machine-learning","tag-mathematical-laws"],"authors":[{"term_id":3873,"user_id":1052,"is_guest":0,"slug":"cornellius-yudha-wijaya","display_name":"Cornellius Yudha Wijaya","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2021\/05\/Cornellius-Yudha-Wijaya-150x150.jpeg","user_url":"https:\/\/careers.allianz.com","last_name":"Yudha Wijaya","first_name":"Cornellius","job_title":"","description":"Cornellius Yudha Wijaya is Data Scientist at Allianz Life, a global leader in insurance and financial services."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22619","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/1052"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=22619"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22619\/revisions"}],"predecessor-version":[{"id":32229,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/22619\/revisions\/32229"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/18704"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=22619"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=22619"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=22619"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=22619"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}