{"id":2106,"date":"2019-12-02T02:51:16","date_gmt":"2019-12-01T23:51:16","guid":{"rendered":"http:\/\/kusuaks7\/?p=1711"},"modified":"2024-02-19T07:31:27","modified_gmt":"2024-02-19T07:31:27","slug":"p-value-explained-simply-for-data-scientists","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/p-value-explained-simply-for-data-scientists\/","title":{"rendered":"P-value Explained Simply for Data Scientists"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"2106\" class=\"elementor elementor-2106\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-5e1d3836 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5e1d3836\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6049ad2c\" data-id=\"6049ad2c\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-74c62ebf elementor-widget elementor-widget-text-editor\" data-id=\"74c62ebf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tP-Values are always a headache to explain even to someone who knows about them let alone someone who doesn\u2019t understand statistics.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-376234c elementor-widget elementor-widget-text-editor\" data-id=\"376234c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tI went to Wikipedia to find something and here is the definition: &gt; In statistical hypothesis testing, the p-value or probability value is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary (such as the sample mean difference between two groups) would be equal to, or more extreme than, the actual observed results.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d106edf elementor-widget elementor-widget-text-editor\" data-id=\"d106edf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnd my first thought was that might be they have written it like this so that nobody could understand it. The problem here lies with a lot of terminology and language that statisticians enjoy to employ.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-09d157d elementor-widget elementor-widget-text-editor\" data-id=\"09d157d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>This post is about explaining p-values in an easy to understand way without all that pretentiousness of statisticians<\/em><\/strong>.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9b4a428 elementor-widget elementor-widget-heading\" data-id=\"9b4a428\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"a-real-life-problem\">A Real-Life problem<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5517f18 elementor-widget elementor-widget-text-editor\" data-id=\"5517f18\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIn our lives, we certainly believe one thing over another.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-03be5f9 elementor-widget elementor-widget-text-editor\" data-id=\"03be5f9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tFrom the\u00a0<strong><em>obvious ones<\/em><\/strong>\u00a0like \u2014 The earth is round. Or that the earth revolves around the Sun. The Sun rises in the east.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1b3cf89 elementor-widget elementor-widget-text-editor\" data-id=\"1b3cf89\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tTo the more\u00a0<strong><em>non-obvious ones<\/em><\/strong>\u00a0with varying level of uncertainties &#8211; Exercising reduces weight? Or that Trump is going to win\/lose in his next election? Or that a particular drug works? Or that sleeping for 8 hours is good for your health?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e5ac13f elementor-widget elementor-widget-text-editor\" data-id=\"e5ac13f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWhile the former category is facts, the latter category differs from person to person.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-70b7171 elementor-widget elementor-widget-text-editor\" data-id=\"70b7171\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSo, what if I come to you and say that exercising does not affect weight?\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f49b0ac elementor-widget elementor-widget-text-editor\" data-id=\"f49b0ac\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>All the gym-goers may call me not so kind words. But is there a mathematical and logical structure in which someone can disprove me?<\/em><\/strong>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-86a9040 elementor-widget elementor-widget-text-editor\" data-id=\"86a9040\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nThis brings us to the notion of Hypothesis testing.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b8a5779 elementor-widget elementor-widget-heading\" data-id=\"b8a5779\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"hypothesis-testing\">Hypothesis Testing<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d14977d elementor-widget elementor-widget-image\" data-id=\"d14977d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/mlwhiz.com\/images\/pval\/0.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6c3dd29 elementor-widget elementor-widget-text-editor\" data-id=\"6c3dd29\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tSo the statement I made in the above example \u2014 exercising does not affect weight. This statement is my Hypothesis. Let\u2019s call it\u00a0<strong><em>Null hypothesis<\/em><\/strong>for now. For now, it is the status quo as in we consider it to be true.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a63f518 elementor-widget elementor-widget-text-editor\" data-id=\"a63f518\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe\u00a0<strong><em>Alternative Hypothesis<\/em><\/strong>\u00a0from people who swear by exercising is \u2014 exercising does reduce weight.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bfd4dee elementor-widget elementor-widget-text-editor\" data-id=\"bfd4dee\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut how do we test these hypotheses? We collect Data. We collect weight loss data for a sample of 10 people who regularly exercise for over 3 months.\n<pre><code>WeightLoss Sample Mean = 2 kg\nSample Standard Deviation = 1 kg\n<\/code><\/pre>\nDoes this prove that exercise does reduce weight? From a cursory look, it sort of looks like that exercising does have its benefits as people who exercise have lost on an average 2 kgs.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b1e5a75 elementor-widget elementor-widget-text-editor\" data-id=\"b1e5a75\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut you will find that such clear cut findings are not always the case when you do hypothesis testing. What if the weight loss mean for people who do exercise was just 0.2 kg. Would you still be so sure that exercise does reduce weight?\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c873358 elementor-widget elementor-widget-text-editor\" data-id=\"c873358\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>So how can we quantify this and put some maths behind it all?<\/em><\/strong>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7ac7ae0 elementor-widget elementor-widget-text-editor\" data-id=\"7ac7ae0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tLet us set up our experiment to do this.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7b274f9 elementor-widget elementor-widget-heading\" data-id=\"7b274f9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"experiment\">Experiment<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-9d921b3 elementor-widget elementor-widget-text-editor\" data-id=\"9d921b3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tLet\u2019s go back to our Hypotheses again:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f795a62 elementor-widget elementor-widget-text-editor\" data-id=\"f795a62\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>H\u00ba:<\/em><\/strong>\u00a0Exercising does not affect weight. Or equivalently \ud835\udf07 = 0\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-51064e7 elementor-widget elementor-widget-text-editor\" data-id=\"51064e7\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>H\u1d2c:<\/em><\/strong>\u00a0Exercise does reduce weight. Or equivalently \ud835\udf07&gt;0\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2db6b71 elementor-widget elementor-widget-text-editor\" data-id=\"2db6b71\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWe see our data sample of 10 people, and we try to find out the value of\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-da0ad74 elementor-widget elementor-widget-text-editor\" data-id=\"da0ad74\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tObserved Mean(Weightloss of People who exercise) = 2 kg\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1c4af1d elementor-widget elementor-widget-text-editor\" data-id=\"1c4af1d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tObserved Sample Standard Deviation = 1 kg\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bdef0ef elementor-widget elementor-widget-text-editor\" data-id=\"bdef0ef\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow a good question to ask ourselves is \u2014\u00a0<strong><em>Assuming that the null hypothesis is true, what is the probability of observing a sample mean of 2 kg or more extreme than 2 kg?<\/em><\/strong>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-94e7d86 elementor-widget elementor-widget-text-editor\" data-id=\"94e7d86\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAssuming we can calculate this \u2014 If this probability value is meagre (lesser than a threshold value), we reject our null hypothesis. And otherwise, we fail to reject our null hypothesis.\u00a0<strong><em>Why fail to reject and not accept?<\/em><\/strong>\u00a0I will answer this later.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5f058ae elementor-widget elementor-widget-text-editor\" data-id=\"5f058ae\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis probability value is actually the p-value. Simply, it is just the probability of observing what we observed or extreme results if we assume our null hypothesis to be true.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6e15a14 elementor-widget elementor-widget-text-editor\" data-id=\"6e15a14\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>The statisticians call the threshold as the significance level(\ud835\udf36), and in most of the cases, \ud835\udf36 is taken to be 0.05.<\/em><\/strong>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4f2a54f elementor-widget elementor-widget-text-editor\" data-id=\"4f2a54f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>So how do we answer:<\/em><\/strong>\u00a0Assuming that the null hypothesis is true, what is the probability of getting a value of 2 kg or more than 2 kg?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-768f10a elementor-widget elementor-widget-text-editor\" data-id=\"768f10a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnd here comes our favourite distribution, Normal distribution in the picture.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d0640fd elementor-widget elementor-widget-heading\" data-id=\"d0640fd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"the-normal-distribution\">The Normal Distribution<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-835b604 elementor-widget elementor-widget-text-editor\" data-id=\"835b604\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tWe create a Sampling Distribution of the mean of the WeightLoss samples assuming our Null hypothesis is True.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2a7758f elementor-widget elementor-widget-text-editor\" data-id=\"2a7758f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>Central Limit Theorem:<\/em><\/strong>\u00a0The\u00a0<strong>central limit theorem<\/strong>\u00a0simply states that if you have a population with mean \u03bc and standard deviation \u03c3, and take random samples from the population, then the\u00a0<strong>distribution<\/strong>\u00a0of the\u00a0<strong>sample<\/strong>\u00a0means will be approximately normally\u00a0<strong>distributed with mean as the population mean<\/strong>\u00a0and\u00a0<strong>standard deviation \u03c3\/\u221an<\/strong>. Where \u03c3 is the standard deviation of the sample and n is the number of observations in the sample.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ce413cc elementor-widget elementor-widget-text-editor\" data-id=\"ce413cc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tNow we already know the mean of our population as given by our null hypothesis. So, we use that and have a normal distribution whose mean is 0. And whose standard deviation is given by 1\/\u221a10\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7cb7a8 elementor-widget elementor-widget-image\" data-id=\"a7cb7a8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/mlwhiz.com\/images\/pval\/1.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7fc2fd1 elementor-widget elementor-widget-text-editor\" data-id=\"7fc2fd1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis is, in fact, the distribution of the mean of the samples from the population. We observed a particular value of the mean that is Xobserved = 2 kg.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-baeba1f elementor-widget elementor-widget-text-editor\" data-id=\"baeba1f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nNow we can use some statistical software to find the area under this particular curve:\n<pre><code data-lang=\"py\">from scipy.stats import norm\nimport numpy as np\np = 1-norm.cdf(2, loc=0, scale = 1\/np.sqrt(10))\nprint(p)<\/code><\/pre>\n<pre><code>1.269814253745949e-10\n<\/code><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec26005 elementor-widget elementor-widget-text-editor\" data-id=\"ec26005\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAs such, this is a very small probability p-value ( less than the significance level of 0.05) for the mean of a sample to take a value of 2 or more.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a7de9fb elementor-widget elementor-widget-text-editor\" data-id=\"a7de9fb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnd so we can reject our Null hypothesis. And we can call our results statistically significant as in they don\u2019t just occur due to mere chance.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-17a894c elementor-widget elementor-widget-heading\" data-id=\"17a894c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"the-z-statistic\">The Z statistic<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e804830 elementor-widget elementor-widget-text-editor\" data-id=\"e804830\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tYou might have heard about the Z statistic too when you have read about Hypothesis testing. Again as I said, terminology.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ddde4dc elementor-widget elementor-widget-text-editor\" data-id=\"ddde4dc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThat is the extension of basically the same above idea where we use a standard normal with mean 0 and variance 1 as our sampling distribution after transforming our observed value x using:\n<p style=\"text-align: center;\"><img decoding=\"async\" src=\"https:\/\/mlwhiz.com\/images\/pval\/2.png\" alt=\"\" \/><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8129716 elementor-widget elementor-widget-text-editor\" data-id=\"8129716\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThis makes it easier to use statistical tables. In our running example, our z statistic is:\n<pre><code data-lang=\"py\">z = (2-0)\/(1\/np.sqrt(10))\nprint(z)<\/code><\/pre>\n<pre><code>6.324555320336758\n<\/code><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-caae8ec elementor-widget elementor-widget-text-editor\" data-id=\"caae8ec\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tJust looking at the Z statistic of &gt;6 should give you an idea that the observed value is at least six standard deviations away and so the p-value should be very less. We can still find the p-value using:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-25ac9fd elementor-widget elementor-widget-text-editor\" data-id=\"25ac9fd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<pre><code data-lang=\"py\">from scipy.stats import norm\nimport numpy as np\n\np = 1-norm.cdf(z, loc=0, scale=1)\nprint(p)<\/code><\/pre>\n<pre><code>1.269814253745949e-10\n<\/code><\/pre>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-868d4ed elementor-widget elementor-widget-text-editor\" data-id=\"868d4ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAs you can see,\u00a0<strong><em>we get the same answer using the Z statistic.<\/em><\/strong>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-01761d4 elementor-widget elementor-widget-heading\" data-id=\"01761d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 id=\"an-important-distinction\">An Important Distinction<\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4145882 elementor-widget elementor-widget-image\" data-id=\"4145882\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/mlwhiz.com\/images\/pval\/3.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f411fce elementor-widget elementor-widget-text-editor\" data-id=\"f411fce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\nSo we said before that we reject our null hypothesis as in we got sufficient evidence to prove that our null hypothesis is false.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-873838b elementor-widget elementor-widget-text-editor\" data-id=\"873838b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut what if the p-value was higher than the significance level. Then we say that we fail to reject the null hypothesis. Why don\u2019t we say accept the null hypothesis?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ecb182f elementor-widget elementor-widget-text-editor\" data-id=\"ecb182f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tThe best intuitive example of this is using trial courts. In a trial court, the null hypothesis is that the accused is not guilty. Then we see some evidence to disprove the null hypothesis.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-47863c5 elementor-widget elementor-widget-text-editor\" data-id=\"47863c5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tIf we are not able to disprove the null hypotheses the judge doesn\u2019t say that the accused hasn\u2019t committed the crime.\u00a0<strong><em>The judge only says that based on the given evidence, we are not able to convict the accused.<\/em><\/strong>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7a4755a elementor-widget elementor-widget-text-editor\" data-id=\"7a4755a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tAnother example to drive this point forward: Assuming that we are exploring life on an alien planet. And our null hypothesis(<strong><em>H\u00ba<\/em><\/strong>) is that there is no life on the planet. We roam around a few miles for some time and look for people\/aliens on that planet. If we see any alien, we can reject the null hypothesis in favour of the alternative.\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb6c51e elementor-widget elementor-widget-text-editor\" data-id=\"cb6c51e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tBut if we don\u2019t see any alien, can we definitively say that there is no alien life on the planet or accept our null hypotheses? Maybe we needed to explore more, or perhaps we needed more time and we may have found an alien. So, in this case, we cannot accept the null hypothesis; we can only fail to reject it. Or In\u00a0<a href=\"https:\/\/medium.com\/hackernoon\/statistical-inference-in-one-sentence-33a4683a6424\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" class=\"broken_link\">Cassie Kozyrkov\u2019s<\/a>\u00a0words from whom the example comes, we can say that\u00a0<strong><em>\u201cwe learned nothing interesting\u201d.<\/em><\/strong>\n<blockquote>In STAT101 class, they teach you to write a convoluted paragraph when that happens. (\u201cWe fail to reject the null hypothesis and conclude that there is insufficient statistical evidence to support the existence of alien life on this planet.\u201d) I\u2019m convinced that the only purpose of this expression is to strain students\u2019 wrists. I\u2019ve always allowed my undergraduate students to write it like it is: we learned nothing interesting.<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-f110b48 elementor-widget elementor-widget-image\" data-id=\"f110b48\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/mlwhiz.com\/images\/pval\/4.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ce32031 elementor-widget elementor-widget-text-editor\" data-id=\"ce32031\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<strong><em>In essence, hypothesis testing is just about checking if our observed values make the null hypothesis look ridiculous<\/em><\/strong>. If yes, we reject the null hypothesis and call our results statistically significant. And otherwise we have learned nothing interesting, and we continue with our status quo.\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>P-Values are always a headache to explain even to someone who knows about them let alone someone who doesn&rsquo;t understand statistics. In statistical hypothesis testing, the p-value or probability value is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary such as the sample mean difference between two groups would be equal to, or more extreme than, the actual observed results. This post is about explaining p-values in an easy to understand way without all that pretentiousness of statisticians.<\/p>\n","protected":false},"author":653,"featured_media":2926,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[94],"ppma_author":[3409],"class_list":["post-2106","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data-science"],"authors":[{"term_id":3409,"user_id":653,"is_guest":0,"slug":"rahul-agarwal","display_name":"Rahul Agarwal","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_cc5785b8-8195-44e6-a0de-2e33be05d7cb-150x150.png","user_url":"http:\/\/bit.ly\/384SBYb","last_name":"Agarwal","first_name":"Rahul","job_title":"","description":"Rahul Agarwal is a Data Scientist at Walmart Labs."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2106","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/653"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=2106"}],"version-history":[{"count":4,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2106\/revisions"}],"predecessor-version":[{"id":36029,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/2106\/revisions\/36029"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/2926"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=2106"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=2106"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=2106"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=2106"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}