{"id":10095,"date":"2020-10-06T10:14:17","date_gmt":"2020-10-06T10:14:17","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=10095"},"modified":"2023-10-24T18:12:30","modified_gmt":"2023-10-24T18:12:30","slug":"misleading-with-data-statistics","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/misleading-with-data-statistics\/","title":{"rendered":"Misleading With Data &#038; Statistics"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"10095\" class=\"elementor elementor-10095\" data-elementor-post-type=\"post\">\n\t\t\t\t\t\t<section class=\"has_eae_slider elementor-section elementor-top-section elementor-element elementor-element-5caf4340 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5caf4340\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"has_eae_slider elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1953e550\" data-id=\"1953e550\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-5d92807f elementor-widget elementor-widget-text-editor\" data-id=\"5d92807f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<em>Statistics play a vital role in our life. We use them everyday \u2014 consciously or unconsciously. Nowadays data is everywhere and making the right decisions becomes increasingly difficult due to an information overload of our system. Statistics allow us to better process and understand the world around us if applied correctly. We should be able to make better decisions based on more complete information. But what if statistics are misleading?<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4a4af3d elementor-widget elementor-widget-image\" data-id=\"4a4af3d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/5414\/1*uiT-maK2TzZFKVjzGqIwYQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-842de36 elementor-widget elementor-widget-text-editor\" data-id=\"842de36\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<em>What if somebody wants to trick you into believing something you better shouldn\u2019t believe? Where are the screws to adjust data and its statistical results? This article walks you through pitfalls when data &amp; statistics are used. The objective is to understand the most common practices of manipulating data and avoid making bad decisions due to misleading information.<\/em>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d6c4a02 elementor-widget elementor-widget-heading\" data-id=\"d6c4a02\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">The Problem With Data &amp; Statistics.<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d6d3bb0 elementor-widget elementor-widget-text-editor\" data-id=\"d6d3bb0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"0c79\">Data has a lot of power and authority in an argument. And this is the challenge: too often there is a made up a number underlining a bullshit story. But due to the authority of that number, we are much less suspicious if this number is really true. An actual\u00a0<a href=\"https:\/\/www.ftc.gov\/enforcement\/cases-proceedings\/102-3070\/reebok-international-ltd\" target=\"_blank\" rel=\"noreferrer noopener\">example from an old Reebok campaign<\/a>\u00a0demonstrates this effect quite well:<\/p>\n\n\n\n<p id=\"2a19\">By understanding some of the most common processes where misleading data and statistics are produced, we drastically reduce the chance of being trapped ourselves. With the correct deployment of data &amp; statistics in a responsible, comprehensible and ethical way , we actively contribute to a better informed world \u2014 and doesn\u2019t that sounds like an amazing mission for the future?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\n\n\n<blockquote class=\"wp-block-quote\">\n<p>\u201cIt\u2019s the shoe proven to work your hamstrings and calves up to 11% harder and tone your butt up to 28% more than regular sneakers \u2026 just by walking!\u201d.<\/p>\n<\/blockquote>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-875b570 elementor-widget elementor-widget-text-editor\" data-id=\"875b570\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container is-layout-flow wp-block-quote-is-layout-flow\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"b0f5\">This just sounds so authoritative &#8211; based on data Reebok probably collected in very intense studies. Compare it to the following version formulated without concrete numbers:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cThis shoe will make your hamstrings and calves harder and tone your butt just be walking.\u201d<\/p>\n<\/blockquote>\n\n\n<p id=\"7142\">Truth is, the numbers were completely made up and the brand had to pay a significant penalty for the first statement. But how can you prevent yourself being trapped?<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b4a90bc elementor-widget elementor-widget-heading\" data-id=\"b4a90bc\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Understanding The Root Cause<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-87c8a36 elementor-widget elementor-widget-image\" data-id=\"87c8a36\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1840\/1*0XODT6qgOMf8e-qiaicacg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-bf47410 elementor-widget elementor-widget-text-editor\" data-id=\"bf47410\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"6f58\">No matter if it is in business, science or your private environment \u2014 you can only avoid being mislead by \u201cwrong\u201d data, if you understand the root cause of it. Pitfalls can emerge in every step when working with data. Therefore this article sheds light in the most common pitfalls when it comes to working with data.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ac29203 elementor-widget elementor-widget-heading\" data-id=\"ac29203\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Pitfalls When Generating Data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5446048 elementor-widget elementor-widget-text-editor\" data-id=\"5446048\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tOne of the most challenging tasks of each data scientist is getting the right data to work with. And with right data we mean a sample that actually reflects the population or the \u201creal world\u201d best . Through various different techniques explored in the following some manipulations or pitfalls can happen during data acquisition. Let\u2019s have a look at three common examples:\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dc19f44 elementor-widget elementor-widget-image\" data-id=\"dc19f44\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/849\/1*M7kxf1vxI48hSqjNQVAZnA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0c17dd0 elementor-widget elementor-widget-heading\" data-id=\"0c17dd0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Bad \/ Biased Sampling<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b0b51aa elementor-widget elementor-widget-text-editor\" data-id=\"b0b51aa\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"86db\">Working with data from a non-random sample occurred many times in history It is the beginning of bad predictions and misleading with data. What happens in is that a significant mistake in the sample selection, excludes an essential factor to be truly representative.<\/p>\n\n\n\n<p id=\"5b49\">Classic example is bad sampling of polls on the presidential elections. In 1948 the Chicago Tribune printed \u201cDewey defeats Truman\u201d, which was based on a phone survey prior to the actual election. At this time phones were not the standard. They were predominantly available for the upper class. But when realizing that the sample was not random, already several thousand copies were printed. The picture above shows Truman as president laughing with one exemplar of the fake news in his hands.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ef642df elementor-widget elementor-widget-image\" data-id=\"ef642df\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1377\/1*KM6zWDCCntkOnnPRqUL6Gw.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d04c040 elementor-widget elementor-widget-heading\" data-id=\"d04c040\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Loaded Questions<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-80f2834 elementor-widget elementor-widget-text-editor\" data-id=\"80f2834\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"8588\">Whenever it comes to generating data, this data has to come from a neutral resource. Loaded questions are exactly the opposite. They already contain an unjustified or controversial assumption with the objective to manipulated the output. A good example is product managers asking their users for feedback:\u00a0<em>\u201cWhat do you love about the product we have build for you?\u201d\u00a0<\/em>The question already implies that the user loves the product. It i is great if you want to collect some feedback and flatter yourself \u2014 but it has nothing to do with generating useful data or statistics.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c6e6912 elementor-widget elementor-widget-heading\" data-id=\"c6e6912\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Faulty Polling<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-fdcfd57 elementor-widget elementor-widget-text-editor\" data-id=\"fdcfd57\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luc<\/p>\n<p id=\"f328\">Very much connected with loaded questions is faulty polling. It is often used to influence the answer of the respective sample.<\/p>\n<p><\/p>\n<p><\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cDo you think we should help unstable countries to keep freedom and have a democracy ?\u201d<\/p>\n<p>vs.<\/p>\n<p>\u201cDo you support our military fighting in other countries against their prevailing government?\u201d<\/p>\n<\/blockquote>\n<p><\/p>\n<p><\/p>\n<p id=\"56e2\">You clearly see the difference and the smear campaign in both questions. The first one trying to support countries interventions, the second one trying the opposite. In many cases it makes a lot of sense to look at the polls \u2014 you often get more insights than with the answers itself.<\/p>\n<p>\u00a0<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4cc539d elementor-widget elementor-widget-heading\" data-id=\"4cc539d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Pitfalls In Processing Data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-73fdde1 elementor-widget elementor-widget-text-editor\" data-id=\"73fdde1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"5213\">As soon as data generation and acquisition is accomplished, data is processed. Data transforming, cleansing, slicing and dicing can produce many misleading results . In the following you find the three most frequent ones:<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8d1c276 elementor-widget elementor-widget-image\" data-id=\"8d1c276\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/849\/1*aGYYZUAeY8NhZFBiaon4tQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3f5984f elementor-widget elementor-widget-heading\" data-id=\"3f5984f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Cherry Picking \/ Discarding Unfavourable Data<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-de06397 elementor-widget elementor-widget-text-editor\" data-id=\"de06397\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"b4eb\">It is quite common in research that you get a big dataset and have to work with that. The research question is already defined in best case and the researcher has a certain pressure to produce results. But what if the found results are not very interesting. And what if you have that much data, that interesting results can be produced, although they are simply a confidence. The temptation to use interesting results created by discarding unfavourable data, or by picking bits and pieces of that data set is high.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2cfde79 elementor-widget elementor-widget-image\" data-id=\"2cfde79\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/648\/1*Muyw8BnNfwM_6ikU6iurCQ.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-786bc0a elementor-widget elementor-widget-text-editor\" data-id=\"786bc0a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"2b61\">A good example is climate change: to underline the claim that climate change was always the same and does not change, the data time frame is very often limited from 2000 to 2013. If looking at this specific extract it may look like temperature anomalies were always the same.<\/p>\n\n\n\n<p id=\"f8a1\">Only within the big picture including the complete dataframe, the intense and rapid change of positive temperature anomalies gets visible.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-eefcc3e elementor-widget elementor-widget-image\" data-id=\"eefcc3e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1296\/1*L5-hAupbYE04uO6yRXaxag.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b9bb21d elementor-widget elementor-widget-heading\" data-id=\"b9bb21d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Significance &amp; Quality<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d8fcb17 elementor-widget elementor-widget-text-editor\" data-id=\"d8fcb17\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"69f5\">We see quite often that experiments are conducted and results presented, without statistical significance. Within business context this happens quite often in so-called A\/B Tests.<\/p>\n\n\n\n<p id=\"bd50\">Statistical significance in the context of A\/B testing is how likely it is that the difference between your experiment\u2019s original version and test version isn\u2019t due to an error or random chance. For example, a conversion rate of 7% in a test version with a green button does not necessarily mean it performs better than the alternative sample with a conversion rate of 5% if you only have a sample size of 200 visitors. There are several options to ensure significance for your hypothesis either via standardized software or by calculating it ourself.\u00a0<a href=\"https:\/\/www.surveymonkey.com\/mp\/ab-testing-significance-calculator\/\" rel=\"noopener\">Here<\/a>\u00a0you can find a calculator to check the significance of your last A\/B test.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5658d06 elementor-widget elementor-widget-image\" data-id=\"5658d06\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/579\/1*bYUJHu1xafnPghXvY6nR4g.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e95a257 elementor-widget elementor-widget-heading\" data-id=\"e95a257\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Data Dredging<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b809554 elementor-widget elementor-widget-text-editor\" data-id=\"b809554\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"7829\">Data dredging, also called p-Hacking, describes the process of analyzing a dataset without any focus and hypothesis. By looking and testing for different combinations of significant variables, a huge amount of hypothesis will be tested. If you are testing a lot of hypothesis with the help of the p-value, it is very likely that you have a type 1 (or type 2 error). Simply because there are so many tests and the statistic probability of rejecting or accepting a hypothesis incorrectly increases. But let\u2019s have a look at an actual example you may experienced from business live.<\/p>\n\n\n\n<p id=\"7509\"><a href=\"https:\/\/towardsdatascience.com\/hands-on-predict-customer-churn-5c2a42806266\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">In a previous article we have predicted if a customer is likely to churn or not with statistical significance.<\/a>\u00a0But let\u2019s assume that you have a bad data set and your test shows no statistical significance. You then start creating a hypothesis for each and every customer feature. And boom \u2014 finally your testing says that one of your hypothesis is significant: \u201cCustomers with blonde hair are more likely to churn after two years.\u201d<\/p>\n\n\n\n<p id=\"62c8\">By running so many tests a random hypothesis may appear significant, simply because so many tests were conducted. But in fact the conclusion is \u201cfalse positive\u201d \u2014 which means the testing falls into the 5% (significance level of 95%) chance of being wrong.<\/p>\n\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-816883a elementor-widget elementor-widget-heading\" data-id=\"816883a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Pitfalls In Communicating Data<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-45daaaf elementor-widget elementor-widget-text-editor\" data-id=\"45daaaf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"561c\">Whenever statistics and data is used to back up the creator\u2019s intention, it is recommended to have a critical look at it from time to time.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-045e218 elementor-widget elementor-widget-image\" data-id=\"045e218\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/849\/1*FuaxU7DmKTY25ctJgAmdcA.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1f37c43 elementor-widget elementor-widget-heading\" data-id=\"1f37c43\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Data Visualisation<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-0a9d9b1 elementor-widget elementor-widget-text-editor\" data-id=\"0a9d9b1\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"fd90\">Especially when it comes to data visualisation, people love to apply some small adjustments with great consequences for the eye of the information consumer. There are tons of methods to mislead with grahps, charts and diagrams. In the following you find an excerpt of the most common ones.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Truncated y-Axis: when comparing data without having the baseline at zero, a data visualisation as seen below can make a much more dramatic story than before.<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cb2199b elementor-widget elementor-widget-image\" data-id=\"cb2199b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/688\/1*Fqe20ZFGXld10gAzFciG5Q.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a0e997e elementor-widget elementor-widget-text-editor\" data-id=\"a0e997e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<ul class=\"wp-block-list\">\n<li>Leaving out data, often referred as \u201comitting data\u201d, describes the absence of several data point in the chart, for example displaying only each data point of the second year. Interpretation and decision-making is based on half the information.<\/li>\n<li>Inappropriate scaling: This method is often use in time series or changes over time and work in both directions: Either the y-Axis is inappropriately increased, so the change seems to be not less intensive. Or the y-Axis is inappropriately lowered in order to make the effects look more determinative than before.<\/li>\n<\/ul>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1841605 elementor-widget elementor-widget-heading\" data-id=\"1841605\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Wrong Causality \/ Conclusions<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-2ba3512 elementor-widget elementor-widget-text-editor\" data-id=\"2ba3512\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"0ecc\">While finding statistical significant insights in data is the beginning of adding value, but you also have to be able to interpret the results correctly. A very good example how to not interpret results is the story of Abraham Wald and the missing bullet holes. Long story short he was part of the Statistical Research Group during World War II. The Airforce approached Wald with a problem: Too many planes are shot down, while simply increasing the armpur overall would make it too heavy. They asked how they can add armour in an efficient way.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-dd8201d elementor-widget elementor-widget-image\" data-id=\"dd8201d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1215\/1*kCdrgy-x_xWc5dzotQPBdg.png\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d04e38f elementor-widget elementor-widget-text-editor\" data-id=\"d04e38f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"a0ac\">Wald investigated planes returning from war, collecting statistics and analysing it. He found areas with a lot and ares with less bullet wholes. First it was intuitively assumed to place more armour where more bullet wholes are. But that was the wrong conclusion. As the statistic sample were planes that returned, the areas with less bullet wholes are the important ones and probably targets hit at the planes that did not make it back.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cGentlemen, you need to put more armour-plate where the holes aren\u2019t, because that\u2019s where the holes were on the planes that didn\u2019t return.\u201d \u2014 Abraham Wald<\/p>\n<\/blockquote>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8b76fad elementor-widget elementor-widget-heading\" data-id=\"8b76fad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Hiding Context<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-49b9ea2 elementor-widget elementor-widget-text-editor\" data-id=\"49b9ea2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p id=\"8fb1\">Useful data visualisations make large amounts of data easy to understand and interpret. The audience should be able to look at the presented data and quickly find the important information. If there is presented too much or irrelevant data, the audience may not see the relevant information. The more data is displayed at once, the harder it gets to detect specific trends. Misleading with too much data is often used to mislead the audience from the small but relevant insights.<\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6ba51d5 elementor-widget elementor-widget-heading\" data-id=\"6ba51d5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Conclusion<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1a97f47 elementor-widget elementor-widget-text-editor\" data-id=\"1a97f47\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\tStatistical data has great potential to be misused. Especially dangerous is its power to underline every argument with certain authority. \u201cIf there is clear evidence and even a number says so, then it must be correct\u201d, is quite often the audience\u2019s reaction. But although wrong decisions based on misleading data &amp; statistics do a great, data also has the ability to enable deep insights, drive better decisions and allows making predictions. So what should we do?<\/p>\n\n\n<!-- wp:paragraph -->\n<p id=\"2a19\">By understanding some of the most common processes where misleading data and statistics are produced, we drastically reduce the chance of being trapped ourselves. With the correct deployment of data &amp; statistics in a responsible, comprehensible and ethical way , we actively contribute to a better informed world \u2014 and doesn\u2019t that sounds like an amazing mission for the future?\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Data is everywhere and making the right decisions becomes increasingly difficult due to an information overload of our system. Statistics allow to better process and understand data if applied correctly. But what if statistics are misleading? <\/p>\n","protected":false},"author":932,"featured_media":10097,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[187],"tags":[205,696,510],"ppma_author":[3710],"class_list":["post-10095","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-data","tag-nformationprocess","tag-statistics"],"authors":[{"term_id":3710,"user_id":932,"is_guest":0,"slug":"florian-tausend","display_name":"Florian Tausend","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/10\/Florian-Tausend-150x150.jpg","user_url":"https:\/\/landingdata.de\/%20%20","last_name":"Tausend","first_name":"Florian","job_title":"","description":"Florian Tausend is Founder &amp; Analytics Expert at landingdata.de"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10095","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/932"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=10095"}],"version-history":[{"count":6,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10095\/revisions"}],"predecessor-version":[{"id":33708,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/10095\/revisions\/33708"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/10097"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=10095"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=10095"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=10095"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=10095"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}