{"id":11100,"date":"2020-11-09T10:55:33","date_gmt":"2020-11-09T10:55:33","guid":{"rendered":"https:\/\/www.experfy.com\/blog\/?p=11100"},"modified":"2023-10-09T10:27:52","modified_gmt":"2023-10-09T10:27:52","slug":"your-dataset-is-giant-inkblot-test","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/your-dataset-is-giant-inkblot-test\/","title":{"rendered":"Your Dataset Is A Giant Inkblot Test"},"content":{"rendered":"\n<p class=\"has-normal-font-size\"><em>The danger of apophenia in analytics and what you can do about it<\/em><\/p>\n\n\n\n<p>There\u2019s a fine line between telling stories with&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_hist\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">data<\/a>&nbsp;and telling lies. Before I tell you how to spot a top-notch&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_analysts\" target=\"_blank\" rel=\"noreferrer noopener\">data analyst<\/a>&nbsp;and boost your analytical excellence, let me scare you a little.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"e10b\">The psychological trap in data analytics<\/h2>\n\n\n\n<p id=\"0c9a\">Human brains are pattern-finding powerhouses\u2026 but those patterns don\u2019t always have much to do with reality. We are the sort of species that finds&nbsp;<a href=\"http:\/\/bit.ly\/potatoelvis\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">rabbits in clouds and Elvis\u2019s face in a potato chip<\/a>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/740\/1*gsTdZ-huBDToZs5mwDkkyg.png\" alt=\"Your Dataset Is A Giant Inkblot Test\"\/><figcaption>Do these look like a rabbit and a portrait of Elvis to you? Image:&nbsp;<a href=\"http:\/\/bit.ly\/potatoelvis\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">SOURCE<\/a>.<\/figcaption><\/figure><\/div>\n\n\n\n<p id=\"2cbc\">Take a moment to consider the&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Rorschach_test\" target=\"_blank\" rel=\"noreferrer noopener\">Rorschach test<\/a>&nbsp;\u2014 the one where people are shown random inkblots and asked what they see \u2014 and you\u2019ll appreciate just how eagerly the mind injects&nbsp;<a href=\"http:\/\/bit.ly\/spurious_correlations\" target=\"_blank\" rel=\"noreferrer noopener\">spurious<\/a>&nbsp;interpretations into randomness.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/662\/0*_DIoC_BACguZV4s7.jpg\" alt=\"Your Dataset Is A Giant Inkblot Test\"\/><figcaption>Bat? Butterfly? Or just an ink blot? This is the first of the ten cards in the&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Rorschach_test\" target=\"_blank\" rel=\"noreferrer noopener\">Rorschach test<\/a>, created in 1921.<\/figcaption><\/figure><\/div>\n\n\n\n<p id=\"d9a4\">Psychologists have a pretty name for this tendency to conjure false meaning out of nothing:&nbsp;<a href=\"http:\/\/bit.ly\/wikiapophenia\" target=\"_blank\" rel=\"noreferrer noopener\">apophenia<\/a>. Give humans a vague stimulus and we\u2019ll find faces, butterflies, and a reason to allocate budget to our favorite project or launch an&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_donttrust\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">AI system<\/a>.<\/p>\n\n\n\n<p id=\"540d\"><em>Uh-oh.<\/em><\/p>\n\n\n\n<p id=\"d3f7\">There\u2019s plenty of random noise in most datasets, so what are the chances there\u2019s no&nbsp;<a href=\"http:\/\/bit.ly\/wikiapophenia\" target=\"_blank\" rel=\"noreferrer noopener\">apophenia<\/a>&nbsp;going on with your&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_analysts\" target=\"_blank\" rel=\"noreferrer noopener\">analytics<\/a>? Can you really trust your interpretation of the data?<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>What the mind does with inkblots it also does with data.<\/p><\/blockquote>\n\n\n\n<p id=\"22ea\">To make matters worse, the more ways there are to slice-and-dice those datasets and the more complex they are, the more vague they are as stimuli. That means they\u2019re practically begging you to see false nonsense in them.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Complex datasets practically beg you to find false meaning in them.<\/p><\/blockquote>\n\n\n\n<p id=\"26ac\">Are you sure your latest data epiphany isn\u2019t an&nbsp;<a href=\"http:\/\/bit.ly\/wikiapophany\" target=\"_blank\" rel=\"noreferrer noopener\">apophany<\/a>&nbsp;in disguise?<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/612\/0*DacvRYxcvml5UYI6.jpg\" alt=\"Your Dataset Is A Giant Inkblot Test\"\/><figcaption>Another great word is&nbsp;<a href=\"http:\/\/bit.ly\/wikipareidolia\" target=\"_blank\" rel=\"noreferrer noopener\">pareidolia<\/a>, which is a kind of apophenia (finding familiar things in vague sensory stimuli). In Japan, they even have a&nbsp;<a href=\"http:\/\/bit.ly\/japanfacerocks\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">museum<\/a>&nbsp;of rocks that look like faces. It\u2019s a beautiful world.<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5bdc\">Lies, damned lies, and analytics<\/h2>\n\n\n\n<p id=\"e298\">If that sounds dismal, I\u2019m not done yet. Taking data analysis courses can pour fuel on that psychological fire. Students are conditioned to expect that looking at\u00a0data\u00a0yields real meaning because every homework exploratory analysis exercise has buried treasure in it. Very few professors have the heart to send you on wild goose chases (for your own good!) and it\u2019s hard to grade open-ended assignments, so you usually don\u2019t get enough exposure to them as a student.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Students grow up believing that every dataset is ready to cough up a nugget of solid truth.<\/p><\/blockquote>\n\n\n\n<p id=\"45b7\">Data storytelling is just a hop, skip, and jump away from outright lying with data. Setting aside the issue of whether the patterns are real, let\u2019s talk about multiple interpretations. Just because you see a bat shape in that inkblot doesn\u2019t mean that there isn\u2019t also a butterfly, a pelvis, or a pair of foxes in it. If I hadn\u2019t mentioned the foxes, would you have seen them? Probably not. Psychological mechanisms related to motivation and attention have stacked the deck against you. It takes a special sort of skill to release the bat interpretation and force yourself to see a superposition of meanings.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Once people glom on to their favorite \u201cinsight\u201d, they\u2019ll struggle to unsee it.<\/p><\/blockquote>\n\n\n\n<p id=\"1ea0\">The trouble is that once people glom on to their favorite \u201cinsight\u201d, they\u2019ll struggle to unsee it in favor of others. People tend to believe most strongly in whichever interpretation captured their attention first and each additional meaning reduces their motivation to keep searching. Juggling multiple potential stories without overweighting your favorite is a mental muscle that takes hard work to build. Alas, not every analyst has the discipline for it. In fact, many are incentivized to \u201cprove\u201d one side of a story through data exploration. Why grow skills that only get in the way of engorging your data science paycheck?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1f99\">What color is your lightsaber?<\/h2>\n\n\n\n<p id=\"ed96\">There are ways to prove things with data (honestly and rigorously)\u2014 my&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_sydd\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">data-splitting article<\/a>&nbsp;will tell you more \u2014 but exploratory data analysis (EDA) is not one of them. Open-ended data exploration is always a fishing expedition. What determines the color of your lightsaber is what you\u2019re fishing for.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/1080\/0*hlPaNABw_jx-9FMV.jpg\" alt=\"Your Dataset Is A Giant Inkblot Test\"\/><\/figure><\/div>\n\n\n\n<p id=\"81f7\">If you join the dark side, you\u2019re fishing for evidence to support a theory you already \u201cknow\u201d to be true (so you can sell it to some naive victim). You might not even realize that your lightsaber is red if you genuinely believe in data objectivity and your own&nbsp;<a href=\"http:\/\/bit.ly\/cogbias_list\" target=\"_blank\" rel=\"noreferrer noopener\">unbiasedness<\/a>.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Open-ended data exploration is always a fishing expedition.<\/p><\/blockquote>\n\n\n\n<p id=\"fb00\">With a sufficiently complex (vague) dataset, you\u2019ll find a pattern you can spin as support for your favorite story. That\u2019s the beauty of the&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Rorschach_test\" target=\"_blank\" rel=\"noreferrer noopener\">Rorschach test<\/a>, after all. Unfortunately, it\u2019s worse with&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_hist\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">data<\/a>&nbsp;than with inkblots because the more <a href=\"https:\/\/www.experfy.com\/blog\/ai-ml\/understanding-mathematical-symbols-with-code\/\" target=\"_blank\" rel=\"noreferrer noopener\">mathemagical <\/a>your method (<a href=\"http:\/\/bit.ly\/quaesita_needles\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">p-hacking<\/a>, anyone?), the more legitimate and convincing you\u2019ll sound to those who don\u2019t know any better.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/576\/0*jItlM7--CZvUSWT5.jpg\" alt=\"Your Dataset Is A Giant Inkblot Test\"\/><figcaption>Satellite photo of the \u201c<a href=\"https:\/\/en.wikipedia.org\/wiki\/Cydonia_(region_of_Mars)\" target=\"_blank\" rel=\"noreferrer noopener\">Face on Mars<\/a>\u201d which many people took as evidence of extraterrestrial habitation.<\/figcaption><\/figure><\/div>\n\n\n\n<p id=\"8f2b\">Those who reject the dark side also go fishing, but they\u2019re after something else: inspiration. They\u2019re looking for patterns that might be interesting or compelling, but they know better than to take them as&nbsp;<em>evidence<\/em>. Instead, they practice a sort of open-minded analytics zen with the discipline to be mindful of as many interpretations as possible.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p><mark>The best analysts challenge themselves to find as many interpretations as possible.<\/mark><\/p><\/blockquote>\n\n\n\n<p id=\"0563\">This takes a sharp eye and a humble, unsticky mind. Rather than tricking their stakeholders into seeing only one side of a story, they challenge themselves to do the creative thinking required to digest the same data into as many stories as possible. They present their findings in a way that inspires rigorous follow-up without causing their leadership team to run overconfidently off a cliff.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>Open-mindedness gives data analysis a chance to be worthwhile.<\/p><\/blockquote>\n\n\n\n<p id=\"92fd\">As an added bonus, the discipline to look for multiple interpretations is an analyst\u2019s secret weapon for not snoozing past the real treasures buried in the data. If you\u2019re distracted by a falsehood you believe in,&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_default\" target=\"_blank\" rel=\"noreferrer noopener\">confirmation bias<\/a>&nbsp;makes it hard to notice evidence that points in the opposite direction. Why bother analyzing anything if your conclusions are determined in advance? Open-mindedness gives the whole endeavor a chance to be worthwhile.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/615\/0*yNlc3FaKaJ1Of0Yr.jpg\" alt=\"Your Dataset Is A Giant Inkblot Test\"\/><figcaption>This grilled cheese sandwich fetched&nbsp;<a href=\"http:\/\/bit.ly\/virgintoast\" target=\"_blank\" rel=\"noreferrer noopener\">$28,000 in auction<\/a>&nbsp;because it features the Virgin Mary. Alternative interpretations of what we\u2019re seeing, anyone?<\/figcaption><\/figure><\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"d8e5\">Hiring a great analyst<\/h2>\n\n\n\n<p id=\"5b36\">If you liked my other articles about analytics, here are the traits you\u2019re already looking for in a great analyst:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>They don\u2019t make inferences that reach beyond the data they\u2019re exploring.&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_datasci\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">[1]<\/a><\/li><li>They\u2019re handy with data science tools and have the skills to sift through vast datasets quickly.&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_roles\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">[2]<\/a><\/li><li>They have relevant domain knowledge so they\u2019re less likely to waste stakeholders\u2019 time&nbsp;with&nbsp;trivia.&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_analysts\" target=\"_blank\" rel=\"noreferrer noopener\">[3]<\/a><\/li><li>They understand that their work is about prospecting for inspiration.&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_analysts\" target=\"_blank\" rel=\"noreferrer noopener\">[3]<\/a>&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_history\" target=\"_blank\" rel=\"noreferrer noopener\">[4]<\/a><\/li><li>They visualize data in a brain-friendly way so that time-to-inspiration is kept as short as possible.&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_analysts\" target=\"_blank\" rel=\"noreferrer noopener\">[3]<\/a><\/li><li>They know what it takes to follow up rigorously on any potential insights they found (and&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_sydd\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">whom<\/a>&nbsp;to call for help with that).&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_history\" target=\"_blank\" rel=\"noreferrer noopener\">[<\/a><a href=\"http:\/\/bit.ly\/quaesita_history\" rel=\"noopener\">4]<\/a>&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_bsides\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">[5]<\/a>&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_default\" target=\"_blank\" rel=\"noreferrer noopener\">[6]<\/a>&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_inspired\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">[7]<\/a><\/li><\/ul>\n\n\n\n<p id=\"7ed6\">In addition to all that,&nbsp;<em>this<\/em>&nbsp;article suggests you look for analysts with three more traits:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>They\u2019re aware that the mind finds meaning where it doesn\u2019t exist, so they stay humble and avoid jumping to conclusions.<\/li><li>They don\u2019t try to sell you a story found by torturing data until it confesses. Instead, they use hedging\/softening language when talking about data.<\/li><li>They have the discipline to come up with multiple interpretations for&nbsp;<em>everything<\/em>. The faster they produce multiple explanations and the more alternatives they generate, the more the force is them. Try interviewing for this skill next time you\u2019re hiring an analytics Jedi.<\/li><\/ul>\n\n\n\n<p id=\"c3d4\">Finally, if you\u2019re a leader, turn a critical eye inward and make sure that you\u2019re giving your people the right incentives. Are you looking for a data analyst or a data&nbsp;<a href=\"http:\/\/bit.ly\/quaesita_charlatan\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">spin doctor<\/a>? These take different mindsets (and skillsets!), so choose wisely and reward the right behaviors.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img decoding=\"async\" src=\"https:\/\/miro.medium.com\/max\/640\/0*yC6NocO4e-qqy0-1.jpg\" alt=\"Forget potato chips! That Japanese museum of rocks that look like faces takes the cake\"\/><figcaption>Forget potato chips! That Japanese&nbsp;<a href=\"http:\/\/bit.ly\/japanfacerocks\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"broken_link\">museum<\/a>&nbsp;of rocks that look like faces takes the cake.<\/figcaption><\/figure><\/div>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The danger of apophenia in analytics and what you can do about it There\u2019s a fine line between telling stories with&nbsp;data&nbsp;and telling lies. Before I tell you how to spot a top-notch&nbsp;data analyst&nbsp;and boost your analytical excellence, let me scare you a little. The psychological trap in data analytics Human brains are pattern-finding powerhouses\u2026 but<\/p>\n","protected":false},"author":335,"featured_media":11101,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[120,97,135,941],"ppma_author":[2050],"class_list":["post-11100","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-analytics","tag-artificial-intelligence","tag-data-analyst","tag-dataset"],"authors":[{"term_id":2050,"user_id":335,"is_guest":0,"slug":"cassie-kozyrkov","display_name":"Cassie Kozyrkov","avatar_url":"https:\/\/www.experfy.com\/blog\/wp-content\/uploads\/2020\/04\/medium_df35f80d-2bff-4fe3-b741-a94d51320e00-150x150.jpg","user_url":"https:\/\/careers.google.com\/?src=Online\/LinkedIn\/linkedin_profilepage&amp;utm_source","last_name":"Kozyrkov","first_name":"Cassie","job_title":"","description":"Cassie Kozyrkov is Chief Decision Scientist at Google, Inc. With a unique combination of deep technical expertise, and world-class public-speaking skills, she has provided guidance on more than 100 projects and designed Google's analytics program, personally training over 15000 Googlers in statistics, decision-making, and machine learning.\u00a0"}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/11100","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/335"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=11100"}],"version-history":[{"count":5,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/11100\/revisions"}],"predecessor-version":[{"id":33319,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/11100\/revisions\/33319"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/11101"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=11100"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=11100"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=11100"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=11100"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}