{"id":1509,"date":"2019-02-18T04:02:55","date_gmt":"2019-02-18T04:02:55","guid":{"rendered":"http:\/\/kusuaks7\/?p=1114"},"modified":"2023-07-07T12:37:00","modified_gmt":"2023-07-07T12:37:00","slug":"big-data-will-be-biased-if-we-let-it","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/bigdata-cloud\/big-data-will-be-biased-if-we-let-it\/","title":{"rendered":"Big Data Will Be Biased, If We Let It"},"content":{"rendered":"<section><strong><em>Ready to learn more about Big Data &amp; Data Science?\u00a0<a href=\"https:\/\/www.experfy.com\/training\/courses\">Browse courses<\/a>\u00a0developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<p id=\"fb3e\">If I had a penny for every time I\u2019ve heard \u201cdata doesn\u2019t lie\u201d\u2026<\/p>\n<p id=\"b519\">For those of us who have the ever exciting and growing task of working with Big Data to help solve some of organization\u2019s biggest inefficiencies, questions, or problems, perpetuating bias is a way too easy-to-make mistake, and we should all be familiarized with it by now.<\/p>\n<p id=\"52d0\"><strong><em>For everyone else, here\u2019s what going on:<\/em><\/strong><\/p>\n<h2 id=\"c4a5\"><strong>Quick recap\u00a0for those who feel lost when data jargon gets thrown at\u00a0you<\/strong><\/h2>\n<p>&gt;<\/p>\n<p>(If\u00a0<em>this isn\u2019t the information you\u2019re looking for, move along\u00a0<\/em>to the next section)<\/p>\n<ul>\n<li id=\"9c1c\"><a href=\"https:\/\/www.experfy.com\/training\/tracks\/big-data-training-certification\"><strong>Big Data<\/strong><\/a>\u00a0is, well, a lot of data. Quantitative and qualitative indicators, that at a large amount get used to identify patterns, trends, and relationships.<\/li>\n<li id=\"a2b7\"><strong>Algorithms<\/strong>\u00a0are\u00a0\u2018a process or set of rules to be followed in calculations or other problem-solving operations\u2019. For example, if I\u2019m deciding what to wear in the morning i\u2019m mentally using an algorithm that considers the weather, my mood, where I\u2019m going, and that Ben &amp; Jerry\u2019s I shouldn\u2019t have eaten last night, which leads me to pick my outfit.<\/li>\n<li id=\"4477\"><a href=\"https:\/\/www.experfy.com\/jobs\/ai-machine-learning\"><strong>Machine Learning<\/strong><\/a> \u2018provides systems the ability to automatically learn and improve from experience without being explicitly programmed.\u2019. If I was a machine learning algorithm I wouldn\u2019t have had that Ben &amp; Jerry\u2019s last night, because I would\u2019ve learned I\u2019d live to regret it from the last time I did it.<\/li>\n<\/ul>\n<h2 id=\"82c7\"><strong>How bias has been introduced into our \u2018smart\u2019\u00a0world<\/strong><\/h2>\n<p>&gt;<\/p>\n<p>My first encounter with the concept of data driven bias blew my mind, and made me wonder how I hadn\u2019t\u00a0<em>seen<\/em>\u00a0this before. It was\u00a0<a href=\"https:\/\/www.propublica.org\/article\/machine-bias-risk-assessments-in-criminal-sentencing\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.propublica.org\/article\/machine-bias-risk-assessments-in-criminal-sentencing\" data->ProPublica\u2019s essay titled Machine Bias<\/a>. Right after the titled it stated:<\/p>\n<blockquote id=\"ce7c\"><p>There\u2019s software used across the country to predict future criminals. And it\u2019s biased against blacks.<\/p><\/blockquote>\n<p id=\"4500\">The\u00a0<em>tl;dr<\/em>\u00a0story here is that several states in the US implemented an algorithm to predict the risk of defendants in court reoffending, and use this value as a factor during sentencing. Interestingly enough, race or ethnicity claimed not to be variables in this algorithms, but it somehow fails blacks the most. Only 20% of defendants who were identified at a high risk of committing a violent crime in the future actually did; and it mis-labeled black people at almost twice the rate as whites.<\/p>\n<p id=\"c5ee\">The whole essay and\u00a0<a href=\"https:\/\/www.propublica.org\/article\/how-we-analyzed-the-compas-recidivism-algorithm\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.propublica.org\/article\/how-we-analyzed-the-compas-recidivism-algorithm\" data->their analysis<\/a>\u00a0on the data are definitely worth a read, but most importantly this isn\u2019t the only place algorithms are failing minorities, and other groups.<\/p>\n<figure id=\"3647\"><canvas width=\"75\" height=\"43\"><\/canvas><figcaption><\/figcaption><\/figure>\n<p id=\"4d67\">It\u2019s important to note, before we go any further down this rabbit hole, that I don\u2019t think the intentions of people building these tools are to deliberately discriminate or create any sort of bias. We could argue the exact opposite: Tools like these are meant to limit the individual bias of whoever is conducting a risk assessment by providing \u2018undeniable\u2019, quantifiable, trust-worthy data.<\/p>\n<p id=\"6c0f\">For a long time now my motto has been \u201cIf nothing changes, nothing changes\u201d, and it rings most true in this case. Bias is nothing new, and it requires specific action to be overcome, in and out of the data science world. By feeding our algorithms history through data, we\u2019re implicitly telling them to discriminate against everyone who\u2019s been historically discriminated against.<\/p>\n<p id=\"d373\">Some of these examples are embedded into our culture and we accept them as the norm -although not always happily: Your gender, income, level of education, and other factors determine how much you will pay for healthcare. Some health-related indicators do as well, for example whether you smoke or not. It\u2019s not unusual, however, for a non-smoking healthy woman to pay more for insurance than a sedentary man who smokes 2 packs a day, even though many doctors would agree that based on that data the latter is more likely to fall ill.<\/p>\n<p id=\"3394\">On the opposite side of gender bias: Men do pay more for car insurance, for similar reasons. \u2018For instance, an 18-year-old male living in Nevada would pay an average of $6,268 a year to insure his sedan if he had the misfortune to grow up there. That\u2019s 51 percent higher than what his twin sister would pay (assuming they have the same grades and driving records), who would fork out just $4,152 to insure an identical car,\u00a0<a href=\"https:\/\/coverhound.com\/press\/coverhound-on-cbs-news-how-men-can-beat-gender-bias-in-car-insurance\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/coverhound.com\/press\/coverhound-on-cbs-news-how-men-can-beat-gender-bias-in-car-insurance\" data->according to a CoverHound analysis<\/a>.\u2019. And since we\u2019re on the car insurance subject,\u00a0<a href=\"https:\/\/www.propublica.org\/article\/minority-neighborhoods-higher-car-insurance-premiums-white-areas-same-risk\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.propublica.org\/article\/minority-neighborhoods-higher-car-insurance-premiums-white-areas-same-risk\" data->minorities pay more<\/a>\u00a0for car insurance than white people in similarly risky neighborhoods.<\/p>\n<h2 id=\"fd02\"><strong>This matters now more than\u00a0ever<\/strong><\/h2>\n<p>&gt;<\/p>\n<p>Algorithms of this kind have existed for decades and a lot of the time are used by organization\u2019s to be able to scale their operations, by using repeatable patterns that can be applied to everyone.<\/p>\n<p id=\"83e4\">The reason why we need to look at this now more than ever is that, through a growing and thriving tech industry, these models are being applied pretty much everywhere: From court sentencing,\u00a0<a href=\"https:\/\/hbr.org\/2016\/12\/hiring-algorithms-are-not-neutral\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/hbr.org\/2016\/12\/hiring-algorithms-are-not-neutral\" data->job searching<\/a>, credit card, college, and mortgage applications,\u00a0<a href=\"http:\/\/business.time.com\/2012\/05\/18\/when-consumers-pay-more-due-to-race-or-gender\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"http:\/\/business.time.com\/2012\/05\/18\/when-consumers-pay-more-due-to-race-or-gender\/\" data->consumer goods<\/a>, etc, to\u00a0<a href=\"https:\/\/www.technologyreview.com\/s\/602950\/how-to-fix-silicon-valleys-sexist-algorithms\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.technologyreview.com\/s\/602950\/how-to-fix-silicon-valleys-sexist-algorithms\/\" data->AI speaking bots<\/a>,\u00a0<a href=\"https:\/\/qz.com\/819245\/data-scientist-cathy-oneil-on-the-cold-destructiveness-of-big-data\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/qz.com\/819245\/data-scientist-cathy-oneil-on-the-cold-destructiveness-of-big-data\/\" data->evaluating teacher\u2019s performances<\/a>,\u00a0<a href=\"https:\/\/www.cmu.edu\/news\/stories\/archives\/2015\/july\/online-ads-research.html\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.cmu.edu\/news\/stories\/archives\/2015\/july\/online-ads-research.html\" data->targeted social media ads<\/a>, and more.<\/p>\n<p id=\"2d40\">This means that, whether you\u2019re interested in big data, algorithms, and tech, or not, you\u2019re a part of this today, and it will affect you more and more.<\/p>\n<p id=\"9e0e\">If we don\u2019t put in place reliable, actionable, and accessible solutions to approach bias in data science, these type of usually unintentional discrimination will become more and more normal, opposing a society and institutions that on the human side are trying their best to evolve past bias, and move forward in history as a global community.<\/p>\n<h2 id=\"5752\"><strong>What\u2019s being done\u00a0today<\/strong><\/h2>\n<p>&gt;<\/p>\n<p id=\"1e95\">The solution to this issue isn\u2019t to stop innovation around big data algorithms and machine learning. Luckily, progress is being made on several fronts.<\/p>\n<p id=\"1dcd\"><strong>The algorithm heroes<\/strong><\/p>\n<p id=\"36e7\">Organizations like\u00a0<a href=\"https:\/\/algorithmwatch.org\/en\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/algorithmwatch.org\/en\/\" data->AlgorithmWatch<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.ajlunited.org\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/www.ajlunited.org\/\" data->The Algorithmic Justice League<\/a>\u00a0founded by Joy Buolamwini (Her amazing TED talk below) are striving to help evaluate and identify bias in existing algorithms by providing education and training materials, as well as a collaborative and inclusive space for people to report bias in algorithms, and help solve these issues as a community.<\/p>\n<figure id=\"a057\"><\/figure>\n<p id=\"9b3f\">There are many other individuals, researchers, and organizations working on different ways to approach the situation.<\/p>\n<p id=\"183c\"><strong>Policy changes with\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/General_Data_Protection_Regulation\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/en.wikipedia.org\/wiki\/General_Data_Protection_Regulation\" data->GDPR<\/a>\u00a0in Europe<\/strong><\/p>\n<p id=\"d106\">Unfortunately, organizations like AJL aren\u2019t enough to guarantee the necessary change. Change needs backing policy. In Europe the the GDPR (the General Data Protection Regulation that goes into effect in May 2018 in the EU) is going to\u00a0<a href=\"https:\/\/blog.acolyer.org\/2017\/01\/31\/european-union-regulations-on-algorithmic-decision-making-and-a-right-to-explanation\/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" data-href=\"https:\/\/blog.acolyer.org\/2017\/01\/31\/european-union-regulations-on-algorithmic-decision-making-and-a-right-to-explanation\/\" data->regulate three key factors involving data bias<\/a>.<\/p>\n<p id=\"6321\">First,\u00a0<strong>profiling<\/strong>, which they define as<\/p>\n<blockquote id=\"77ed\"><p>Any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular, to analyze or predict aspects concerning that natural person\u2019s performance at work, economic situation, health, personal preferences, interests, reliability, behavior, location or movements.<\/p><\/blockquote>\n<p id=\"8cae\">This comes together with offering a clear explanation to consumers on how their data will be used, and providing them with the option to opt out.<\/p>\n<p id=\"062e\">Secondly,\u00a0<strong>the right to an explanation.<\/strong>\u00a0When companies use automated decision-making, users will have a right to ask to an explanation and dispute decisions if they were made exclusively by algorithms and data. The scope of this action hasn\u2019t been fully defined, but it\u2019s expected to fit into credit applications, job searching, and others areas of concern.<\/p>\n<p id=\"9dd2\">Last but definitely not least, there\u2019s a specific\u00a0<strong>bias and discrimination\u00a0<\/strong>section, preventing organizations from using data which might promote bias such as race, gender, religious or political beliefs, health status, and more, to make automated decisions (except some verified exceptions).<\/p>\n<h2 id=\"33ca\"><strong>What needs to happen\u00a0next<\/strong><\/h2>\n<p>&gt;<\/p>\n<p id=\"065c\">Unlike human bias, we can quickly teach algorithms to consider and avoid bias, by including it as another indicator. We can also put policy in place to prevent data driven bias from happening. In my opinion there are three main areas we need to work on in the near future to make sure bias is diminished in the data space.<\/p>\n<p id=\"7442\"><strong>Education<\/strong><\/p>\n<p id=\"0dbe\">Potentially the most important aspect, and the most accessible one in the short term is promoting and requiring <a href=\"https:\/\/www.experfy.com\/training\/courses\/big-data-what-every-manager-needs-to-know\">training and education<\/a> for people participating in the creation and maintenance of automated decision-making tools, and other data-driven tools prone to bias.<\/p>\n<p id=\"517d\">In the tech industry we\u2019ve seen a lot of controversy over bias, and have fought that by adding education and trainings on the HR level; trying to spread the word on the value of diversity and equality on the personal level. It\u2019s time to make that training broader, and teach all people involved about the ways their decisions while building tools may affect minorities, and accompany that with the relevant technical knowledge to prevent it from happening.<\/p>\n<p id=\"0326\">People who aren\u2019t part of the tech industry should also be aware of this, enough to be able to identify when they might be victims and speak up. Without individuals sharing their stories, and how these methods have changed their lives, the message becomes cold and impersonal, which is exactly what we\u2019re trying to avoid.<\/p>\n<p id=\"ad45\"><strong>Regulation<\/strong><\/p>\n<p id=\"6cfd\">I like GDPR\u2019s case in particular (even though it remains to see how well it\u2019s implemented), because it comes from a mandate. This is not a suggestion or an option, it\u00a0<em>needs<\/em>\u00a0to happen. The EU parliament has determined that data security is pertinent to all its citizens more and more, and has identified that it also can be unfair to some of them. There\u2019s incredible value and validation in this.<\/p>\n<p id=\"768f\">This kind of data regulation, especially around bias and discrimination, is in my opinion key for the healthy growth of the big data industry. Without the pubic sector\u2019s leadership, the opportunities to dismiss the need to pay specific attention to the people who are being discriminated against are too tempting and affordable.<\/p>\n<p id=\"c84c\"><strong>Transparency<\/strong><\/p>\n<p id=\"8497\">Finally, and this is my personal belief, I think some level of data transparency from the organizations collecting it and developing these tools would help identify and prevent this sort of thing from happening in the future. Machines can learn, but human insight needs to be their supervising teacher, and by opening and sharing non-personal data to be analyzed for bias, organizations can benefit from the power of a diverse global community looking to promote fairness.<\/p>\n<\/section>\n<section>\n<hr \/>\n<p id=\"b898\"><em>Disclaimer: This isn\u2019t meant to be a scientific analysis of existing algorithms or a technical evaluation of the landscape, as much a humble translation of what\u2019s been going on for people who aren\u2019t always involved in this space.<\/em><\/p>\n<p><u><i>Originally posted at: <a href=\"https:\/\/medium.com\/towards-data-science\/bias-in-big-data-for-the-non-tech-90fc53729025\" rel=\"noopener\">medium<\/a><\/i><\/u><\/p>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>Ready to learn more about Big Data &amp; Data Science?\u00a0Browse courses\u00a0developed by industry thought leaders and Experfy in Harvard Innovation Lab. If I had a penny for every time I\u2019ve heard \u201cdata doesn\u2019t lie\u201d\u2026 For those of us who have the ever exciting and growing task of working with Big Data to help solve some<\/p>\n","protected":false},"author":36,"featured_media":3828,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[187],"tags":[122],"ppma_author":[3076],"class_list":["post-1509","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-cloud","tag-big-data"],"authors":[{"term_id":3076,"user_id":36,"is_guest":0,"slug":"federica-pelzel","display_name":"Federica Pelzel","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","author_category":"","user_url":"","last_name":"Pelzel","first_name":"Federica","job_title":"","description":"Federica Pelzel is a NYC based government technology and data expert, currently working with Mastercard. In the past, she has served as chief of staff in Buenos Aires city government&rsquo;s e-government office, as well as helping build and grow several startups as a product manager."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1509","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/36"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1509"}],"version-history":[{"count":0,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1509\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3828"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1509"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1509"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1509"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}