{"id":1426,"date":"2019-02-15T10:32:09","date_gmt":"2019-02-15T10:32:09","guid":{"rendered":"http:\/\/kusuaks7\/?p=1031"},"modified":"2023-06-29T10:47:33","modified_gmt":"2023-06-29T10:47:33","slug":"discriminate-for-fairness","status":"publish","type":"post","link":"https:\/\/www.experfy.com\/blog\/ai-ml\/discriminate-for-fairness\/","title":{"rendered":"Discriminate for fairness"},"content":{"rendered":"<p><strong><em>Ready to learn Machine Learning? Browse<\/em><\/strong> <strong><em><a href=\"https:\/\/www.experfy.com\/training\/tracks\/machine-learning-training-certification\">Machine Learning Training and Certification courses<\/a> developed by industry thought leaders and Experfy in Harvard Innovation Lab.<\/em><\/strong><\/p>\n<div>As machine learning methods come to be more widely used, there is a great deal of hand-wringing about whether they produce\u00a0<a href=\"https:\/\/weaponsofmathdestructionbook.com\/\" rel=\"noopener\">fair results<\/a>.\u00a0\u00a0For example,\u00a0<a href=\"https:\/\/www.propublica.org\/article\/machine-bias-risk-assessments-in-criminal-sentencing\" rel=\"noopener\">Pro Publica<\/a>\u00a0reported that a widely used program intended to assess the likelihood of criminal recidivism, that is whether a person in custody would be likely to commit an additional crime, tended to over-estimate the probability that a black person would commit an additional crime and under-estimate whether a white person would.\u00a0\u00a0Amazon was said to have abandoned a\u00a0<a href=\"https:\/\/www.theguardian.com\/technology\/2018\/oct\/10\/amazon-hiring-ai-gender-bias-recruiting-engine\" rel=\"noopener\">machine learning system<\/a>\u00a0that evaluated resumes for potential hires, because that program under-estimated the likely success of women and therefore, recommended against hiring them.<\/div>\n<div><\/div>\n<div>I don\u2019t want to deny that these processes are biased, but I do want to try to understand why they are biased and what we can do about it.\u00a0\u00a0The bias is not an inherent property of the machine learning algorithms, and we would not find its source by investigating the algorithms that go into them.<\/div>\n<div><\/div>\n<div>The usual explanation is that the systems are trained on the \u201cwrong\u201d data and merely perpetuate the biases of the past.\u00a0\u00a0If they were trained on\u00a0<a href=\"https:\/\/www.businessinsider.com\/amazon-ai-biased-against-women-no-surprise-sandra-wachter-2018-10\" rel=\"noopener\">unbiased data<\/a>, the explanation goes, they would achieve less biased results.\u00a0\u00a0Bias in the training data surely plays a role, but I don\u2019t think that it is the primary explanation for the bias.<\/div>\n<div><\/div>\n<div>Instead, it appears that the bias comes substantially from how we approach the notion of fairness itself.\u00a0\u00a0We assess fairness as if it were some property that should emerge automatically, rather than a process that must be designed in.<\/div>\n<h2>What do we mean by fairness?<\/h2>\n<p>&nbsp;<\/p>\n<div>In the Pro Publica\u00a0<a href=\"https:\/\/www.propublica.org\/article\/how-we-analyzed-the-compas-recidivism-algorithm\" rel=\"noopener\">analysis<\/a>\u00a0of recidivism, the unfairness derived largely from the fact that when errors are made, they tend to be in one direction for black defendants and in the other direction for white defendants.\u00a0\u00a0This bias means that black defendants are denied bail when they really do not present a risk, and white defendants are given bail when they really should remain in custody.\u00a0\u00a0That bias seems to be inherently unfair, but the race of the defendant is not even considered explicitly by the program that makes this prediction.<\/div>\n<div><\/div>\n<div>In the case of programs like the Amazon hiring recommendation system, fairness would seem to imply that women and men with similar histories be recommended for hiring at similar rates.\u00a0\u00a0But again, the gender of the applicant is not among the factors considered explicitly by the hiring system.<\/div>\n<div>Race and gender are protected factors under US law (e.g.,\u00a0<a href=\"https:\/\/www.eeoc.gov\/laws\/statutes\/titlevii.cfm\" rel=\"noopener\">Title VII of the Civil Rights Act of 1964<\/a>).\u00a0\u00a0The law states that \u201cIt shall be an unlawful employment practice for an employer \u2026 to discriminate against any individual with respect to his compensation, terms, conditions, or privileges of employment, because of such individual\u2019s race, color, religion, sex, or national origin.\u201d<\/div>\n<div><\/div>\n<div>Although the recidivism system does not include race explicitly in its assessment, it does include such factors as whether the defendant has any\u00a0<a href=\"https:\/\/www.documentcloud.org\/documents\/2702103-Sample-Risk-Assessment-COMPAS-CORE.html\" rel=\"noopener\">family members<\/a>\u00a0who have ever been arrested, whether they have financial resources, etc.\u00a0\u00a0As I understand it, practically every black person who might come before the court is likely to have at least one family member who has been arrested, but that is less often true for whites.\u00a0\u00a0Black people are more likely than whites to be\u00a0<a href=\"https:\/\/www.vox.com\/identities\/2018\/5\/14\/17353040\/racial-disparity-marijuana-arrests-new-york-city-nypd\" rel=\"noopener\">arrested<\/a>, and once arrested, they are more likely than whites to be convicted and incarcerated.\u00a0\u00a0Relative to their proportion in the population, they are\u00a0<a href=\"https:\/\/www.naacp.org\/criminal-justice-fact-sheet\/\" rel=\"noopener\">substantially over-represented<\/a>\u00a0in the US prison system compared to whites.\u00a0\u00a0These correlations may be the result of other biases, such as racism in the US, but they are not likely to be the result of any intentional bias being inserted into the recidivism machine learning system.\u00a0\u00a0Black defendants are substantially more likely to be evaluated by the recidivism system and were more likely to be included in its training set because these same factors.\u00a0\u00a0I don\u2019t believe that anyone set out to make any of these systems biased.<\/div>\n<div><\/div>\n<div>The resumes written by men and women are often different.\u00a0\u00a0Women tend to have more interruptions in their work history; they tend to be less assertive about seeking promotions; they use different language than men to talk about their accomplishments.\u00a0\u00a0These tendencies, associated with gender are available to the system, even without any desire to impose a bias on the results.\u00a0\u00a0Men are more likely to be considered for technical jobs at Amazon because they are more likely to apply for them.\u00a0\u00a0Male resumes are also more likely to be used in the training set, because historically, men have filled a large majority of the technical jobs at Amazon.<\/div>\n<div><\/div>\n<div>One reason to be skeptical that imbalances in the training set are sufficient to explain the bias of these systems is that machine learning systems do not always learn what their designers think that they will learn.\u00a0\u00a0Machine learning works by adjusting internal parameters (for example the weights of a neural network) to best realize a \u201cmapping\u201d from the inputs on which it is trained to the goal states that it is set.\u00a0\u00a0If the system is trained to recognize cat photos versus photos of other things, it will adjust its internal parameters to most accurately achieve that result.\u00a0\u00a0The system is shown a lot of labeled pictures, some of which contain cats, and some of which do not.\u00a0\u00a0Modern machine learning systems are quite capable of learning distinctions like this, but there is no guarantee that they learn the same features that a person would learn.<\/div>\n<div><\/div>\n<div>For example, even given many thousand of training examples to classify photographs, a deep neural network system can still be \u201cduped\u201d into classifying a\u00a0<a href=\"https:\/\/blog.openai.com\/adversarial-example-research\/\" rel=\"noopener\">photo of a panda as a photo of a gibbon<\/a>, even though both photos look to the human eye very much like a panda and not at all by a gibbon.\u00a0\u00a0All it took to cause this system to misclassify the photo was to add a certain amount of apparently random visual noise to the photograph.\u00a0\u00a0The misclassification of the picture when noise was added implies that the system learned features, in this case pixels, that were disrupted by the noise and not the features that a human used.<\/div>\n<div><\/div>\n<div>The recidivism and hiring systems, similarly, can learn to make quite accurate predictions without having to consider the same factors that a human might.\u00a0\u00a0People find some features more important than others when classifying pictures.\u00a0\u00a0Computers are free to choose whatever features will allow correct performance, whether a human would find them important or not.<\/div>\n<div><\/div>\n<div>In many cases, the features that it identifies are also applicable to other examples that it has not seen, but there is often a decrease in accuracy when a well-trained machine learning system is actually deployed by a business and applied to items (e.g., resumes) that were not drawn from the same group as the training set.\u00a0\u00a0The bigger point is that for machine learning systems, the details can be more important than the overall gist and the details may be associated with the unfairness.<\/div>\n<h2>Simpson\u2019s paradox and unfairness<\/h2>\n<p>&nbsp;<\/p>\n<div>A phenomenon related to this bias is called\u00a0<a href=\"https:\/\/ftp.cs.ucla.edu\/pub\/stat_ser\/r414.pdf\" rel=\"noopener\">Simpson\u2019s paradox<\/a>, and one of the most commonly cited examples of this so-called paradox concerns the appearance of bias in the acceptance rate of men versus women to the University of California graduate school.<\/div>\n<div><\/div>\n<div>The admission figures for the Berkeley campus for 1973 showed that 8442 men applied, of which 44% were accepted, and 4321 women applied, of which only 35% were accepted.\u00a0\u00a0The difference between 44% and 35% acceptance is substantial and could be a violation of Title VII.<\/div>\n<div><\/div>\n<div>The difference in proportions would seem to indicate that the admission process was unfairly biased toward men.\u00a0\u00a0But when the departments were considered individually, the results looked much different.\u00a0\u00a0Graduate admission decisions are made by the individual departments, such as English, or Psychology.\u00a0\u00a0The graduate school may administer the process, but it plays no role in deciding who gets in.\u00a0\u00a0On deeper analysis it was\u00a0<a href=\"http:\/\/www.unc.edu\/~nielsen\/soci708\/cdocs\/Berkeley_admissions_bias.pdf\" rel=\"noopener\">found<\/a>\u00a0(P. J. Bickel, E. A. Hammel, J. W. O&#8217;Connell, 1975) that 6 of the 85 departments showed small bias toward admitting women and only four of them showed a small bias toward admitting men.\u00a0\u00a0Although the acceptance rate for women was substantially lower than for men, individual departments were slightly more likely to favor women than men. This is the apparent paradox, departments are not biased against women, but the overall performance of the graduate school seems to be.<\/div>\n<div><\/div>\n<div>Rather, according to Bickel and associates, the apparent mismatch derived from the fact that women applied to different departments on average than the men did.\u00a0\u00a0Women were more likely to apply to departments that had more competition for their available slots and men were more likely to apply to departments that had relatively more slots per applicant. In those days, the \u201chard\u201d sciences attracted more male applicants than female, but they were also better supported with teaching assistantships and so on than the humanities departments that women were more likely to apply to. Men applied on average to departments with high rates of admission and women tended to apply to departments with low rates.\u00a0\u00a0The bias in admissions was apparently not caused by the graduate school, but by the prior histories of the women, which biased them away from the hard sciences and toward the humanities.<\/div>\n<div><\/div>\n<div>A lot has been\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Simpson%27s_paradox\" rel=\"noopener\">written<\/a>\u00a0about\u00a0<a href=\"https:\/\/ftp.cs.ucla.edu\/pub\/stat_ser\/r414.pdf\" rel=\"noopener\">Simpson\u2019s paradox<\/a>\u00a0and even whether it is a paradox at all.\u00a0\u00a0The Berkeley admissions study as well as the gender bias and recidivism bias can all be explained by the correlation between a factor of interest (gender or race) and some other variable.\u00a0\u00a0Graduate applications were correlated with patterns of department selection, gender bias in resume analysis is correlated with such factors as work history, language used to describe work, and so on.\u00a0\u00a0Recidivism predictors are correlated with race.\u00a0\u00a0Although these examples all show large discrepancies in the size of the two groups of interest (many more men applied to graduate school, many more of the defendants being considered were black rather than white, and many more the Amazon applicants were men), these differences will not disappear if all we do is add training examples.<\/div>\n<div><\/div>\n<div>These systems are considered unfair, presumably because we do not think that gender or race should play a causal role in whether people are admitted, hired, or denied bail (e.g., Title VII).\u00a0\u00a0Yet, gender and race are apparently correlated with factors that do affect these decisions.\u00a0\u00a0Statisticians call these correlated variables confounding variables.\u00a0\u00a0The way to remove them from the prediction is to treat them separately (<a href=\"http:\/\/bayes.cs.ucla.edu\/BOOK-2K\/ch6-1.pdf\" rel=\"noopener\">hold them fixed<\/a>).\u00a0\u00a0If the ability to predict recidivism is still accurate when considering just blacks or just whites, then it may have some value.\u00a0\u00a0If hiring evaluations are made for men and women separately, then there can be no unintentional bias.\u00a0\u00a0Differences between men and women then, cannot explain or cause the bias because that factor is held constant for any predictions within a gender.\u00a0\u00a0Women do not differ from women in general in gender-related characteristics, and so these characteristics are not able to contribute to a hiring bias toward men.<\/div>\n<div>We detect unfairness by ignoring a characteristic, for example, race or gender, during the training process and then examining it during a subsequent evaluation process.\u00a0\u00a0In machine learning, that is often a recipe for disaster.\u00a0\u00a0Ignoring a feature during training means that that feature is uncontrolled in the result.\u00a0\u00a0As a result, it would be surprising if the computer were able to produce fair results.<\/div>\n<div><\/div>\n<div>Hiring managers may or may not be able to\u00a0<a href=\"https:\/\/www.hbs.edu\/faculty\/Publication%20Files\/12-083.pdf\" rel=\"noopener\">ignore gender<\/a>.\u00a0\u00a0The evidence is pretty clear that they cannot really do it, but the US law requires that they do.\u00a0\u00a0In an attempt to make these programs consistent with laws like Title VII, their designers have explicitly avoided including gender or race among the factors that are considered.\u00a0\u00a0In reality, however, gender and race are still functionally present in the factors that correlate with them.\u00a0\u00a0Putting a man\u2019s name on a woman\u2019s resume, does not make it into a male resume, but including the questions about the number of a defendant\u2019s siblings that have been arrested does provide information about the person\u2019s race.\u00a0\u00a0The system can learn about them.\u00a0\u00a0But what really causes the bias, I think, is that these factors are not included as part of the system\u2019s goals.<\/div>\n<div><\/div>\n<div>If fairness is really a goal of our machine learning system, then it should be included as a criterion by which the success of the system is judged.\u00a0\u00a0Program designers leave these factors out of the evaluation because they mistakenly (in my opinion) believe that the law requires them to leave them out, but machines are unlikely to learn about them unless they are included.\u00a0\u00a0I am not a lawyer, but I believe that the law concerns the outcome of the process, not the means by which that outcome is achieved.\u00a0\u00a0If these factors are left out of the training evaluation, then any resemblance of a machine learning process to a fair one is entirely coincidental.\u00a0\u00a0By explicitly evaluating for fairness, fairness can be achieved. That is what I think is missing from these processes.<\/div>\n<div><\/div>\n<div>The goals of machine learning need not be limited to just the accuracy of a judgment.\u00a0\u00a0Other criteria, including fairness can be part of the goal for which the machine learning process is being optimized.\u00a0\u00a0The same kind of approach of explicitly treating factors that must be treated fairly can be used in other areas where fairness is a concern, including mapping of voting districts (gerrymandering), college admissions, and grant allocations.\u00a0\u00a0Fairness can be achieved by discriminating among the factors that we use to assess fairness and including these factors directly and explicitly in our models.\u00a0\u00a0By discriminating we are much more likely to achieve fairness than by leaving these factors to chance in a world where factors are not actually independent of one another.<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bias in the training data surely plays a role, but I don&rsquo;t think that it is the primary explanation for the bias. The usual explanation is that the systems are trained on the &ldquo;wrong&rdquo; data and merely perpetuate the biases of the past.&nbsp;&nbsp;If they were trained on&nbsp;unbiased data, the explanation goes, they would achieve less biased results.&nbsp;It appears that the bias comes substantially from how we approach the notion of fairness itself.&nbsp;&nbsp;We assess fairness as if it were some property that should emerge automatically, rather than a process that must be designed in.&nbsp;<\/p>\n","protected":false},"author":400,"featured_media":3415,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"content-type":"","footnotes":""},"categories":[183],"tags":[97],"ppma_author":[2960],"class_list":["post-1426","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-ml","tag-artificial-intelligence"],"authors":[{"term_id":2960,"user_id":400,"is_guest":0,"slug":"herbert-roitblat","display_name":"Herbert Roitblat","avatar_url":"https:\/\/secure.gravatar.com\/avatar\/?s=96&d=mm&r=g","user_url":"","last_name":"Roitblat","first_name":"Herbert","job_title":"","description":"Herbert Roitblat&nbsp;is Principal Data Scientist at Mimecast. Author, contributor, and speaker on electronic discovery, he is a recognized expert in cognitive science, information retrieval, deep learning, electronic discovery, machine learning, neural networks, information governance, natural language processing, and dolphin biosonar."}],"_links":{"self":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1426","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/users\/400"}],"replies":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/comments?post=1426"}],"version-history":[{"count":3,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1426\/revisions"}],"predecessor-version":[{"id":28954,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/posts\/1426\/revisions\/28954"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media\/3415"}],"wp:attachment":[{"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/media?parent=1426"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/categories?post=1426"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/tags?post=1426"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.experfy.com\/blog\/wp-json\/wp\/v2\/ppma_author?post=1426"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}