Are we doomed to have racist and sexist algorithms?Even when we optimize for accuracy, machine learning algorithms may perpetuate discrimination even when we work from an unbiased data set and have a performance task that has social goods in mind.
What else could we do?Sequential learningMore theoryCausal modelingOptimizing for fairnessOf all of these, optimizing for fairness seems like the easiest and the best course of action.
In the next section, we will outline how to optimize a model for fairness.
Optimizing for FairnessBuilding machine learning algorithms that optimize for non-discrimination can be done in 4 ways:Formalizing a non-discrimination criterionDemographic parityEqualized oddsWell-calibrated systemsWe will discuss each of these in turn.
Formalizing a non-discrimination criterion is essentially what the other 3 approaches involve, they are types of criterion which aim to formalize a non-discrimination criterion.
However, this list is not exhaustive and there may be better approaches that have not been proposed yet.
Demographic parity proposes that the decision (the target variable) should be independent of protected attributes — race, gender etc.
are irrelevant to the decision.
For a binary decision Y and protected attribute A:P(Y=1 ∣ A=0) = P(Y=1∣A=1)The probability of some decision being made (Y=1) should be the same, regardless of the protected attribute (whether (A=1) or (A=0)).
However, demographic parity rules out using the perfect predictor C=Y, where C is the predictor and Y the target variable.
To understand the objection, consider the following case.
Say that we want to predict whether an individual will purchase organic shampoo.
Whether members of certain groups purchase organic shampoo is not independent of their membership of that group.
But, demographic parity would rule out using the perfect predictor.
So perhaps this is not the best procedure, maybe the others will give us a better result?Equalized odds proposes that the predictor and the protected attribute should be independent, conditional on the outcome.
For the predictor R, outcome Y, and protected attribute A, where all three are binary variables:P(R=1|A=0, Y=1) = P(R=1|A=1, Y=1).
The attribute (whether (A=1) or (A=0)) should not change your estimation (P) of how likely it is that some relevant predictor (R=1) holds true of the candidate.
Instead, the outcome (of some decision) (Y=1) should.
An advantage of this procedure is that this is compatible with the ideal predictor, R=Y.
Consider the following case involving a student getting accepted into Yale, given that they were the valedictorian in their high school.
Equalized odds posits that knowing whether the student is gay does not change the probability of whether the student was valedictorian.
Predictor R = Whether you were high school valedictorian (1) or not (0)Outcome Y = Getting into Yale (1) or not (0)Attribute A = Being gay (1), being straight (0)P(R=1| A=0, Y=1) = P(R=1| A=1, Y=1).
Well-calibrated systems propose that the outcome and protected attribute are independent, conditional on the predictor.
For the predictor R, outcome Y, and protected attribute A, where all three are binary variables:P(Y=1|A=0, R=1) = P(Y=1|A=1, R=1)The probability of some outcome occurring (Y=1) should be unaffected by some protected attribute (whether (A=0) or (A=1)), and instead should be conditional on the relevant predictor (R=1).
This formulation has the advantage that it is group unaware — it holds everyone to the same standard.
Contrasting this with our previous example, knowing that the student is gay does not change the probability of whether the student got into Yale.
The distinction between equalized odds and well-calibrated systems is subtle, but important.
In fact, this difference is the basis of the disagreement about the COMPAS software we discussed in the beginning.
So is COMPAS racist?Equalized odds and well-calibrated systems are mutually incompatible standards.
Sometimes, given certain empirical circumstances, we cannot have a system be both well-calibrated and equalize the odds.
Let’s look at this fact in the context of the debate between ProPublica and Northpointe about whether COMPAS is biased against black defendants.
Y = whether the defendant will reoffendA = race of the defendantR = recidivism predictor used by COMPASNorthpointe’s defense: COMPAS is well-calibrated, i.
,P(Y=1|A=0, R=1) = P(Y=1|A=1, R=1).
The COMPAS system makes roughly similar recidivism predictions for defendants, regardless of their race.
ProPublica’s rejoinder: COMPAS has a higher false positive rate for black defendants and a higher false negative rate for white defendants, i.
, does not satisfy equalized odds:P(R=1|A=0, Y=1) ≠ P(R=1|A=1, Y=1)Source: Washington PostThe race of the defendant makes a difference to whether the individual is placed in the low or the medium/high-risk category.
Whether (A=0) or (A=1) makes a difference to the probability that COMPAS has identified some recidivism risk predictor will hold of the defendant (P(R=1)), and not just whether the defendant will/won’t reoffend (Y=1).
Why did this happen?When certain empirical facts hold, our ability to have a well-calibrated and an odds-equalizing system breaks down.
It seems that what’s generating the problem is something we discussed earlier: background facts created by injustice.
For example, higher rates of being caught re-offending due to higher police scrutiny.
It’s hard to figure out when certain fairness criteria should apply.
If some criterion didn’t come at a cost to the others, then you would worry less about applying one when you’re uncertain.
But, since this isn’t the case, we need to understand the impact of failing to meet at some criteria.
So which of our discussed criteria are the best to choose?.All of these approaches have promising features, but all have their drawbacks.
So what now?We cannot section off fairness in one little corner without fighting to change injustices in the world and discrimination that happens outside of machine learning systems.
This doesn’t mean we can’t do anything!.We must set some standards for fairness in certain domains, while at the same time striving to change base rates.
Despite several controversies and its unpopularity amongst some, the COMPAS software continues to be used to this day.
No one who develops an algorithm wants to be accused or imprisoned for unknowingly developing a racist algorithm, but some criteria must be selected to base predictions of in situations like that which COMPAS tries to tackle.
It may be an algorithm, and it may not be perfect, but it is a start, and one has to start somewhere.
Can Machine Learning Help to Reduce Discrimination?Machine learning is an extremely powerful tool.
This is increasingly clear as humanity begins to transition from humanist to dataist perspectives— where we begin to trust algorithms and data more than people or our own thoughts (some people have driven into lakes because their GPS told them too!).
This makes it extremely important that we try to make algorithms as unbiased as possible so that they do not unknowingly perpetuate social injustices that are embedded in historical data.
However, there is also a huge potential to use algorithms to make a more just and equal society.
A good example of this is in the hiring process.
Say you are applying for your dream job and are in the final stage of the interview process.
The hiring manager has the power to determine whether you are hired or not.
Would you like an unbiased algorithm to decide whether you are the best person for the job?Would you still prefer this if you knew that the hiring manager was racist?.Or sexist?Perhaps the hiring manager is a very neutral person and is basing the job purely on merit, however, everyone has their own proclivities and underlying cognitive biases that may make them more likely to select the candidate they like the most, as opposed to the person best for the job.
If unbiased algorithms can be developed, the hiring process could become faster and less expensive, and their data could lead recruiters to more highly skilled people who are better matches for their companies.
Another potential result: a more diverse workplace.
The software relies on data to surface candidates from a wide variety of places and match their skills to the job requirements, free of human biases.
This may not be the perfect solution, in fact, there is rarely a perfect answer when it comes to justice.
However, the arc of history appears to tend towards justice, so perhaps this will give justice another step forwards.
Another good example of this is automatic loan underwriting.
Compared with traditional manual underwriting, automated underwriting more accurately predicts whether someone will default on a loan, and its greater accuracy results in higher borrower approval rates, especially for underserved applicants.
The upshot of this is that sometimes machine learning algorithms do a better job than we would at making the most accurate classifications, and sometimes this combats discrimination in domains like hiring and credit approval.
Food for ThoughtTo end such a long and serious article, I leave you with a quote from Google about discrimination in machine learning to mull over.
“Optimizing for equal opportunity is just one of many tools that can be used to improve machine learning systems — and mathematics alone is unlikely to lead to the best solutions.
Attacking discrimination in machine learning will ultimately require a careful, multidisciplinary approach.
” — GoogleReferences O’Neil, Cathy.
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy.
 Garvie, Claire; Frankle, Jonathan.
Facial-Recognition Software Might Have a Racial Bias Problem.
The Atlantic, 2016.
 Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai.
Man is to Computer Programmer as Woman is to Homemaker?.Debiasing Word Embeddings.
 CS181: Machine Learning.
Harvard University, 2019.