Exploring Recruitment Bias using Machine Learning and RSambit DasBlockedUnblockFollowFollowingApr 10Hello There ,Thanks to my background in HR consulting , I have had the opportunity of working with numerous HR professionals across the world.
Through my work and studies , I soon realised that the Recruitment process is the starting point for any study related to diversity.
The potential for human bias that could be associated with the recruitment process are numerous.
I hope my work helps to establish its not a complicated study to undertake afterall.
Objective :Using an experimental dataset for this case study , the key objectives of my work are to investigate only the Shortlisting stage of the recruitment process and :Conduct an exploratory data analysis of the recruitment data to determine patterns of Gender , Ethnicity through the recruitment stagesInvestigate if Gender and Ethnicity influence applicant shortlisting processApply Machine learning to Predict who will be Shortlisted and determine the key driversRecommend updates to the Hiring strategy based on the findingsOverview of DataSet :The experimental dataset contains the following fields :Applicant Code : Unique identifier for each application in the systemGender : Assigned a code of 1 for Male and 2 for FemaleATSIyn : Assigned 1 = Yes if candidate is an Aboriginal or Torres Strait Islander.
Assigned 2 = No if candidate is a general applicantShortlistedyn : Assigned 0 if rejected and 1 if shortlistedInterviewed : Assigned 0 if not interviewed and 1 if interviewedFemaleONpanel : Assigned 1 for Male only panel and 2 if a female member was present on the panelOfferNY : Assigned 1 if offer was made to candidate and 0 if not offeredAcceptNY : Assigned 1 if accepted and 0 if declinedJoinYN : Assigned 1 if joined and 0 if not joinedExploratory Data Analysis to study Gender and Ethnicity patterns :Female applicants dominate the applicant pool with 72.
Applicants who are Aboriginal or Toress Strait Islander make up just under half (43.
2%) of the applicant pool.
88 out of a total 280 applicants were shortlisted for next stage after resume screening.
55 of the shortlisted applicants were interviewed.
22 of the 55 interviews conducted had a female member on the panel.
28 offers were made from the total 55 interviews conducted.
Offer acceptance rate was at 64%.
18 applicants who accepted the offer joined the company.
Investigate if Gender and Ethnicity influence applicant shortlisting process :From the exploratory study above , the Gender and ethnic representation within the applicant pool is 72 % and 43% respectively.
All external conditions kept aside , an unbiased recruitment process should result in a fairly similar representation of shortlisted applicants based on gender and ethnicity.
This is the objective of the study in this section.
Analysis of Ethnicity Bias in Shortlisting of applicants :Ethnicity representation of applicants as discussed above was 43.
However based on below findings the ethnicity representation within the applicants shortlisted has fallen to 15% (19/88).
This seems to show a leaning towards general applicants in the shortlisting process.
The balloon plot suggest general applicants are likely to be shortlisted.
Using statistical analysis (Chi-Square) , it can be established that there is a substantial gap between expected and observed number of applicants with ethnic backgrounds being shortlisted.
In below case only 19 were shortlisted (observed) while the expected figure should have been 38.
The likelihood that the preference to general category applicants in the shortlisting process is happening by luck or chance has been negated with a very low Chi- Squared value of 23.
184 , p < 0.
Analysis of Gender Bias in Shortlisting of applicants :Female representation within the applicant pool as discussed above was 72% .
However based on below findings the female representation within the applicants shortlisted has fallen to 56.
7% (50/88) with a significantly high number of rejections at 152.
This seems to show a bias towards male applicants in the shortlisting process.
The balloon plot confirms a high rejection of female applicants in the interview shortlisting stage.
Using statistical analysis (Chi-Square) , it can be established that there is a gap between expected and observed number of female applicants being shortlisted.
In below case only 50 were shortlisted (observed) while the expected figure should have been 63.
The likelihood that the preference to male applicants in the shortlisting process is happening by luck or chance has been negated with a very low chi-square value of 13.
905 , p < 0.
Apply Machine learning to Predict shortlisting of applicants and determine the key drivers :In the above analysis , gender and ethnicity were evaluated separately.
However if the majority of applicants with ethnic backgrounds happen to be females then it will not be correct to imply that both gender and applicants with ethnic backgrounds are facing bias.
For this reason , in this section the analysis for predicting applicant shortlisting will take into account both the variables.
The way to do this analysis is by Logistic regression.
Logistic regression is a machine learning algorithm which can be used to predict the likelihood of an applicant being shortlisted in the application process.
The algorithm will be used to study our dataset in order to determine the top drivers of applicant shortlisting.
The results of the machine learning analysis suggests that both Gender and ATSIyn (Aboriginal or Torres Islander) are significant predictors for applicant shortlisting.
In summary based on findings , male applicants are 3.
3 times more likely to be shortlisted than female applicants.
General category applicants are 4.
5 times more likely to be shortlisted than applicants who are Aboriginal or Torres Strait Islander.
Prepare a Hiring strategy to offset the bias :The data analysis has shown that based on the synthetic dataset , gender and ethnicity bias is present in the Shortlisting stage of the recruitment process within the company.
In order to offset the bias , the recommended strategy would be to introduce a blind review process where applicant’s name , gender and ethnicity background information is hidden from the internal or external recruiter who is reviewing the job application during the shortlisting process.
References and Links to Code and Dataset :I was deeply inspired by the work of these 2 students in this field and it was the starting point for my interest to explore further in this area.
The video link is here for you to have a look at.
Dataset and case study used is as illustrated in the book Predictive HR Analytics — Mastering the HR metric by Martin R Edwards and Kirsten Edwards.
R code has been developed by me.
Dataset and R code is available on my github account.
Link is provided below :Sambit78/People-Analytics-ProjectAll my personal projects related to People Analytics – Sambit78/People-Analytics-Projectgithub.
comThank You :).. More details