<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=353110511707231&amp;ev=PageView&amp;noscript=1">

What to Consider Before Using Artificial Intelligence for Employee Selection

August 18, 2020

I recently had a chance to speak with a pioneer in the psychological measurement and social science fields, Hannes Rosenbusch. Hannes is a PhD candidate and skilled researcher at Tilburg University in the Netherlands who uses natural language processing (NLP) and machine learning (ML) to conduct novel and innovative research 

WorkingRemoteDuring our call, I asked Hannes: What should organizations consider before using natural language processing and machine learning to inform personnel decisions?

I've summarized key considerations below.  


I repurposed this example from Bostrom and Yudkowsky (2014). Picture an organization using machine learning to select job candidates. A rejected candidate files a lawsuit against the organization, claiming the model discriminates based on race“Impossible,” says the organization. “Race isn’t even included in the algorithm.” Yet, analyses show a low hiring rate for racial minorities. How could this have happened? 

Machine learning algorithms can be extremely complex, making it nearly impossible to know how certain features contribute to the model. Hence, the “black box” analogy. If the hypothetical organization above used a neural network, for example, it’s unlikely they’d be able to completely understand how or why a candidate's race factored into the modelThey might not realize the model learned to classify candidates who attended universities with a majority white student population as “qualified” and candidates from universities with more minority students as “not qualified. 

I am not saying organizations should avoid using complex models. Rather, I want to stress the importance of model transparencyData scientists, HR decision makers, and other stakeholders should know exactly what features exist in their models and how the features are used. They should be able to explain why the model came to a result and why we believe the result. To say the machine gave this result, so we are going to treat everyone with this result in a particular way is not sufficient.  


One clear value of artificial intelligence in selection is the ability to process data while avoiding some human biases and errors. Take interviews for example. Research shows that interview ratings are influenced by rater biasesfatigue, motivation, and a host of other issues (Hoyt, 2000). Machines don’t have these problemsThey can multi-task, they can run data non-stop for hours, and they don’t care if a candidate was the first or 100th person to interview.   

However, that doesn’t mean machine learning models are impartialModels are created by people with biases, so they inherently share some of our partialitiesConsequentlyhuman biases show up in our results and impact our decisions. We are obligated, therefore, to investigate our models for biases, figure out how biases entered into the modelsthen isolate and mitigate biases as well as possible.  

ReliabilityValidity, and Fairness  

The best selection assessments are reliable, valid for their intended purpose, and fair. For this reason, assessment designers go to great lengths to ensure their instruments provide repeatable resultsmeasure what they say they measure, minimize discriminatory effects, and relate to the target job role. Machine learning models are no different than traditional assessments. They ought to receive the same level of scrutiny as any survey, situational judgement test (SJT), or cognitive ability test. Therefore, organizations should consider how they plan to test the quality of their models and what standards the models must meet before being deployed.   

There is also the issue of instrument replacement. For example, if you use a rank style SJT item that has strong reliability and validity evidence, does not lead to adverse impact, and is job related, should it be replaced with a write-in style situational judgement test item that requires natural language processing? That depends. Before trading a conventional but high-quality item for fancy alternative, consider the following: does the new item predict job performance above and beyond what is already in place? Does it meet standards of reliability, validity, and fairness? Does it improve the overall performance of the assessment? If the answer is a resounding no, then it’s probably not wise to make the change.   


This last consideration pertains to ethics and legalityThe topic of ethics and legality of artificial intelligence in employee selection deserves its own blog post, but I will touch on it here. Machine learning tools make extracting and evaluating personal information like social media posts extremely easy. You can pull thousands of user tweets in seconds, but should you? Is it ethical to download a candidate’s past tweets, use natural language processing to create text-based features, then use those features to predict assessment scoresDid the candidate give permission to use this data?  

To answer these questionslet’s turn to the American Psychological Association’s Ethical Principles of Psychologists and Code of Conduct (“Code of Conduct,” 2016). The Code of Conduct states that consent is implied when a person applies for a job (Section 9.03, Informed Consent in Assessments). But does that make all information they put out into the world fair game?   

Social media users do not have complete control over who accesses their accounts. Even if a user sets strict privacy restrictions, that does not guarantee their personal information is safe from a motivated search. For users that do not set strict privacy settings, is consent implied because they did not fully conceal their information? (Guilfoyle, Bergman, Hartwell, & Powers, 2016). I suspect not. Still, the issue is debatable and there is no legal precedent for handling such data.   

In summary, the ability to use artificial intelligence to improve selection processes is no easy task. Before going all-in on a fancy new method or model, consider the following:  

  • Transparency – Can you explain: 

    • What features are in the model 

    • Why those features belong in the model 

    • How the results were calculated 

    • Why the results are trustworthy  

  • Partiality – Can you pinpoint: 

    • Human biases in the model 

    • How biases entered into the model 

    • The effect of biases on model results 

    • How to rectify biases   

  • Reliability, Validity, and Fairness – Does the model and method: 

    • Meet standards of reliability, validity, and fairness 

    • Improve an existing system 

    • Measure a construct better than an existing instrument  

  • Privacy – Was the data collected: 

    • With the candidates’ consent 

    • Within legal standards  

Thanks for reading and, as my graduate advisor would say, onward and upward!    

volume hiring


Hannes Rosenbusch is a PhD candidate at Tilburg University. He works on applications of machine learning and optimization algorithms to questions in social science and research methodology.  


American Psychological Association. (2016). Revision of Ethical Standard 3.04 of the" Ethical Principles of Psychologists and Code of Conduct" (2002, as amended 2010). The American Psychologist, 71(9), 900. 

Bostrom, N., & Yudkowsky, E. (2014). The ethics of artificial intelligence. The Cambridge handbook of artificial intelligence, 1, 316-334. 

Guilfoyle, S., Bergman, S. M., Hartwell, C., & Powers, J. (2016). Social media, big data, and employment decisions: Mo’data, mo’problems?. In Social media in employee selection and recruitment (pp. 127-155). Springer, Cham. 

Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what can we do about it?. Psychological methods, 5(1), 64. 

Kimberly Silva, PhD is a Research Consultant for PSI Services LLC based in Pittsburgh, PA. Her areas of expertise include machine learning, natural language processing, and employee selection.