For more than half a century, one of the most troubling issues facing professionals engaged in making employment decisions (hiring, promoting, downsizing) is the differential selection of individuals from various groups. As members of the I/O Psychology community we seek to create and implement assessment programs that are fair and that result in people with an equal ability to do the job having an equal opportunity to obtain the job. The determination of adverse impact, an approach to understanding if selection programs are fair, has been part of our landscape for decades.
While the concept is simple, the assessment of adverse impact is a topic of great debate. Methods for resolving adverse impact are numerous and controversial. We are still seeking to better understand the underlying causes of assessment differences across groups — the factor that precedes any finding of adverse impact.
Adverse Impact: Worse Being Big than Being Bad?
The way we measure adverse impact has changed over time. Early on, we looked at what was known as the 80% or 4/5ths rule: a simple calculation comparing the selection rate of one group with another, and then determining that the rates are within range or out of range. For example, if white candidates are selected at a rate of 20%, the rate of selection for black candidates needs to be 16% or more to avoid a determination of adverse impact.
As cases made their way through the courts and data were analyzed using more sophisticated analytics, we saw measures of statistical difference between selection rates. Statistical significance has become the standard for evaluation, which simultaneously provides greater scientific rigor along with a host of analytic complications:
One of the big problems facing employers is the ability to attract large numbers of applicants via internet based testing and other technological advancements. Statistical significance testing is particularly influenced by sample size. So, in terms of adverse impact, it may be “worse being big than being bad.” (Jacobs, Murphy, and Silva (2013)).
So, if your organization has a large numbers of candidates in its analysis of adverse impact, you must be aware of the sample size. Additionally, statistical significance testing starts with a “null hypothesis,” which states the expectation of “no difference.”
But this is not always the case. As decades of research can document for certain types of tests, known differences exist. If those types of tests are indicated for use in a selection program, the "no difference hypothesis" will create a biasing of the results toward a finding of adverse impact.
So, How Do We Reduce Adverse Impact?
Aside from the measurement method described above, I/O Psychologists have searched for ways to reduce adverse impact with some degree of success.
This approach basically looks at differences in test scores and reduces these differences by putting scores together that are highly similar. Perhaps the easiest way to think about banding in a selection context is to look to something we are all familiar with – school grades. In any class, the instructor can array students from low to high, showing scores ranging from below 50% to nearly 100%. In some settings, those scores are actually recorded – but more often we see people receiving grades of A, B, C, etc. Someone who receives a grade of B scored somewhere between 80 and 89%. In this way, bands are created.
The same process can be used for selecting employees where candidates receive a category score rather than what might be a more precise score that includes some degree of error. The width of the band is often linked to the amount of error that is believed to be in a score.
Multiple Selection Tools
Another method for reducing adverse impact is to include multiple test/tools in the selection process. Here we look to the job analysis to help uncover key knowledge, skills, abilities, and personal characteristics that lead to success. As an example, for decades police officers were selected based on tests of cognitive abilities often referred to as Civil Service tests. Many of these were broad based knowledge assessments along with a measure of thinking. When these were the sole way for selecting police officers, we saw predominantly white police forces. In the early 80s, programs in this arena expanded based on the idea that while you have to be smart to be a police officer, you also needed to have good communication skills, possess a level of empathy, and be comfortable working within rules. When assessments expanded to include these concepts, police departments began to diversify.
The message here is that one way to reduce adverse impact is to more completely define the job and include more and different types of assessments.
Adverse impact is complicated, and its reduction takes efforts on multiple fronts. It is important that employers keep track of how their selection and promotion systems are operating and ways they can be improved both to enhance validity and to increase diversity.