Any company that decides to start using a pre-hire assessment tool to screen external job candidates should do their due diligence when choosing a new test. The ultimate goal should be finding an assessment that will help the company raise the bar on the talent selected into the organization for the target position. This process includes reviewing any available validation evidence for the test, and ensuring a thorough job analysis is completed prior to implementing it.
However, sometimes companies devise alternative means for vetting a new test that can lead to inaccurate or uninformed interpretations of its effectiveness. One such example is when stakeholders want to see how well the test works at identifying the top current employees. In other words, they want to have some employees take the test to find out if the best ones score the highest, and the worst ones score the lowest. If test results indeed show these patterns, then the test is deemed to be effective. However, if top performers do poorly and/or poor performers do well, then they believe the test will not work effectively at their company, no matter how good the validation evidence may be.
While leaders like to view this as a “litmus test”, there are some very significant risks to making judgments of the test based on this type data. The risks have to do with:
The criteria used to define the “best” versus “worst” employees
Differences in the quality of data provided by external job candidates versus current employees
The fact it is unrealistic to expect all “top” employees to get high scores.
Breaking Down Each Risk
Regarding sample size, it can be very tempting to start making inferences about how well a test is working after just a handful of people – say 5 or 10 – are assessed. But the fact is these small numbers simply do not provide a reliable sample that represents the target population of interest, and therefore makes it very difficult to determine how well the test measures the target skills. This is even more of a problem if the few that are tested were “hand-picked” by stakeholders in the company. When this occurs, the sample will likely comprised of employees who are similar on certain characteristics, making the sample even less representative of the external candidate population.
Regarding how the “best” versus “worst” employees are defined, often these labels are applied without any quantitative data and are based simply on anecdotal comments or personal perceptions. This introduces a lot of subjectivity and bias into how the labels are applied and does not provide a good outcome measure to associate with the test results. Instead, performance ratings should be obtained utilizing rating scales and behavioral examples to minimize subjectivity and other errors. Also, the ratings should be provided by individuals who are familiar with the target role - such a Supervisors - and have observed the employees perform the job over a period of time.
The third major risk outlined above is that when taking the assessment, the motivations for external job candidates are very different from those of current employees. With external candidates, the stakes are much higher because a job is riding on how well they perform on the test. They therefore will put forth their best effort when testing. Current employees, in contrast, do not have a job at stake in this example, and they often have to make time to take the test while having very busy work schedules. In addition, if they are told to take the test as a pilot, with no clear benefit to them other than perhaps receiving a feedback report, they may not feel it is worth the effort to try their best. As a result, the quality of test data from external candidates is typically higher than that of current employees.
The last risk – not recognizing that no test is 100 percent accurate – is very important to highlight, particularly when the unexpectedly poor test results of a single “top” employee can derail the implementation of the assessment. When a highly valued employee does not do well on a test, it can create very visceral, negative responses. No matter how strong the validation evidence for the test may be, the fact that it was not accurate for this particular employee often seems to hold more weight among stakeholders.
In addition to the problem with a small sample size already discussed above, it is important to note that all tests have false positives (high scores for people who are ultimately not successful on the job) and false negatives (low scores for people who are successful on the job). In determining whether a test is really working, the question that should be asked is how strong is the relationship between scores on the test and job performance?
Another way to frame this is whether the company has significantly increased the number of successful hires following implementation of the test. So if for every 50 hires, the number of successful hires jumps from 20 to 30, the test has been successful. But that still means that it is possible that the results for any single individual will not be accurate, and that ratio of successful hires will never reach 50 out of 50. In short, there is no such thing as a perfect test.
All of the risks highlighted above make the case for why validation studies should be used to determine how well a test is working. The results of a well-conducted validation study will tell you with much more accuracy how well a test is working than whether a handful of top employees get high scores on the test. Nevertheless, it remains a challenge to convince stakeholders of this, and it is highly recommended that the risks highlighted here be discussed with key stakeholders before any internal employees are tested. The earlier they understand these issues, the less likely that employee test results will erode sponsorship of the new test.