In our last blog post, we reviewed several do’s and don’ts for conducting internal HR analytics. To refresh your memory, HR analytics refers to the process of analyzing data related to your company and employees, and can be a goldmine of information if it’s tapped correctly. We reviewed several strategies, tips, and tricks in the previous blog but are back again with 5 more guidelines to adhere to that will help the endeavor of analyzing your employee data go much more smoothly.
1. Sampling Biases
Bias in statistics can sometimes result from the method in which the data is collected. If some individuals have a better chance of being selected to gather data from, for one reason or another, that data is considered to have sampling bias. For example, a health survey collected at a park would likely include people that are more active and exclude less active people who avoid parks.
Do: Make every effort to avoid all forms of sampling bias. Make sure that everyone has an equal chance to participate, or if the data is collected via some form of observation, make sure the observer is as objective and clear on the process as possible.
Don’t: Just use convenience to determine who to include in your sample. For example, don’t just survey the first 50 people you see that day. The data may not necessarily be biased, but the way you are collecting it could be. Again, your results could be due to an outside factor. In this example, perhaps the first 50 people you see in a day are more conscientious and arrive to work earlier than others. If you only include these people in your sample your results may be very different than if you use a more random approach to select participants. Don’t insist on the inclusion of such data to avoid re-collecting it, or draw any conclusions based on potentially biased data.
2. Faulty Comparisons
The phrase ‘apples to oranges’ applies here, in that two things that are fundamentally dissimilar can’t effectively be compared to each other in any statistically meaningful way.
Do: Make sure to parse out data groupings ahead of time. If there is more than one type or group of individuals being analyzed, make sure they’re equivalent where it matters before lumping them all together. Determine if it makes sense to analyze sets of data separately (e.g., by position, industry, etc.).
Don’t: Assume that the data you’re looking at is automatically similar in nature. Even jobs with similar titles or job descriptions can differ critically, even in just one area. Don’t draw comparisons across positions or people if they’re not of the same classification in regards to HR.
3. Data Merging
Oftentimes, HR data isn’t compiled into a workable spreadsheet automatically. Data usually needs to be merged across two or more sources, and often needs to be ‘cleaned’ when doing so.
Do: Ensure that when combining large datasets, any duplicates are removed via a standard process. Using unique identifiers (ID numbers, names, etc) is a good way to merge data cleanly and ensure that only applicable and relevant cases remain. Make sure your data is cleaned thoroughly, meaning any irrelevant cases such as outliers, duplicates, test data, etc., are removed first.
Don’t: Lazily merge the data and assume the resulting dataset is sound. Don’t allow any duplicates or other irrelevant cases to remain just to avoid re-running an analysis. Don’t draw any conclusions or allow submission of any results that came from improperly merged data. Do not begin an analysis without first cleaning your data and ensuring only the relevant/appropriate cases remain.
4. Cherry Picking Data
If an HR analytics endeavor is underway with the goal of finding a specific result or trend, cherry picking refers to the act of disregarding any data that is in opposition to the desired result in hopes of confirming the trend.
Do: Include any and all data that is relevant to the study as long as it isn’t dirty (i.e. test data or duplicates). Make sure any result obtained is accurately reported even if it doesn’t mesh with the original desires or goals of the study. If there are legitimate reasons to exclude data, document the exclusion and the reasoning. For example, employees who were hired and termed within 30 days were not included in the organizational culture survey. These employees did not work for the company long enough to understand the culture. Hint: “Data excluded because it didn’t help me obtain my desired results” is NOT a legitimate reason for excluding data.
Don’t: Cherry pick data. Disregarding data that is relevant to the study means that at least some part of the sample is being ignored, which can lead to legal problems down the road if a program or change is implemented based on the results of the study.
5. Normal Distributions
A normal distribution, also known as a bell curve, refers to the assumption that the cases in a dataset are evenly spread out around the mean. In other words, there is an equal amount of high and low scores above and below the average score, which is rarely ever the case with real world data. Outliers may affect the normality of the distribution and the nature of the outlier must be investigated before removing them from the dataset. For example, if an individual noted their height as 50’ 10’’, you know that this would be impossible and they probably meant to say 5’ 10’’ or 5’ 1’’ but since you don’t know which one, it may be best to just drop it. No matter what the reason for dropping outliers, it should be documented and results should be presented with a footnote declaring if and why outliers were removed.
Do: Understand that real world data rarely ever takes the form of a “perfect” normal distribution. Take this into account when choosing how to approach analyzing the data. For example, salary analyses are often skewed due to a few high salaries offsetting numerous lower salaries when averaging them. In this instance, looking at the median salary would provide a better understanding of the typical salary than the mean.
Don’t: Approach an analysis assuming a normal distribution every time. This can lead to incorrect assumptions being drawn and inaccurate results being generalized to a population inappropriately.
Remember, the most important piece of the puzzle when it comes to HR analytics is preparation. Diving headfirst into a deeper level analysis project without adequate preparation ahead of time will not only waste time and money, but likely won’t yield any truly useful results. Making sure you keep the considerations listed in both this blog and the last in mind will give you a solid foundation on which to start a thorough and valid HR analytics project.
Also contributing to this blog article are: Trevor McGlochlin, Research Analyst at Select International and Alli Besl, Research Consultant at Select International.