Scaled scoring for your certification examination

Blog

Scaled scoring for your certification examination

Josh MacInnes, PhD

February 6, 2024

When designing a certification test, one of the most important considerations is ensuring that those who use the test scores can properly interpret them. Candidates, institutions, employers, governing bodies, and members of the public need to be aware of the score scale (the number of points possible) as well as the minimum score required to earn the credential (the cut score).

Certification testing programs have a choice between providing candidates raw scores or scaled scores. Let’s take a look at the difference between raw scores and scaled scores as well as the benefits of choosing to report scaled scores over raw scores.

What are Raw Scores?

The simplest way to provide scoring feedback is to report raw scores. For instance, if a test publisher creates a test with 100 items, then the raw score scale covers the range of possible scores between 0 and 100. If the cut score on the test is 70 items correct, then candidates who score 70 and above earn the credential while candidates who score 69 or below do not earn the credential. Reporting raw scores is a simple way of providing candidates with feedback that is easy to interpret.

Challenges with Raw Scores

A challenge with providing raw scores is when there are multiple forms. Test publishers work very hard to create forms that are similar in difficulty, but no two tests will be exactly alike. Sometimes different forms of the same test will have different cut scores to account for slight differences in difficulty. For example, let’s say a test publisher creates two forms of a test. Form 1 has a cut score of 70 but Form 2 is slightly easier and a higher score of 71 is required to pass the test.

This creates a problem for the test publisher. On Form 1, a candidate who scores 70 earns the credential while a candidate who scores 70 on Form 2 does not earn the credential. How will the test publisher explain to two candidates who both scored 70 that one candidate earned the credential while the other candidate did not? This situation can lead to a lot of questions for the test publisher.

Another challenge with raw scores is when a test publisher offers multiple credentials. For instance, the test in the above example has 100 items, but the same publisher may offer a second credential with an 80-item test and a third credential with 150-item test. If raw scores are reported for all three credentials, then there are three different score scales with three different cut scores. Not to mention that each test may have multiple forms. Consolidating to a single scaled range and cut may be preferable for the test publisher.

These types of situations can create confusion for candidates and other stakeholders. If scaled scores were reported instead of raw scores, all three tests could be reported on the same scale with the same cut score. This eliminates a lot of confusion for candidates and likely reduces the number of questions the test publisher receives.

What are Scaled Scores?

Scaling tests scores is the process of converting raw scores to a different scale. Mathematically, it’s similar to the process of converting temperature on a Fahrenheit scale to a Celsius scale, in that a linear equation can be created for both conversions. The blue line in Figure 1 represents the equation for converting from Fahrenheit to Celsius. Note that a horizontal line crosses the blue line at 0 degrees Celsius, which is the same as 32 degrees Fahrenheit (see the vertical dotted line). Zero degrees Celsius and 32 degrees Fahrenheit have the same meaning, they are the freezing points for water on both scales.

Figure 1: Converting Fahrenheit to Celsius

Using the 80-item, 100-item, and 150- item test examples from earlier, a test publisher may choose to convert the scores on each test to be on the same scale. In this example a scale from 200 to 600 was created where the passing score is always 480. In Figure 2, there are three lines representing the scaled scores on each exam. Like the Fahrenheit and Celsius example, the horizontal line represents the same cut score of 480 required to pass each test, while the dotted line represents the corresponding raw cut score for each test.

Figure 2: Converting Raw to Scaled Scores

As mentioned earlier, no two test forms are the same so sometimes slightly harder or easier forms require different raw cut scores to pass. To account for these differences, raw cut scores are converted to the same scaled cut score of 480 for each form. Minor adjustments are made to the linear conversion formula for each form so that the minimum and maximum reported values remain the within the scale range of 200 to 600.

Benefits of Scaled Scores

An advantage of using scaled scores over raw scores is that interpretation of scores is simplified for different forms of a test. Under scaled scoring, all candidates must earn the same minimum scaled score to earn the credential, eliminating confusion for candidates who were administered different test forms with different raw cut scores.

Scaled scoring is also better suited for test publishers when scoring tests using item response theory (IRT), which models the relationship between item performance and candidate ability. Estimates of test candidate ability under the IRT model are not provided on a traditional raw score scale, but rather a logit scale, which usually ranges from approximately -3 to +3. These ability estimates need to be converted to a scale that candidates can understand. To eliminate confusion, scaled scoring is typically preferred over approximating a raw score equivalent.

In addition to overall scores, content domain scores, or content area subscores, can also be converted to the same scale as the overall test. Domains have fewer items than the overall test, and because of this, small differences in overall form-to-form difficulty are more pronounced at the domain-level. Domain-level difficulty differences are not as noticeable when domain scaled scores are reported instead of raw scores.

When to Implement Scaled Scores

A great time for test publishers to make the switch from reporting raw scores to reporting scaled scores is after developing new test specifications. When new test specifications are announced, candidates and other stakeholders often have many questions, especially if there are major changes to the content. Since the test publisher must educate stakeholders about the new test specifications, it tends to be a logical time to also inform stakeholders about changes to the score scale. The scoring switch also helps to distinguish between the current and updated versions of the certification test.

Blog

The latest test delivery trends | Part III

Resources

Navigating change in high-stakes testing with the FAA

ETS Expands Focus on Workforce Skills Sector with the Acquisition of PSI

Case Study

National Council of Architectural Registration Boards (NCARB)

Blog

Scaled scoring for your certification examination

Josh MacInnes, PhD

What are Raw Scores?

Challenges with Raw Scores

What are Scaled Scores?

Benefits of Scaled Scores

When to Implement Scaled Scores

We're here to help

Stay informed

Test Takers

Knowledge Hub

Test Sponsors

About PSI

Get in touch

Join our newsletter to stay tuned with the latest insights.

Blog

The latest test delivery trends | Part III

Resources

Navigating change in high-stakes testing with the FAA

ETS Expands Focus on Workforce Skills Sector with the Acquisition of PSI

Case Study

National Council of Architectural Registration Boards (NCARB)