Norm-referenced and Criterion-referenced
Interpretation of Test Results

Assessment results allow educators to make important decisions about students' knowledge, abilities and future educational potential. There are multiple ways to summarize and interpret assessment results. The conclusions we can reach based on the results from an assessment will depend on how we interpret these results.

Sometimes we want to compare students with their peers or rank-order them. When we use tests to make these types of decision we say we are making a norm-referenced or normative interpretation of test results.

Some tests are used to evaluate a person's mastery of a subject, usually by using cut scores to define proficiency or pass/fail decisions. When tests are used to decide if a student meets a certain pre-defined standard or criteria, we say we are making a criterion-referenced interpretation of test results.

In this interactive module we will explore the concepts, appropriate uses, and limitations of both norm-referenced and criterion-referenced interpretation of test results. Throughout this lesson you will be able to change score values and see how the changes you make impact the interpretation of tests results.


Mary Masters Math

Mary is a 9th grade student at ABC High School and she just took the final Math I exam.

The results from this exam will be used to:


How can we use Mary's test score to decide if she can move on to Math II and if she will receive a prize for being a top performer in comparison to her schoolmates? Let's find out!


Some details about our example:


Raw Score

In order to score the exam, Mr. Smith, the math teacher, computes the grade by counting the number of correct answers, with no penalty for incorrect answers and then calculates the percent correct. For example, if Mary got questions right out of the total 50 questions on the test, she would score % on the exam. We call the number correct and percent correct scores a raw score because they are based solely on the number of correctly answered items on the test without any manipulation.

Activity 1: Let's assume Mary got correct questions on the test. What was her percent correct score? Use the slider to answer and hit submit.

50%


[Graphic 1: Mary's raw score in terms of Number Correct (Points) and Percent Correct (%) - will show once the correct 'percent correct' has been selected above]


Norm-referenced Scores and Interpretations

You know Mary's raw score, but does this score tell you how well she did in comparison to her classmates or to all other 9th grade students in her high school? Does it tell you if she got any of the 9th grade Math prizes?

No!

In order to rank students and know how one student's score compares to other students who took the same test we must have a norm-referenced score. A norm-referenced score typically indicates the examinee's relative position in a group of test takers that we want to compare the examinee's score with. Once we have information about this group, which we call a norm group, we are able to answer the questions above.

[Graphic 2: Mary's raw score compared to her classmates ]

Norm Groups

What is a norm group?

A norm group can be any group we wish to make comparisons against. In high-stakes national or state standardized tests, the norm group must be representative of the national or state population of students taking the same test. To be valid, group comparisons should be made between similar students (e.g., the percent of children learning English should be the same in the norm and comparison groups).


Quiz: We know that the results of the Math I exam will be used to give prizes to students in the 95th percentile of ninth grade Math students at ABC High. What is the most appropriate norm group for this kind of decision?

  1. Mary's 9th grade class
  2. All ABC High's 9th grade students in the current school year
  3. All ABC High's students
  4. All current and past ABC High's 9th grade students who have taken this test


Types of Norm-referenced Scores

There are different types of norm-referenced scores. The most common are:

Activity 3: Now let's define Mary's norm group. To what group do you want to compare Mary's score to?

[Graphic 3: Selecting Norm Group (class or school)] Show Mary's placement in comparison to the class/school.

{Reactive text: Mary scored points or % on the exam. Based on the selected norm group [norm_group], she scored at the [ZZ] percentile. This means that based on her raw score Mary scored higher than [ZZ]% of the other students in her [class/school] (norm group) who took the test. In other words, [ZZ]% of the students who made up the norm group answered fewer than XX items correctly.}


Criterion-referenced Scores and Interpretations

You know Mary's raw score and how well she did compared to her classmates and schoolmates, but do you know if she will be able to move on to Math II?

No!

Placement decisions are usually based off a minimum score that the student must obtain to demonstrate proficiency in the subject matter. In order to make pass/fail decisions we usually make a criterion-referenced interpretation of test scores.


Cut Scores

A cut score is a point on the test score scale used for classifying the test takers into groups on the basis of their scores. In general, criterion-referenced tests involve a cut score, where the examinee passes if their score exceeds the cut score and fails if it does not (often called a mastery test). However, the criterion is not the cut score; the criterion is the domain of subject matter that the test is designed to assess. In some cases, tests may have multiple cut scores representing tiered levels of proficiency, such as basic, proficient, or advanced. Cut scores may also be applied in certification and licensing exams that are used to determine whether examinees are professionally "qualified."

Most criterion-referenced assessments use a cut score which determines success or failure based on a pre-established percent correct. Criterion-referenced scores tell us how well the examinee performed against an objective or standard, as opposed to against other examinees. For example, a driver's test intends to determine whether a person is knowledgeable and capable enough to be allowed on the road, not whether one driver is more accomplished than another.


Standard Setting

The process used to determine cut scores is formally known as standard setting. For tests like classroom assessments cut scores might be defined by the teacher, for final exams it might be defined by the school. For commercial and high-stakes tests, the test developer will form a standard-setting panel by recruiting a group of experts, such as psychometricians (specialists in the science of educational measurement) or teachers from a relevant content area. The panel will then use one or more research-based methods, developed by psychometricians and academics, for setting testing standards and determining cut scores.

Activity 4: Defining a cut score. Choose a cut score and see if Mary passed or failed the exam. You may also go back and change Mary's raw score. Note that defining a cut score does not change Mary's norm-referenced scores.

[Graphic 3: Selecting cut score] - Add Mary's pass/fail information in addition to her placement in comparison to the norm group.

{Reactive text: Mary scored XX points or YY% on the exam. Since Mary scored above the cut score of [KK], she has demonstrated [sufficient/insufficient] knowledge of 9th grade math curriculum and [passed/failed] the exam.}

Activity 5: Defining additional cut scores. Suppose you want to further categorize students into basic, proficient and advanced based on their scores on the math final exam. Select a cut score for the advanced level and see how Mary's classification changes.

[Graphic 4: Tiered classification graph]

{Reactive text: Mary scored XX points or YY% on the exam. Since Mary scored [above/below] the cut score of [KK], she has demonstrated [basic/proficient/advanced] knowledge of 9th grade math curriculum and [passed/failed] the exam.}


Combining Norm- and Criterion-referenced Interpretations

We can, and often do, interpret a single test score both in terms of norm- and criterion-referencing. However, these interpretations are distinct and should not be confused.

Activity 6: Now we will combine both types of interpretations of Mary's test score. Try changing Mary's raw score and see how it affects both her percentile rank in the norm group you selected and her proficiency levels based on the cut scores you defined.

[Graphic 5: Norm and Criterion-referenced interpretations]

{Reactive text: Mary scored XX points or YY% on the exam. This corresponds to a standard score of ZZ based on the distribution of scores for her [norm group] and she scored at the [ZZ] percentile. This means that based on her raw score Mary scored higher than [ZZ]% of the other students in her [class/school] (norm group) who took the test. In other words, [ZZ]% of the students who made up the norm group answered fewer than XX items correctly.

Mary [did/did not] score at the 95th percentile rank, therefore she [did/did not] win the Math prize.

Additionally, since Mary scored [above/below] the cut score of [KK], she has demonstrated [basic/proficient/advanced] knowledge of 9th grade math curriculum and [passed/failed] the exam. }



Summary

The distinction between norm-referenced and criterion-referenced score interpretation can be summarized as follows.

In our example, we could say that the focus of a normative score is on how well Mary performed on the Math final exam compared to her peers and the focus of a criterion-referenced score is on what it is that Mary can do and if she met the criteria of mastery of content included in the Math exam.

Uses and limitations of norm-referenced scores:

Uses and limitations of criterion-referenced scores:

"Although loads of educators refer to "criterion-referenced tests" and "norm-referenced tests," there are, technically, no such creatures. Rather, there are criterion- and norm-referenced interpretations of students' test performances. For example, educators in a school district might have built a test to yield criterion-referenced interpretations, used the test for several years and, in the process, gathered substantial data regarding the performances of district students. As a consequence, the district's educators could build normative tables permitting norm-referenced interpretations of the test which, although born to provide criterion-referenced inferences, can still permit meaningful norm-referenced interpretations." Popham, W. James. 2011. Classroom Assessment: What Teachers Need to Know (Sixth Edition). Boston: Pearson Education, Inc. pp. 46-8.