Interpretation of Test Results

Assessment results allow educators to make important decisions about students' knowledge, abilities and future educational potential. There are multiple ways to summarize and interpret assessment results. The conclusions we can reach based on the results from an assessment will depend on how we interpret these results.

Sometimes we want to compare students with their peers or rank-order them. When we use tests to make these types of decision we say we are making a **norm-referenced** or **normative** interpretation of test results.

Some tests are used to evaluate a person's mastery of a subject, usually by using cut scores to define proficiency or pass/fail decisions. When tests are used to decide if a student meets a certain pre-defined standard or criteria, we say we are making a **criterion-referenced** interpretation of test results.

In this interactive module we will explore the __concepts__, __appropriate uses__, and __limitations__ of both norm-referenced and criterion-referenced interpretation of test results. Throughout this lesson you will be able to change score values and see how the changes you make impact the interpretation of tests results.

Mary is a 9th grade student at ABC High School and she just took the final Math I exam.

The results from this exam will be used to:

- Determine if 9th grade students mastered Math I concepts and can move on to Math II.
- Give prizes to 9th grade students who scored
__at the 95th percentile__in Math I at ABC High.

*How can we use Mary's test score to decide if she can move on to Math II and if she will receive a prize for being a top performer in comparison to her schoolmates? Let's find out! *

Some details about our example:

- The exam contains 50 questions and each question is worth 1 point.
- Each 9th grade class has a total of 20 students.
- The high school, ABC High, has 5 ninth grade classes for a total of
__100__ninth grade students. - ABC High has records from students who took this exam in the past 10 years.

In order to score the exam, Mr. Smith, the math teacher, computes the grade by counting the **number of correct** answers, with no penalty for incorrect answers and then calculates the **percent correct**. For example, if Mary got questions right out of the total 50 questions on the test, she would score **%** on the exam. We call the __number correct__ and __percent correct__ scores a **raw score** because they are based solely on the number of correctly answered items on the test without any manipulation.

Activity 1: Let's assume Mary got correct questions on the test. What was her percent correct score? Use the slider to answer and hit submit.

[Graphic 1: Mary's raw score in terms of Number Correct (Points) and Percent Correct (%) - will show once the correct 'percent correct' has been selected above]

You know Mary's raw score, but does this score tell you how well she did in comparison to her classmates or to all other 9th grade students in her high school? Does it tell you if she got any of the 9th grade Math prizes?

**No!**

In order to rank students and know how one student's score compares to other students who took the same test we must have a norm-referenced score. A norm-referenced score typically indicates the examinee's relative position in a group of test takers that we want to compare the examinee's score with. Once we have information about this group, which we call a __norm group__, we are able to answer the questions above.

[Graphic 2: Mary's raw score compared to her classmates ]

__What is a norm group?__

A **norm group** can be any group we wish to make comparisons against. In high-stakes national or state standardized tests, the norm group must be __representative__ of the national or state population of students taking the same test. To be valid, group comparisons should be made between similar students (e.g., the percent of children learning English should be the same in the norm and comparison groups).

**Quiz:** We know that the results of the Math I exam will be used to give prizes to students in the 95th percentile of ninth grade Math students at ABC High. What is the most appropriate norm group for this kind of decision?

There are different types of norm-referenced scores. The most common are:

**Percentile ranks:**this is a simple way of ranking a raw score to the performance of a particular norm group. Percentiles always run from 1 to 99 and define the percent of the norm group which achieved lower scores. Each percentile table only refers to one specific norm group at the specific time it was tested. If a raw score of 15 correspond to a percentile of 30, it means that 30% of the norm group had raw scores lower than 15.

~~Since percentile ranks indicate various percentages of individuals above and below scores that are normally distributed, they do not represent equal units. Percentile ranks are much more compactly arranged in the middle of the normal distribution since that is where the majority of individuals fall.~~**Standard score:**a transformation of the raw score, whose score distribution in a specified population has convenient, known values for the mean and standard deviation. Often this term is used to specifically denote scores transformed to have a mean of zero and a standard deviation of 1 (also known as z-scores). Standard scores permit the direct comparison of examinees by placement of the scores on a common scale and, for this reason, are useful for comparisons over time.

Activity 3: Now let's define Mary's norm group. To what group do you want to compare Mary's score to?

[Graphic 3: Selecting Norm Group (class or school)] Show Mary's placement in comparison to the class/school.

{__Reactive text:__ Mary scored points or % on the exam. Based on the selected norm group [norm_group], she scored at the [ZZ] percentile. This means that based on her raw score Mary scored higher than [ZZ]% of the other students in her [class/school] (norm group) who took the test. In other words, [ZZ]% of the students who made up the norm group answered fewer than XX items correctly.}

You know Mary's raw score and how well she did compared to her classmates and schoolmates, but do you know if she will be able to move on to Math II?

**No!**

Placement decisions are usually based off a minimum score that the student must obtain to demonstrate proficiency in the subject matter. In order to make pass/fail decisions we usually make a **criterion-referenced interpretation of test scores**.

A **cut score** is a point on the test score scale used for classifying the test takers into groups on the basis of their scores. In general, criterion-referenced tests involve a cut score, where the examinee passes if their score exceeds the cut score and fails if it does not (often called a mastery test). However, __the criterion is not the cut score__; the criterion is the domain of subject matter that the test is designed to assess. In some cases, tests may have multiple cut scores representing tiered levels of proficiency, such as *basic*, *proficient*, or *advanced*. Cut scores may also be applied in certification and licensing exams that are used to determine whether examinees are professionally "*qualified*."

Most criterion-referenced assessments use a **cut score** which determines success or failure based on a pre-established percent correct. Criterion-referenced scores tell us how well the examinee performed against an objective or standard, as opposed to against other examinees. For example, a driver's test intends to determine whether a person is knowledgeable and capable enough to be allowed on the road, not whether one driver is more accomplished than another.

The process used to determine cut scores is formally known as **standard setting**. For tests like classroom assessments cut scores might be defined by the teacher, for final exams it might be defined by the school. For commercial and high-stakes tests, the test developer will form a standard-setting panel by recruiting a group of experts, such as psychometricians (specialists in the science of educational measurement) or teachers from a relevant content area. The panel will then use one or more research-based methods, developed by psychometricians and academics, for setting testing standards and determining cut scores.

Activity 4: Defining a cut score. Choose a cut score and see if Mary passed or failed the exam. You may also go back and change Mary's raw score. Note that defining a cut score does not change Mary's norm-referenced scores.

[Graphic 3: Selecting cut score] - Add Mary's pass/fail information in addition to her placement in comparison to the norm group.

{__Reactive text:__ Mary scored XX points or YY% on the exam. Since Mary scored above the cut score of [KK], she has demonstrated [sufficient/insufficient] knowledge of 9th grade math curriculum and [passed/failed] the exam.}

Activity 5: Defining additional cut scores. Suppose you want to further categorize students into basic, proficient and advanced based on their scores on the math final exam. Select a cut score for the advanced level and see how Mary's classification changes.

[Graphic 4: Tiered classification graph]

{__Reactive text:__ Mary scored XX points or YY% on the exam. Since Mary scored [above/below] the cut score of [KK], she has demonstrated [basic/proficient/advanced] knowledge of 9th grade math curriculum and [passed/failed] the exam.}

We can, and often do, interpret a single test score both in terms of norm- and criterion-referencing. However, these interpretations are distinct and should not be confused.

Activity 6: Now we will combine both types of interpretations of Mary's test score. Try changing Mary's raw score and see how it affects both her percentile rank in the norm group you selected and her proficiency levels based on the cut scores you defined.

[Graphic 5: Norm and Criterion-referenced interpretations]

{__Reactive text:__ Mary scored XX points or YY% on the exam. This corresponds to a standard score of ZZ based on the distribution of scores for her [norm group] and she scored at the [ZZ] percentile. This means that based on her raw score Mary scored higher than [ZZ]% of the other students in her [class/school] (norm group) who took the test. In other words, [ZZ]% of the students who made up the norm group answered fewer than XX items correctly.

Mary [did/did not] score at the 95th percentile rank, therefore she [did/did not] win the Math prize.

Additionally, since Mary scored [above/below] the cut score of [KK], she has demonstrated [basic/proficient/advanced] knowledge of 9th grade math curriculum and [passed/failed] the exam.
}

The distinction between norm-referenced and criterion-referenced score interpretation can be summarized as follows.

**Norm referencing:**when we interpret a score of an individual by comparing his score to those of other individuals called a norm group.**Criterion referencing:**when we interpret a person's performance by comparing it to some pre-specified standard or criterion of proficiency.

In our example, we could say that the focus of a normative score is on how well Mary performed on the Math final exam compared to her peers and the focus of a criterion-referenced score is on what it is that Mary can do and if she met the criteria of mastery of content included in the Math exam.

Uses and limitations of **norm-referenced** scores:

- In order to make norm-referenced interpretations, the norm group must be representative of the population of examinees we want to compare to.
- Norm-referenced scores are usually used to identify specific learning disabilities, such as autism, dyslexia, or nonverbal learning disability, or to determine eligibility for special-education services.
- Reporting results in terms of percentiles or standard scores does not provide information on whether or not the students were proficient.
- It is possible that all students performed poorly and that none are proficient or that all are proficient.
- Rank ordering the student scores does not necessarily reflect an amount of knowledge and skills measured by the test.

Uses and limitations of **criterion-referenced** scores:

- Every student taking the exam could theoretically fail if they don't meet the expected standard; alternatively, every student could earn the highest possible score.
- It is not only possible, but desirable, for every student to pass the test or earn a perfect score.
- These scores are used to report how well students are doing relative to a pre-determined performance level on a specified set of educational goals or outcomes included in the school, district, or state curriculum.
- Scores may be used as one piece of information to determine how well the student is learning the desired curriculum and how well the school is teaching that curriculum.

"Although loads of educators refer to "criterion-referenced tests" and "norm-referenced tests," there are, technically, no such creatures. Rather, there are criterion- and norm-referenced interpretations of students' test performances. For example, educators in a school district might have built a test to yield criterion-referenced interpretations, used the test for several years and, in the process, gathered substantial data regarding the performances of district students. As a consequence, the district's educators could build normative tables permitting norm-referenced interpretations of the test which, although born to provide criterion-referenced inferences, can still permit meaningful norm-referenced interpretations." Popham, W. James. 2011. *Classroom Assessment: What Teachers Need to Know (Sixth Edition)*. Boston: Pearson Education, Inc. pp. 46-8.