The following test diagnostics can help you design tests of appropriate difficulty level or tests that discriminate between high and low scorers. Test diagnostics can also help you pinpoint potential trouble spots in your instruction or your assessments.

Difficulty Levels

It is relatively easy to determine the difficulty level of individual test items. You simply divide the number of students who answered an item correctly by the total number of students who answered the item. Multiply that figure by 100. If the purpose for your test is norm-referencing or discrimination, you will seek items with a difficulty level around 50%.

Item Analysis for Norm-Referencing

Item analysis is an important technique for perfecting norm-referenced tests that discriminate between high and low scorers. You begin by choosing some standard for high and low scorers on a test. For example, high scorers = students with test scores in the top 1/3 of the class, and low scorers = students with test scores in the bottom 1/3 of the class. In the example below, 12 students are in the high and low groups. On a specific test item, if the 12 students in the high group respond to the item correctly (answer B) and the 12 students in the low group respond to the item incorrectly, you have a perfectly discriminating item.

A
B
C
D
E
Groups
12
All 12 Students in High Group
Respond to Item Correctly
3
0
3
3
3
All 12 Students in Low Group
Respond to Item Incorrectly

You can compute a discrimination index for an individual test item as follows: subtract the number of students in the low group responding to the item correctly from the number of students in the high group responding to the item correctly (12-0). Divide this figure by the number of students in one group (12). The value of the example above is 1.0, or a perfectly discriminating item.

  • Designers of norm-referenced tests typically seek items in the range of .35 to .70.
  • If most students in both the high and low groups respond to an item correctly, your discrimination index might be .14. This is a red flag that the test item is too easy.
  • If more students in the low group respond to an item correctly than students in the high group, your discrimination index would be negative (-.08). This is a red flag that the test item is flawed.

Item Analysis for Diagnosing Instruction and Assessments

If you distribute your test items both pre and post-instruction, you can also use a form of item analysis to diagnose both your instruction and its assessments. Simply compare the percentage of students who responded to an item correctly both pre and post-instruction.

Students Responding to an Item Correctly Pre-Instruction
Students Responding to an Item Correctly Post- Instruction
Indicator of:
15%
85%
successful instruction
15%
22%
defect in test item or in the instruction, need to reteach content**
80%
90%
defect in test item or instructional complexity too low
75%
29%
defect in test item, typo, need to check your question

**If a specific test has several items with poor response rates both pre and post-instruction, you should return to your test blueprint and determine which goals/objectives were not understood by the students. It may be that all of the poorly understood items were from the same goal or objective area, thus you would need to reteach that portion of your content.