Item Development and Analysis Worksheet
Student Name: Section: PSYC421-
PART 1: Writing Multiple Choice Test Items
Develop one multiple choice question that covers content from each of the four chapters listed below. When writing your sample questions, please keep in mind the specifications regarding item construction discussed in the textbook. Also, remember the importance of carefully crafted distractor options. Finally, please limit the number of response options to 4 (1 correct response and 3 distractors), and avoid the options of “all of the above,” none of the above,” or the like. Be sure to indicate which of the response options is the correct one.
Chapter 3 Multiple Choice Question (5 points)
It is in agreement that there are ____diverse degrees or scales of measurement.
- A) Six
- B) Two
- C) One
- D) Four
Correct Answer is D
Chapter 4 Multiple Choice Question (5 points)
______ is an informed scientific concept developed to describe or explain behaviour.
- A) Overt Behavior
- B) Construct
- C) Trait
- D) State
Correct Answer is B
Chapter 5 Multiple Choice Question (5 points)
Measurement error, much like an error, in general, can be characterized as being either methodical or __________
- A) unique
- B) constant
- C) random
- D) valid
Correct Answer is C
Chapter 6 Multiple Choice Question (5 points)
________ the degree to which an additional predictor explains something about the criterion measure that is not -explained by predictors already in use.
- A) Incremental Validity
- B) Construct Validity
- C) Homogeneity
- D) Validity Coefficient
The correct answer is A
PART 2: Item Analysis: Item Difficulty Index (Cohen & Swerdlik, 2017, pg. 248)
A test is only as good as its questions! When researchers, test constructors, and educators create items for ability or achievement tests, we have a responsibility to evaluate the items and make sure that they are useful and high-quality. The process that we use to evaluate test items is known as Item Analysis. When bad items are identified and eliminated from a test, that increases the efficiency, reliability and validity of the entire test! One way that we can distinguish among good and bad items is with the Item Difficulty Index.
Part 2A: Calculating Item Difficulty
Using the data below, calculate the Item Difficulty Index for the first 6 items on Quiz 1 from a recent section of PSYC101. For each item, “1” means the item was answered correctly and “0” means it was answered incorrectly. Type your answers in the spaces provided at the bottom of the table. (2 pts. each)
PSYC101 Quiz 1 Item Distribution and Total Scores | ||||||
Examinee | Item 1 | Item 2 | Item 3 | Item 4 | Item 5 | Item 6 |
Andre | 1 | 1 | 1 | 1 | 1 | 1 |
Allison | 0 | 1 | 1 | 1 | 0 | 0 |
Heather | 1 | 1 | 1 | 1 | 0 | 0 |
Corey | 1 | 1 | 0 | 1 | 1 | 0 |
Christina | 0 | 0 | 1 | 0 | 0 | 1 |
Jeffrey | 0 | 1 | 1 | 1 | 0 | 0 |
Shawn | 1 | 1 | 1 | 1 | 0 | 1 |
Dana | 0 | 0 | 1 | 1 | 0 | 1 |
Megan | 1 | 1 | 1 | 1 | 0 | 1 |
David | 0 | 1 | 1 | 1 | 0 | 1 |
Isabel | 0 | 1 | 0 | 1 | 0 | 0 |
Lance | 1 | 1 | 1 | 1 | 0 | 0 |
Aliyah | 0 | 1 | 1 | 1 | 0 | 1 |
Blaire | 0 | 1 | 1 | 1 | 1 | 1 |
Gabriel | 0 | 0 | 1 | 1 | 0 | 0 |
Item Difficulty |
Part 2B: Calculating Optimal Item Difficulty (1 pt. each)
- For a test item with two response options (e.g., true/false), what is the probability of selecting the correct answer by chance?
50 %
- Calculate the optimal level of difficulty for test questions with two response options.
.75
- For a test item with three response options, what is the probability of selecting the correct answer by chance?
33.33 %
- Calculate the optimal level of difficulty for test questions with three response options.
.665
- For a test item with four response options, what is the probability of selecting the correct answer by chance?
25 %
- Calculate the optimal level of difficulty for test questions with four response options.
.625
- For a test item with five response options, what is the probability of selecting the correct answer by chance?
20 %
- Calculate the optimal level of difficulty for test questions with five response options.
.60
PART 3: Item Analysis: Item Discrimination Index (Cohen & Swerdlik, 2017, pg. 250–253)
Another way that test creators can distinguish between good and bad items is with an analysis called the Discrimination Index. The discrimination index measures how well an individual test item distinguishes between high scorers and low scores on the test. An item is considered to be “good” if most of the high scorers get it right, and most of the low scorers get it wrong.
Interpreting the Discrimination Index (d)
- The discrimination index can range from -1.0 to 1.0.
- The closer d is to 1.0, the better the item discriminates between high and low scorers
- The closer d is to 0, the more poorly the item discriminates between high and low scorers.
- An item with a negative discrimination index is considered a “negative discriminator” because more low scorers get the item correct than high scorers.
- A discrimination index of 1.0 means all the high scorers got the item correct and all of the low scorers got it incorrect.
- A discrimination index of -1.0 means all of the low scorers got the item correct and all of the high scorers got it incorrect.
- Items with d’s close to 0 or with negative d’s ought to be eliminated from the test!
Calculating the Item Discrimination Index (d)
Calculate the item discrimination index (d) for the 7 hypothetical test items presented below. Type your answers in the spaces provided at the right of the table (2 pts. each).
Item # | U | L | n | d |
Item 1 | 0 | 30 | 30 | |
Item 2 | 25 | 8 | 30 | |
Item 3 | 23 | 19 | 30 | |
Item 4 | 26 | 3 | 30 | |
Item 5 | 28 | 1 | 30 | |
Item 6 | 19 | 5 | 30 | |
Item 7 | 3 | 26 | 30 |
Based on your calculations above, answer the following questions (2 pts. each).
- Which item discriminates the best? 3
- Which item discriminates most poorly? 1
- Based on your analysis, identify which two items would you choose to eliminate from this test and explain why you would eliminate each.
Item 7 and item 1, item 7 is the only negative and item 1 because it has a small discrimination index.
Part 4: Item Characteristic Curves (Cohen & Swerdlik, pg. 253–255)
Another method that test creators can use to assess the usefulness of test items is with Item Characteristic Curves. Item characteristic curves provide a graphical depiction of examinees’ performance on individual test items. As indicated in the figure below, Total Test Score is plotted on the x-axis of the curve, while proportion of examinees who got the item correct is plotted on the y-axis
Using the figure above, provide a written description of how test items A–E discriminate among examinees at various levels of performance. In your responses, discuss why each item would be considered a “good” or a “bad” item. EXAMPLE: “This item discriminates well among high scores, but doesn’t discriminate well among low scorers. So this item would be considered a good item because it discriminates at the highest levels of performance.” (4 pts. each)
Item A: it is good, it does not discriminate both the high and low scores
Item B: is bad; its slope is too high and low. It discriminates the high low score and the low high score.
Item C: it good, it discriminates the low low score and the high high score.
Item D: it is good, it discriminates the lowlow score and the high high score.
Item E: it is good; it discriminates the low low score and the high high score.
Part 5: Qualitative Item Analysis (Cohen & Swerdlik, pg. 258–260)
Qualitative item analysis refers to a set of non-statistical procedures used to gather information about the usefulness of test items. These analyses typically involve interviews, panel discussions, questionnaires and other forms of verbal exchange with test-takers to explore how individual test items work.
As an online student, you have a very different test-taking experience than residential students. Based on your readings from Chapter 8, identify 4 topics related to online test taking, and create 4 qualitative questions that you could ask online test-takers to gain an understanding of their experiences with test-taking. Also, as students at a Christian institution of higher education, course assignments/assessments are supposed to give students an opportunity to integrate course content with their Christian worldview. Given the topic of faith and learning, create one qualitative question that you could ask test-takers.
Qualitative Item Analysis | |
Topic (2 pts. each) | Sample Question for Test-Takers (2 pts. each) |
Cultural Sensitivity
| Do u think what people believe was affect by this test? |
Environment test
| What was your location due to the completing this test? Did the environment interfere with your ability to complete the test? |
Overall test taker impressions
| What was the impression you got from this test? Are there any suggestions on how the test can be improved? |
Test fairness
| Do you believe the test was fair? |
Tester guessing
| Did you have to guess on any questions that pertained to scripture? How many items do you think you had to guess on? |