Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?

被引:20
作者
Kibble, Jonathan D. [1 ,2 ]
Johnson, Teresa [1 ]
机构
[1] Univ Cent Florida, Coll Med, Orlando, FL 32827 USA
[2] Mem Univ Newfoundland, Hlth Sci Ctr, St John, NF, Canada
关键词
Bloom's taxonomy; assessment; evaluation; multiple-choice questions; standard setting; physiology education; medical education; hidden curriculum; STUDENTS; FACT;
D O I
10.1152/advan.00062.2011
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Kibble JD, Johnson T. Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations? Adv Physiol Educ 35: 396-401, 2011; doi:10.1152/advan.00062.2011.-The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors "easy," " moderate," or "hard" and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (rho = -0.19, P < 0.01), indicating that, as intended item difficulty increased, the resulting student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (chi(2) = 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (rho = -0.09, P = 0.14) or item discrimination (rho = 0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examinations were of high quality, as evidenced by reliability coefficients (Cronbach's alpha) of 0.70-0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.
引用
收藏
页码:396 / 401
页数:6
相关论文
共 13 条
[1]   SUBJECT-MATTER EXPERTS ASSESSMENT OF ITEM STATISTICS [J].
BEJAR, II .
APPLIED PSYCHOLOGICAL MEASUREMENT, 1983, 7 (03) :303-310
[2]   AMEE Guide No. 18: Standard setting in student assessment [J].
Ben-David, MF .
MEDICAL TEACHER, 2000, 22 (02) :120-130
[3]  
Bloom B. S., 1956, TAXONOMY ED OBJECTIV
[4]   Measurement practices: methods for developing content-valid student examinations [J].
Bridge, PD ;
Musial, J ;
Frank, R ;
Roe, T ;
Sawilowsky, S .
MEDICAL TEACHER, 2003, 25 (04) :414-421
[5]  
Coderre S, 2009, MED TEACH, V31, P332
[6]   Applying learning taxonomies to test items: Is a fact an artifact? [J].
Cunnington, JPW ;
Norman, GR ;
Blake, JM ;
Dauphinee, WD ;
Blackmore, DE .
ACADEMIC MEDICINE, 1996, 71 (10) :S31-S33
[7]   Beyond curriculum reform: Confronting medicine's hidden curriculum [J].
Hafferty, FW .
ACADEMIC MEDICINE, 1998, 73 (04) :403-407
[8]  
Knecht KT, 2001, AM J PHARM EDUC, V65, P324
[9]   A SUGGESTED TECHNIQUE FOR THE IMPROVEMENT OF DIFFICULTY PREDICTION OF TEST ITEMS [J].
Lorge, Irving ;
Kruglov, Lorraine .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1952, 12 (04) :554-561
[10]  
MacLaughlin K, 2005, BMC MED EDUC, V5, P39