Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement

被引:575
作者
Edelen, Maria Orlando [1 ]
Reeve, Bryce B.
机构
[1] Brown Univ, Sch Med, Dept Psychiat & Human Behav, Providence, RI 02912 USA
[2] Natl Canc Inst, Bethesda, MD USA
关键词
IRT; health outcomes; adolescent depression; short form;
D O I
10.1007/s11136-007-9198-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Health outcomes researchers are increasingly applying Item Response Theory (IRT) methods to questionnaire development, evaluation, and refinement efforts. Objective To provide a brief overview of IRT, to review some of the critical issues associated with IRT applications, and to demonstrate the basic features of IRT with an example. Methods Example data come from 6,504 adolescent respondents in the National Longitudinal Study of Adolescent Health public use data set who completed to the 19 item Feelings Scale for depression. The sample was split into a development and validation sample. Scale items were calibrated in the development sample with the Graded Response Model and the results were used to construct a 10-item short form. The short form was evaluated in the validation sample by examining the correspondence between IRT scores from the short form and the original, and by comparing the proportion of respondents identified as depressed according to the original and short form observed cut scores. Results The 19-items varied in their discrimination (slope parameter range: .86-2.66), and item location parameters reflected a considerable range of depression (-.72-3.39). However, the item set is most discriminating at higher levels of depression. In the validation sample IRT scores generated from the short and long forms were correlated at .96 and the average difference in these scores was -.01. In addition, nearly 90% of the sample was classified identically as at risk or not at risk for depression using observed score cut points from the short and long forms. Conclusions When used appropriately, IRT can be a powerful tool for questionnaire development, evaluation, and refinement, resulting in precise, valid, and relatively brief instruments that minimize response burden.
引用
收藏
页码:5 / 18
页数:14
相关论文
共 70 条
[1]   MULTICATEGORICAL SPLINE MODEL FOR ITEM RESPONSE THEORY [J].
ABRAHAMOWICZ, M ;
RAMSAY, JO .
PSYCHOMETRIKA, 1992, 57 (01) :5-27
[2]   GOODNESS OF FIT TEST FOR RASCH MODEL [J].
ANDERSEN, EB .
PSYCHOMETRIKA, 1973, 38 (01) :123-140
[3]   RATING FORMULATION FOR ORDERED RESPONSE CATEGORIES [J].
ANDRICH, D .
PSYCHOMETRIKA, 1978, 43 (04) :561-573
[4]  
Andrich D., 1978, Applied Psychological Measurement, V2, P581, DOI DOI 10.1177/014662167800200413
[5]  
[Anonymous], 2003, IRT SSI BILOGMG MULT
[6]  
[Anonymous], 2004, METHODS TESTING EVAL
[7]  
Bearman P., 1997, The national longitudinal study on adolescent health: research design
[8]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[9]  
BENTLER PM, 1990, PSYCHOL BULL, V107, P238, DOI 10.1037/0033-2909.88.3.588
[10]  
BJORNER JB, 2005, ANN M INT SOC QUAL L