Impact of Cross-Calibration Methods on the Interpretation of a Treatment Comparison Study Using 2 Depression Scales

被引:7
作者
Fischer, Herbert Felix [1 ,2 ]
Wahl, Inka [3 ,4 ]
Fliege, Herbert [3 ,4 ]
Klapp, Burghard F. [2 ]
Rose, Matthias [3 ,4 ]
机构
[1] Charite Univ Med Ctr, Inst Social Med Epidemiol & Hlth Econ, D-10117 Berlin, Germany
[2] Charite Univ Med Ctr, Dept Psychosomat Med & Psychotherapy, Clin Internal Med, D-10117 Berlin, Germany
[3] Univ Med Ctr Hamburg Eppendorf, Dept Psychosomat Med & Psychotherapy, Hamburg, Germany
[4] Schon Klin Hamburg Eilbek, Hamburg, Germany
关键词
item response theory; psychometrics; comparability of measures; depression self-rating scales; equating; ITEM RESPONSE THEORY; ICD-10-SYMPTOM-RATING ISR; VALIDITY; MODEL;
D O I
10.1097/MLR.0b013e31822945b4
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Many questionnaires assessing depressive symptoms are available. Most of these questionnaires are constructed based on classical test theory, making comparisons of individual scores difficult. Item response theory (IRT) allows the comparison of scores from different instruments. In this study, the impact of IRT-based cross-calibration methods on the results of a treatment outcome study was evaluated using 2 instruments. Methods: Data collected during admission and discharge procedures from 1066 inpatients in 2 psychosomatic clinics using different depression measures were analyzed. To achieve comparability across the applied depression measures, we used an IRT-based conversion table to transform scores from one instrument's scale to the other. Latent trait values were also estimated using different instruments in each clinic. We compared these methods to the traditional approach of using the same instrument in both clinics and examined their effects on the statistical analyses. Results: There was no substantial change in the interpretation of the study results when different instruments were used. However, F values, P values, and effect sizes in the analysis of variance changed significantly. This might be attributed to differences in the content or measurement properties of the instruments. Interestingly, no difference was observed between use of transformed sum scores and latent trait values. Conclusions: IRT cross-calibration methods are a convenient way to enhance the comparability of questionnaire data in applied clinical settings but seem not to be able to overcome differences in measurement properties of the instruments. As these differences can lead to biased results, there is a need for further research into more advanced techniques.
引用
收藏
页码:320 / 326
页数:7
相关论文
共 36 条
  • [1] American Psychiatric Association, 2000, Text Revision (DSM-IV-TR), V4th, DOI DOI 10.1176/APPI.BOOKS.9780890423349
  • [2] [Anonymous], 1993, Educational measurement: issues and practice
  • [3] Using item response theory to calibrate the Headache Impact Test (HIT™) to the metric of traditional headache scales
    Bjorner, JB
    Kosinski, M
    Ware, JE
    [J]. QUALITY OF LIFE RESEARCH, 2003, 12 (08) : 981 - 1002
  • [4] COMPARING METHODS OF MEASUREMENT - WHY PLOTTING DIFFERENCE AGAINST STANDARD METHOD IS MISLEADING
    BLAND, JM
    ALTMAN, DG
    [J]. LANCET, 1995, 346 (8982): : 1085 - 1087
  • [5] STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENT
    BLAND, JM
    ALTMAN, DG
    [J]. LANCET, 1986, 1 (8476) : 307 - 310
  • [6] Bortz J, 1999, STAT SOCIAL SCI
  • [7] Canty A, BOOT BOOTSTRAP R S P, P2
  • [8] Linking Pain Items from Two Studies Onto a Common Scale Using Item Response Theory
    Chen, Wen-Hung
    Revicki, Dennis A.
    Lai, Jin-Shei
    Cook, Karon F.
    Amtmann, Dagmar
    [J]. JOURNAL OF PAIN AND SYMPTOM MANAGEMENT, 2009, 38 (04) : 615 - 628
  • [9] ETA-SQUARED AND PARTIAL ETA-SQUARED IN FIXED FACTOR ANOVA DESIGNS
    COHEN, J
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1973, 33 (01) : 107 - 112
  • [10] Davison A.C., 2006, BOOTSTRAP METHODS TH