Performance of Polytomous IRT Models With Rating Scale Data: An Investigation Over Sample Size, Instrument Length, and Missing Data

被引：34

作者：

Dai, Shenghai ^{[1
]}

Thao Thu Vo ^{[1
]}

Kehinde, Olasunkanmi James ^{[1
]}

He, Haixia ^{[2
]}

Xue, Yu ^{[1
]}

Demir, Cihan ^{[1
]}

Wang, Xiaolin ^{[3
]}

机构：

[1] Washington State Univ, Dept Knesiol & Educ Psychol, Pullman, WA 99164 USA

[2] Washington State Univ, Dept Teaching & Learning, Pullman, WA 99164 USA

[3] Pearson VUE, Bloomington, MN USA

来源：

FRONTIERS IN EDUCATION | 2021年 / 6卷

关键词：

IRT; GRM; GPCM; sample size; instrument length; missing data; ITEM RESPONSE THEORY; PATIENT-REPORTED OUTCOMES; PARTIAL CREDIT MODEL; MAXIMUM-LIKELIHOOD; FIT; PARAMETERS; RECOVERY;

D O I：

10.3389/feduc.2021.721963

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

The implementation of polytomous item response theory (IRT) models such as the graded response model (GRM) and the generalized partial credit model (GPCM) to inform instrument design and validation has been increasing across social and educational contexts where rating scales are usually used. The performance of such models has not been fully investigated and compared across conditions with common survey-specific characteristics such as short test length, small sample size, and data missingness. The purpose of the current simulation study is to inform the literature and guide the implementation of GRM and GPCM under these conditions. For item parameter estimations, results suggest a sample size of at least 300 and/or an instrument length of at least five items for both models. The performance of GPCM is stable across instrument lengths while that of GRM improves notably as the instrument length increases. For person parameters, GRM reveals more accurate estimates when the proportion of missing data is small, whereas GPCM is favored in the presence of a large amount of missingness. Further, it is not recommended to compare GRM and GPCM based on test information. Relative model fit indices (AIC, BIC, LL) might not be powerful when the sample size is less than 300 and the length is less than 5. Synthesis of the patterns of the results, as well as recommendations for the implementation of polytomous IRT models, are presented and discussed.

引用

页数：18

共 47 条

[1]

Burt W., 2003, COMP ITEM EXPOSURE C

[2] Psychometric Properties of Three New National Survey of Student Engagement Based Engagement Scales: An Item Response Theory Analysis [J].

Carle, Adam C. ;

Jaffee, David ;

Vaughan, Neil W. ;

Eder, Douglas .

RESEARCH IN HIGHER EDUCATION, 2009, 50 (08) :775-794

[3] Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research [J].

Cheema, Jehanzeb R. .

JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2014, 13 (02) :53-75

[4]

Cohen J., 1988, STAT POWER ANAL BEHA, DOI 10.4324/9780203771587

[5] Applying Item Response Theory (IRT) Modeling to an Observational Measure of Childhood Pragmatics: The Pragmatics Observational Measure-2 [J].

Cordier, Reinie ;

Munro, Natalie ;

Wilkes-Gillan, Sarah ;

Speyer, Renee ;

Parsons, Lauren ;

Joosten, Annette .

FRONTIERS IN PSYCHOLOGY, 2019, 10

[6] Investigation of Missing Responses in Q-Matrix Validation [J].

Dai, Shenghai ;

Svetina, Dubravka ;

Chen, Cong .

APPLIED PSYCHOLOGICAL MEASUREMENT, 2018, 42 (08) :660-676

[7]

De Ayala R.J., 2013, The theory and practice of item response theory

[8]

De Ayala RJ, 2001, J EDUC MEAS, V38, P213

[9] Power and Sample Size Calculations in Clinical Trials with Patient-Reported Outcomes under Equal and Unequal Group Sizes Based on Graded Response Model: A Simulation Study [J].

Doostfatemeh, Marziyeh ;

Ayatollah, Seyyed Mohammad Taghi ;

Jafari, Peyman .

VALUE IN HEALTH, 2016, 19 (05) :639-647

[10] Item Response Theory Analysis of the Psychopathic Personality Inventory-Revised [J].

Eichenbaum, Alexander E. ;

Marcus, David K. ;

French, Brian F. .

ASSESSMENT, 2019, 26 (06) :1046-1058

← 1 2 3 4 5 →