A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research

被引:86
作者
Aryadoust, Vahid [1 ]
Ng, Li Ying [1 ]
Sayama, Hiroki [2 ,3 ]
机构
[1] Nanyang Technol Univ, Natl Inst Educ, Singapore, Singapore
[2] SUNY Binghamton, Binghamton, NY USA
[3] Waseda Univ, Waseda Innovat Lab, Tokyo, Japan
关键词
Fit; language assessment; local independence; network analysis; modularity maximization method; Rasch measurement; reliability and separation; unidimensionality; LOCAL ITEM DEPENDENCE; CRITICAL-VALUES; RESPONSE THEORY; MODEL; FIT; UNIDIMENSIONALITY; REPLICATION; QUALITY; TESTS;
D O I
10.1177/0265532220927487
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Over the past decades, the application of Rasch measurement in language assessment has gradually increased. In the present study, we reviewed and coded 215 papers using Rasch measurement published in 21 applied linguistics journals for multiple features. We found that seven Rasch models and 23 software packages were adopted in these papers, with many-facet Rasch measurement (n = 100) andFacets(n = 113) being the most frequently used Rasch model and software, respectively. Significant differences were detected between the number of papers that applied Rasch measurement to different language skills and components, with writing (n = 63) and grammar (n = 12) being the most and least frequently investigated, respectively. In addition, significant differences were found between the number of papers reporting person separation (n = 73, not reported:n = 142) and item separation (n = 59, not reported:n = 156) and those that did not. An alarming finding was how few papers reported unidimensionality check (n = 57 vs 158) and local independence (n = 19 vs 196). Finally, a multilayer network analysis revealed that research involving Rasch measurement has created two major discrete communities of practice (clusters), which can be characterized by features such as language skills, the Rasch models used, and the reporting of item reliability/separation vs person reliability/separation. Cluster 1 was accordingly labelled the production and performance cluster, whereas cluster 2 was labelled the perception and language elements cluster. Guidelines and recommendations for analyzing unidimensionality, local independence, data-to-model fit, and reliability in Rasch model analysis are proposed.
引用
收藏
页码:6 / 40
页数:35
相关论文
共 120 条
[1]  
Ackerman Terry A, 2003, Educational Measurement: Issues and Practice, V22, P37, DOI [DOI 10.1111/J.1745-3992.2003.TB00136.X, 10.1111/j.1745-3992.2003.tb00136.x]
[2]   SUFFICIENT STATISTICS AND LATENT TRAIT MODELS [J].
ANDERSEN, EB .
PSYCHOMETRIKA, 1977, 42 (01) :69-81
[3]   RATING FORMULATION FOR ORDERED RESPONSE CATEGORIES [J].
ANDRICH, D .
PSYCHOMETRIKA, 1978, 43 (04) :561-573
[4]  
Andrich D., 2012, RUMM2030: Rasch Unidimensional Models for Measurement
[5]  
[Anonymous], QUANTITATIVE DATA AN
[6]  
[Anonymous], 2016, ACER CONQUEST GEN IT
[7]  
[Anonymous], 2018, SCIMAGO J COUNTRY RA
[8]  
[Anonymous], 2010, Rasch Measurement Transactions, V24, P1289
[9]  
Armstrong R.D., 2007, Practical Assessment, Research and Evaluation, V12, P16, DOI [10.7275/xz5d-7j62, DOI 10.7275/XZ5D-7J62]
[10]   A neurocognitive investigation of test methods and gender effects in listening assessment [J].
Aryadoust, Vahid ;
Ng, Li Ying ;
Foo, Stacy ;
Esposito, Gianluca .
COMPUTER ASSISTED LANGUAGE LEARNING, 2022, 35 (04) :743-763