Using Deep Reinforcement Learning to Decide Test Length

被引：0

作者：

Zoucha, James ^{[1
]}

Himelfarb, Igor ^{[2
]}

Tang, Nai-En ^{[2
]}

机构：

[1] Univ Northern Colorado, Greeley, CO USA

[2] Natl Board Chiropract Examiners, Greeley, CO USA

来源：

EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT | 2025年

关键词：

deep reinforcement learning; machine learning; psychometrics; COGNITIVE FATIGUE; SHORT FORMS; ITEM; ALGORITHM; PACKAGE; SELECTION; STRATEGY; QUALITY; STRESS; DESIGN;

D O I：

10.1177/00131644251332972

中图分类号：

G44 [教育心理学];

学科分类号：

0402 ; 040202 ;

摘要：

This study explored the application of deep reinforcement learning (DRL) as an innovative approach to optimize test length. The primary focus was to evaluate whether the current length of the National Board of Chiropractic Examiners Part I Exam is justified. By modeling the problem as a combinatorial optimization task within a Markov Decision Process framework, an algorithm capable of constructing test forms from a finite set of items while adhering to critical structural constraints, such as content representation and item difficulty distribution, was used. The findings reveal that although the DRL algorithm was successful in identifying shorter test forms that maintained comparable ability estimation accuracy, the existing test length of 240 items remains advisable as we found shorter test forms did not maintain structural constraints. Furthermore, the study highlighted the inherent adaptability of DRL to continuously learn about a test-taker's latent abilities and dynamically adjust to their response patterns, making it well-suited for personalized testing environments. This dynamic capability supports real-time decision-making in item selection, improving both efficiency and precision in ability estimation. Future research is encouraged to focus on expanding the item bank and leveraging advanced computational resources to enhance the algorithm's search capacity for shorter, structurally compliant test forms.

引用

页数：28

共 85 条

[1] Cognitive Fatigue During Testing: An Examination of Trait, Time-on-Task, and Strategy Influences
Ackerman, Phillip L.
Kanfer, Ruth
Shapiro, Stacey W.
Newton, Sunni
Beier, Margaret E.
[J]. HUMAN PERFORMANCE, 2010, 23 (05) : 381 - 402
[2] Test Length and Cognitive Fatigue: An Empirical Examination of Effects on Performance and Test-Taker Reactions
Ackerman, Phillip L.
Kanter, Ruth
[J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-APPLIED, 2009, 15 (02) : 163 - 181
[3] Optimization Approaches for the Traveling Salesman Problem with Drone
Agatz, Niels
Bouman, Paul
Schmidt, Marie
[J]. TRANSPORTATION SCIENCE, 2018, 52 (04) : 965 - 981
[4] Angoff W. H., 1953, Psychometrika, V18, P1
[5] Bello Irwan, 2016, arXiv
[6] ADAPTIVE EAP ESTIMATION OF ABILITY IN A MICROCOMPUTER ENVIRONMENT
BOCK, RD
MISLEVY, RJ
[J]. APPLIED PSYCHOLOGICAL MEASUREMENT, 1982, 6 (04) : 431 - 444
[7] IRT estimation of domain scores
Bock, RD
Thissen, D
Zimowski, MF
[J]. JOURNAL OF EDUCATIONAL MEASUREMENT, 1997, 34 (03) : 197 - 211
[8] An SEM Algorithm for Scale Reduction Incorporating Evaluation of Multiple Psychometric Criteria
Browne, Matthew
Rockloff, Matthew
Rawat, Vijay
[J]. SOCIOLOGICAL METHODS & RESEARCH, 2018, 47 (04) : 812 - 836
[9] Burisch M, 1997, EUR J PERSONALITY, V11, P303, DOI 10.1002/(SICI)1099-0984(199711)11:4<303::AID-PER292>3.0.CO
[10] 2-#

← 1 2 3 4 5 6 7 8 9 →