Using Deep Reinforcement Learning to Decide Test Length

被引:0
作者
Zoucha, James [1 ]
Himelfarb, Igor [2 ]
Tang, Nai-En [2 ]
机构
[1] Univ Northern Colorado, Greeley, CO USA
[2] Natl Board Chiropract Examiners, Greeley, CO USA
关键词
deep reinforcement learning; machine learning; psychometrics; COGNITIVE FATIGUE; SHORT FORMS; ITEM; ALGORITHM; PACKAGE; SELECTION; STRATEGY; QUALITY; STRESS; DESIGN;
D O I
10.1177/00131644251332972
中图分类号
G44 [教育心理学];
学科分类号
0402 ; 040202 ;
摘要
This study explored the application of deep reinforcement learning (DRL) as an innovative approach to optimize test length. The primary focus was to evaluate whether the current length of the National Board of Chiropractic Examiners Part I Exam is justified. By modeling the problem as a combinatorial optimization task within a Markov Decision Process framework, an algorithm capable of constructing test forms from a finite set of items while adhering to critical structural constraints, such as content representation and item difficulty distribution, was used. The findings reveal that although the DRL algorithm was successful in identifying shorter test forms that maintained comparable ability estimation accuracy, the existing test length of 240 items remains advisable as we found shorter test forms did not maintain structural constraints. Furthermore, the study highlighted the inherent adaptability of DRL to continuously learn about a test-taker's latent abilities and dynamically adjust to their response patterns, making it well-suited for personalized testing environments. This dynamic capability supports real-time decision-making in item selection, improving both efficiency and precision in ability estimation. Future research is encouraged to focus on expanding the item bank and leveraging advanced computational resources to enhance the algorithm's search capacity for shorter, structurally compliant test forms.
引用
收藏
页数:28
相关论文
共 85 条
  • [1] Cognitive Fatigue During Testing: An Examination of Trait, Time-on-Task, and Strategy Influences
    Ackerman, Phillip L.
    Kanfer, Ruth
    Shapiro, Stacey W.
    Newton, Sunni
    Beier, Margaret E.
    [J]. HUMAN PERFORMANCE, 2010, 23 (05) : 381 - 402
  • [2] Test Length and Cognitive Fatigue: An Empirical Examination of Effects on Performance and Test-Taker Reactions
    Ackerman, Phillip L.
    Kanter, Ruth
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-APPLIED, 2009, 15 (02) : 163 - 181
  • [3] Optimization Approaches for the Traveling Salesman Problem with Drone
    Agatz, Niels
    Bouman, Paul
    Schmidt, Marie
    [J]. TRANSPORTATION SCIENCE, 2018, 52 (04) : 965 - 981
  • [4] Angoff W. H., 1953, Psychometrika, V18, P1
  • [5] Bello Irwan, 2016, arXiv
  • [6] ADAPTIVE EAP ESTIMATION OF ABILITY IN A MICROCOMPUTER ENVIRONMENT
    BOCK, RD
    MISLEVY, RJ
    [J]. APPLIED PSYCHOLOGICAL MEASUREMENT, 1982, 6 (04) : 431 - 444
  • [7] IRT estimation of domain scores
    Bock, RD
    Thissen, D
    Zimowski, MF
    [J]. JOURNAL OF EDUCATIONAL MEASUREMENT, 1997, 34 (03) : 197 - 211
  • [8] An SEM Algorithm for Scale Reduction Incorporating Evaluation of Multiple Psychometric Criteria
    Browne, Matthew
    Rockloff, Matthew
    Rawat, Vijay
    [J]. SOCIOLOGICAL METHODS & RESEARCH, 2018, 47 (04) : 812 - 836
  • [9] Burisch M, 1997, EUR J PERSONALITY, V11, P303, DOI 10.1002/(SICI)1099-0984(199711)11:4<303::AID-PER292>3.0.CO
  • [10] 2-#