Evaluating Content-Related Validity Evidence Using a Text-Based Machine Learning Procedure

被引:8
作者
Anderson, Daniel [1 ]
Rowley, Brock [1 ]
Stegenga, Sondra [2 ]
Irvin, P. Shawn [1 ]
Rosenberg, Joshua M. [3 ]
机构
[1] Univ Oregon, Behav Res & Teaching, Eugene, OR 97403 USA
[2] Univ Utah, Special Educ, Salt Lake City, UT 84112 USA
[3] Univ Tennessee, Theory & Practice Teacher Educ, Knoxville, TN 37996 USA
关键词
machine learning; text-mining; textual congruence; validity;
D O I
10.1111/emip.12314
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Validity evidence based on test content is critical to meaningful interpretation of test scores. Within high-stakes testing and accountability frameworks, content-related validity evidence is typically gathered via alignment studies, with panels of experts providing qualitative judgments on the degree to which test items align with the representative content standards. Various summary statistics are then calculated (e.g., categorical concurrence, balance of representation) to aid in decision-making. In this paper, we propose an alternative approach for gathering content-related validity evidence that capitalizes on the overlap in vocabulary used in test items and the corresponding content standards, which we define as textual congruence. We use a text-based, machine learning model, specifically topic modeling, to identify clusters of related content within the standards. This model then serves as the basis from which items are evaluated. We illustrate our method by building a model from the Next Generation Science Standards, with textual congruence evaluated against items within the Oregon statewide alternate assessment. We discuss the utility of this approach as a source of triangulating and diagnostic information and show how visualizations can be used to evaluate the overall coverage of the content standards across the test items.
引用
收藏
页码:53 / 64
页数:12
相关论文
共 39 条
  • [1] American Educational Research Association American Psychological Association National Council on Measurement in Education, 2014, Standards for educational and psychological testing, DOI DOI 10.1037/14855-004
  • [2] Gauging Item Alignment Through Online Systems While Controlling for Rater Effects
    Anderson, Daniel
    Irvin, Shawn
    Alonzo, Julie
    Tindal, Gerald A.
    [J]. EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2015, 34 (01) : 22 - 33
  • [3] [Anonymous], 2015, Guide to implementing the Next Generation Science Standards
  • [4] [Anonymous], 2005, ALT ACH STAND STUD M
  • [5] Crowdsourcing the creation of image segmentation algorithms for connectomics
    Arganda-Carreras, Ignacio
    Turaga, Srinivas C.
    Berger, Daniel P.
    Ciresan, Dan
    Giusti, Alessandro
    Gambardella, Luca M.
    Schmidhuber, Juergen
    Laptev, Dmitry
    Dwivedi, Sarvesh
    Buhmann, Joachim M.
    Liu, Ting
    Seyedhosseini, Mojtaba
    Tasdizen, Tolga
    Kamentsky, Lee
    Burget, Radim
    Uher, Vaclav
    Tan, Xiao
    Sun, Changming
    Pham, Tuan D.
    Bas, Erhan
    Uzunbas, Mustafa G.
    Cardona, Albert
    Schindelin, Johannes
    Seung, H. Sebastian
    [J]. FRONTIERS IN NEUROANATOMY, 2015, 9 : 1 - 13
  • [6] Arun R, 2010, LECT NOTES ARTIF INT, V6118, P391
  • [7] Crowd computing: using competitive dynamics to develop and refine highly predictive models
    Bentzien, Joerg
    Muegge, Ingo
    Hamner, Ben
    Thompson, David C.
    [J]. DRUG DISCOVERY TODAY, 2013, 18 (9-10) : 472 - 478
  • [8] Bhola D.S., 2003, EDUC MEAS-ISSUES PRA, V22, P21, DOI DOI 10.1111/J.1745-3992.2003.TB00134.X
  • [9] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [10] Boyd-Graber J, 2017, FOUND TRENDS INF RET, V11, P144