Evaluating Content-Related Validity Evidence Using a Text-Based Machine Learning Procedure

被引：8

作者：

Anderson, Daniel ^{[1
]}

Rowley, Brock ^{[1
]}

Stegenga, Sondra ^{[2
]}

Irvin, P. Shawn ^{[1
]}

Rosenberg, Joshua M. ^{[3
]}

机构：

[1] Univ Oregon, Behav Res & Teaching, Eugene, OR 97403 USA

[2] Univ Utah, Special Educ, Salt Lake City, UT 84112 USA

[3] Univ Tennessee, Theory & Practice Teacher Educ, Knoxville, TN 37996 USA

来源：

EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE | 2020年 / 39卷 / 04期

关键词：

machine learning; text-mining; textual congruence; validity;

D O I：

10.1111/emip.12314

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

Validity evidence based on test content is critical to meaningful interpretation of test scores. Within high-stakes testing and accountability frameworks, content-related validity evidence is typically gathered via alignment studies, with panels of experts providing qualitative judgments on the degree to which test items align with the representative content standards. Various summary statistics are then calculated (e.g., categorical concurrence, balance of representation) to aid in decision-making. In this paper, we propose an alternative approach for gathering content-related validity evidence that capitalizes on the overlap in vocabulary used in test items and the corresponding content standards, which we define as textual congruence. We use a text-based, machine learning model, specifically topic modeling, to identify clusters of related content within the standards. This model then serves as the basis from which items are evaluated. We illustrate our method by building a model from the Next Generation Science Standards, with textual congruence evaluated against items within the Oregon statewide alternate assessment. We discuss the utility of this approach as a source of triangulating and diagnostic information and show how visualizations can be used to evaluate the overall coverage of the content standards across the test items.

引用

页码：53 / 64

页数：12

共 39 条

[1] American Educational Research Association American Psychological Association National Council on Measurement in Education, 2014, Standards for educational and psychological testing, DOI DOI 10.1037/14855-004
[2] Gauging Item Alignment Through Online Systems While Controlling for Rater Effects
Anderson, Daniel
Irvin, Shawn
Alonzo, Julie
Tindal, Gerald A.
[J]. EDUCATIONAL MEASUREMENT-ISSUES AND PRACTICE, 2015, 34 (01) : 22 - 33
[3] [Anonymous], 2015, Guide to implementing the Next Generation Science Standards
[4] [Anonymous], 2005, ALT ACH STAND STUD M
[5] Crowdsourcing the creation of image segmentation algorithms for connectomics
Arganda-Carreras, Ignacio
Turaga, Srinivas C.
Berger, Daniel P.
Ciresan, Dan
Giusti, Alessandro
Gambardella, Luca M.
Schmidhuber, Juergen
Laptev, Dmitry
Dwivedi, Sarvesh
Buhmann, Joachim M.
Liu, Ting
Seyedhosseini, Mojtaba
Tasdizen, Tolga
Kamentsky, Lee
Burget, Radim
Uher, Vaclav
Tan, Xiao
Sun, Changming
Pham, Tuan D.
Bas, Erhan
Uzunbas, Mustafa G.
Cardona, Albert
Schindelin, Johannes
Seung, H. Sebastian
[J]. FRONTIERS IN NEUROANATOMY, 2015, 9 : 1 - 13
[6] Arun R, 2010, LECT NOTES ARTIF INT, V6118, P391
[7] Crowd computing: using competitive dynamics to develop and refine highly predictive models
Bentzien, Joerg
Muegge, Ingo
Hamner, Ben
Thompson, David C.
[J]. DRUG DISCOVERY TODAY, 2013, 18 (9-10) : 472 - 478
[8] Bhola D.S., 2003, EDUC MEAS-ISSUES PRA, V22, P21, DOI DOI 10.1111/J.1745-3992.2003.TB00134.X
[9] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[10] Boyd-Graber J, 2017, FOUND TRENDS INF RET, V11, P144

← 1 2 3 4 →