Estimating Test-Retest Reliability in the Presence of Self-Selection Bias and Learning/Practice Effects

被引:0
作者
Belzak, William C. M. [1 ]
Lockwood, J. R. [1 ]
机构
[1] Duolingo, 5900 Penn Ave, Pittsburgh, PA 15206 USA
关键词
test-retest reliability; self-selection bias; entropy balancing; minimum discriminant information adjustment; polynomial regression; Bayesian model averaging;
D O I
10.1177/01466216241284585
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Test-retest reliability is often estimated using naturally occurring data from test repeaters. In settings such as admissions testing, test takers choose if and when to retake an assessment. This self-selection can bias estimates of test-retest reliability because individuals who choose to retest are typically unrepresentative of the broader testing population and because differences among test takers in learning or practice effects may increase with time between test administrations. We develop a set of methods for estimating test-retest reliability from observational data that can mitigate these sources of bias, which include sample weighting, polynomial regression, and Bayesian model averaging. We demonstrate the value of using these methods for reducing bias and improving precision of estimated reliability using empirical and simulated data, both of which are based on more than 40,000 repeaters of a high-stakes English language proficiency test. Finally, these methods generalize to settings in which only a single, error-prone measurement is taken repeatedly over time and where self-selection and/or changes to the underlying construct may be at play.
引用
收藏
页码:323 / 340
页数:18
相关论文
共 21 条
[1]  
American Educational Research Association American Psychological Association & National Council on Measurement in Education, 2014, STAND ED PSYCH TEST
[2]  
[Anonymous], 1986, Introduction to classical and modern test theory
[3]   CALIBRATION ESTIMATORS IN SURVEY SAMPLING [J].
DEVILLE, JC ;
SARNDAL, CE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (418) :376-382
[4]  
Efron B., 1994, An Introduction to the Bootstrap
[5]   Repeater Analysis for Combining Information From Different Assessments [J].
Haberman, Shelby ;
Yao, Lili .
JOURNAL OF EDUCATIONAL MEASUREMENT, 2015, 52 (02) :223-251
[6]   ADJUSTMENT BY MINIMUM DISCRIMINANT INFORMATION [J].
HABERMAN, SJ .
ANNALS OF STATISTICS, 1984, 12 (03) :971-988
[7]   Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies [J].
Hainmueller, Jens .
POLITICAL ANALYSIS, 2012, 20 (01) :25-46
[8]   Bayesian model averaging: A tutorial [J].
Hoeting, JA ;
Madigan, D ;
Raftery, AE ;
Volinsky, CT .
STATISTICAL SCIENCE, 1999, 14 (04) :382-401
[9]   Covariate balancing propensity score [J].
Imai, Kosuke ;
Ratkovic, Marc .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (01) :243-263
[10]   BAYES FACTORS [J].
KASS, RE ;
RAFTERY, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) :773-795