Improving the Assessment of Measurement Invariance: Using Regularization to Select Anchor Items and Identify Differential Item Functioning

被引：57

作者：

Belzak, William C. M. ^{[1
]}

Bauer, Daniel J. ^{[1
]}

机构：

[1] Univ N Carolina, Dept Psychol & Neurosci, 235 East Cameron Ave, Chapel Hill, NC 27599 USA

来源：

PSYCHOLOGICAL METHODS | 2020年 / 25卷 / 06期

基金：

美国国家卫生研究院;

关键词：

differential item functioning; measurement invariance; item response theory; lasso regularization; likelihood ratio test; VARIABLE SELECTION; RESPONSE THEORY; MIMIC-MODEL; LIKELIHOOD; DIF; IRT; FRAMEWORK;

D O I：

10.1037/met0000253

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

A common challenge in the behavioral sciences is evaluating measurement invariance, or whether the measurement properties of a scale are consistent for individuals from different groups. Measurement invariance fails when differential item functioning (DIF) exists, that is, when item responses relate to the latent variable differently across groups. To identify DIF in a scale, many data-driven procedures iteratively test for DIF one item at a time while assuming other items have no DIF. The DIF-free items are used to anchor the scale of the latent variable across groups, identifying the model. A major drawback to these iterative testing procedures is that they can fail to select the correct anchor items and identify true DIF, particularly when DIF is present in many items. We propose an alternative method for selecting anchors and identifying DIF. Namely, we use regularization, a machine learning technique that imposes a penalty function during estimation to remove parameters that have little impact on the fit of the model. We focus specifically here on a lasso penalty for group differences in the item parameters within the two-parameter logistic item response theory model. We compare lasso regularization with the more commonly used likelihood ratio test method in a 2-group DIF analysis. Simulation and empirical results show that when large amounts of DIF are present and sample sizes are large, lasso regularization has far better control of Type I error than the likelihood ratio test method with little decrement in power. This provides strong evidence that lasso regularization is a promising alternative for testing DIF and selecting anchors. Translational Abstract Measurement in the psychological sciences is difficult in large part because two individuals with identical values on a construct (e.g., depression) may appear unequal when measured. This can happen when an item (e.g., cries easily) is not only tapping into that construct but also into some other background characteristic of the individual-for instance, their sex. This is formally referred to as differential item functioning (DIF). If undetected and unaddressed, DIF can distort inferences about individual and group differences. There are many procedures for statistically detecting DIF, most of which are data-driven and use multiple statistical tests to determine where DIF occurs in a scale. Unfortunately, these procedures make assumptions about other untested items that are unlikely to be true. Specifically, when testing for DIF in one item, one or more other items must be assumed to have no DIF. This is paradoxical, in that the same item is assumed to have DIF in one test but assumed not to have DIF in all other tests. We propose a machine learning approach known as lasso regularization as an alternative. Lasso regularization considers DIF in all items simultaneously, rather than one item at a time, and uses a penalized estimation approach to identify items with and without DIF rather than inference tests with dubious assumptions. Computer simulations and a real data validation study show that lasso regularization performs increasingly better than a commonly used traditional method of DIF detection (the likelihood ratio test approach) as the number of items with DIF and sample size increase.

引用

页码：673 / 690

页数：18

共 68 条

[1] The Moore-Penrose Pseudoinverse: A Tutorial Review of the Theory [J].

Alves Barata, Joao Carlos ;

Hussein, Mahir Saleh .

BRAZILIAN JOURNAL OF PHYSICS, 2012, 42 (1-2) :146-165

[2]

[Anonymous], 2001, IRTLRDIF V2 0B SOFTW

[3] Simplifying the Assessment of Measurement Invariance over Multiple Background Variables: Using Regularized Moderated Nonlinear Factor Analysis to Detect Differential Item Functioning [J].

Bauer, Daniel J. ;

Belzak, William C. M. ;

Cole, Veronica T. .

STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2020, 27 (01) :43-55

[4] Psychometric Approaches for Developing Commensurate Measures Across Independent Studies: Traditional and New Models [J].

Bauer, Daniel J. ;

Hussong, Andrea M. .

PSYCHOLOGICAL METHODS, 2009, 14 (02) :101-125

[5] Estimating multilevel linear models as structural equation models [J].

Bauer, DJ .

JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2003, 28 (02) :135-167

[6] Testing Differential Item Functioning in Small Samples [J].

Belzak, William C. M. .

MULTIVARIATE BEHAVIORAL RESEARCH, 2020, 55 (05) :722-747

[7] MARGINAL MAXIMUM-LIKELIHOOD ESTIMATION OF ITEM PARAMETERS - APPLICATION OF AN EM ALGORITHM [J].

BOCK, RD ;

AITKIN, M .

PSYCHOMETRIKA, 1981, 46 (04) :443-459

[8] TESTING FOR THE EQUIVALENCE OF FACTOR COVARIANCE AND MEAN STRUCTURES - THE ISSUE OF PARTIAL MEASUREMENT INVARIANCE [J].

BYRNE, BM ;

SHAVELSON, RJ ;

MUTHEN, B .

PSYCHOLOGICAL BULLETIN, 1989, 105 (03) :456-466

[9]

Cai L., 2011, IRTPRO USER GUIDE

[10] Evaluating goodness-of-fit indexes for testing measurement invariance [J].

Cheung, GW ;

Rensvold, RB .

STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2002, 9 (02) :233-255

← 1 2 3 4 5 6 7 →