Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] TWO-STAGE BAYESIAN MODEL AVERAGING IN ENDOGENOUS VARIABLE MODELS
    Lenkoski, Alex
    Eicher, Theo S.
    Raftery, Adrian E.
    ECONOMETRIC REVIEWS, 2014, 33 (1-4) : 122 - 151
  • [22] Identifying the odds ratio estimated by a two-stage instrumental variable analysis with a logistic regression model
    Burgess, Stephen
    STATISTICS IN MEDICINE, 2013, 32 (27) : 4726 - 4747
  • [23] Two-Stage Path Analysis With Definition Variables: An Alternative Framework to Account for Measurement Error
    Lai, Mark H. C.
    Hsiao, Yu-Yu
    PSYCHOLOGICAL METHODS, 2022, 27 (04) : 568 - 588
  • [24] Instrumental variable estimation of nonlinear models with nonclassical measurement error using control variables
    Hahn, Jinyong
    Ridder, Geert
    JOURNAL OF ECONOMETRICS, 2017, 200 (02) : 238 - 250
  • [25] Would two-stage scoring models alleviate bank exposure to bad debt?
    Abdou, Hussein A.
    Mitra, Shatarupa
    Fry, John
    Elamer, Ahmed A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 128 : 1 - 13
  • [26] Optimizing fully-efficient two-stage models for genomic selection using open-source software
    Fernandez-Gonzalez, Javier
    Sanchez, Julio
    PLANT METHODS, 2025, 21 (01)
  • [27] A two-stage estimation procedure for non-linear structural equation models
    Holst, Klaus Kaehler
    Budtz-Jorgensen, Esben
    BIOSTATISTICS, 2020, 21 (04) : 676 - 691
  • [28] Measurement bias and error correction in a two-stage estimation for multilevel IRT models
    Zhang, Xue
    Wang, Chun
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2021, 74 : 247 - 274
  • [29] Empirical Models for Predicting Two-Stage Light Gas Gun Muzzle Velocity
    Murtaugh, M.
    Rogers, J. A.
    Allaire, D.
    Lacy Jr, T. E.
    JOURNAL OF DYNAMIC BEHAVIOR OF MATERIALS, 2024, : 179 - 197
  • [30] Energy Disaggregation Using Two-Stage Fusion of Binary Device Detectors
    Schirmer, Pascal A.
    Mporas, Iosif
    Sheikh-Akbari, Akbar
    ENERGIES, 2020, 13 (09)