Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] ARE INSTRUMENTAL VARIABLES REALLY THAT INSTRUMENTAL? ENDOGENEITY RESOLUTION IN REGRESSION MODELS FOR COMPARATIVE STUDIES
    Kashyap, Ravi
    STATISTICA SINICA, 2022, 32 : 645 - 651
  • [2] A two-stage Bridge estimator for regression models with endogeneity based on control function method
    Bahador, Fatemeh
    Sheikhi, Ayyub
    Arabpour, Alireza
    COMPUTATIONAL STATISTICS, 2024, 39 (03) : 1351 - 1370
  • [3] Two-stage instrumental variable estimation of linear panel data models with interactive effects
    Cui, Guowei
    Norkut, Milda
    Sarafidis, Vasilis
    Yamagata, Takashi
    ECONOMETRICS JOURNAL, 2022, 25 (02) : 340 - 361
  • [4] On two-stage estimation of structural instrumental variable models
    Choi, Byeong Yeob
    Fine, Jason P.
    Brookhart, M. Alan
    BIOMETRIKA, 2017, 104 (04) : 881 - 899
  • [5] Addressing Endogeneity Using a Two-Stage Copula Generated Regressor Approach
    Yang, Fan
    Qian, Yi
    Xie, Hui
    JOURNAL OF MARKETING RESEARCH, 2025,
  • [6] Using synthetic variables in instrumental variable estimation of spatial series models
    Le Gallo, Julie
    Paez, Antonio
    ENVIRONMENT AND PLANNING A-ECONOMY AND SPACE, 2013, 45 (09): : 2227 - 2242
  • [7] A two-stage Bridge estimator for regression models with endogeneity based on control function method
    Fatemeh Bahador
    Ayyub Sheikhi
    Alireza Arabpour
    Computational Statistics, 2024, 39 : 1351 - 1370
  • [8] An Introduction to Model Implied Instrumental Variables Using Two Stage Least Squares (MIIV-2SLS) in Structural Equation Models (SEMs)
    Bollen, Kenneth A.
    Fisher, Zachary F.
    Giordano, Michael L.
    Lilly, Adam G.
    Luo, Lan
    Ye, Ai
    PSYCHOLOGICAL METHODS, 2022, 27 (05) : 752 - 772
  • [9] A Bayesian two-stage regression approach of analysing longitudinal outcomes with endogeneity and incompleteness
    Bhuyan, Prajamitra
    Biswas, Jayabrata
    Ghosh, Pulak
    Das, Kiranmoy
    STATISTICAL MODELLING, 2019, 19 (02) : 157 - 173
  • [10] Two-stage prediction in linear models
    Jeske, Daniel R.
    Kurum, Esra
    Yao, Weixin
    Rizzo, Shemra
    SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, 2018, 37 (03): : 311 - 321