Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] Financing a two-stage sustainable supply chain using green bonds: environmental and waste
    Heydari, Hanieh
    Taleizadeh, Ata Allah
    Jolai, Fariborz
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
  • [42] Pharmaceutical expenditure and gross domestic product: Evidence of simultaneous effects using a two-step instrumental variables strategy
    Shaikh, Mujaheed
    Gandjour, Afschin
    HEALTH ECONOMICS, 2019, 28 (01) : 101 - 122
  • [43] Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting
    Guo, Zijian
    Kang, Hyunseung
    Cai, T. Tony
    Small, Dylan S.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2018, 80 (04) : 793 - 815
  • [44] Model forecasting based on two-stage feature selection procedure using orthogonal greedy algorithm
    Jiang, He
    APPLIED SOFT COMPUTING, 2018, 63 : 110 - 123
  • [45] Using Instrumental Variables to Measure Causation over Time in Cross-Lagged Panel Models
    Singh, Madhurbain
    Verhulst, Brad
    Vinh, Philip
    Zhou, Yi
    Castro-de-Araujo, Luis F. S.
    Hottenga, Jouke-Jan
    Pool, Rene
    de Geus, Eco J. C.
    Vink, Jacqueline M.
    Boomsma, Dorret I.
    Maes, Hermine H. M.
    Dolan, Conor V.
    Neale, Michael C.
    MULTIVARIATE BEHAVIORAL RESEARCH, 2023, 59 (02) : 342 - 370
  • [46] Two-stage orthogonality based estimation for semiparametric varying-coefficient models and its applications in analyzing AIDS data
    Zhao, Yan-Yong
    Lin, Jin-Guan
    Ye, Xu-Guo
    Wang, Hong-Xia
    Huang, Xing-Fang
    BIOMETRICAL JOURNAL, 2018, 60 (01) : 79 - 99
  • [47] Semiparametric Empirical Likelihood Estimation for Two-stage Outcome-dependent Sampling under the Frame of Generalized Linear Models
    Ding, Jie-li
    Liu, Yan-yan
    ACTA MATHEMATICAE APPLICATAE SINICA-ENGLISH SERIES, 2014, 30 (03): : 663 - 676
  • [48] A pan-sharpening network using multi-resolution transformer and two-stage feature fusion
    Fan, Wensheng
    Liu, Fan
    Li, Jingzhi
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [49] Data fusion in a two-stage spatio-temporal model using the INLA-SPDE approach
    Villejo, Stephen Jun
    Illian, Janine B.
    Swallow, Ben
    SPATIAL STATISTICS, 2023, 54
  • [50] Generation of Isolated Wideband Sound Fields Using a Combined Two-stage Lasso-LS Algorithm
    Radmanesh, Nasim
    Burnett, Ian S.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 378 - 387