Cleaning Up Confounding: Accounting for Endogeneity Using Instrumental Variables and Two-Stage Models

被引:0
|
作者
Graf-vlach, Lorenz [1 ,2 ]
Wagner, Stefan [3 ]
机构
[1] TU Dortmund Univ, Dortmund, Germany
[2] Univ Stuttgart, Inst Software Engn, Stuttgart, Germany
[3] Tech Univ Munich, TUM Sch Computat Informat & Technol, Heilbronn, Germany
关键词
Regression; endogeneity; confounder; two-stage least squares; 2SLS; instrumental variables; SAMPLE SELECTION BIAS; ECONOMIC-GROWTH; REGRESSION; RELEVANCE; VALIDITY; TESTS; SIZE;
D O I
10.1145/3674730
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Studies in empirical software engineering are often most useful if they make causal claims because this allows practitioners to identify how they can purposefully influence (rather than only predict) outcomes of interest. Unfortunately, many non-experimental studies suffer from potential endogeneity, for example, through omitted confounding variables, which precludes claims of causality. In this conceptual tutorial, we aim to transfer the proven solution of instrumental variables and two-stage models as a means to account for endogeneity from econometrics to the field of empirical software engineering. To this end, we discuss causality and causal inference, provide a definition of endogeneity, explain its causes, and lay out the conceptual idea behind instrumental variable approaches and two-stage models. We also provide an extensive illustration with simulated data and a brief illustration with real data to demonstrate the approach, offering Stata and R code to allow researchers to replicate our analyses and apply the techniques to their own research projects. We close with concrete recommendations and a guide for researchers on how to deal with endogeneity.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] A two-stage visual tracking algorithm using dual-template
    Xia, Yu
    Li, Ju
    Zhou, Li-fan
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2016, 13 : 1 - 9
  • [32] Structural Break Inference Using Information Criteria in Models Estimated by Two-Stage Least Squares
    Hall, Alastair R.
    Osborn, Denise R.
    Sakkas, Nikolaos
    JOURNAL OF TIME SERIES ANALYSIS, 2015, 36 (05) : 741 - 762
  • [33] A quantile regression approach for estimating panel data models using instrumental variables
    Harding, Matthew
    Lamarche, Carlos
    ECONOMICS LETTERS, 2009, 104 (03) : 133 - 135
  • [34] INSTRUMENTAL VARIABLES ESTIMATION OF HETEROSKEDASTIC LINEAR MODELS USING ALL LAGS OF INSTRUMENTS
    West, Kenneth D.
    Wong, Ka-fu
    Anatolyev, Stanislav
    ECONOMETRIC REVIEWS, 2009, 28 (05) : 441 - 467
  • [35] Two-stage consumer credit risk modelling using heterogeneous ensemble learning
    Papouskova, Monika
    Hajek, Petr
    DECISION SUPPORT SYSTEMS, 2019, 118 : 33 - 45
  • [36] Finding confidence bound using two-stage data
    Kushary, Debashis
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2018, 47 (13) : 3043 - 3051
  • [37] Robust and Adaptive Two-stage Designs in Nonlinear Mixed Effect Models
    Fayette, Lucie
    Leroux, Romain
    Mentre, France
    Seurat, Jeremy
    AAPS JOURNAL, 2023, 25 (04)
  • [38] Correlations between estimated and true dietary intakes: Using two instrumental variables
    Fraser, GE
    Butler, TL
    Shavlik, D
    ANNALS OF EPIDEMIOLOGY, 2005, 15 (07) : 509 - 518
  • [39] Poster Abstract : Electrical Load Disaggregation using a two-stage deep learning approach
    Paresh, Spoorthy
    Thokala, Naveen Kumar
    Chandra, M. Girish
    BUILDSYS'19: PROCEEDINGS OF THE 6TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, 2019, : 366 - 367
  • [40] Hybrid Multivariate Machine Learning Models for Streamflow Forecasting: A Two-Stage Decomposition-Reconstruction Framework
    Jin, Aohan
    Wang, Quanrong
    Zhou, Renjie
    Shi, Wenguang
    Qiao, Xiangyu
    JOURNAL OF HYDROLOGIC ENGINEERING, 2024, 29 (05)