Testing Causal Theories with Learned Proxies

被引:19
作者
Knox, Dean [1 ]
Lucas, Christopher [2 ,3 ]
Cho, Wendy K. Tam [4 ,5 ,6 ,7 ,8 ,9 ,10 ]
机构
[1] Univ Penn, Wharton Sch, Operat Informat & Decis Dept & Analyt Wharton, Philadelphia, PA 19104 USA
[2] Washington Univ, Dept Polit Sci, St Louis, MO 63110 USA
[3] Washington Univ, Div Computat & Data Sci, St Louis, MO 63110 USA
[4] Univ Illinois, Dept Polit Sci, Champaign, IL USA
[5] Univ Illinois, Dept Stat, Champaign, IL USA
[6] Univ Illinois, Dept Math, Champaign, IL USA
[7] Univ Illinois, Dept Comp Sci, Champaign, IL USA
[8] Univ Illinois, Dept Asian Amer Studies, Champaign, IL USA
[9] Univ Illinois, Coll Law, Champaign, IL USA
[10] Univ Illinois, Natl Ctr Supercomp Applicat, Champaign, IL USA
关键词
causal inference; machine learning; supervised learning; measurement; proxies; DIRECTED ACYCLIC GRAPHS; MODEL; DEMOCRACY; BIAS; TEXT;
D O I
10.1146/annurev-polisci-051120-111443
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Social scientists commonly use computational models to estimate proxies of unobserved concepts, then incorporate these proxies into subsequent tests of their theories. The consequences of this practice, which occurs in over two-thirds of recent computational work in political science, are underappreciated. Imperfect proxies can reflect noise and contamination from other concepts, producing biased point estimates and standard errors. We demonstrate how analysts can use causal diagrams to articulate theoretical concepts and their relationships to estimated proxies, then apply straightforward rules to assess which conclusions are rigorously supportable. We formalize and extend common heuristics for "signing the bias"-a technique for reasoning about unobserved confounding-to scenarios with imperfect proxies. Using these tools, we demonstrate how, in often-encountered research settings, proxy-based analyses allow for valid tests for the existence and direction of theorized effects. We conclude with best-practice recommendations for the rapidly growing literature using learned proxies to test causal theories.
引用
收藏
页码:419 / 441
页数:23
相关论文
共 49 条
  • [1] Measurement validity: A shared standard for qualitative and quantitative research
    Adcock, R
    Collier, D
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 2001, 95 (03) : 529 - 546
  • [2] Angrist JD, 1996, J AM STAT ASSOC, V91, P444, DOI 10.2307/2291629
  • [3] [Anonymous], 2014, FREED WORLD 2014 ANN
  • [4] The orientation of newspaper endorsements in US elections, 1940-2002
    Ansolabehere, Stephen
    Lessem, Rebecca
    Snyder, James M., Jr.
    [J]. QUARTERLY JOURNAL OF POLITICAL SCIENCE, 2006, 1 (04) : 393 - 404
  • [5] A Unified Approach to Measurement Error and Missing Data: Overview and Applications
    Blackwell, Matthew
    Honaker, James
    King, Gary
    [J]. SOCIOLOGICAL METHODS & RESEARCH, 2017, 46 (03) : 303 - 341
  • [6] A CORRELATED TOPIC MODEL OF SCIENCE
    Blei, David M.
    Lafferty, John D.
    [J]. ANNALS OF APPLIED STATISTICS, 2007, 1 (01) : 17 - 35
  • [7] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [8] A Pairwise Comparison Framework for Fast, Flexible, and Reliable Human Coding of Political Texts
    Carlson, David
    Montgomery, Jacob M.
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 2017, 111 (04) : 835 - 843
  • [9] Stan: A Probabilistic Programming Language
    Carpenter, Bob
    Gelman, Andrew
    Hoffman, Matthew D.
    Lee, Daniel
    Goodrich, Ben
    Betancourt, Michael
    Brubaker, Marcus A.
    Guo, Jiqiang
    Li, Peter
    Riddell, Allen
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (01): : 1 - 29
  • [10] Prediction, Proxies, and Power
    Carroll, Robert J.
    Kenkel, Brenton
    [J]. AMERICAN JOURNAL OF POLITICAL SCIENCE, 2019, 63 (03) : 577 - 593