Adapting Text Embeddings for Causal Inference

被引:0
|
作者
Veitch, Victor [1 ]
Sridhar, Dhanya
Blei, David M.
机构
[1] Columbia Univ, Dept Stat, New York, NY 10027 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Does adding a theorem to a paper affect its chance of acceptance? Does labeling a post with the author's gender affect the post popularity? This paper develops a method to estimate such causal effects from observational text data, adjusting for confounding features of the text such as the subject or writing quality. We assume that the text suffices for causal adjustment but that, in practice, it is prohibitively high-dimensional. To address this challenge, we develop causally sufficient embeddings, low-dimensional document representations that preserve sufficient information for causal identification and allow for efficient estimation of causal effects. Causally sufficient embeddings combine two ideas. The first is supervised dimensionality reduction: causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. The second is efficient language modeling: representations of text are designed to dispose of linguistically irrelevant information, and this information is also causally irrelevant. Our method adapts language models (specifically, word embeddings and topic models) to learn document embeddings that are able to predict both treatment and outcome. We study causally sufficient embeddings with semi-synthetic datasets and find that they improve causal estimation over related embedding methods. We illustrate the methods by answering the two motivating questions-the effect of a theorem on paper acceptance and the effect of a gender label on post popularity. Code and data available at github.com/vveitch/causaltext-embeddings-tf2.
引用
收藏
页码:919 / 928
页数:10
相关论文
共 50 条
  • [1] Causal inference from text: A commentary
    Sridhar, Dhanya
    Blei, David M.
    SCIENCE ADVANCES, 2022, 8 (42)
  • [2] Challenges of Using Text Classifiers for Causal Inference
    Wood-Doughty, Zach
    Shpitsert, Ilya
    Dredze, Mark
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4586 - 4598
  • [3] Causal Matching with Text Embeddings: A Case Study in Estimating the Causal Effects of Peer Review Policies
    Zhang, Raymond Z.
    Kennard, Neha Nayak
    Smith, Daniel Scott
    McFarland, Daniel A.
    McCallum, Andrew
    Keith, Katherine A.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1284 - 1297
  • [4] Conceptualizing Treatment Leakage in Text-based Causal Inference
    Daoud, Adel
    Jerzak, Connor T.
    Johansson, Richard
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5638 - 5645
  • [5] Text-Based Causal Inference on Irony and Sarcasm Detection
    Cekinel, Recep Firat
    Karagoz, Pinar
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 31 - 45
  • [6] Towards Deconfounded Image-Text Matching with Causal Inference
    Li, Wenhui
    Su, Xinqi
    Song, Dan
    Wang, Lanjun
    Zhang, Kun
    Liu, An-An
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6264 - 6273
  • [7] Causal Inference from Text: Unveiling Interactions between Variables
    Zhou, Yuxiang
    He, Yulan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10559 - 10571
  • [8] Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving
    Ding, Lei
    Yu, Dengdeng
    Xie, Jinhan
    Guo, Wenxing
    Hu, Shenggang
    Liu, Meichen
    Kong, Linglong
    Dai, Hongsheng
    Bao, Yanchun
    Jiang, Bei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11864 - 11872
  • [9] Pickup of Causal Language and Inference During and After Reading Scientific Text
    Cromley, Jennifer G.
    Ma, Shufeng
    Van Boekel, Martin
    Dane, Aygul Parpucu
    READING PSYCHOLOGY, 2020, 41 (03) : 157 - 182
  • [10] Embeddings of Causal Sets
    Reid, David D.
    PROCEEDINGS OF THE NATIONAL SOCIETY OF BLACK PHYSICISTS, 2009, 1140 : 60 - 68