Deep mendelian randomization: Investigating the causal knowledge of genomic deep learning models

被引:3
作者
Malina, Stephen [1 ,2 ]
Cizin, Daniel [1 ,3 ]
Knowles, David A. [1 ,4 ,5 ,6 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Dyno Therapeut, Watertown, MA 02472 USA
[3] Weill Cornell Med, Triinst PhD Program Computat Biol & Med, New York, NY USA
[4] New York Genome Ctr, New York, NY USA
[5] Columbia Univ, Dept Syst Biol, New York, NY USA
[6] Columbia Univ, Data Sci Inst, New York, NY USA
关键词
DNA; INSTRUMENTS;
D O I
10.1371/journal.pcbi.1009880
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Multi-task deep learning (DL) models can accurately predict diverse genomic marks from sequence, but whether these models learn the causal relationships between genomic marks is unknown. Here, we describe Deep Mendelian Randomization (DeepMR), a method for estimating causal relationships between genomic marks learned by genomic DL models. By combining Mendelian randomization with in silico mutagenesis, DeepMR obtains local (locus specific) and global estimates of (an assumed) linear causal relationship between marks. In a simulation designed to test recovery of pairwise causal relations between transcription factors (TFs), DeepMR gives accurate and unbiased estimates of the `true' global causal effect, but its coverage decays in the presence of sequence-dependent confounding. We then apply DeepMR to examine the global relationships learned by a state-of-the-art DL model, BPNet, between TFs involved in reprogramming. DeepMR's causal effect estimates validate previously hypothesized relationships between TFs and suggest new relationships for future investigation. Author summary Chromatin marks such as transcription factor (TF) binding, accessibility, and histone modifications play a critical role in controlling cell behavior and identity. In recent years, multi-task deep learning (DL) models have achieved remarkable success at predicting these and other chromatin marks. However, it is unclear to what extent these models learn meaningful mechanistic, even causal, relationships between these variables. Our work aims to fill this gap by combining in silico mutagenesis, deep learning uncertainty estimation and causal inference (specifically Mendelian randomization, MR), into a framework we call DeepMR. We describe DeepMR, apply it to a simulation intended to test its ability to recover causal relationships between features from a learned model, and then use it to examine the relationships learned by a state-of-the-art DL model, BPNet. Our results suggest that DeepMR can estimate causal relationships under its stated assumptions and provide further evidence for previously hypothesized relationships between TFs identified by BPNet.
引用
收藏
页数:14
相关论文
共 44 条
  • [1] A review of uncertainty quantification in deep learning: Techniques, applications and challenges
    Abdar, Moloud
    Pourpanah, Farhad
    Hussain, Sadiq
    Rezazadegan, Dana
    Liu, Li
    Ghavamzadeh, Mohammad
    Fieguth, Paul
    Cao, Xiaochun
    Khosravi, Abbas
    Acharya, U. Rajendra
    Makarenkov, Vladimir
    Nahavandi, Saeid
    [J]. INFORMATION FUSION, 2021, 76 : 243 - 297
  • [2] Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
    Alipanahi, Babak
    Delong, Andrew
    Weirauch, Matthew T.
    Frey, Brendan J.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (08) : 831 - +
  • [3] 2-STAGE LEAST-SQUARES ESTIMATION OF AVERAGE CAUSAL EFFECTS IN MODELS WITH VARIABLE TREATMENT INTENSITY
    ANGRIST, JD
    IMBENS, GW
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) : 431 - 442
  • [4] Base-resolution models of transcription-factor binding reveal soft motif syntax
    Avsec, Ziga
    Weilert, Melanie
    Shrikumar, Avanti
    Krueger, Sabrina
    Alexandari, Amr
    Dalal, Khyati
    Fropf, Robin
    McAnany, Charles
    Gagneur, Julien
    Kundaje, Anshul
    Zeitlinger, Julia
    [J]. NATURE GENETICS, 2021, 53 (03) : 354 - +
  • [5] ISOTONIC REGRESSION PROBLEM AND ITS DUAL
    BARLOW, RE
    BRUNK, HD
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1972, 67 (337) : 140 - &
  • [6] High-resolution profiling of histone methylations in the human genome
    Barski, Artern
    Cuddapah, Suresh
    Cui, Kairong
    Roh, Tae-Young
    Schones, Dustin E.
    Wang, Zhibin
    Wei, Gang
    Chepelev, Iouri
    Zhao, Keji
    [J]. CELL, 2007, 129 (04) : 823 - 837
  • [7] Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression
    Bowden, Jack
    Smith, George Davey
    Burgess, Stephen
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2015, 44 (02) : 512 - 525
  • [8] Brown B.C., 2020, bioRxiv
  • [9] Buenrostro Jason D, 2015, Curr Protoc Mol Biol, V109, DOI 10.1002/0471142727.mb2129s109
  • [10] A review of instrumental variable estimators for Mendelian randomization
    Burgess, Stephen
    Small, Dylan S.
    Thompson, Simon G.
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2017, 26 (05) : 2333 - 2355