Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays

被引:48
作者
Movva, Rajiv [1 ,2 ]
Greenside, Peyton [3 ]
Marinov, Georgi K. [2 ]
Nair, Surag [4 ]
Shrikumar, Avanti [4 ]
Kundaje, Anshul [2 ,4 ]
机构
[1] Harker Sch, San Jose, CA 95129 USA
[2] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[3] Stanford Univ, Biomed Informat Training Program, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
PLOS ONE | 2019年 / 14卷 / 06期
关键词
POLYUNSATURATED FATTY-ACIDS; ENHANCER ACTIVITY MAPS; TRANSCRIPTION FACTORS; SYSTEMATIC DISSECTION; FUNCTIONAL DISSECTION; ELEMENTS;
D O I
10.1371/journal.pone.0218073
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of similar to 500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman rho = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.
引用
收藏
页数:20
相关论文
共 62 条
  • [1] Genetic effects on gene expression across human tissues
    Aguet, Francois
    Brown, Andrew A.
    Castel, Stephane E.
    Davis, Joe R.
    He, Yuan
    Jo, Brian
    Mohammadi, Pejman
    Park, Yoson
    Parsana, Princy
    Segre, Ayellet V.
    Strober, Benjamin J.
    Zappala, Zachary
    Cummings, Beryl B.
    Gelfand, Ellen T.
    Hadley, Kane
    Huang, Katherine H.
    Lek, Monkol
    Li, Xiao
    Nedzel, Jared L.
    Nguyen, Duyen Y.
    Noble, Michael S.
    Sullivan, Timothy J.
    Tukiainen, Taru
    MacArthur, Daniel G.
    Getz, Gad
    Management, Nih Program
    Addington, Anjene
    Guan, Ping
    Koester, Susan
    Little, A. Roger
    Lockhart, Nicole C.
    Moore, Helen M.
    Rao, Abhi
    Struewing, Jeffery P.
    Volpi, Simona
    Collection, Biospecimen
    Brigham, Lori E.
    Hasz, Richard
    Hunter, Marcus
    Johns, Christopher
    Johnson, Mark
    Kopen, Gene
    Leinweber, William F.
    Lonsdale, John T.
    McDonald, Alisa
    Mestichelli, Bernadette
    Myer, Kevin
    Roe, Bryan
    Salvatore, Michael
    Shad, Saboor
    [J]. NATURE, 2017, 550 (7675) : 204 - +
  • [2] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [3] [Anonymous], BIORXIV
  • [4] [Anonymous], 2017, BIORXIV
  • [5] [Anonymous], BIORXIV
  • [6] [Anonymous], SCIENCE
  • [7] [Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
  • [8] [Anonymous], BIORXIV
  • [9] [Anonymous], BIORXIV
  • [10] [Anonymous], BIORXIV