The evolution, evolvability and engineering of gene regulatory DNA

被引:148
作者
Vaishnav, Eeshit Dhaval [1 ,2 ]
de Boer, Carl G. [3 ,4 ]
Molinet, Jennifer [5 ,6 ]
Yassour, Moran [4 ,7 ,8 ]
Fan, Lin [2 ]
Adiconis, Xian [4 ,9 ]
Thompson, Dawn A. [2 ]
Levin, Joshua Z. [4 ,9 ]
Cubillos, Francisco A. [5 ,6 ]
Regev, Aviv [4 ,10 ,11 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[3] Univ British Columbia, Sch Biomed Engn, Vancouver, BC, Canada
[4] Broad Inst MIT & Harvard, Klarman Cell Observ Broad, Cambridge, MA 02142 USA
[5] Univ Santiago Chile, Fac Quim & Biol, Dept Biol, Santiago, Chile
[6] Millennium Inst Integrat Biol iBio, ANID Millennium Sci Initiat Program, Santiago, Chile
[7] Hebrew Univ Jerusalem, Fac Med, Jerusalem, Israel
[8] Hebrew Univ Jerusalem, Rachel & Selim Benin Sch Comp Sci & Engn, Jerusalem, Israel
[9] Broad Inst MIT & Harvard, Stanley Ctr Psychiat Res, Cambridge, MA USA
[10] MIT, Dept Biol, Cambridge, MA 02139 USA
[11] Genentech Inc, San Francisco, CA 94080 USA
关键词
FITNESS LANDSCAPES; EXPRESSION LEVELS; MESSENGER-RNA; TRANSCRIPTION; SELECTION; YEAST; ADAPTATION; ALIGNMENT; ELEMENTS; REVEALS;
D O I
10.1038/s41586-022-04506-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness(1,3). Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces(4-6). Here we build sequence-to-expression models that capture fitness landscapes and usethem to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leadsto the moderation of regulatory complexity. We present an approach for using such modelsto detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
引用
收藏
页码:455 / +
页数:29
相关论文
共 114 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   The deadenylase components Not2p, Not3p, and Not5p promote mRNA decapping [J].
Alhusaini, Najwa ;
Coller, Jeff .
RNA, 2016, 22 (05) :709-721
[3]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[4]   powerlaw: A Python']Python Package for Analysis of Heavy-Tailed Distributions [J].
Alstott, Jeff ;
Bullmore, Edward T. ;
Plenz, Dietmar .
PLOS ONE, 2014, 9 (01)
[5]  
[Anonymous], 2017, Convolutional Kitchen Sinks for Transcription Factor Binding Site Prediction. (Iid)
[6]   Effective gene expression prediction from sequence by integrating long-range interactions [J].
Avsec, Ziga ;
Agarwal, Vikram ;
Visentin, Daniel ;
Ledsam, Joseph R. ;
Grabska-Barwinska, Agnieszka ;
Taylor, Kyle R. ;
Assael, Yannis ;
Jumper, John ;
Kohli, Pushmeet ;
Kelley, David R. .
NATURE METHODS, 2021, 18 (10) :1196-+
[7]   Base-resolution models of transcription-factor binding reveal soft motif syntax [J].
Avsec, Ziga ;
Weilert, Melanie ;
Shrikumar, Avanti ;
Krueger, Sabrina ;
Alexandari, Amr ;
Dalal, Khyati ;
Fropf, Robin ;
McAnany, Charles ;
Gagneur, Julien ;
Kundaje, Anshul ;
Zeitlinger, Julia .
NATURE GENETICS, 2021, 53 (03) :354-+
[8]   The Kipoi repository accelerates community exchange and reuse of predictive models for genomics [J].
Avsec, Ziga ;
Kreuzhuber, Roman ;
Israeli, Johnny ;
Xu, Nancy ;
Cheng, Jun ;
Shrikumar, Avanti ;
Banerjee, Abhimanyu ;
Kim, Daniel S. ;
Beier, Thorsten ;
Urban, Lara ;
Kundaje, Anshul ;
Stegle, Oliver ;
Gagneur, Julien .
NATURE BIOTECHNOLOGY, 2019, 37 (06) :592-600
[9]   DREME: motif discovery in transcription factor ChIP-seq data [J].
Bailey, Timothy L. .
BIOINFORMATICS, 2011, 27 (12) :1653-1659
[10]   Divergent MLS1 Promoters Lie on a Fitness Plateau for Gene Expression [J].
Bergen, Andrew C. ;
Olsen, Gerilyn M. ;
Fay, Justin C. .
MOLECULAR BIOLOGY AND EVOLUTION, 2016, 33 (05) :1270-1279