Hold out the genome: a roadmap to solving the cis-regulatory code

被引:25
|
作者
de Boer, Carl G. [1 ]
Taipale, Jussi [2 ,3 ,4 ]
机构
[1] Univ British Columbia, Sch Biomed Engn, Vancouver, BC, Canada
[2] Univ Helsinki, Fac Med, Appl Tumor Genom Res Program, Helsinki, Finland
[3] Karolinska Inst, Dept Med Biochem & Biophys, Stockholm, Sweden
[4] Univ Cambridge, Dept Biochem, Cambridge, England
关键词
ENHANCER ACTIVITY MAPS; TRANSCRIPTION FACTORS; SHADOW ENHANCERS; GENE; SEQUENCE; BINDING; EVOLUTION; EXPRESSION; ELEMENTS; MODEL;
D O I
10.1038/s41586-023-06661-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
引用
收藏
页码:41 / 50
页数:10
相关论文
共 50 条
  • [31] A change in cis-regulatory logic underlying obligate versus facultative muscle multinucleation in chordates
    Johnson, Christopher J.
    Zhang, Zheng
    Zhang, Haifeng
    Shang, Renjie
    Piekarz, Katarzyna M.
    Bi, Pengpeng
    Stolfi, Alberto
    DEVELOPMENT, 2024, 151 (20):
  • [32] Cell-Type Resolved Insights into the Cis-Regulatory Genome of NAFLD
    Dam, Trine V.
    Toft, Nicolaj I.
    Grontved, Lars
    CELLS, 2022, 11 (05)
  • [33] Deep learning the cis-regulatory code for gene expression in selected model plants
    Peleke, Fritz Forbang
    Zumkeller, Simon Maria
    Gueltas, Mehmet
    Schmitt, Armin
    Szymanski, Jedrzej
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [34] Cis-Regulatory Timers for Developmental Gene Expression
    Christiaen, Lionel
    PLOS BIOLOGY, 2013, 11 (10)
  • [35] Deciphering cis-regulatory grammar with deep learning
    Miraldi, Emily R.
    Chen, Xiaoting
    Weirauch, Matthew T.
    NATURE GENETICS, 2021, 53 (03) : 266 - 268
  • [36] OncoCis: annotation of cis-regulatory mutations in cancer
    Perera, Dilmi
    Chacon, Diego
    Thoms, Julie A. I.
    Poulos, Rebecca C.
    Shlien, Adam
    Beck, Dominik
    Campbell, Peter J.
    Pimanda, John E.
    Wong, Jason W. H.
    GENOME BIOLOGY, 2014, 15 (10)
  • [37] The cis-Regulatory Atlas of the Mouse Immune System
    Yoshida, Hideyuki
    Lareau, Caleb A.
    Ramirez, Ricardo N.
    Rose, Samuel A.
    Maier, Barbara
    Wroblewska, Aleksandra
    Desland, Fiona
    Chudnovskiy, Aleksey
    Mortha, Arthur
    Dominguez, Claudia
    Tellier, Julie
    Kim, Edy
    Dwyer, Dan
    Shinton, Susan
    Nabekura, Tsukasa
    Qi, YiLin
    Yu, Bingfei
    Robinette, Michelle
    Kim, Ki-Wook
    Wagers, Amy
    Rhoads, Andrew
    Nutt, Stephen L.
    Brown, Brian D.
    Mostafavi, Sara
    Buenrostro, Jason D.
    Benoist, Christophe
    CELL, 2019, 176 (04) : 897 - +
  • [38] Discovering cis-regulatory modules by optimizing barbecues
    Mosig, Axel
    Biyikoglu, Tuerker
    Prohaska, Sonja J.
    Stadler, Peter F.
    DISCRETE APPLIED MATHEMATICS, 2009, 157 (10) : 2458 - 2468
  • [39] A cis-regulatory logic simulator
    Zeigler, Robert D.
    Gertz, Jason
    Cohen, Barak A.
    BMC BIOINFORMATICS, 2007, 8
  • [40] Motif-Blind, Genome-Wide Discovery of cis-Regulatory Modules in Drosophila and Mouse
    Kantorovitz, Miriam R.
    Kazemian, Majid
    Kinston, Sarah
    Miranda-Saavedra, Diego
    Zhu, Qiyun
    Robinson, Gene E.
    Goettgens, Berthold
    Halfon, Marc S.
    Sinha, Saurabh
    DEVELOPMENTAL CELL, 2009, 17 (04) : 568 - 579