Learning Retention Mechanisms and Evolutionary Parameters of Duplicate Genes from Their Expression Data

被引:6
作者
DeGiorgio, Michael [1 ,2 ]
Assis, Raquel [1 ,2 ]
机构
[1] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
[2] Florida Atlantic Univ, Inst Human Hlth & Dis Intervent, Boca Raton, FL 33431 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
gene duplication; neofunctionalization; subfunctionalization; Ornstein-Uhlenbeck; neural network; GENOME DUPLICATION; POSITIVE SELECTION; TRANSITION-STATE; SMALL-SCALE; DROSOPHILA; RATES; NEOFUNCTIONALIZATION; DIVERGENCE; PROTEINS; SUBFUNCTIONALIZATION;
D O I
10.1093/molbev/msaa267
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.
引用
收藏
页码:1209 / 1224
页数:16
相关论文
共 105 条
  • [91] Deep Learning for Population Genetic Inference
    Sheehan, Sara
    Song, Yun S.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (03)
  • [92] Overtraining, regularization and searching for a minimum, with application to neural networks
    Sjoberg, J
    Ljung, L
    [J]. INTERNATIONAL JOURNAL OF CONTROL, 1995, 62 (06) : 1391 - 1407
  • [93] Srivastava N, 2014, J MACH LEARN RES, V15, P1929
  • [94] On the possibility of constructive neutral evolution
    Stoltzfus, A
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1999, 49 (02) : 169 - 181
  • [95] Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome
    Subramanian, S
    Kumar, S
    [J]. GENETICS, 2004, 168 (01) : 373 - 381
  • [96] Localization of adaptive variants in human genomes using averaged one-dependence estimation
    Sugden, Lauren Alpert
    Atkinson, Elizabeth G.
    Fischer, Annie P.
    Rong, Stephen
    Henn, Brenna M.
    Ramachandran, Sohini
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [97] Models for gene duplication when dosage balance works as a transition state to subsequent neo-or sub-functionalization
    Teufel, Ashley I.
    Liu, Liang
    Liberles, David A.
    [J]. BMC EVOLUTIONARY BIOLOGY, 2016, 16
  • [98] Differential analysis of gene regulation at transcript resolution with RNA-seq
    Trapnell, Cole
    Hendrickson, David G.
    Sauvageau, Martin
    Goff, Loyal
    Rinn, John L.
    Pachter, Lior
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (01) : 46 - +
  • [99] Trevor Hastie, 2009, The elements of statistical learning, V2nd, DOI [10.1007/978-0-387-84858-7, DOI 10.1007/978-0-387-84858-7]
  • [100] Cellular reactions to gene dosage imbalance: genomic, transcriptomic and proteomic effects
    Veitia, Reiner A.
    Bottani, Samuel
    Birchler, James A.
    [J]. TRENDS IN GENETICS, 2008, 24 (08) : 390 - 397