Automatic inference of demographic parameters using generative adversarial networks

被引:36
作者
Wang, Zhanpeng [1 ]
Wang, Jiaping [1 ]
Kourakos, Michael [2 ]
Nhung Hoang [2 ]
Lee, Hyong Hark [2 ]
Mathieson, Iain [3 ]
Mathieson, Sara [1 ]
机构
[1] Haverford Coll, Dept Comp Sci, Haverford, PA 19041 USA
[2] Swarthmore Coll, Dept Comp Sci, Swarthmore, PA 19081 USA
[3] Univ Penn, Dept Genet, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
demographic inference; evolutionary modelling; generative adversarial network; simulated data; NATURAL-SELECTION; RECOMBINATION; LANDSCAPE; SAMPLES; MODEL; SITE;
D O I
10.1111/1755-0998.13386
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Population genetics relies heavily on simulated data for validation, inference and intuition. In particular, since the evolutionary 'ground truth' for real data is always limited, simulated data are crucial for training supervised machine learning methods. Simulation software can accurately model evolutionary processes but requires many hand-selected input parameters. As a result, simulated data often fail to mirror the properties of real genetic data, which limits the scope of methods that rely on it. Here, we develop a novel approach to estimating parameters in population genetic models that automatically adapts to data from any population. Our method, pg-gan, is based on a generative adversarial network that gradually learns to generate realistic synthetic data. We demonstrate that our method is able to recover input parameters in a simulated isolation-with-migration model. We then apply our method to human data from the 1000 Genomes Project and show that we can accurately recapitulate the features of real data.
引用
收藏
页码:2689 / 2705
页数:17
相关论文
共 52 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   A community-maintained standard library of population genetic models [J].
Adrion, Jeffrey R. ;
Cole, Christopher B. ;
Dukler, Noah ;
Galloway, Jared G. ;
Gladstein, Ariella L. ;
Gower, Graham ;
Kyriazis, Christopher C. ;
Ragsdale, Aaron P. ;
Tsambos, Georgia ;
Baumdicker, Franz ;
Carlson, Jedidiah ;
Cartwright, Reed A. ;
Durvasula, Arun ;
Gronau, Ilan ;
Kim, Bernard Y. ;
McKenzie, Patrick ;
Messer, Philipp W. ;
Noskova, Ekaterina ;
Ortega-Del Vecchyo, Diego ;
Racimo, Fernando ;
Struck, Travis J. ;
Gravel, Simon ;
Gutenkunst, Ryan N. ;
Lohmueller, Kirk E. ;
Ralph, Peter L. ;
Schrider, Daniel R. ;
Siepel, Adam ;
Kelleher, Jerome ;
Kern, Andrew D. .
ELIFE, 2020, 9 :1-39
[3]   Predicting the Landscape of Recombination Using Deep Learning [J].
Adrion, Jeffrey R. ;
Galloway, Jared G. ;
Kern, Andrew D. .
MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (06) :1790-1808
[4]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[5]   Visualizing population structure with variational autoencoders [J].
Battey, C. J. ;
Coffing, Gabrielle C. ;
Kern, Andrew D. .
G3-GENES GENOMES GENETICS, 2021, 11 (01)
[6]  
Beaumont MA, 2002, GENETICS, V162, P2025
[7]   Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories [J].
Beichma, Annabel C. ;
Phung, Tanya N. ;
Lohmueller, Kirk E. .
G3-GENES GENOMES GENETICS, 2017, 7 (11) :3605-3620
[8]   Non-linear regression models for Approximate Bayesian Computation [J].
Blum, Michael G. B. ;
Francois, Olivier .
STATISTICS AND COMPUTING, 2010, 20 (01) :63-73
[9]   Pros and cons of GAN evaluation measures [J].
Borji, Ali .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 179 :41-65
[10]  
Chan J, 2018, ADV NEUR IN, V31