Faster model-based estimation of ancestry proportions

被引:0
作者
Santander, Cindy G. [1 ]
Martinez, Alba Refoyo [2 ]
Meisner, Jonas [3 ,4 ]
机构
[1] Univ Copenhagen, Dept Biol, Copenhagen, Denmark
[2] Univ Copenhagen, Ctr Hlth Data Sci, Copenhagen, Denmark
[3] Copenhagen Univ Hosp, Mental Hlth Ctr Copenhagen, Copenhagen, Denmark
[4] Univ Copenhagen, Novo Nordisk Fdn, Ctr Basic Metab Res, Copenhagen, Denmark
来源
PEER COMMUNITY JOURNAL | 2024年 / 4卷
关键词
POPULATION-STRUCTURE; ADMIXTURE;
D O I
10.24072/pcjournal.503
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ancestry estimation from genotype data in unrelated individuals has become an essential tool in population and medical genetics to understand demographic population histories and to model or correct for population structure. The ADMIXTURE software is a widely used model-based approach to account for population stratification, however, it struggles with convergence issues and does not scale to modern human datasets or the large number of variants in whole-genome sequencing data. Likelihood-free approaches optimize a least square objective and have gained popularity in recent years due to their scalability. However, this comes at the cost of accuracy in the ancestry estimates in more complex admixture scenarios. We present a new model-based approach, fastmixture, which adopts aspects from likelihood-free approaches for parameter initialization, followed by a mini-batch expectation-maximization procedure to model the standard likelihood. In a simulation study, we demonstrate that the model-based approaches of fastmixture and ADMIXTURE are significantly more accurate than recent and likelihood- free approaches. We further show that fastmixture runs approximately 30x faster than ADMIXTURE on both simulated and empirical data from the 1000 Genomes Project such that our model-based approach scales to much larger sample sizes than previously possible.
引用
收藏
页数:15
相关论文
共 29 条
  • [1] Fast model-based estimation of ancestry in unrelated individuals
    Alexander, David H.
    Novembre, John
    Lange, Kenneth
    [J]. GENOME RESEARCH, 2009, 19 (09) : 1655 - 1664
  • [2] Auton A., 2015, Nature, V526, P68, DOI [10.1038/nature15393, DOI 10.1038/NATURE15393]
  • [3] Efficient ancestry and mutation simulation with msprime 1.0
    Baumdicker, Franz
    Bisschop, Gertjan
    Goldstein, Daniel
    Gower, Graham
    Ragsdale, Aaron P.
    Tsambos, Georgia
    Zhu, Sha
    Eldon, Bjarki
    Ellerman, E. Castedo
    Galloway, Jared G.
    Gladstein, Ariella L.
    Gorjanc, Gregor
    Guo, Bing
    Jeffery, Ben
    Kretzschumar, Warren W.
    Lohse, Konrad
    Matschiner, Michael
    Nelson, Dominic
    Pope, Nathaniel S.
    Quinto-Cortes, Consuelo D.
    Rodrigues, Murillo F.
    Saunack, Kumar
    Sellinger, Thibaut
    Thornton, Kevin
    van Kemenade, Hugo
    Wohns, Anthony W.
    Wong, Yan
    Gravel, Simon
    Kern, Andrew D.
    Koskela, Jere
    Ralph, Peter L.
    Kelleher, Jerome
    [J]. GENETICS, 2022, 220 (03)
  • [4] Ancestry-specific recent effective population size in the Americas
    Browning, Sharon R.
    Browning, Brian L.
    Daviglus, Martha L.
    Durazo-Arvizu, Ramon A.
    Schneiderman, Neil
    Kaplan, Robert C.
    Laurie, Cathy C.
    [J]. PLOS GENETICS, 2018, 14 (05):
  • [5] A Likelihood-Free Estimator of Population Structure Bridging Admixture Models and Principal Components Analysis
    Cabreros, Irineo
    Storey, John D.
    [J]. GENETICS, 2019, 212 (04) : 1009 - 1029
  • [6] Second-generation PLINK: rising to the challenge of larger and richer datasets
    Chang, Christopher C.
    Chow, Carson C.
    Tellier, Laurent C. A. M.
    Vattikuti, Shashaank
    Purcell, Shaun M.
    Lee, James J.
    [J]. GIGASCIENCE, 2015, 4
  • [7] Inferring population structure in biobank-scale genomic data
    Chiu, Alec M.
    Molloy, Erin K.
    Tan, Zilong
    Talwalkar, Ameet
    Sankararaman, Sriram
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2022, 109 (04) : 727 - 737
  • [8] Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
    Engelhardt, Barbara E.
    Stephens, Matthew
    [J]. PLOS GENETICS, 2010, 6 (09):
  • [9] A second generation human haplotype map of over 3.1 million SNPs
    Frazer, Kelly A.
    Ballinger, Dennis G.
    Cox, David R.
    Hinds, David A.
    Stuve, Laura L.
    Gibbs, Richard A.
    Belmont, John W.
    Boudreau, Andrew
    Hardenbol, Paul
    Leal, Suzanne M.
    Pasternak, Shiran
    Wheeler, David A.
    Willis, Thomas D.
    Yu, Fuli
    Yang, Huanming
    Zeng, Changqing
    Gao, Yang
    Hu, Haoran
    Hu, Weitao
    Li, Chaohua
    Lin, Wei
    Liu, Siqi
    Pan, Hao
    Tang, Xiaoli
    Wang, Jian
    Wang, Wei
    Yu, Jun
    Zhang, Bo
    Zhang, Qingrun
    Zhao, Hongbin
    Zhao, Hui
    Zhou, Jun
    Gabriel, Stacey B.
    Barry, Rachel
    Blumenstiel, Brendan
    Camargo, Amy
    Defelice, Matthew
    Faggart, Maura
    Goyette, Mary
    Gupta, Supriya
    Moore, Jamie
    Nguyen, Huy
    Onofrio, Robert C.
    Parkin, Melissa
    Roy, Jessica
    Stahl, Erich
    Winchester, Ellen
    Ziaugra, Liuda
    Altshuler, David
    Shen, Yan
    [J]. NATURE, 2007, 449 (7164) : 851 - U3
  • [10] Fast and Efficient Estimation of Individual Ancestry Coefficients
    Frichot, Eric
    Mathieu, Francois
    Trouillon, Theo
    Bouchard, Guillaume
    Francois, Olivier
    [J]. GENETICS, 2014, 196 (04) : 973 - +