Faster model-based estimation of ancestry proportions

被引:0
作者
Santander, Cindy G. [1 ]
Martinez, Alba Refoyo [2 ]
Meisner, Jonas [3 ,4 ]
机构
[1] Univ Copenhagen, Dept Biol, Copenhagen, Denmark
[2] Univ Copenhagen, Ctr Hlth Data Sci, Copenhagen, Denmark
[3] Copenhagen Univ Hosp, Mental Hlth Ctr Copenhagen, Copenhagen, Denmark
[4] Univ Copenhagen, Novo Nordisk Fdn, Ctr Basic Metab Res, Copenhagen, Denmark
来源
PEER COMMUNITY JOURNAL | 2024年 / 4卷
关键词
POPULATION-STRUCTURE; ADMIXTURE;
D O I
10.24072/pcjournal.503
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ancestry estimation from genotype data in unrelated individuals has become an essential tool in population and medical genetics to understand demographic population histories and to model or correct for population structure. The ADMIXTURE software is a widely used model-based approach to account for population stratification, however, it struggles with convergence issues and does not scale to modern human datasets or the large number of variants in whole-genome sequencing data. Likelihood-free approaches optimize a least square objective and have gained popularity in recent years due to their scalability. However, this comes at the cost of accuracy in the ancestry estimates in more complex admixture scenarios. We present a new model-based approach, fastmixture, which adopts aspects from likelihood-free approaches for parameter initialization, followed by a mini-batch expectation-maximization procedure to model the standard likelihood. In a simulation study, we demonstrate that the model-based approaches of fastmixture and ADMIXTURE are significantly more accurate than recent and likelihood- free approaches. We further show that fastmixture runs approximately 30x faster than ADMIXTURE on both simulated and empirical data from the 1000 Genomes Project such that our model-based approach scales to much larger sample sizes than previously possible.
引用
收藏
页数:15
相关论文
共 29 条
  • [21] Genes mirror geography within Europe
    Novembre, John
    Johnson, Toby
    Bryc, Katarzyna
    Kutalik, Zoltan
    Boyko, Adam R.
    Auton, Adam
    Indap, Amit
    King, Karen S.
    Bergmann, Sven
    Nelson, Matthew R.
    Stephens, Matthew
    Bustamante, Carlos D.
    [J]. NATURE, 2008, 456 (7218) : 98 - U5
  • [22] Population structure and eigenanalysis
    Patterson, Nick
    Price, Alkes L.
    Reich, David
    [J]. PLOS GENETICS, 2006, 2 (12): : 2074 - 2093
  • [23] Pritchard JK, 2000, GENETICS, V155, P945
  • [24] Ruder S, 2017, Arxiv, DOI [arXiv:1609.04747, DOI 10.48550/ARXIV.1609.04747]
  • [25] Santander CG, 2024, bioRxiv, DOI [10.1101/2024.07.08.602454, 10.5281/zenodo.14106454, DOI 10.5281/ZENODO.14106454]
  • [26] Estimation of individual admixture: Analytical and study design considerations
    Tang, H
    Peng, J
    Wang, P
    Risch, NJ
    [J]. GENETIC EPIDEMIOLOGY, 2005, 28 (04) : 289 - 301
  • [27] link-ancestors: fast simulation of local ancestry with tree sequence software
    Tsambos, Georgia
    Kelleher, Jerome
    Ralph, Peter
    Leslie, Stephen
    Vukcevic, Damjan
    [J]. BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [28] Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores
    Wang, Ying
    Tsuo, Kristin
    Kanai, Masahiro
    Neale, Benjamin M.
    Martin, Alicia R.
    [J]. ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, 2022, 5 : 293 - 320
  • [29] A quasi-Newton acceleration for high-dimensional optimization algorithms
    Zhou, Hua
    Alexander, David
    Lange, Kenneth
    [J]. STATISTICS AND COMPUTING, 2011, 21 (02) : 261 - 273