Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data

被引:379
|
作者
Meisner, Jonas [1 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Dept Biol, Bioinformat Ctr, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
关键词
Population structure; PCA; admixture; ancestry; next-generation sequencing; genotype likelihoods; low depth; INDIVIDUAL ADMIXTURE; MATRIX; STRATIFICATION; GENOTYPE; COMPONENTS; ALGORITHMS; ANCESTRY;
D O I
10.1534/genetics.118.301336
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We here present two methods for inferring population structure and admixture proportions in low-depth next-generation sequencing (NGS) data. Inference of population structure is essential in both population genetics and association studies, and is often performed using principal component analysis (PCA) or clustering-based approaches. NGS methods provide large amounts of genetic data but are associated with statistical uncertainty, especially for low-depth sequencing data. Models can account for this uncertainty by working directly on genotype likelihoods of the unobserved genotypes. We propose a method for inferring population structure through PCA in an iterative heuristic approach of estimating individual allele frequencies, where we demonstrate improved accuracy in samples with low and variable sequencing depth for both simulated and real datasets. We also use the estimated individual allele frequencies in a fast non-negative matrix factorization method to estimate admixture proportions. Both methods have been implemented in the PCAngsd framework available at http://www.popgen.dk/software/.
引用
收藏
页码:719 / 731
页数:13
相关论文
共 18 条
  • [1] Fast admixture analysis and population tree estimation for SNP and NGS data
    Cheng, Jade Yu
    Mailund, Thomas
    Nielsen, Rasmus
    BIOINFORMATICS, 2017, 33 (14) : 2148 - 2155
  • [2] Moment estimators of relatedness from low-depth whole-genome sequencing data
    Herzig, Anthony F.
    Ciullo, M.
    Consortium, FranceGenRef
    Leutenegger, A-L
    Perdry, H.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [3] Pooled mapping: an efficient method of calling variations for population samples with low-depth resequencing data
    Fu, Lixia
    Cai, Chengcheng
    Cui, Yinan
    Wu, Jian
    Liang, Jianli
    Cheng, Feng
    Wang, Xiaowu
    MOLECULAR BREEDING, 2016, 36 (04)
  • [4] Pooled mapping: an efficient method of calling variations for population samples with low-depth resequencing data
    Lixia Fu
    Chengcheng Cai
    Yinan Cui
    Jian Wu
    Jianli Liang
    Feng Cheng
    Xiaowu Wang
    Molecular Breeding, 2016, 36
  • [5] Inference of population structure and admixture proportion from Y chromosomal data of Chinese population
    Song, Mengyuan
    Wang, Xindi
    Zhao, Chenxi
    Qian, Xiaoqin
    Lang, Min
    Hou, Yiping
    Song, Feng
    ELECTROPHORESIS, 2022, 43 (23-24) : 2351 - 2362
  • [6] Moment estimators of relatedness from low-depth whole-genome sequencing data
    Anthony F. Herzig
    M. Ciullo
    A-L. Leutenegger
    H. Perdry
    BMC Bioinformatics, 23
  • [7] Inferring population structure in biobank-scale genomic data
    Chiu, Alec M.
    Molloy, Erin K.
    Tan, Zilong
    Talwalkar, Ameet
    Sankararaman, Sriram
    AMERICAN JOURNAL OF HUMAN GENETICS, 2022, 109 (04) : 727 - 737
  • [8] An eigenvalue ratio approach to inferring population structure from whole genome sequencing data
    Xu, Yuyang
    Liu, Zhonghua
    Yao, Jianfeng
    BIOMETRICS, 2023, 79 (02) : 891 - 902
  • [9] Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data
    Waples, Ryan K.
    Albrechtsen, Anders
    Moltke, Ida
    MOLECULAR ECOLOGY, 2019, 28 (01) : 35 - 48
  • [10] Testing for Hardy-Weinberg equilibrium in structured populations using genotype or low-depth next generation sequencing data
    Meisner, Jonas
    Albrechtsen, Anders
    MOLECULAR ECOLOGY RESOURCES, 2019, 19 (05) : 1144 - 1152