Haplotype and population structure inference using neural networks in whole-genome sequencing data

被引:12
|
作者
Meisner, Jonas [1 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Bioinformat Ctr, Dept Biol, DK-2200 Copenhagen, Denmark
关键词
INDIVIDUAL ADMIXTURE;
D O I
10.1101/gr.276813.122
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
引用
收藏
页码:1542 / 1552
页数:11
相关论文
共 50 条
  • [1] Deciphering the Population Characteristics of Leiqiong Cattle Using Whole-Genome Sequencing Data
    Guo, Yingwei
    Zhao, Zhihui
    Ge, Fei
    Yu, Haibin
    Lyu, Chenxiao
    Liu, Yuxin
    Li, Junya
    Chen, Yan
    ANIMALS, 2025, 15 (03):
  • [2] Population analysis of the Korean native duck using whole-genome sequencing data
    Lee, Daehwan
    Lee, Jongin
    Heo, Kang-Neung
    Kwon, Kisang
    Moon, Youngbeen
    Lim, Dajeong
    Lee, Kyung-Tai
    Kim, Jaebum
    BMC GENOMICS, 2020, 21 (01)
  • [3] Population analysis of the Korean native duck using whole-genome sequencing data
    Daehwan Lee
    Jongin Lee
    Kang-Neung Heo
    Kisang Kwon
    Youngbeen Moon
    Dajeong Lim
    Kyung-Tai Lee
    Jaebum Kim
    BMC Genomics, 21
  • [4] High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs
    Dilthey, Alexander T.
    Gourraud, Pierre-Antoine
    Mentzer, Alexander J.
    Cereb, Nezih
    Iqbal, Zamin
    McVean, Gil
    PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (10)
  • [5] Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing
    Selvaraj, Siddarth
    Dixon, Jesse R.
    Bansal, Vikas
    Ren, Bing
    NATURE BIOTECHNOLOGY, 2013, 31 (12) : 1111 - +
  • [6] Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing
    Siddarth Selvaraj
    Jesse R Dixon
    Vikas Bansal
    Bing Ren
    Nature Biotechnology, 2013, 31 : 1111 - 1118
  • [7] PennCNV in whole-genome sequencing data
    Lima, Leandro de Araujo
    Wang, Kai
    BMC BIOINFORMATICS, 2017, 18
  • [8] PennCNV in whole-genome sequencing data
    Leandro de Araújo Lima
    Kai Wang
    BMC Bioinformatics, 18
  • [9] Use of whole-genome variants and their frequency data to estimate haplotype structure in the Thoroughbred genome
    Tozaki, Teruaki
    Ohnuma, Aoi
    Kikuchi, Mio
    Ishige, Taichiro
    Kakoi, Hironaga
    Hirora, Kei-ichi
    Nagata, Shun-ichi
    ANIMAL GENETICS, 2023, 54 (05) : 662 - 663
  • [10] Assessing the digenic model in rare disorders using population whole-genome sequencing data
    Moreno-Ruiz, Nerea
    Lao, Oscar
    Ignacio Arostegui, Juan
    Laayouni, Hafid
    Casals, Ferran
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 579 - 579