Haplotype and population structure inference using neural networks in whole-genome sequencing data

被引:12
|
作者
Meisner, Jonas [1 ]
Albrechtsen, Anders [1 ]
机构
[1] Univ Copenhagen, Bioinformat Ctr, Dept Biol, DK-2200 Copenhagen, Denmark
关键词
INDIVIDUAL ADMIXTURE;
D O I
10.1101/gr.276813.122
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
引用
收藏
页码:1542 / 1552
页数:11
相关论文
共 50 条
  • [21] Robustness in population-structure and demographic-inference results derived from the Aedes aegypti genotyping chip and whole-genome sequencing data
    Gomez-Palacio, Andres
    Morinaga, Gen
    Turner, Paul E.
    Micieli, Maria Victoria
    Elnour, Mohammed-Ahmed B.
    Salim, Bashir
    Surendran, Sinnathamby Noble
    Ramasamy, Ranjan
    Powell, Jeffrey R.
    Soghigian, John
    Gloria-Soria, Andrea
    G3-GENES GENOMES GENETICS, 2024, 14 (06):
  • [22] Whole-genome sequencing
    Morris, Huw R.
    Houlden, Henry
    Polke, James
    PRACTICAL NEUROLOGY, 2021, 21 (04) : 322 - +
  • [23] Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
    Browning, Sharon R.
    Browning, Brian L.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 1084 - 1097
  • [24] Identification of individuals by trait prediction using whole-genome sequencing data
    Lippert, Christoph
    Sabatini, Riccardo
    Maher, M. Cyrus
    Kang, Eun Yong
    Lee, Seunghak
    Arikan, Okan
    Harley, Alena
    Bernal, Axel
    Garst, Peter
    Lavrenko, Victor
    Yocum, Ken
    Wong, Theodore
    Zhu, Mingfu
    Yang, Wen-Yun
    Chang, Chris
    Lu, Tim
    Lee, Charlie W. H.
    Hicks, Barry
    Ramakrishnan, Smriti
    Tang, Haibao
    Xie, Chao
    Piper, Jason
    Brewerton, Suzanne
    Turpaz, Yaron
    Telenti, Amalio
    Roby, Rhonda K.
    Och, Franz J.
    Venter, J. Craig
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (38) : 10166 - 10171
  • [25] Comprehensive clinical pharmacogenomic profiling using whole-genome sequencing data
    Zhang, Lusi
    Bishop, Jeffrey R.
    Mroz, Pawel
    PHARMACOGENETICS AND GENOMICS, 2023, 33 (08): : 189 - 189
  • [26] Using whole-genome sequencing to discover mutations
    Laffman-Johnson, Elise
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2009, 85 (02) : 118 - 118
  • [27] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Vincent, Martin
    Mundbjerg, Kamilla
    Pedersen, Jakob Skou
    Liang, Gangning
    Jones, Peter A.
    Orntoft, Torben Falck
    Sorensen, Karina Dalsgaard
    Wiuf, Carsten
    GENOME BIOLOGY, 2017, 18
  • [28] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Martin Vincent
    Kamilla Mundbjerg
    Jakob Skou Pedersen
    Gangning Liang
    Peter A. Jones
    Torben Falck Ørntoft
    Karina Dalsgaard Sørensen
    Carsten Wiuf
    Genome Biology, 18
  • [29] Inferring Signatures of Positive Selection in Whole-Genome Sequencing Data: An Overview of Haplotype-Based Methods
    Abondio, Paolo
    Cilli, Elisabetta
    Luiselli, Donata
    GENES, 2022, 13 (05)
  • [30] GAMIBHEAR: whole-genome haplotype reconstruction from Genome Architecture Mapping data
    Markowski, Julia
    Kempfer, Rieke
    Kukalev, Alexander
    Irastorza-Azcarate, Ibai
    Loof, Gesa
    Kehr, Birte
    Pombo, Ana
    Rahmann, Sven
    Schwarz, Roland F.
    BIOINFORMATICS, 2021, 37 (19) : 3128 - 3135