Blockwise HMM computation for large-scale population genomic inference

被引:10
|
作者
Paul, Joshua S. [1 ]
Song, Yun S. [1 ,2 ]
机构
[1] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
GENE CONVERSION RATES; RECOMBINATION RATES; SAMPLING DISTRIBUTIONS; LINKAGE DISEQUILIBRIUM; COALESCENT HISTORIES; POLYMORPHISM DATA; GENOTYPE DATA; ASSOCIATION; IMPUTATION; HOTSPOTS;
D O I
10.1093/bioinformatics/bts314
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A promising class of methods for large-scale population genomic inference use the conditional sampling distribution (CSD), which approximates the probability of sampling an individual with a particular DNA sequence, given that a collection of sequences from the population has already been observed. The CSD has a wide range of applications, including imputing missing sequence data, estimating recombination rates, inferring human colonization history and identifying tracts of distinct ancestry in admixed populations. Most well-used CSDs are based on hidden Markov models (HMMs). Although computationally efficient in principle, methods resulting from the common implementation of the relevant HMM techniques remain intractable for large genomic datasets. Results: To address this issue, a set of algorithmic improvements for performing the exact HMM computation is introduced here, by exploiting the particular structure of the CSD and typical characteristics of genomic data. It is empirically demonstrated that these improvements result in a speedup of several orders of magnitude for large datasets and that the speedup continues to increase with the number of sequences. The optimized algorithms can be adopted in methods for various applications, including the ones mentioned above and make previously impracticable analyses possible.
引用
收藏
页码:2008 / 2015
页数:8
相关论文
共 50 条
  • [1] Successive Refinement in Large-Scale Computation: Expediting Model Inference Applications
    Esfahanizadeh, Homa
    Cohen, Alejandro
    Shamai, Shlomo
    Medard, Muriel
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2025, 73 : 811 - 826
  • [2] Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation
    Li, Sen
    Jakobsson, Mattias
    BMC GENETICS, 2012, 13
  • [3] Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation
    Sen Li
    Mattias Jakobsson
    BMC Genetics, 13
  • [4] Large-scale genomic sequencing of colorectal cancer in the Japanese population
    Wakai, Toshifumi
    Nagahashi, Masayuki
    Shimada, Yoshifumi
    Ichikawa, Hiroshi
    Kameyama, Hitoshi
    Kobayashi, Takashi
    Sakata, Jun
    Sato, Nobuaki
    Izutsu, Hiroshi
    Kodama, Keisuke
    Nakada, Mitsutaka
    Russell, Meaghan
    Heyer, Joerg
    Powers, Winslow
    Sun, Ruobai
    Ring, Jennifer E.
    Okuda, Shujiro
    Takabe, Kazuaki
    Protopopov, Alexei
    Lyle, Stephen
    JOURNAL OF CLINICAL ONCOLOGY, 2016, 34 (15)
  • [5] PROMISES OF LARGE-SCALE COMPUTATION
    BUZBEE, BL
    RAVECHE, HJ
    JOURNAL OF RESEARCH OF THE NATIONAL BUREAU OF STANDARDS, 1985, 90 (01): : 49 - 52
  • [6] Large-scale inference of population structure in presence of missingness using PCA
    Meisner, Jonas
    Liu, Siyang
    Huang, Mingxi
    Albrechtsen, Anders
    BIOINFORMATICS, 2021, 37 (13) : 1868 - 1875
  • [7] Large-Scale MIMO Detection Using MCMC Approach With Blockwise Sampling
    Bai, Lin
    Li, Tian
    Liu, Jianwei
    Yu, Quan
    Choi, Jinho
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2016, 64 (09) : 3697 - 3707
  • [8] LARGE-SCALE INFERENCE WITH BLOCK STRUCTURE
    Kou, Jiyao
    Walther, Guenther
    ANNALS OF STATISTICS, 2022, 50 (03): : 1541 - 1572
  • [9] Visualizing large-scale genomic sequences
    Glusman, G
    Lancet, D
    IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, 2001, 20 (04): : 49 - 54
  • [10] On the analysis of large-scale genomic structures
    Nestor Norio Oiwa
    Carla Goldman
    Cell Biochemistry and Biophysics, 2005, 42 : 145 - 165