Compressed computations using wavelets for hidden Markov models with continuous observations

被引:3
作者
Bello, Luca [1 ]
Wiedenhoeft, John [2 ]
Schliep, Alexander [1 ,3 ]
机构
[1] Univ Gothenburg, Comp Sci & Engn, Chalmers, Gothenburg, Sweden
[2] Univ Med Ctr Gottingen, Sci Core Facil Med Biometry & Stat Bioinformat, Gottingen, Germany
[3] B TU Cottbus Senftenberg, Fac Hlth Sci, Cottbus, Germany
关键词
LIFTING SCHEME; SEGMENTATION;
D O I
10.1371/journal.pone.0286074
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Compression as an accelerant of computation is increasingly recognized as an important component in engineering fast real-world machine learning methods for big data; c.f., its impact on genome-scale approximate string matching. Previous work showed that compression can accelerate algorithms for Hidden Markov Models (HMM) with discrete observations, both for the classical frequentist HMM algorithms-Forward Filtering, Backward Smoothing and Viterbi-and Gibbs sampling for Bayesian HMM. For Bayesian HMM with continuous-valued observations, compression was shown to greatly accelerate computations for specific types of data. For instance, data from large-scale experiments interrogating structural genetic variation can be assumed to be piece-wise constant with noise, or, equivalently, data generated by HMM with dominant self-transition probabilities. Here we extend the compressive computation approach to the classical frequentist HMM algorithms on continuous-valued observations, providing the first compressive approach for this problem. In a large-scale simulation study, we demonstrate empirically that in many settings compressed HMM algorithms very clearly outperform the classical algorithms with no, or only an insignificant effect, on the computed probabilities and infered state paths of maximal likelihood. This provides an efficient approach to big data computations with HMM. An open-source implementation of the method is available from .
引用
收藏
页数:15
相关论文
共 26 条
[1]  
Balakrishnan G., 2013, Electrical Engineering and Computer Sciences University of California at Berkeley, V53, P57
[2]  
Balasubramanian Vijay., EQUIVALENCE REDUCTIO, p6.1
[3]   IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE [J].
DONOHO, DL ;
JOHNSTONE, IM .
BIOMETRIKA, 1994, 81 (03) :425-455
[4]   How repetitive are genomes? [J].
Haubold, Bernhard ;
Wiehe, Thomas .
BMC BIOINFORMATICS, 2006, 7 (1)
[5]   Observable operator models for discrete stochastic time series [J].
Jaeger, H .
NEURAL COMPUTATION, 2000, 12 (06) :1371-1398
[6]  
Krogh A, 1997, ISMB-97 - FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS FOR MOLECULAR BIOLOGY, PROCEEDINGS, P179
[7]   HIDDEN MARKOV-MODELS IN COMPUTATIONAL BIOLOGY - APPLICATIONS TO PROTEIN MODELING [J].
KROGH, A ;
BROWN, M ;
MIAN, IS ;
SJOLANDER, K ;
HAUSSLER, D .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (05) :1501-1531
[8]   Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances [J].
LaFramboise, Thomas .
NUCLEIC ACIDS RESEARCH, 2009, 37 (13) :4181-4193
[9]   Experiencing SAX: a novel symbolic representation of time series [J].
Lin, Jessica ;
Keogh, Eamonn ;
Wei, Li ;
Lonardi, Stefano .
DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 15 (02) :107-144
[10]  
Mahmud Md Pavel, 2011, Algorithms in Bioinformatics. Proceedings of the 11th International Workshop, WABI 2011, P188, DOI 10.1007/978-3-642-23038-7_17