Basecalling using hidden Markov models

被引:12
作者
Boufounos, P
El-Difrawy, S
Ehrlich, D
机构
[1] MIT, Elect Res Lab, Cambridge, MA 02139 USA
[2] Whitehead Inst Biomed Res, Cambridge Ctr 9, Cambridge, MA 02142 USA
来源
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS | 2004年 / 341卷 / 1-2期
关键词
hidden Markov models; basecalling; DNA sequencing; PHRED;
D O I
10.1016/j.jfranklin.2003.12.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose hidden Markov models to model electropherograms from DNA sequencing equipment and perform basecalling. We model the state emission densities using artificial neural networks, and modify the Baum-Welch reestimation procedure to perform training. Moreover, we develop a method that exploits consensus sequences to label training data, thus minimizing the need for hand labeling. We propose the same method for locating an electropherogram in a longer DNA sequence. We also perform a careful study of the basecalling errors and propose alternative HMM topologies that might further improve performance. Our results demonstrate the potential of these models. Based on these results, we conclude by suggesting further research directions. (C) 2003 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:23 / 36
页数:14
相关论文
共 13 条
[1]  
Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902
[2]  
BOUFOUNOS P, 2002, P WORKSH GEN SIGN PR
[3]  
BOUFOUNOS P, 2002, THEISS
[4]   NEIGHBORING NUCLEOTIDE INTERACTIONS DURING DNA SEQUENCING GEL-ELECTROPHORESIS [J].
BOWLING, JM ;
BRUNER, KL ;
CMARIK, JL ;
TIBBETTS, C .
NUCLEIC ACIDS RESEARCH, 1991, 19 (11) :3089-3097
[5]   Base-calling of automated sequencer traces using phred.: II.: Error probabilities [J].
Ewing, B ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :186-194
[6]   Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment [J].
Ewing, B ;
Hillier, L ;
Wendl, MC ;
Green, P .
GENOME RESEARCH, 1998, 8 (03) :175-185
[7]  
HAAN NM, 2000, ACOUSTICS SPEECH SIG, V6, P3542
[8]  
HENNEBERT J, 1997, EUROSPEECH 97 RHOD G, P1951
[9]  
Minka T., 1998, EXPECTATION MAXIMIZA
[10]  
NELSON D, 1996, GENETIC MAPPING DNA, P183