Effective hidden Markov models for detecting splicing junction sites in DNA sequences

被引:30
作者
Yin, MM [1 ]
Wang, JTL [1 ]
机构
[1] New Jersey Inst Technol, Dept Comp & Informat Sci, Newark, NJ 07102 USA
关键词
hidden Markov models; bioinformatics; computational biology; splicing junction; gene finding;
D O I
10.1016/S0020-0255(01)00160-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Identification or prediction of coding sequences from within genomic DNA has been a major rate-limiting step in the pursuit of genes. Programs currently available are far from being powerful enough to elucidate the gene structure completely. In this paper, we develop effective hidden Markov models (HMMs) to represent the consensus and degeneracy features of splicing junction sites in eukaryotic genes. Our HMM system based on the developed HMMs is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system can correctly detect 92% of the true donor sites and 91.5% of the true acceptor sites in the test data set containing real vertebrate gene sequences. These results suggest that our approach provide a useful tool in discovering the splicing junction sites in eukaryotic genes. (C) 2001 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:139 / 163
页数:25
相关论文
共 16 条
[1]   Detection of eukaryotic promoters using Markov transition matrices [J].
Audic, S ;
Claverie, JM .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :223-227
[2]   An artificial intelligence approach to motif discovery in protein sequences: Application to steroid dehydrogenases [J].
Bailey, TL ;
Baker, ME ;
Elkan, CP .
JOURNAL OF STEROID BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1997, 62 (01) :29-44
[3]  
Baker J., 1982, STUDY BIOL
[4]   GENMARK - PARALLEL GENE RECOGNITION FOR BOTH DNA STRANDS [J].
BORODOVSKY, M ;
MCININCH, J .
COMPUTERS & CHEMISTRY, 1993, 17 (02) :123-133
[5]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[6]   The difficulty of identifying genes in anonymous vertebrate sequences [J].
Claverie, JM ;
Poirot, O ;
Lopez, F .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :203-214
[7]   Computational gene identification: an open problem [J].
Guigo, R .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :215-222
[8]   Finding genes in DNA with a Hidden Markov Model [J].
Henderson, J ;
Salzberg, S ;
Fasman, KH .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (02) :127-141
[9]   GeneMark.hmm: new solutions for gene finding [J].
Lukashin, AV ;
Borodovsky, M .
NUCLEIC ACIDS RESEARCH, 1998, 26 (04) :1107-1115
[10]  
Salzberg SL, 1997, COMPUT APPL BIOSCI, V13, P365