An approach of encoding for prediction of splice sites using SVM

被引:44
作者
Huang, J. [1 ]
Li, T. [1 ]
Chen, K. [1 ]
Wu, J. [1 ]
机构
[1] Tongji Univ, Dept Chem, Shanghai 200092, Peoples R China
基金
新加坡国家研究基金会;
关键词
splice sites; coding sequence; support vector machines;
D O I
10.1016/j.biochi.2006.03.006
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In splice sites prediction, the accuracy is lower than 90% though the sequences adjacent to the splice sites have a high conservation. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, and few used for solving the fundamental issues, namely, nucleotide encoding. In this paper, a predictor is constructed to predict the true and false splice sites for higher eukaryotes based on support vector machines (SVM). Four types of encoding, which were mono-nucleotide (MN) encoding, MN with frequency difference between the true sites and false sites (FDTF) encoding, Pair-wise nucleotides (PN) encoding and PN with FDTF encoding, were applied to generate the input for the SVM. The results showed that PN with FDTF encoding as input to SVM led to the most reliable recognition of splice sites and the accuracy for the prediction of true donor sites and false sites were 96.3%, 93.7%, respectively, and the accuracy for predicting of true acceptor sites and false sites were 94.0%, 93.2%, respectively. (c) 2006 Elsevier SAS. All rights reserved.
引用
收藏
页码:923 / 929
页数:7
相关论文
共 23 条
[1]   Genomics - The end of the beginning [J].
Brenner, S .
SCIENCE, 2000, 287 (5461) :2173-2174
[2]  
Brown M. P. S., 1999, KNOWLEDGE BASED ANAL
[3]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[4]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[5]  
CHAGN CC, 2001, LIBSVM LIBR SUPPORT
[6]  
CHIHWEI HCC, 2002, PRACTICAL GUIDE SUPP
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]   Feature subset selection for splice site prediction [J].
Degroeve, S ;
De Baets, B ;
Van de Peer, Y ;
Rouzé, P .
BIOINFORMATICS, 2002, 18 :S75-S83
[9]   SELECTION OF SPLICE SITES IN PRE-MESSENGER-RNAS WITH SHORT INTERNAL EXONS [J].
DOMINSKI, Z ;
KOLE, R .
MOLECULAR AND CELLULAR BIOLOGY, 1991, 11 (12) :6075-6083
[10]   Finding genes in DNA with a Hidden Markov Model [J].
Henderson, J ;
Salzberg, S ;
Fasman, KH .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (02) :127-141