A novel splice site prediction method using support vector machine

被引:0
作者
机构
[1] Cognitive Science Department and Fujian Key Laboratory of the Brain-like Intelligent Systems, Xiamen University
[2] Shenzhen Key Lab. for High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
来源
Wei, Y. (yj.wei@siat.ac.cn) | 1600年 / Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States卷 / 09期
关键词
Distribution of tri-nucleotides; Markov model; Splice site; Support vector machine;
D O I
10.12733/jcis6763
中图分类号
学科分类号
摘要
We present a novel classification method for splice sites prediction using support vector machine (SVM). The method first represents input sequences by sequence-based features, including the information of the distribution of tri-nucleotides and the conserved features surrounding the splice sites characterized by Markov model. An F-score based feature selection method is then used to select informative features to improve the performance. Finally, SVM is employed to classify the splice sites with the selected features. Experimental results show that this method improves splice site prediction accuracy and performs better than the existing methods such as MM1-SVM, Reduced MM1-SVM and some other methods. © 2013 Binary Information Press.
引用
收藏
页码:8053 / 8060
页数:7
相关论文
共 27 条
  • [1] Chen T.M., Lu C.C., Li W.H., Prediction of splice sites with dependency graphs and their expanded Bayesian networks, Bioinformatics, 21, 4, pp. 471-482, (2005)
  • [2] Degroeve S., Saeys Y., de Baets B., Rouze P., van de Peer Y., SpliceMachine: Predicting splice sites from high-dimensional local context representations, Bioinformatics, 21, 8, pp. 1332-1338, (2005)
  • [3] Sonnenburg S., Schweikert G., Philips P., Behr J., Ratsch G., Accurate splice site prediction using support vector machines, BMC Bioinformatics, 8, SUPPL.10, (2007)
  • [4] Mathe C., Sagot M.F., Schiex T., Rouze P., Current methods of gene prediction, their strengths and weaknesses, Nucleic Acids Res, 30, pp. 4103-4117, (2002)
  • [5] Brent M.R., Guigo R., Recent advances in gene structure prediction, Curr Opin Struct Biol, 14, pp. 264-272, (2004)
  • [6] Lim L.P., Burge C.B., A computational analysis of sequence features involved in recognition of short introns, Proc Natl Acad Sci USA, 98, 20, pp. 11193-11198, (2001)
  • [7] Staden R., McLachlan A.D., Codon preference and its use in identifying protein coding regions in long DNA sequences, Nucleic Acids Res, 10, 1, pp. 141-156, (1982)
  • [8] Nikolaou C., Almiranits Y., Measuring the coding potential of genomic sequences through a combination of triplet occurrence patterns and RNY preference, J Mol Evol, 59, 3, pp. 309-316, (2004)
  • [9] Parmley J.L., Hurst L.D., Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals, Mol Biol Evol, 24, 8, pp. 1600-1603, (2007)
  • [10] Staden R., Computer methods to locate signals in nucleic acid sequences, Nucleic Acids Research, 12, pp. 505-519, (1984)