MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features

被引:368
作者
Jiang, Peng [1 ]
Wu, Haonan [1 ]
Wang, Wenkai [1 ]
Ma, Wei [1 ]
Sun, Xiao [1 ]
Lu, Zuhong [1 ]
机构
[1] SE Univ, Dept Biol Sci & Med Engn, State Key Lab Bioelect, Nanjing 210096, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1093/nar/gkm368
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
To distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (pseudo pre-miRNAs), a hybrid feature which consists of local contiguous structure-sequence composition, minimum of free energy (MFE) of the secondary structure and P-value of randomization test is used. Besides, a novel machine-learning algorithm, random forest (RF), is introduced. The results suggest that our method predicts at 98.21% specificity and 95.09% sensitivity. When compared with the previous study, Triplet-SVM-classifier, our RF method was nearly 10% greater in total accuracy. Further analysis indicated that the improvement was due to both the combined features and the RF algorithm. The MiPred web server is available at http://www.bioinf.seu.edu.cn/miRNA/. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the RF classifier will predict whether it is a real pre-miRNA or a pseudo one.
引用
收藏
页码:W339 / W344
页数:6
相关论文
共 31 条
[1]   MicroRNAs: Genomics, biogenesis, mechanism, and function (Reprinted from Cell, vol 116, pg 281-297, 2004) [J].
Bartel, David P. .
CELL, 2007, 131 (04) :11-29
[2]   Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes [J].
Baskerville, S ;
Bartel, DP .
RNA, 2005, 11 (03) :241-247
[3]   Identification of hundreds of conserved and nonconserved human microRNAs [J].
Bentwich, I ;
Avniel, A ;
Karov, Y ;
Aharonov, R ;
Gilad, S ;
Barad, O ;
Barzilai, A ;
Einat, P ;
Einav, U ;
Meiri, E ;
Sharon, E ;
Spector, Y ;
Bentwich, Z .
NATURE GENETICS, 2005, 37 (07) :766-770
[4]   Detection of 91 potential in plant conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes [J].
Bonnet, E ;
Wuyts, J ;
Rouzé, P ;
Van de Peer, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (31) :11511-11516
[5]   Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences [J].
Bonnet, E ;
Wuyts, J ;
Rouzé, P ;
Van de Peer, Y .
BIOINFORMATICS, 2004, 20 (17) :2911-2917
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Dimitriadou E., 2006, e1071: Misc Functions of the Department of Statistics
[9]   Managing the genome:: microRNAs in Drosophila [J].
Gesellchen, V ;
Boutros, M .
DIFFERENTIATION, 2004, 72 (2-3) :74-80
[10]   The microRNA Registry [J].
Griffiths-Jones, S .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D109-D111