Machine learning approach for ab initio prediction of microRNA precursors

被引:0
作者
Jiang, Peng [1 ]
Wang, Wenkai [1 ]
Sang, Fei [1 ]
Tong, Jing [1 ]
Lu, Zuhong [1 ]
机构
[1] Southeast Univ, State Key Lab Bioelect, Dept Biol Sci & Med Engn, Nanjing 210096, Peoples R China
来源
PROGRESS ON POST-GENOME TECHNOLOGIES | 2007年
关键词
real/pseudo pre-miRNAs; classification; random forest;
D O I
暂无
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Although comparative genomics based methods provided important techniques to predict new miRNAs, it is unable to identify novel miRNAs for which there are no known close homologies. It is a fact that almost all pre - miRNAs have the characteristic of stem - loop hairpin structures. Therefore those hairpin structures give key clues to the ab initio prediction of pre - miRNAs. However, a large amount of pre - rniRNA - like hairpins can be folded in many genomes. It is challenging to distinguish the real pre - miRNAs from other hairpin sequences with similar stem - loops (pseudo pre - miRNAs). In this paper, to distinguish the real pre - miRNAs from other hairpin sequences with similar stem - loops (pseudo pre - miRNAs), we proposed a novel machine learning method: random forest. Coupled with a hybrid feature which consists of local contiguous structure - sequence composition, minimum of free energy (MFE) of the secondary structure and p - value of randomization test, the prediction model achieves 98.21 % specificity and 95.09% sensitivity.
引用
收藏
页码:190 / 193
页数:4
相关论文
共 10 条
[1]   Detection of 91 potential in plant conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes [J].
Bonnet, E ;
Wuyts, J ;
Rouzé, P ;
Van de Peer, Y .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (31) :11511-11516
[2]   Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences [J].
Bonnet, E ;
Wuyts, J ;
Rouzé, P ;
Van de Peer, Y .
BIOINFORMATICS, 2004, 20 (17) :2911-2917
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Vienna RNA secondary structure server [J].
Hofacker, IL .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3429-3431
[5]   Computational identification of plant MicroRNAs and their targets, including a stress-induced miRNA [J].
Jones-Rhoades, MW ;
Bartel, DP .
MOLECULAR CELL, 2004, 14 (06) :787-799
[6]   Computational identification of Drosophila microRNA genes -: art. no. R42 [J].
Lai, EC ;
Tomancak, P ;
Williams, RW ;
Rubin, GM .
GENOME BIOLOGY, 2003, 4 (07)
[7]   COMPARISON OF PREDICTED AND OBSERVED SECONDARY STRUCTURE OF T4 PHAGE LYSOZYME [J].
MATTHEWS, BW .
BIOCHIMICA ET BIOPHYSICA ACTA, 1975, 405 (02) :442-451
[8]   MicroRNA identification based on sequence and structure alignment [J].
Wang, XW ;
Zhang, J ;
Gu, J ;
He, T ;
Zhang, XG ;
Li, YD ;
Li, F .
BIOINFORMATICS, 2005, 21 (18) :3610-3614
[9]   No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution [J].
Workman, C ;
Krogh, A .
NUCLEIC ACIDS RESEARCH, 1999, 27 (24) :4816-4822
[10]   Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine [J].
Xue, CH ;
Li, F ;
He, T ;
Liu, GP ;
Li, YD ;
Zhang, XG .
BMC BIOINFORMATICS, 2005, 6 (1)