miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM

被引:26
作者
Ding, Jiandong [1 ,2 ]
Zhou, Shuigeng [1 ,2 ]
Guan, Jihong [3 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200433, Peoples R China
关键词
SECONDARY STRUCTURE; MICRORNAS; IDENTIFICATION; CONSERVATION; PREDICTION; FAMILIES; RFAM;
D O I
10.1186/1471-2105-12-216
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: MicroRNAs (miRNAs) are similar to 22 nt long integral elements responsible for post-transcriptional control of gene expressions. After the identification of thousands of miRNAs, the challenge is now to explore their specific biological functions. To this end, it will be greatly helpful to construct a reasonable organization of these miRNAs according to their homologous relationships. Given an established miRNA family system (e. g. the miRBase family organization), this paper addresses the problem of automatically and accurately classifying newly found miRNAs to their corresponding families by supervised learning techniques. Concretely, we propose an effective method, miRFam, which uses only primary information of pre-miRNAs or mature miRNAs and a multiclass SVM, to automatically classify miRNA genes. Results: An existing miRNA family system prepared by miRBase was downloaded online. We first employed n-grams to extract features from known precursor sequences, and then trained a multiclass SVM classifier to classify new miRNAs (i.e. their families are unknown). Comparing with miRBase's sequence alignment and manual modification, our study shows that the application of machine learning techniques to miRNA family classification is a general and more effective approach. When the testing dataset contains more than 300 families (each of which holds no less than 5 members), the classification accuracy is around 98%. Even with the entire miRBase 15 (1056 families and more than 650 of them hold less than 5 samples), the accuracy surprisingly reaches 90%. Conclusions: Based on experimental results, we argue that miRFam is suitable for application as an automated method of family classification, and it is an important supplementary tool to the existing alignment-based small non-coding RNA (sncRNA) classification methods, since it only requires primary sequence information.
引用
收藏
页数:11
相关论文
共 47 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 2014, C4. 5: programs for machine learning
[3]   MicroRNAs: Genomics, biogenesis, mechanism, and function (Reprinted from Cell, vol 116, pg 281-297, 2004) [J].
Bartel, David P. .
CELL, 2007, 131 (04) :11-29
[4]   Identification of hundreds of conserved and nonconserved human microRNAs [J].
Bentwich, I ;
Avniel, A ;
Karov, Y ;
Aharonov, R ;
Gilad, S ;
Barad, O ;
Barzilai, A ;
Einat, P ;
Einav, U ;
Meiri, E ;
Sharon, E ;
Spector, Y ;
Bentwich, Z .
NATURE GENETICS, 2005, 37 (07) :766-770
[5]   Deep conservation of microRNA-target relationships and 3′UTR motifs in vertebrates, flies, and neimatodes [J].
Chen, K. ;
Rajewsky, N. .
COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY, 2006, 71 :149-156
[6]   On the algorithmic implementation of multiclass kernel-based vector machines [J].
Crammer, K ;
Singer, Y .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) :265-292
[7]   Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction [J].
Dowell, RD ;
Eddy, SR .
BMC BIOINFORMATICS, 2004, 5 (1)
[8]  
Duan KB, 2005, LECT NOTES COMPUT SC, V3541, P278
[9]   A Human snoRNA with MicroRNA-Like Functions [J].
Ender, Christine ;
Krek, Azra ;
Friedlaender, Marc R. ;
Beitzinger, Michaela ;
Weinmann, Lasse ;
Chen, Wei ;
Pfeffer, Sebastien ;
Rajewsky, Nikolaus ;
Meister, Gunter .
MOLECULAR CELL, 2008, 32 (04) :519-528
[10]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139