BRAda: A Robust Method for Identification of Pre-microRNAs by Combining Adaboost Framework with BP and RF

被引:0
|
作者
Zhang, Ningyi [1 ]
Zhang, Ying [2 ]
Zhao, Tianyi [1 ]
Ren, Jun [3 ]
Cheng, Yangmei [4 ]
Hu, Yang [3 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
[2] Heilongjiang Prov Land Reclamat Headquarters Gen, Dept Pharm, Harbin 150088, Heilongjiang, Peoples R China
[3] Harbin Inst Technol, Sch Life Sci & Technol, Harbin 150001, Heilongjiang, Peoples R China
[4] First Peoples Hosp, Anqing Ultrasound Dept, 42 Xiaosu Rd, Anqing City, Anhui, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Biological process; BRAda; BP neural network; genes; Pre-miRNA identification; random forest; CLASSIFICATION; PREDICTION; PRECURSORS; GENE; REAL;
D O I
10.2174/1570178614666170221144619
中图分类号
O62 [有机化学];
学科分类号
070303 ; 081704 ;
摘要
Background: MicroRNAs (miRNAs) are a set of non-coding, short (approximately 21nt) RNAs that play an important role as a regulator in biological processes in the cells. The identification and discovery of pre-miRNAs are beneficial in understanding the regulatory process, the functions of miRNAs and other genes, and furthermore in biological evolution. Methods: Machine learning method has been a powerful technology in distinguishing the real pre-miRNAs from other hairpin-like sequences (pseudo pre-miRNAs). However, most of the commonly used classifiers are not promising in predicting performances on independent testing data sets. To overcome this, we proposed a novel BRAda algorithm integrating BP neural network and random forest classifier based on two balanced training sets. By distributing weights to these classifiers and the proposed 98-dimensional features, we obtained a strong classifier with high-accuracy and stability. Furthermore, based on the novel classifier we proposed, two independent testing sets (undated human and non-human pre-miRNAs) were employed to evaluate the prediction performance. Results: The novel method BRAda algorithm is significantly outperformed the other methods in identifying both human and non-human pre-miRNAs. Conclusion: The novel algorithm integrated BP neural network and random forest classifier based on two balanced training sets. Compared with other state-of-art machine-learning methods, the performance of BRAda was perfect (the ACC is over 99%) according to the validation. Besides, though the algorithm was trained by human gene sets, the prediction performance on non-human testing sets was also excellent (the average ACC is over 97%), which means the method not only has high stability but also robustness. By experiments and validation, the authors showed the method is an effective tool for pre-miRNA identification.
引用
收藏
页码:690 / 695
页数:6
相关论文
共 2 条
  • [1] Identification of pre-microRNAs by characterizing their sequence order evolution information and secondary structure graphs
    Ma, Yuanlin
    Yu, Zuguo
    Han, Guosheng
    Li, Jinyan
    Vo Anh
    BMC BIOINFORMATICS, 2018, 19
  • [2] MatPred: Computational Identification of Mature MicroRNAs within Novel Pre-MicroRNAs
    Li, Jin
    Wang, Ying
    Wang, Lei
    Feng, Weixing
    Luan, Kuan
    Dai, Xuefeng
    Xu, Chengzhen
    Meng, Xianglian
    Zhang, Qiushi
    Liang, Hong
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015