A novel classification method for class-imbalanced data and its application in microRNA recognition

被引:1
作者
Geng X. [1 ]
Zhu Y.-Q. [1 ]
Yang Z. [2 ]
机构
[1] School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu
[2] School of Management, Jiangsu University, Zhenjiang, Jiangsu
关键词
Adaboost algorithm; Class imbalance; Ensemble learning; Non-coding RNA;
D O I
10.7546/ijba.2018.22.2.133-146
中图分类号
学科分类号
摘要
For non-coding RNA gene mining, especially microRNA mining, there are many challenges in the classification of imbalanced data. A novel classification method based on the Adaboost algorithm is proposed to handle the imbalance of positive and negative cases. Unstable-Adaboost is improved with respect to the initial weight assignment, the base classifier selection, the weight adjustment mechanism and other aspects. Furthermore, the Stable-Adaboost algorithm is proposed, which adjusts the weight of the sample set to rapidly achieve a more balanced training set. In addition, the Stable-Adaboost algorithm can ensure that the follow-up training set is maintained in a balanced state by optimizing the weight adjustment mechanism of incorrectly classified samples and stabilizing the classification performance. Experimental results show the superiority of Unstable-Adaboost and Stable- Adaboost in imbalance classification. © 2018 by the authors.
引用
收藏
页码:133 / 146
页数:13
相关论文
共 26 条
  • [1] Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., Smote: Syntheticminority Over-sampling Technique, Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002)
  • [2] Fawcett T., In vivo Spam Filtering: A Challenge Problem for Data Mining, ACM SigKDD Explorations, 5, 2, pp. 140-148, (2003)
  • [3] Freund Y., Schapire R.E., A Decision-theoretic Generalization of On-line Learning and an Application to Boosting, Journal of Computer and System Sciences, 55, 1, pp. 119-139, (1997)
  • [4] Guo H., Viktor H.L., Learning from Imbalanced Data Sets with Boosting and Data Generation: The DataBoost-IM Approach, ACM SigKDD Explorations, 6, 1, pp. 30-39, (2004)
  • [5] Hu L.L., Huang Y., Wang Q.C., Zou Q., Jiang Y., Benchmark Comparison of ab initio microRNA Identification Methods and Software, Genetics and Molecular Research, 11, 4, pp. 4525-4538, (2012)
  • [6] Kamarajan B.P., Sridhar J., Subramanian S., In silico Prediction of microRNAs in Plant Mitochondria, International Journal Bioautomation, 16, 4, pp. 251-262, (2012)
  • [7] Kubat M.S., Holte R.C.S., Matwin S.S., Machine Learning for the Detection of Oil Spills in Satellite Radar Images, Machine Learning, 30, 2, pp. 195-215, (1998)
  • [8] Li J.Z., Yang K., Gao H., Luo J.Z., Guo Z., Model Free Genes Election Method by Considering Unbalanced Samples, Journal of Software, 17, 7, pp. 1485-1493, (2006)
  • [9] Li P., Wang X.L., Liu Y.C., Wang B.X., A Classification Method for Imbalance Data Set Based on Hybrid Strategy, Acta Electronica Sinica, 35, 11, pp. 2161-2165, (2007)
  • [10] Liu X.Y., Wu J.X., Zhou Z.H., A Cascade-based Classification Method for Class-imbalanced Data, Journal of Nanjing University: Natural Sciences, 42, 2, pp. 148-155, (2006)