Robust multiclass classification for learning from imbalanced biomedical data

被引:0
作者
Phoungphol, Piyaphol [1 ]
Zhang, Yanqing [1 ]
Zhao, Yichuan [2 ]
机构
[1] Department of Computer Science, Georgia State University, Atlanta
[2] Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30302-3994, USA
关键词
Biomedical data; Imbalanced data; Multiclass classification; Ramp-loss; Support Vector Machine (SVM);
D O I
10.1109/TST.2012.6374363
中图分类号
学科分类号
摘要
Imbalanced data is a common and serious problem in many biomedical classification tasks. It causes a bias on the training of classifiers and results in lower accuracy of minority classes prediction. This problem has attracted a lot of research interests in the past decade. Unfortunately, most research efforts only concentrate on 2-class problems. In this paper, we study a new method of formulating a multiclass Support Vector Machine (SVM) problem for imbalanced biomedical data to improve the classification performance. The proposed method applies cost-sensitive approach and ramp loss function to the Crammer and Singer multiclass SVM formulation. Experimental results on multiple biomedical datasets show that the proposed solution can effectively cure the problem when the datasets are noisy and highly imbalanced. © 1996-2012 Tsinghua University Press.
引用
收藏
页码:619 / 628
页数:9
相关论文
共 30 条
[1]  
Chawla N.V., Japkowicz N., Editorial: Special issue on learning from imbalanced datasets, SIGKDD Explorations, 6, pp. 1-6, (2004)
[2]  
Weiss G.M., Mining with rarity: A unifying framework, SIGKDD Explor Newsl, 6, pp. 7-19, (2004)
[3]  
Japkowicz N., Holte R., Workshop report: AAAI-2000 workshop on learning from imbalanced data sets, AI Magazine, 22, pp. 127-136, (2001)
[4]  
Crammer K., Singer Y., Cristianini N., Shawe-Taylor J., Williamson B., On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, 2, (2001)
[5]  
Wasikowski M., Chen X.W., Combating the small sample class imbalance problem using feature selection, Knowledge and Data Engineering IEEE Transactions on, 22, pp. 1388-1400, (2010)
[6]  
Chen X.-W., Gerlach B., Casasent D., Pruning support vectors for imbalanced data classification, Proceedings of the International Joint Conference on Neural Networks, 3, pp. 1883-1888, (2005)
[7]  
Tang Y., Zhang Y.Q., Chawla N., Krasser S., SVMs modeling for highly imbalanced classification, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 39, pp. 281-288, (2009)
[8]  
Elkan C., Magical thinking in data mining: Lessons from coil challenge 2000, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001, pp. 426-431
[9]  
Zhou Z.H., Liu X.Y., On multi-class cost-sensitive learning, Computational Intelligence, 26, pp. 232-257, (2010)
[10]  
Bach F.R., Heckerman D., Horvitz E., Considering cost asymmetry in learning classifiers, J. Mach. Learn. Res, 7, pp. 1713-1741, (2006)