Mining and integrating reliable decision rules for imbalanced cancer gene expression data sets

被引:0
作者
Yu, Hualong [1 ]
Ni, Jun [2 ]
Dan, Yuanyuan [3 ]
Xu, Sen [4 ]
机构
[1] School of Computer Science and Engineering, Jiangsu University of Science and Technology
[2] Department of Radiology, Carver College of Medicine, University of Iowa, Iowa City
[3] School of Biology and Chemical Engineering, Jiangsu University of Science and Technology
[4] School of Information Engineering, Yancheng Institute of Technology
关键词
cancer gene expression data; class imbalance; decision rule; ensemble learning; majority voting; paired differential expression genes;
D O I
10.1109/TST.2012.6374368
中图分类号
学科分类号
摘要
There have been many skewed cancer gene expression datasets in the post-genomic era. Extraction of differential expression genes or construction of decision rules using these skewed datasets by traditional algorithms will seriously underestimate the performance of the minority class, leading to inaccurate diagnosis in clinical trails. This paper presents a skewed gene selection algorithm that introduces a weighted metric into the gene selection procedure. The extracted genes are paired as decision rules to distinguish both classes, with these decision rules then integrated into an ensemble learning framework by majority voting to recognize test examples; thus avoiding tedious data normalization and classifier construction. The mining and integrating of a few reliable decision rules gave higher or at least comparable classification performance than many traditional class imbalance learning algorithms on four benchmark imbalanced cancer gene expression datasets. © 1996-2012 Tsinghua University Press.
引用
收藏
页码:666 / 673
页数:7
相关论文
共 30 条
[1]  
Golub T.R., Slonim D.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., Coller H., Loh M.L., Downing J.R., Caligiuri M.A., Bloomfield C.D., Lander E.S., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science, 286, 5439, pp. 531-537, (1999)
[2]  
Alon U., Barkai N., Notterman D.A., Gish K., Ybarra S., MacK D., Levine A.J., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by Oligonucleotide array, PNAS, 96, 12, pp. 6745-6750, (1999)
[3]  
Yu H., Gu G., Liu H., Shen J., Zhao J., A modified ant colony optimization algorithm for tumor marker gene selection, Genomics Proteomics and Bioinformatics, 7, 4, pp. 200-208, (2009)
[4]  
Yu H., Gao S., Qin B., Zhao J., Multiclass microarray data classification based on confidence evaluation, Genetics and Molecular Research, 11, 2, pp. 1357-1369, (2012)
[5]  
Tang Y., Zhang Y., Huang Z., FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data, In: Proceedings of the 14th International Conference on Fuzzy Systems, pp. 97-101, (2005)
[6]  
Kim K.J., Cho S.B., An evolutionary algorithm approach to optimal ensemble classifiers for DNA microarray data analysis, IEEE Transactions on Evolutionary Computation, 12, 3, pp. 377-388, (2008)
[7]  
Chiang J.H., Ho S.H., A combination of rough-based feature selection and RBF neural network for classification using gene expression data, IEEE Transactions on Nanobioscience, 7, 1, pp. 91-99, (2008)
[8]  
Blagus R., Lusa L., Class prediction for high-dimensional class-imbalanced data, BMC Bioinformatics, 11, (2010)
[9]  
Wasikowski M., Chen X., Combating the small sample class imbalance problem using feature selection, IEEE Transactions on Knowledge and Data Engineering, 22, 10, pp. 1388-1400, (2010)
[10]  
Khoshgoftaar T.M., Hulse J.V., Napolitano A., Comparing boosting and bagging techniques with noisy and imbalanced data, IEEE Transactions on System, Man and Cybernetics: Part B, 41, 3, pp. 552-568, (2011)