A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data

被引:21
作者
He, Yuanyu [1 ]
Zhou, Junhai [1 ]
Lin, Yaping [1 ]
Zhu, Tuanfei [1 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Changsha, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Microarray gene expression data; Relief; Feature selection; Imbalanced data classification;
D O I
10.1016/j.compbiolchem.2019.03.017
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA microarray data has been widely used in cancer research due to the significant advantage helped to successfully distinguish between tumor classes. However, typical gene expression data usually presents a high-dimensional imbalanced characteristic, which poses severe challenge for traditional machine learning methods to construct a robust classifier performing well on both the minority and majority classes. As one of the most successful feature weighting techniques, Relief is considered to particularly suit to handle high-dimensional problems. Unfortunately, almost all relief-based methods have not taken the class imbalance distribution into account. This study identifies that existing Relief-based algorithms may underestimate the features with the discernibility ability of minority classes, and ignore the distribution characteristic of minority class samples. As a result, an additional bias towards being classified into the majority classes can be introduced. To this end, a new method, named imRelief, is proposed for efficiently handling high-dimensional imbalanced gene expression data. imRelief can correct the bias towards to the majority classes, and consider the scattered distributional characteristic of minority class samples in the process of estimating feature weights. This way, imRelief has the ability to reward the features which perform well at separating the minority classes from other classes. Experiments on four microarray gene expression data sets demonstrate the effectiveness of imRelief in both feature weighting and feature subset selection applications.
引用
收藏
页码:121 / 127
页数:7
相关论文
共 23 条
  • [1] [Anonymous], 2012, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI [DOI 10.1145/1401890.1401910, 10.1145/1401890.1401910]
  • [2] [Anonymous], 1994, Proceedings of the AAAI Fall Symposium on Relevance
  • [3] MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
    Barua, Sukarna
    Islam, Md. Monirul
    Yao, Xin
    Murase, Kazuyuki
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) : 405 - 425
  • [4] Class prediction for high-dimensional class-imbalanced data
    Blagus, Rok
    Lusa, Lara
    [J]. BMC BIOINFORMATICS, 2010, 11 : 523
  • [5] Dong YJ, 2011, LECT NOTES ARTIF INT, V7091, P343, DOI 10.1007/978-3-642-25975-3_30
  • [6] Comparison of discrimination methods for the classification of tumors using gene expression data
    Dudoit, S
    Fridlyand, J
    Speed, TP
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) : 77 - 87
  • [7] Dudoit S., 2011, 35 STAPP CAR CRASH C, P2671
  • [8] Fengzhang Luo, 2015, 2015 IEEE Power & Energy Society General Meeting, P1, DOI 10.1109/PESGM.2015.7285736
  • [9] Gentleman R, 2005, STAT BIOL HEALTH, P189
  • [10] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284