Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines

被引：223

作者：

Maldonado, Sebastian ^{[1
]}

Weber, Richard ^{[2
]}

Famili, Fazel ^{[3
]}

机构：

[1] Univ Los Andes, Santiago, Chile

[2] Univ Chile, Dept Ind Engn, Santiago, Chile

[3] Natl Res Council Canada, Ottawa, ON, Canada

来源：

INFORMATION SCIENCES | 2014年 / 286卷

关键词：

Feature selection; Imbalanced data set; Dimensionality reduction; Support Vector Machine; Data mining; GENE SELECTION; CLASSIFICATION; CARCINOMAS; SURVIVAL;

D O I：

10.1016/j.ins.2014.07.015

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Feature selection and classification of imbalanced data sets are two of the most interesting machine learning challenges, attracting a growing attention from both, industry and academia. Feature selection addresses the dimensionality reduction problem by determining a subset of available features to build a good model for classification or prediction, while the class-imbalance problem arises when the class distribution is too skewed. Both issues have been independently studied in the literature, and a plethora of methods to address high dimensionality as well as class-imbalance has been proposed. The aim of this work is to simultaneously explore both issues, proposing a family of methods that select those attributes that are relevant for the identification of the target class in binary classification. We propose a backward elimination approach based on successive holdout steps, whose contribution measure is based on a balanced loss function obtained on an independent subset. Our experiments are based on six highly imbalanced microarray data sets, comparing our methods with well-known feature selection techniques, and obtaining a better prediction with consistently fewer relevant features. (C) 2014 Elsevier Inc. All rights reserved.

引用

页码：228 / 246

页数：19

共 50 条

[1] Online feature selection for high-dimensional class-imbalanced data
Zhou, Peng
Hu, Xuegang
Li, Peipei
Wu, Xindong
KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 187 - 199
[2] Class-imbalanced classifiers for high-dimensional data
Lin, Wei-Jiun
Chen, James J.
BRIEFINGS IN BIOINFORMATICS, 2013, 14 (01) : 13 - 26
[3] SMOTE for high-dimensional class-imbalanced data
Rok Blagus
Lara Lusa
BMC Bioinformatics, 14
[4] SMOTE for high-dimensional class-imbalanced data
Blagus, Rok
Lusa, Lara
BMC BIOINFORMATICS, 2013, 14
[5] Class prediction for high-dimensional class-imbalanced data
Blagus, Rok
Lusa, Lara
BMC BIOINFORMATICS, 2010, 11 : 523
[6] Class prediction for high-dimensional class-imbalanced data
Rok Blagus
Lara Lusa
BMC Bioinformatics, 11
[7] Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Neighborhood Rough Set
Chen X.
Lin Y.
Wang C.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (08): : 726 - 735
[8] Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data
Fu, Guang-Hui
Wu, Yuan-Jiao
Zong, Min-Jie
Pan, Jianxin
BMC BIOINFORMATICS, 2020, 21 (01)
[9] Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data
Guang-Hui Fu
Yuan-Jiao Wu
Min-Jie Zong
Jianxin Pan
BMC Bioinformatics, 21
[10] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
Maldonado, Sebastian
Lopez, Julio
APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105

← 1 2 3 4 5 →