An Efficient Cost-Sensitive Feature Selection Using Chaos Genetic Algorithm for Class Imbalance Problem

被引:18
作者
Bian, Jing [1 ,2 ]
Peng, Xin-guang [1 ]
Wang, Ying [1 ]
Zhang, Hai [3 ]
机构
[1] Taiyuan Univ Technol, Coll Comp Sci & Technol, Yingze St 79, Taiyuan 030024, Peoples R China
[2] Shanxi Med Coll Continuing Educ, Ctr Informat & Network, Shuangtasi St 22, Taiyuan 030012, Peoples R China
[3] Shanxi Branch Agr Bank China, Technol & Prod Management, Nanneihuan St 33, Taiyuan 030024, Peoples R China
基金
美国国家科学基金会;
关键词
OPTIMIZATION; ACQUISITION; DEFECT; SMOTE;
D O I
10.1155/2016/8752181
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In the era of big data, feature selection is an essential process in machine learning. Although the class imbalance problem has recently attracted a great deal of attention, little effort has been undertaken to develop feature selection techniques. In addition, most applications involving feature selection focus on classification accuracy but not cost, although costs are important. To cope with imbalance problems, we developed a cost-sensitive feature selection algorithm that adds the cost-based evaluation function of a filter feature selection using a chaos genetic algorithm, referred to as CSFSG. The evaluation function considers both feature-acquiring costs (test costs) and misclassification costs in the field of network security, thereby weakening the influence of many instances from the majority of classes in large-scale datasets. The CSFSG algorithm reduces the total cost of feature selection and trades off both factors. The behavior of the CSFSG algorithm is tested on a large-scale dataset of network security, using two kinds of classifiers: C4.5 and k-nearest neighbor (KNN). The results of the experimental research show that the approach is efficient and able to effectively improve classification accuracy and to decrease classification time. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
引用
收藏
页数:9
相关论文
共 39 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]   An approach for classification of highly imbalanced data using weighting and undersampling [J].
Anand, Ashish ;
Pugalenthi, Ganesan ;
Fogel, Gary B. ;
Suganthan, P. N. .
AMINO ACIDS, 2010, 39 (05) :1385-1391
[3]  
[Anonymous], 2012, THESIS
[4]  
[Anonymous], P 13 INT C MACH LEAR
[5]  
[Anonymous], MATH PROBLEMS ENG
[6]  
[Anonymous], THESIS
[7]  
[Anonymous], P 7 INT WORKSH FUZZ
[8]   SMOTE for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2013, 14
[9]   A framework for cost-based feature selection [J].
Bolon-Canedo, V. ;
Porto-Diaz, I. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
PATTERN RECOGNITION, 2014, 47 (07) :2481-2489
[10]   Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection [J].
Chang, Yongjun ;
Kim, Namkug ;
Lee, Youngjoo ;
Lim, Jonghyuck ;
Seo, Joon Beom ;
Lee, Young Kyung .
COMPUTERS IN BIOLOGY AND MEDICINE, 2012, 42 (12) :1157-1164