Test-cost-sensitive attribute reduction

被引:282
作者
Min, Fan [1 ]
He, Huaping [2 ]
Qian, Yuhua [3 ]
Zhu, William [1 ]
机构
[1] Zhangzhou Normal Univ, Lab Granular Comp, Zhangzhou 363000, Peoples R China
[2] Sichuan Univ Sci & Engn, Sch Comp Sci, Zigong 643000, Peoples R China
[3] Minist Educ, Key Lab Computat Intelligence & Chinese Informat, Taiyuan 030006, Shanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Cost-sensitive learning; Attribute reduction; Test cost; Heuristic algorithm; DECISION SYSTEMS; ROUGH; INDUCTION; NETWORKS; SEARCH;
D O I
10.1016/j.ins.2011.07.010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In many data mining and machine learning applications, there are two objectives in the task of classification: one is decreasing the test cost, the other is improving the classification accuracy. Most existing research work focuses on the latter, with attribute reduction serving as an optional pre-processing stage to remove redundant attributes. In this paper, we point out that when tests must be undertaken in parallel, attribute reduction is mandatory in dealing with the former objective. With this in mind, we posit the minimal test cost reduct problem which constitutes a new, but more general, difficulty than the classical reduct problem. We also define three metrics to evaluate the performance of reduction algorithms from a statistical viewpoint. A framework for a heuristic algorithm is proposed to deal with the new problem: specifically, an information gain-based lambda-weighted reduction algorithm is designed, where weights are decided by test costs and a non-positive exponent lambda, which is the only parameter set by the user. The algorithm is tested with three representative test cost distributions on four UCI (University of California - Irvine) datasets. Experimental results show that there is a trade-off while setting lambda, and a competition approach can improve the quality of the result significantly. This study suggests potential application areas and new research trends concerning attribute reduction. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:4928 / 4942
页数:15
相关论文
共 53 条
[1]  
[Anonymous], P 16 INT C MACH LEAR
[2]  
[Anonymous], GRANULAR COMPUTING
[3]  
[Anonymous], 1992, DISCERNIBILITY MATRI, DOI DOI 10.1007/978-94-015-7975-9_21
[4]  
[Anonymous], ISMIS 94
[5]  
[Anonymous], INT WORKSH ROUGH SET
[6]  
[Anonymous], 2014, C4. 5: programs for machine learning
[7]  
[Anonymous], CONTROL DECISION
[8]  
[Anonymous], LNCS
[9]  
[Anonymous], CHINESE J COMPUTERS
[10]  
[Anonymous], KNOWL INF SYST