On acquiring classification knowledge from noisy data based on rough set

被引:26
作者
Wang, FH [1 ]
机构
[1] Ming Chuan Univ, Dept Comp Sci & Informat Engn, Taoyuan 333, Taiwan
关键词
classification; rough set; noisy information system; lower approximation; information granule; randomization analysis;
D O I
10.1016/j.eswa.2005.01.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Induction of classification rules based on rough set theory has been an active research area in the field of machine learning. However, pure rough set theory is not well suited for analyzing noisy information systems. This paper adopts a generalization of rough set model based on fuzzy lower approximation with respect to information granules. Based on the fuzzy lower approximation, a concept of tolerant approximation is introduced to deal with the problem of discovering effective rules from noisy data. An efficient rule induction algorithm based on the tolerant lower approximation is proposed, and two heuristics are investigated to study their inductive effectiveness. Empirical experiments are conducted on five real-life data sets, acknowledged in the machine learning community, using the algorithms. The Tree classification algorithm from the IBM Intelligent Miner is also investigated as a comparison basis. Effectiveness measurements include the prediction accuracy, cost ratio and the rule validation rate based on randomization analysis. The empirical evidences show that the proposed algorithm is effective in dealing with rule induction in noisy environments. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:49 / 64
页数:16
相关论文
共 26 条
[1]  
[Anonymous], ROUGH SETS KNOWLEDGE
[2]  
Bell DA, 1998, J AM SOC INFORM SCI, V49, P403, DOI 10.1002/(SICI)1097-4571(19980415)49:5<403::AID-ASI3>3.0.CO
[3]  
2-8
[4]  
Blake C.L., 1998, UCI repository of machine learning databases
[5]  
DEVORE J, 1999, APPL STAT ENG SCI, P315
[6]   Statistical evaluation of rough set dependency analysis [J].
Duntsch, I ;
Gediga, G .
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 1997, 46 (05) :589-604
[7]  
Feng-Hsu Wang, 2001, Advances in Knowledge Discovery and Data Mining. 5th Pacific-Asia Conference, PAKDD 2001. Proceedings (Lecture Notes in Artificial Intelligence Vol.2035), P161
[8]  
Komorowski J., 1999, ROUGH SETS TUTORIAL, P3
[9]   Rough set approach to incomplete information systems [J].
Kryszkiewicz, M .
INFORMATION SCIENCES, 1998, 112 (1-4) :39-49
[10]  
Liang JY, 2000, PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, P2526, DOI 10.1109/WCICA.2000.862501