A discretization algorithm based on Class-Attribute Contingency Coefficient

被引:140
作者
Tsai, Cheng-Jung [1 ]
Lee, Chien-I. [2 ]
Yang, Wei-Pang [3 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Natl Univ Tainan, Dept Informat & Learning Technol, Tainan, Taiwan
[3] Natl DongHwa Univ, Dept Informat Management, Hualien, Taiwan
关键词
data mining; classification; decision tree; discretization; Contingency Coefficient;
D O I
10.1016/j.ins.2007.09.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Discretization algorithms have played an important role in data mining and knowledge discovery. They not only produce a concise summarization of continuous attributes to help the experts understand the data more easily, but also make learning more accurate and faster. In this paper, we propose a static, global, incremental, supervised and top-down discretization algorithm based on Class-Attribute Contingency Coefficient. Empirical evaluation of seven discretization algorithms on 13 real datasets and four artificial datasets showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification. As to the execution time of discretization, the number of generated rules, and the training time of C5.0, our approach also achieved promising results. (c) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:714 / 731
页数:18
相关论文
共 38 条
[1]  
[Anonymous], 1992, The Tenth National Conference on Artificial Intelligence
[2]  
[Anonymous], DATA MINING TOOLS
[3]  
[Anonymous], 1993, P 13 INT JOINT C ART
[4]   Building multi-way decision trees with numerical attributes [J].
Berzal, F ;
Cubero, JC ;
Marín, N ;
Sánchez, D .
INFORMATION SCIENCES, 2004, 165 (1-2) :73-90
[5]  
BLAKE C, 1998, UCI REPOSITORY MACH
[6]   Khiops: A statistical discretization method of continuous attributes [J].
Boulle, M .
MACHINE LEARNING, 2004, 55 (01) :53-69
[7]   Newspaper demand prediction and replacement model based on fuzzy clustering and rules [J].
Cardoso, G. ;
Gomide, F. .
INFORMATION SCIENCES, 2007, 177 (21) :4799-4809
[8]   CLASS-DEPENDENT DISCRETIZATION FOR INDUCTIVE LEARNING FROM CONTINUOUS AND MIXED-MODE DATA [J].
CHING, JY ;
WONG, AKC ;
CHAN, KCC .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (07) :641-651
[9]  
Chiu D. K. Y., 1991, Knowledge discovery in databases, P125
[10]   CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules [J].
Cios, KJ ;
Kurgan, LA .
INFORMATION SCIENCES, 2004, 163 (1-3) :37-83