On mining instance-centric classification rules

被引:18
作者
Wang, Jianyong [1 ]
Karypis, George
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
data mining; classification rule; instance-centric; classifier;
D O I
10.1109/TKDE.2006.179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many studies have shown that rule-based classifiers perform well in classifying categorical and sparse high-dimensional databases. However, a fundamental limitation with many rule-based classifiers is that they find the rules by employing various heuristic methods to prune the search space and select the rules based on the sequential database covering paradigm. As a result, the final set of rules that they use may not be the globally best rules for some instances in the training database. To make matters worse, these algorithms fail to fully exploit some more effective search space pruning methods in order to scale to large databases. In this paper, we present a new classifier, HARMONY, which directly mines the final set of classification rules. HARMONY uses an instance-centric rule-generation approach and it can assure that, for each training instance, one of the highest-confidence rules covering this instance is included in the final rule set, which helps in improving the overall accuracy of the classifier. By introducing several novel search strategies and pruning methods into the rule discovery process, HARMONY also has high efficiency and good scalability. Our thorough performance study with some large text and categorical databases has shown that HARMONY outperforms many well-known classifiers in terms of both accuracy and computational efficiency and scales well with regard to the database size.
引用
收藏
页码:1497 / 1511
页数:15
相关论文
共 34 条
[1]  
AGARWAL R, 2001, J PARALLEL DISTRIBUT, V61
[2]  
AGRAWAL R, 1993, P ACM SIGMOD 93
[3]  
Agrawal R, 1994, P 20 INT C VER LARG, V1215, P487
[4]  
Ali K., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P115
[5]  
[Anonymous], P WORKSH FREQ IT MIN
[6]  
[Anonymous], 1999, P 5 ACM SIGKDD INT C, DOI DOI 10.1145/312129.312275
[7]  
ANTONI V, 2002, RECENT RES DEV PLASM, V2, P19
[8]  
Apte C., 1994, SIGIR '94. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, P23
[9]  
Bayardo R. J. Jr., 1997, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, P123
[10]  
BAYARDO RJ, 1999, P 5 INT C KNOWL DISC