A decision-tree-based symbolic rule induction system for text categorization

被引:47
作者
Johnson, DE [1 ]
Oles, FJ [1 ]
Zhang, T [1 ]
Goetz, T [1 ]
机构
[1] IBM Corp, Div Res, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
10.1147/sj.413.0428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a decision-tree-based symbolic rule induction system for categorizing text documents automatically. Our method for rule induction involves the novel combination of (1) a fast decision tree induction algorithm especially suited to text data and (2) a new method for converting a decision tree to a rule set that is simplified, but still logically equivalent to, the original tree. We report experimental results on the use of this system on some practical problems.
引用
收藏
页码:428 / 437
页数:10
相关论文
共 14 条
[1]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[2]  
Breiman L., 1984, BIOMETRICS, DOI DOI 10.2307/2530946
[3]  
Carlin B. P., 2001, BAYES EMPIRICAL BAYE
[4]  
DUMAIS S, 1988, P 7 ACM INT C INF KN, P148
[5]  
GEHRKE J, 2000, DATA MIN KNOWL DISC, V12, P127
[6]  
Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
[7]  
McCallum A., 1998, Workshop on Learning for Text Categorization, V752, P41, DOI DOI 10.1109/TSMC.1985.6313426
[8]  
Quinlan R, 1993, C4.5: Programs for Machine Learning
[9]   Maximizing text-mining performance [J].
Weiss, SM ;
Apte, C ;
Damerau, FJ ;
Johnson, DE ;
Oles, FJ ;
Goetz, T ;
Hampp, T .
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (04) :63-69
[10]   THE CONTEXT-TREE WEIGHTING METHOD - BASIC PROPERTIES [J].
WILLEMS, FMJ ;
SHTARKOV, YM ;
TJALKENS, TJ .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1995, 41 (03) :653-664