Evolving text classification rules with genetic programming

被引:5
作者
Hirsch, L [1 ]
Saeedi, M
Hirsch, R
机构
[1] Royal Holloway Univ London, Sch Management, Egham TW20 0EX, Surrey, England
[2] UCL, Dept Comp Sci, London, England
关键词
D O I
10.1080/08839510590967307
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams ( character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications.
引用
收藏
页码:659 / 676
页数:18
相关论文
共 30 条
[1]  
AHONENMYKA H, 1999, P 16 INT C MACH LEAR
[2]  
ANTHONY N, 2003, CDAMLSE200309
[3]   AUTOMATED LEARNING OF DECISION RULES FOR TEXT CATEGORIZATION [J].
APTE, C ;
DAMERAU, F ;
WEISS, SM .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1994, 12 (03) :233-251
[4]   Enlarging the margins in perceptron decision trees [J].
Bennett, KP ;
Cristianini, N ;
Shawe-Taylor, J ;
Wu, DH .
MACHINE LEARNING, 2000, 41 (03) :295-313
[5]  
BERGSTROM A, 2000, P 2000 INT C INT US, P29
[6]  
BERLEANT D, 2000, 1000A IOW STAT U DEP
[7]  
Biskri I, 2002, 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS, P110
[8]  
Cavnar W. B., 1994, P 3 ANN S DOC AN INF
[9]  
CLACK C, 1997, P 1 INT C AUT AG MAR
[10]   GAUGING SIMILARITY WITH N-GRAMS - LANGUAGE-INDEPENDENT CATEGORIZATION OF TEXT [J].
DAMASHEK, M .
SCIENCE, 1995, 267 (5199) :843-848