A rough set-based case-based reasoner for text categorization

被引:34
作者
Li, Y
Shiu, SCK [1 ]
Pal, SK
Liu, JNK
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700035, W Bengal, India
关键词
text categorization (TC); case-based reasoning (CBR); rough set; case coverage; case reachability;
D O I
10.1016/j.ijar.2005.06.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel rough set-based case-based reasoner For Use in text categorization (TC). The reasoner has four main components: feature term extractor, document representor, case selector, and case retriever. It operates by first reducing the number of feature terms in the documents Using the rough set technique. Then, the number of documents is reduced using a new document selection approach based on the case-based reasoning (CBR) concepts of coverage and reachability. As a result, both the number of feature terms and documents are reduced with only minimal loss of information. Finally, this smaller set of documents with fewer feature terms is Used in TC. The proposed rough set-based case-based reasoner wits tested on the Reuters21578 text datasets. The experimental results demonstrate its effectiveness and efficiency as it significantly reduced feature terms and documents, important for improving the efficiency of TC, while preserving and even improving classification accuracy. (C) 2005 Elsevier Inc. All rights reserved.
引用
收藏
页码:229 / 255
页数:27
相关论文
共 34 条
  • [1] [Anonymous], 2002, PROC 8 INT C KNOWL D
  • [2] [Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
  • [3] [Anonymous], MACHINE LEARNING
  • [4] [Anonymous], REUTERS 21578 TEXT C
  • [5] BAO Y, 2001, P 2 INT C WEB INF SY, V1, P254
  • [6] Selection of relevant features and examples in machine learning
    Blum, AL
    Langley, P
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 245 - 271
  • [7] A fuzzy-rough approach for the maintenance of distributed case-based reasoning systems
    Cao, G
    Shiu, SCK
    Wang, X
    [J]. SOFT COMPUTING, 2003, 7 (08) : 491 - 499
  • [8] A rough set approach to attribute generalization in data mining
    Chan, CC
    [J]. INFORMATION SCIENCES, 1998, 107 (1-4) : 169 - 176
  • [9] Rough set-based hybrid fuzzy-neural controller design for industrial wastewater treatment
    Chen, WC
    Chang, NB
    Chen, JC
    [J]. WATER RESEARCH, 2003, 37 (01) : 95 - 107
  • [10] Application of rough sets analysis to identify polluted aquatic sites based on a battery of biomarkers:: a comparison with classical methods
    Chèvre, N
    Gagné, F
    Gagnon, P
    Blaise, C
    [J]. CHEMOSPHERE, 2003, 51 (01) : 13 - 23