Rough set based hybrid algorithm for text classification

被引:47
|
作者
Miao, Duoqian [1 ]
Duan, Qiguo [1 ]
Zhang, Hongyu [1 ]
Jiao, Na [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Variable precision rough set (VPRS); k-nearest neighbor (kNN); Rocchio algorithm;
D O I
10.1016/j.eswa.2008.12.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic classification of text documents, one of essential techniques for Web mining, has always been a hot topic flue to the explosive growth of digital documents available on-line. In text classification community, k-nearest neighbor (kNN) is a simple and yet effective classifier. However, as being a lazy learning method Without premodelling, kNN has a high cost to classify new documents when training set is large. Rocchio algorithm is another well-known and widely used technique for text classification. One drawback of [tie Rocchio classifier is that it restricts the hypothesis space to the set of linear separable hyperplane regions. When the data does not fit its underlying assumption well, Rocchio classifier suffers. In this paper, a hybrid algorithm based on variable precision rough set is proposed to combine the strength of both kNN and Rocchio techniques and overcome their weaknesses. Art experimental evaluation of different methods is carried out oil two common text corpora, i.e., the Reuters-21578 collection and the 20-newsgroup collection. The experimental results indicate that the novel algorithm achieves significant performance improvement. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:9168 / 9174
页数:7
相关论文
共 50 条
  • [41] Fuzzy-rough set based nearest neighbor clustering classification algorithm
    Wang, XY
    Yang, J
    Teng, XL
    Peng, NS
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 370 - 373
  • [42] Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory
    Wang, Ying
    Liu, Xiaoyun
    Wang, Zhensong
    Chen, Wufan
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 4853 - 4857
  • [43] Rough and Fuzzy Set Based Classification Algorithm on Computer Practice Teaching Evaluation
    Wan, Hongxin
    Peng, Yun
    ADVANCES IN MECHATRONICS AND CONTROL ENGINEERING III, 2014, 678 : 43 - +
  • [44] Aircraft type prediction based on approximate rough set resolution classification algorithm
    Guo Q.
    Zhao J.
    Hangkong Dongli Xuebao/Journal of Aerospace Power, 2023, 38 (05): : 1250 - 1258
  • [45] Analysis of Decision Tree Mining Algorithm Based on Improved Rough Set Classification
    Wang, Lan
    Xu, Hongsheng
    PROCEEDINGS OF THE 2016 7TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND MEDICINE (EMCM 2016), 2017, 59 : 993 - 997
  • [46] Study on Web-page classification algorithm based on rough set theory
    Yin, Shiqun
    Wang, Fang
    Xie, Zhong
    Qiu, Yuhui
    2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 202 - 206
  • [47] An efficient rules induction algorithm for rough set classification
    Tan, S
    Gu, J
    DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 330 - 337
  • [48] An clustering algorithm based on rough set
    Xu, E.
    Gao Xuedong
    Sen, Wu
    Bin, Yu
    2006 3RD INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2006, : 466 - 469
  • [49] Rough set and rough classification based on imperfect information systems
    Wang, J
    Liang, JY
    PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 437 - 440
  • [50] A hybrid Algorithm for Text classification Based on CNN-BLSTM with Attention
    Fu, Lei
    Yin, ZhaoXia
    Wang, Xin
    Liu, Yi
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 31 - 34