Enhancement of DTP feature selection method for text categorization

被引:0
|
作者
Moyotl-Hernández, E [1 ]
Jiménez-Salazar, H [1 ]
机构
[1] Univ Autonoma Puebla, Fac Ciencias Comp, Puebla 72570, Mexico
来源
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING | 2005年 / 3406卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. We found that the distance to transition point (DTP) method omits commonly occurring terms, which are poor discriminators between documents, but which convey important information about a collection. Experimental results obtained on the Reuters-21578 collection with the k-NN classifier show that feature selection by DTP combined with common terms outperforms slightly simple document frequency.
引用
收藏
页码:719 / 722
页数:4
相关论文
共 50 条
  • [41] Measures of rule quality for feature selection in Text Categorization
    Montañés, E
    Fernández, J
    Díaz, I
    Combarro, EF
    Ranilla, J
    ADVANCES IN INTELLIGENT DATA ANALYSIS V, 2003, 2810 : 589 - 598
  • [42] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [43] Five new feature selection metrics in text categorization
    Song, Fengxi
    Zhang, David
    Xu, Yong
    Wang, Jizhong
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (06) : 1085 - 1101
  • [44] An extensive empirical study of feature selection for text categorization
    Qiu, Li-Qing
    Zhao, Ru-Yi
    Zhou, Gang
    Yi, Sheng-Wei
    7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 312 - 315
  • [45] Incorporating Game Theory in Feature Selection for Text Categorization
    Azam, Nouman
    Yao, JingTao
    ROUGH SETS, FUZZY SETS, DATA MINING AND GRANULAR COMPUTING, RSFDGRC 2011, 2011, 6743 : 215 - 222
  • [46] AN EFFICIENT FEATURE SELECTION METHOD USING NAMED ENTITY RECOGNITION FOR CHINESE TEXT CATEGORIZATION
    Liu, Bin
    Li, Chunping
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3527 - +
  • [47] Introducing a family of linear measures for feature selection in text categorization
    Combarro, EF
    Montañés, E
    Díaz, I
    Ranilla, J
    Mones, R
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1223 - 1232
  • [48] Feature selection with a measure of deviations from Poisson in text categorization
    Ogura, Hiroshi
    Amano, Hiromi
    Kondo, Masato
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6826 - 6832
  • [49] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [50] Exploring Feature Selection and Support Vector Machine in Text Categorization
    Abdul-Rahman, Shuzlina
    Mutalib, Sofianita
    Khanafi, Nur Amira
    Ali, Azliza Mohd
    2013 IEEE 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2013), 2013, : 1101 - 1104