Feature Selection Method Based on Crossed Centroid for Text Categorization

被引:0
作者
Yang, Jieming [1 ]
Liu, Zhiying [1 ]
Qu, Zhaoyang [1 ]
Wang, Jing [1 ]
机构
[1] Northeast Dianli Univ, Sch Informat Engn, Jilin, Jilin, Peoples R China
来源
2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD) | 2014年
关键词
feature selection; text categorization; across centroid; high dimension; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The most important characteristic of text categorization is the high dimensionality even for the moderate size dataset. Feature selection, which can reduce the size of the dimensionality without sacrificing the performance of the categorization and avoid over-fitting, is a commonly used approach in dimensionality reduction. In this paper, we proposed a new feature selection, which evaluates the deviation from the centroid based on both inter-category and intra-category. We compared the proposed method with four well-known feature selection algorithms using support vector machines on three benchmark datasets (20-newgroups, reuters-21578 and webkb). The experimental results show that the proposed method can significantly improve the performance of the classifier.
引用
收藏
页码:11 / 15
页数:5
相关论文
共 50 条
  • [31] A new Centroid-Based Classification model for text categorization
    Liu, Chuan
    Wang, Wenyong
    Tu, Guanghui
    Xiang, Yu
    Wang, Siyang
    Lv, Fengmao
    KNOWLEDGE-BASED SYSTEMS, 2017, 136 : 15 - 26
  • [32] AN EFFICIENT FEATURE SELECTION METHOD USING NAMED ENTITY RECOGNITION FOR CHINESE TEXT CATEGORIZATION
    Liu, Bin
    Li, Chunping
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3527 - +
  • [33] Feature Selection based on Supervised Topic Modeling for Boosting-Based Multi-Label Text Categorization
    Al-Salemi, Bassam
    Ayob, Masri
    Noah, Shahrul Azman Mohd
    Ab Aziz, Mohd Juzaiddin
    PROCEEDINGS OF THE 2017 6TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI'17), 2017,
  • [34] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [35] GU metric - A new feature selection algorithm for text categorization
    Uchyigit, Gulden
    Clark, Keith
    ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2007, : 399 - 402
  • [36] An extended document frequency metric for feature selection in text categorization
    Xu, Yan
    Wang, Bin
    Li, JinTao
    Jing, Hongfang
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 71 - +
  • [37] COMPARATIVE STUDY OF FEATURE SELECTION APPROACHES FOR URDU TEXT CATEGORIZATION
    Zia, Tehseen
    Akhter, Muhammad Pervez
    Abbas, Qaiser
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2015, 28 (02) : 93 - 109
  • [38] Study and Analyze on Feature Selection in Text Categorization for Engineering Domain
    Wu Junyun
    EMERGING MATERIALS AND MECHANICS APPLICATIONS, 2012, 487 : 383 - 386
  • [39] Toward Optimal Feature Selection in Naive Bayes for Text Categorization
    Tang, Bo
    Kay, Steven
    He, Haibo
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (09) : 2508 - 2521
  • [40] Introducing a family of linear measures for feature selection in text categorization
    Combarro, EF
    Montañés, E
    Díaz, I
    Ranilla, J
    Mones, R
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (09) : 1223 - 1232