Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification

被引:1
作者
Yi, Junkai [1 ]
Yang, Guang [1 ]
Wan, Jing [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
关键词
text classification; text categorization; feature selection; tj-idf; category discrimination; CATEGORIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field.
引用
收藏
页码:1145 / 1159
页数:15
相关论文
共 50 条
  • [21] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [22] A Text Feature Selection Algorithm Based on Improved TFIDF
    Chengcheng Yang
    Xingshi He
    PROCEEDINGS OF THE 2008 CHINESE CONFERENCE ON PATTERN RECOGNITION (CCPR 2008), 2008, : 416 - 419
  • [23] A Novel Feature Selection Method Based on Probability Latent Semantic Analysis for Chinese Text Classification
    Zhong Jiang
    Sun Qigan
    Li Xue
    Wen Luosheng
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (02): : 228 - 232
  • [24] Research on Feature Selection and kNN Classification Method in Chinese Text Classification
    Xiao Chao
    Wu Ping
    PROCEEDINGS OF THE 2015 4TH NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING ( NCEECE 2015), 2016, 47 : 956 - 962
  • [25] A Chinese Text Classier Based on Strong Class Feature Selection and Bayesian Algorithm
    Chen, Yanqiu
    Sun, Peili
    2019 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2019, : 540 - 543
  • [26] Text classification based on optimization feature selection methods: a review and future directions
    Osamah Mohammed Alyasiri
    Yu-N Cheah
    Hao Zhang
    Omar Mustafa Al-Janabi
    Ammar Kamal Abasi
    Multimedia Tools and Applications, 2025, 84 (15) : 14187 - 14233
  • [27] Hybrid feature selection for text classification
    Gunal, Serkan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [28] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [29] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [30] Importance Weighted Feature Selection Strategy for Text Classification
    Li, Baoli
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 344 - 347