Category Discrimination Based Feature Selection Algorithm in Chinese Text Classification

被引:1
作者
Yi, Junkai [1 ]
Yang, Guang [1 ]
Wan, Jing [1 ]
机构
[1] Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China
关键词
text classification; text categorization; feature selection; tj-idf; category discrimination; CATEGORIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How to improve the classification precision is a major issue in the field of Chinese text classification. The tf-idf algorithm is a classic and widely-used feature selection algorithm based on VSM. But the traditional tf-idf algorithm neglects the feature term's distribution inside category and among categories, which causes many unreasonable selective results. This paper makes an improvement to the traditional tf-idf algorithm through the introduction of the concept of Category Discrimination. We evaluate our algorithm with experiments, and make comparisons with other algorithms. The experimental results show that the improved tf-idf algorithm consistently has a higher precision and recall compared with the traditional tf-idf algorithm, and is superior to other algorithm as a whole. Therefore, it is a more effective feature selection algorithm in text classification field.
引用
收藏
页码:1145 / 1159
页数:15
相关论文
共 50 条
[41]   Ensemble Learning Based Feature Selection with an Application to Text Classification [J].
Onan, Aytug .
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[42]   Improved Gini-Index Algorithm to Correct Feature-Selection Bias in Text Classification [J].
Park, Heum ;
Kwon, Hyuk-Chul .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (04) :855-865
[43]   Text feature selection for sentiment classification of Chinese online reviews [J].
Wang, Hongwei ;
Yin, Pei ;
Yao, Jiani ;
Liu, James N. K. .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2013, 25 (04) :425-439
[44]   Dynamic Feature Selection Strategy in Incremental Chinese Text Classification [J].
Yang, Dan ;
Fan, Xinghua .
2012 2ND INTERNATIONAL CONFERENCE ON APPLIED ROBOTICS FOR THE POWER INDUSTRY (CARPI), 2012, :1123-1126
[45]   A Review on Feature Selection and Feature Extraction for Text Classification [J].
Shah, Foram P. ;
Patel, Vibha .
PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, :2264-2268
[46]   A Text Classification Algorithm based on Feature Weighting [J].
Yang, Han ;
Cui, Honggang ;
Tang, Hao .
GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
[47]   Research on the Feature Selection Algorithm of Chinese News Classification [J].
Gong, Jun-peng ;
Wen, Yu-jun ;
Song, Qing .
INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND MATHEMATICAL STATISTICS (SMMS 2015), 2015, :455-458
[48]   Distributed Text Feature Selection Based On Bat Algorithm Optimization [J].
Chen, Hongwei ;
Hou, Qiao ;
Han, Lin ;
Hu, Thou ;
Ye, Zhiwei ;
Zeng, Jun ;
Yuan, Jiansen .
PROCEEDINGS OF THE 2019 10TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS - TECHNOLOGY AND APPLICATIONS (IDAACS), VOL. 1, 2019, :75-80
[49]   A Text Feature Selection Method Based on the Small World Algorithm [J].
Lu, Yonghe ;
Chen, Yongshan .
ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY, 2017, 107 :276-284
[50]   Feature selection using hybrid poor and rich optimization algorithm for text classification [J].
Thirumoorthy, K. ;
Muneeswaran, K. .
PATTERN RECOGNITION LETTERS, 2021, 147 :63-70