Application of an Improved CHI Feature Selection Algorithm

被引:9
作者
Cai, Liang-jing [1 ]
Lv, Shu [1 ]
Shi, Kai-bo [2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, Chengdu 611731, Sichuan, Peoples R China
[2] Chengdu Univ, Sch Elect Informat & Elect Engn, Chengdu 610106, Sichuan, Peoples R China
关键词
D O I
10.1155/2021/9963382
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Text classification is the critical content of machine learning, and it is widely applied in information filtering, sentimental analysis, and text review. It is very important to improve the accuracy of classification results, and this is also the main research purpose of researchers in this field in recent years. Feature selection plays an important role in text classification, which has the functions of eliminating irrelevant features, reducing dimensionality, and improving classification accuracy. So, this paper studies the CHI feature selection algorithm, and the main work and innovations are as follows: firstly, this paper analyzed the CHI algorithm's flaws, determined that the introduction of new parameters will be the improvement direction of the CHI algorithm, and thus proposed a new algorithm based on variance and coefficient of variation. Secondly, experiment to verify the effectiveness of the new algorithm. In terms of language, the experiment in this paper includes two text classification systems, which were Chinese and English. In terms of classifiers, two classifier algorithms were used, which included the KNN classifier and the Naive Bayes classifier. In terms of data types, two distribution types of data were used: balanced datasets and unbalanced datasets. Finally, experiment and result analysis. This paper has conducted 3 comparative experiments and analyzed the results of each experiment. The experimental results obtained are all significantly improved compared to the results before the improvement.
引用
收藏
页数:8
相关论文
共 14 条
[1]  
Bahassine Said, 2016, 11 INT C INTELLIGENT, P1, DOI DOI 10.1109/SITA.2016.7772289
[2]  
Fan C.J, 2016, COMPUTER MODERNIZATI, V11, P7
[3]   On machine learning methods for Chinese document categorization [J].
He, J ;
Tan, AH ;
Tan, CL .
APPLIED INTELLIGENCE, 2003, 18 (03) :311-322
[4]  
Li B, 2016, P 2016 INT C AS LANG, DOI [10.1109/IALP.2016.7876002, DOI 10.1109/IALP.2016.7876002]
[5]  
[刘海峰 Liu Haifeng], 2013, [计算机工程与应用, Computer Engineering and Application], V49, P110
[6]   Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification [J].
Oleynik, Michel ;
Kugic, Amila ;
Kasac, Zdenko ;
Kreuzthaler, Markus .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (11) :1247-1254
[7]  
Pei Yingbo, 2011, Computer Engineering and Applications, V47, P128, DOI 10.3778/j.issn.1002-8331.2011.04.035
[8]  
Qiu Yun-fei, 2012, Application Research of Computers, V29, P1304, DOI 10.3969/j.issn.1001-3695.2012.04.028
[9]   Ensemble Fuzzy Feature Selection Based on Relevancy, Redundancy, and Dependency Criteria [J].
Salem, Omar A. M. ;
Liu, Feng ;
Chen, Yi-Ping Phoebe ;
Chen, Xi .
ENTROPY, 2020, 22 (07)
[10]   On Two-Stage Feature Selection Methods for Text Classification [J].
Uysal, Alper Kursat .
IEEE ACCESS, 2018, 6 :43233-43251