Improved feature selection approach TFIDF in text mining

被引:0
作者
Jing, LP [1 ]
Huang, HK [1 ]
Shi, HB [1 ]
机构
[1] No Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
来源
2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS | 2002年
关键词
text mining; TFIDF; evaluation function; VSM; feature selection;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes one Feature Selection method (TFIDF). With it, we process the data resource and set up the VSM model in order to provide a convenient data structure for text categorization. We calculate the precision of this method with the help of categorization results. According to the empirical results, we analyze its advantages and disadvantages and present a new TFIDF-based feature selection approach to improve its accuracy.
引用
收藏
页码:944 / 946
页数:3
相关论文
共 6 条
[1]  
[Anonymous], COMP STUDY FEATURE S
[2]  
[Anonymous], 1998, Machine Learning on non-homogenous, distributed text data
[3]  
Joachims T., 1996, P 14 INT C MACH LEAR, P143, DOI DOI 10.1016/J.ESWA.2016.09.009
[4]  
JOHN GH, 1998, IRRELEVANTFEATURES S
[5]   VECTOR-SPACE MODEL FOR AUTOMATIC INDEXING [J].
SALTON, G ;
WONG, A ;
YANG, CS .
COMMUNICATIONS OF THE ACM, 1975, 18 (11) :613-620
[6]  
YANG Y, 1997, CMUCS97127