Improvement of Text Feature Selection Method based on TFIDF
被引:17
作者:
Qu, Shouning
论文数: 0引用数: 0
h-index: 0
机构:
Univ Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R ChinaUniv Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R China
Qu, Shouning
[1
]
Wang, Sujuan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R ChinaUniv Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R China
Wang, Sujuan
[1
]
Zou, Yan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R ChinaUniv Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R China
Zou, Yan
[1
]
机构:
[1] Univ Jinan, Sch Informat Sci & Engn, Jinan 250022, Shandong, Peoples R China
来源:
2008 INTERNATIONAL SEMINAR ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, PROCEEDINGS
|
2008年
关键词:
D O I:
10.1109/FITME.2008.25
中图分类号:
F [经济];
学科分类号:
02 ;
摘要:
TFIDF is a kind of common methods used to select the text feature, but it has many disadvantages. First, the method undervalues that this term can represent the characteristic of the documents of this class if it only frequently appears in the documents belongs to the same class while infrequently in the documents of the other class. Second TFIDF neglects the relations between the feature and the class. The paper proposed the improved TFIDF strategy, and combined with the text classification method of simple distance vector to compare to traditional TFIDF, and obtained the very good classified effect, the experiment proved its feasibility.