An improved term weighting scheme for text classification

被引:13
|
作者
Tang, Zhong [1 ]
Li, Wenqiang [1 ]
Li, Yan [1 ]
机构
[1] Sichuan Univ, Sch Mech Engn, Sichuan Prov Key Lab Innovat Methodol & Creat Des, Chengdu, Sichuan, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
feature selection; term weighting; text classification; text representation; TF-IEF; FEATURE-SELECTION METHOD; REPRESENTATION; IMPACT;
D O I
10.1002/cpe.5604
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text representation is a necessary and primary procedure in performing text classification (TC), which first needs to be obtained through an information-rich term weighting scheme to achieve higher TC performance. So far, term frequency-inverse document frequency (TF-IDF) is the most widely used term weighting scheme, but it suffers from two deficiencies. First, the global weighting factors IDF in TF-IDF approaches infinity if a certain term does not occur in a text. Second, the IDF is equal to zero if a certain term appears in any text. To offset these drawbacks, we first conduct an in-depth analysis of the current term weighting schemes, and subsequently, an improved term weighting scheme called term frequency-inverse exponential frequency (TF-IEF) and its various variants are proposed. The proposed method replaces IDF with the new global weighting factor IEF to characterize the global weighting factor log-like IDF in the corpus, which can greatly reduce the effect of feature (term) with high local weighting factor TF in term weighting. As a result, a more representative feature can be generated. We carried out a series of experiments on two commonly used data sets (corpora) utilizing Naive Bayes and support vector machine classifiers to validate the performance of our proposed schemes. Experimental results explicitly reveal that the proposed term weighting schemes come with better performance than the compared schemes.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Several alternative term weighting methods for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    Zhao, Wu
    Li, Song
    KNOWLEDGE-BASED SYSTEMS, 2020, 207
  • [42] Text classification using scores based k-NN approach and term to category relevance weighting scheme
    Ben Afia, Ahmed
    Amiri, Hamid
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 283 - 290
  • [43] A Supervised Term Weighting Scheme for Multi-class Text Categorization
    Gu, Yiwei
    Gu, Xiaodong
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 436 - 447
  • [44] A Novel scheme for Term weighting in Text Categorization : Positive Impact factor
    Emmanuel, M.
    Khatri, Saurabh M.
    Babu, Ramesh D. R.
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2292 - 2297
  • [45] A new term-weighting scheme for naive Bayes text categorization
    Mendoza, Marcelo
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2012, 8 (01) : 55 - +
  • [46] Hadoop MapReduce Implementation of A Novel scheme for Term weighting in Text Categorization
    Dalavi, Manesh
    Cheke, Shailesh
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 994 - 999
  • [47] A Term Weighting Scheme Based on the Measure of Relevance and Distinction for Text Categorization
    Yang, Jieming
    Wang, Jing
    Liu, Zhiying
    Qu, Zhaoyang
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 63 - 68
  • [48] An improved global feature selection scheme for text classification
    Uysal, Alper Kursat
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 43 : 82 - 92
  • [49] Model-induced term-weighting schemes for text classification
    Kim, Hyun Kyung
    Kim, Minyoung
    APPLIED INTELLIGENCE, 2016, 45 (01) : 30 - 43
  • [50] Grammatical Dependency-Based Relations for Term Weighting in Text Classification
    Dat Huynh
    Dat Tran
    Ma, Wanli
    Sharma, Dharmendra
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 476 - 487