Model-induced term-weighting schemes for text classification

被引:14
|
作者
Kim, Hyun Kyung [1 ]
Kim, Minyoung [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Elect & IT Media Engn, Seoul 139743, South Korea
基金
新加坡国家研究基金会;
关键词
Document/text classification; Feature/term weighting; Feature selection; Supervised learning; SENTIMENT ANALYSIS; CATEGORIZATION;
D O I
10.1007/s10489-015-0745-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The bag-of-words representation of text data is very popular for document classification. In the recent literature, it has been shown that properly weighting the term feature vector can improve the classification performance significantly beyond the original term-frequency based features. In this paper we demystify the success of the recent term-weighting strategies as well as provide possibly more reasonable modifications. We then propose novel term-weighting schemes that can be induced from the well-known document probabilistic models such as the Naive Bayes and the multinomial term model. Interestingly, some of the intuition-based term-weighting schemes coincide exactly with the proposed derivations. Our term-weighting schemes are tested on large-scale text classification problems/datasets where we demonstrate improved prediction performance over existing approaches.
引用
收藏
页码:30 / 43
页数:14
相关论文
共 50 条
  • [21] A Study of Applying Different Term Weighting Schemes on Arabic Text Classification
    Guru, D. S.
    Ali, Mostafa
    Suhil, Mahamad
    Hazman, Maryam
    DATA ANALYTICS AND LEARNING, 2019, 43 : 293 - 305
  • [22] INVESTIGATING TERM WEIGHTING SCHEMES ON THE CLASSIFICATION PERFORMANCE FOR THE IMBALANCED TEXT DATA
    Al Manei, Afra
    Al Hasani, Iman
    Wesonga, Ronald
    ADVANCES AND APPLICATIONS IN STATISTICS, 2022, 78 : 63 - 82
  • [23] Evolved term-weighting schemes in Information Retrieval: an analysis of the solution space
    Ronan Cummins
    Colm O’Riordan
    Artificial Intelligence Review, 2006, 26 : 35 - 47
  • [24] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
    Cummins, Ronan
    O'Riordan, Colm
    ARTIFICIAL INTELLIGENCE REVIEW, 2007, 28 (01) : 51 - 68
  • [25] Evolving general term-weighting schemes for information retrieval: Tests on larger collections
    Cummins, R
    O'riordan, C
    ARTIFICIAL INTELLIGENCE REVIEW, 2005, 24 (3-4) : 277 - 299
  • [26] Evolving General Term-Weighting Schemes for Information Retrieval: Tests on Larger Collections
    Ronan Cummins
    Colm O’riordan
    Artificial Intelligence Review, 2005, 24 : 277 - 299
  • [27] An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions
    Ronan Cummins
    Colm O’Riordan
    Artificial Intelligence Review, 2007, 28 : 51 - 68
  • [28] A probabilistic model derived term weighting scheme for text classification
    Feng, Guozhong
    Li, Shaoting
    Sun, Tieli
    Zhang, Bangzuo
    PATTERN RECOGNITION LETTERS, 2018, 110 : 23 - 29
  • [29] Comparative study of term-weighting schemes for environmental big data using machine learning
    Kim, JungJin
    Kim, Han-Ul
    Adamowski, Jan
    Hatami, Shadi
    Jeong, Hanseok
    ENVIRONMENTAL MODELLING & SOFTWARE, 2022, 157
  • [30] Comparison of term weighting schemes for document classification
    Jeong, Ho Young
    Shin, Sang Min
    Choi, Yong-Seok
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (02) : 265 - 276