Model-induced term-weighting schemes for text classification

被引:14
|
作者
Kim, Hyun Kyung [1 ]
Kim, Minyoung [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Elect & IT Media Engn, Seoul 139743, South Korea
基金
新加坡国家研究基金会;
关键词
Document/text classification; Feature/term weighting; Feature selection; Supervised learning; SENTIMENT ANALYSIS; CATEGORIZATION;
D O I
10.1007/s10489-015-0745-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The bag-of-words representation of text data is very popular for document classification. In the recent literature, it has been shown that properly weighting the term feature vector can improve the classification performance significantly beyond the original term-frequency based features. In this paper we demystify the success of the recent term-weighting strategies as well as provide possibly more reasonable modifications. We then propose novel term-weighting schemes that can be induced from the well-known document probabilistic models such as the Naive Bayes and the multinomial term model. Interestingly, some of the intuition-based term-weighting schemes coincide exactly with the proposed derivations. Our term-weighting schemes are tested on large-scale text classification problems/datasets where we demonstrate improved prediction performance over existing approaches.
引用
收藏
页码:30 / 43
页数:14
相关论文
共 50 条
  • [31] Analytical evaluation of term weighting schemes for text categorization
    Altincay, Hakan
    Erenel, Zafer
    PATTERN RECOGNITION LETTERS, 2010, 31 (11) : 1310 - 1323
  • [32] A comparative study on term weighting schemes for text categorization
    Lan, M
    Sung, SY
    Low, HB
    Tan, CL
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 546 - 551
  • [33] Adaptable Term Weighting Framework for Text Classification
    Huynh, Dat
    Dat Tran
    Ma, Wanli
    Sharma, Dharmendra
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 254 - 265
  • [34] Imbalanced text classification: A term weighting approach
    Liu, Ying
    Loh, Han Tong
    Sun, Aixin
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
  • [35] An improved method of term weighting for text classification
    Jiang, Hua
    Li, Ping
    Hu, Xin
    Wang, Shuyan
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 294 - 298
  • [36] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [37] Comparative Evaluation of Term-Weighting Methods for Automatic Summarization
    Orasan, Constantin
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2009, 16 (01) : 67 - 95
  • [38] An Effective Term Weighting Method Using Random Walk Model for Text Classification
    Islam, Md. Rafiqul
    Islam, Md. Rakibul
    2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 433 - 436
  • [39] On entropy-based term weighting schemes for text categorization
    Tao Wang
    Yi Cai
    Ho-fung Leung
    Raymond Y. K. Lau
    Haoran Xie
    Qing Li
    Knowledge and Information Systems, 2021, 63 : 2313 - 2346
  • [40] On entropy-based term weighting schemes for text categorization
    Wang, Tao
    Cai, Yi
    Leung, Ho-fung
    Lau, Raymond Y. K.
    Xie, Haoran
    Li, Qing
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (09) : 2313 - 2346