Model-induced term-weighting schemes for text classification

被引:14
|
作者
Kim, Hyun Kyung [1 ]
Kim, Minyoung [1 ]
机构
[1] Seoul Natl Univ Sci & Technol, Dept Elect & IT Media Engn, Seoul 139743, South Korea
基金
新加坡国家研究基金会;
关键词
Document/text classification; Feature/term weighting; Feature selection; Supervised learning; SENTIMENT ANALYSIS; CATEGORIZATION;
D O I
10.1007/s10489-015-0745-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The bag-of-words representation of text data is very popular for document classification. In the recent literature, it has been shown that properly weighting the term feature vector can improve the classification performance significantly beyond the original term-frequency based features. In this paper we demystify the success of the recent term-weighting strategies as well as provide possibly more reasonable modifications. We then propose novel term-weighting schemes that can be induced from the well-known document probabilistic models such as the Naive Bayes and the multinomial term model. Interestingly, some of the intuition-based term-weighting schemes coincide exactly with the proposed derivations. Our term-weighting schemes are tested on large-scale text classification problems/datasets where we demonstrate improved prediction performance over existing approaches.
引用
收藏
页码:30 / 43
页数:14
相关论文
共 50 条
  • [1] Model-induced term-weighting schemes for text classification
    Hyun Kyung Kim
    Minyoung Kim
    Applied Intelligence, 2016, 45 : 30 - 43
  • [2] A generic multi-level framework for building term-weighting schemes in text classification
    Tang, Zhong
    COMPUTER JOURNAL, 2024, 67 (11): : 3042 - 3055
  • [3] Term-weighting learning via genetic programming for text classification
    Escalante, Hugo Jair
    García-Limón, Mauricio A.
    Morales-Reyes, Alicia
    Graff, Mario
    Montes-y-Gómez, Manuel
    Morales, Eduardo F.
    Martínez-Carranza, José
    Knowledge-Based Systems, 2015, 83 : 176 - 189
  • [4] Term-weighting learning via genetic programming for text classification
    Jair Escalante, Hugo
    Garcia-Limon, Mauricio A.
    Morales-Reyes, Alicia
    Graff, Mario
    Montes-y-Gomez, Manuel
    Morales, Eduardo F.
    Martinez-Carranza, Jose
    KNOWLEDGE-BASED SYSTEMS, 2015, 83 : 176 - 189
  • [5] A survey of term weighting schemes for text classification
    Alsaeedi, Abdullah
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2020, 12 (02) : 237 - 254
  • [6] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    Knowledge and Information Systems, 2016, 49 : 909 - 931
  • [7] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Haddoud, Mounia
    Mokhtari, Aicha
    Lecroq, Thierry
    Abdeddaim, Said
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) : 909 - 931
  • [8] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
    SALTON, G
    BUCKLEY, C
    INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
  • [9] A Novel Term-weighting Approach in Text Classification over Skewed Data Sets
    Sun, Tieli
    Zhang, Yujie
    Yang, Fengqin
    Yang, Xiquan
    Jiang, Yingjie
    Wang, Zibing
    Li, Kuiwu
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (03): : 621 - 633
  • [10] A Comparative Study on Term Weighting Schemes for Text Classification
    Mazyad, Ahmad
    Teytaud, Fabien
    Fonlupt, Cyril
    MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 100 - 108