Combining supervised term-weighting metrics for SVM text classification with extended term representation

被引:65
|
作者
Haddoud, Mounia [1 ,2 ]
Mokhtari, Aicha [1 ]
Lecroq, Thierry [2 ]
Abdeddaim, Said [2 ]
机构
[1] USTHB, RIIMA, BP 32, Algiers 16111, Algeria
[2] Univ Rouen, LITIS, F-76821 Mont St Aignan, France
关键词
Text classification; Term weighting; Text representation; Support vector machines; Classifier combination; SCHEME;
D O I
10.1007/s10115-016-0924-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accuracy of a text classification method based on a SVM learner depends on the weighting metric used in order to assign a weight to a term. Weighting metrics can be classified as supervised or unsupervised according to whether they use prior information on the number of documents belonging to each category. A supervised metric should be highly informative about the relation of a document term to a category, and discriminative in separating the positive documents from the negative documents for this category. In this paper, we propose 80 metrics never used for the term-weighting problem and compare them to 16 functions of the literature. A large number of these metrics were initially proposed for other data mining problems: feature selection, classification rules and term collocations. While many previous works have shown the merits of using a particular metric, our experience suggests that the results obtained by such metrics can be highly dependent on the label distribution on the corpus and on the performance measures used (microaveraged or macroaveraged -Score). The solution that we propose consists in combining the metrics in order to improve the classification. More precisely, we show that using a SVM classifier which combines the outputs of SVM classifiers that utilize different metrics performs well in all situations. The second main contribution of this paper is an extended term representation for the vector space model that improves significantly the prediction of the text classifier.
引用
收藏
页码:909 / 931
页数:23
相关论文
共 50 条
  • [1] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    Knowledge and Information Systems, 2016, 49 : 909 - 931
  • [2] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
  • [3] Model-induced term-weighting schemes for text classification
    Kim, Hyun Kyung
    Kim, Minyoung
    APPLIED INTELLIGENCE, 2016, 45 (01) : 30 - 43
  • [4] Model-induced term-weighting schemes for text classification
    Hyun Kyung Kim
    Minyoung Kim
    Applied Intelligence, 2016, 45 : 30 - 43
  • [5] Term-weighting learning via genetic programming for text classification
    Escalante, Hugo Jair
    García-Limón, Mauricio A.
    Morales-Reyes, Alicia
    Graff, Mario
    Montes-y-Gómez, Manuel
    Morales, Eduardo F.
    Martínez-Carranza, José
    Knowledge-Based Systems, 2015, 83 : 176 - 189
  • [6] Term-weighting learning via genetic programming for text classification
    Jair Escalante, Hugo
    Garcia-Limon, Mauricio A.
    Morales-Reyes, Alicia
    Graff, Mario
    Montes-y-Gomez, Manuel
    Morales, Eduardo F.
    Martinez-Carranza, Jose
    KNOWLEDGE-BASED SYSTEMS, 2015, 83 : 176 - 189
  • [7] TERM-WEIGHTING APPROACHES IN AUTOMATIC TEXT RETRIEVAL
    SALTON, G
    BUCKLEY, C
    INFORMATION PROCESSING & MANAGEMENT, 1988, 24 (05) : 513 - 523
  • [8] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [9] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560
  • [10] A Novel Term-weighting Approach in Text Classification over Skewed Data Sets
    Sun, Tieli
    Zhang, Yujie
    Yang, Fengqin
    Yang, Xiquan
    Jiang, Yingjie
    Wang, Zibing
    Li, Kuiwu
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (03): : 621 - 633