Supervised term-category feature weighting for improved text classification

被引:12
|
作者
Attieh, Joseph [1 ]
Tekli, Joe [1 ,2 ]
机构
[1] Lebanese Amer Univ LAU, Elect & Comp Engn Dept, Byblos 36, Lebanon
[2] Univ Pay & Pays Adour UPPA, LIUPPA Lab, SPIDER Res Team, F-64600 Anglet, Aquitaine, France
关键词
Text classification; Document and text processing; Feature Engineering; Supervised term weighting; Inverse Category Frequency; TF-IDF; Text representation; SCHEMES; MODEL;
D O I
10.1016/j.knosys.2022.110215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category -based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] An Improved Term Weighting Scheme for Sentiment Classification
    Zhang, Pu
    Wang, Yinghao
    Wang, Junxia
    Zeng, Xianhua
    Wang, Yong
    2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 462 - 466
  • [42] A New Improved Term Weighting Scheme for Text Categorization
    Nguyen Pham Xuan
    Hieu Le Quang
    KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2013), VOL 1, 2014, 244 : 261 - 270
  • [43] Supervised term weighting centroid-based classifiers for text categorization
    Tam T. Nguyen
    Kuiyu Chang
    Siu Cheung Hui
    Knowledge and Information Systems, 2013, 35 : 61 - 85
  • [44] A Supervised Term Weighting Scheme for Multi-class Text Categorization
    Gu, Yiwei
    Gu, Xiaodong
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 436 - 447
  • [45] Supervised term weighting centroid-based classifiers for text categorization
    Nguyen, Tam T.
    Chang, Kuiyu
    Hui, Siu Cheung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 35 (01) : 61 - 85
  • [46] Using modified term frequency to improve term weighting for text classification
    Chen, Long
    Jiang, Liangxiao
    Li, Chaoqun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 101
  • [47] CWC: A clustering-based feature weighting approach for text classification
    Zhu, Lin
    Guan, Jihong
    Zhou, Shuigeng
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4617 : 204 - +
  • [48] Deep feature weighting for naive Bayes and its application to text classification
    Jiang, Liangxiao
    Li, Chaoqun
    Wang, Shasha
    Zhang, Lungan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 26 - 39
  • [49] Exploiting category information and document information to improve term weighting for text categorization
    Li, Jingyang
    Sun, Maosong
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 587 - +
  • [50] Supervised Hebb rule based feature selection for text classification
    Heyong, Wang
    Ming, Hong
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (01) : 167 - 191