Supervised term-category feature weighting for improved text classification

被引:12
|
作者
Attieh, Joseph [1 ]
Tekli, Joe [1 ,2 ]
机构
[1] Lebanese Amer Univ LAU, Elect & Comp Engn Dept, Byblos 36, Lebanon
[2] Univ Pay & Pays Adour UPPA, LIUPPA Lab, SPIDER Res Team, F-64600 Anglet, Aquitaine, France
关键词
Text classification; Document and text processing; Feature Engineering; Supervised term weighting; Inverse Category Frequency; TF-IDF; Text representation; SCHEMES; MODEL;
D O I
10.1016/j.knosys.2022.110215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category -based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] The Effects of Globalization Functions on Feature Weighting for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [22] A Kernel-based Feature Weighting for Text Classification
    Wittek, Peter
    Tan, Chew Lim
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 3062 - 3068
  • [23] Single pass text classification by direct feature weighting
    Malik, Hassan H.
    Fradkin, Dmitriy
    Moerchen, Fabian
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (01) : 79 - 98
  • [24] Single pass text classification by direct feature weighting
    Hassan H. Malik
    Dmitriy Fradkin
    Fabian Moerchen
    Knowledge and Information Systems, 2011, 28 : 79 - 98
  • [25] Feature weighting for improved classification of anuran calls
    Singh, Dalwinder
    Singh, Birmohan
    2018 FIRST INTERNATIONAL CONFERENCE ON SECURE CYBER COMPUTING AND COMMUNICATIONS (ICSCCC 2018), 2018, : 604 - 609
  • [26] An Improved Feature Weighting Strategy in Chinese Text Categorization
    Song, Jia
    Qin, Sijun
    Zhang, Pengzhou
    PROCEEDINGS OF THE 2015 6TH INTERNATIONAL CONFERENCE ON MANUFACTURING SCIENCE AND ENGINEERING, 2016, 32 : 202 - 208
  • [27] Adaptable Term Weighting Framework for Text Classification
    Huynh, Dat
    Dat Tran
    Ma, Wanli
    Sharma, Dharmendra
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 254 - 265
  • [28] A survey of term weighting schemes for text classification
    Alsaeedi, Abdullah
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2020, 12 (02) : 237 - 254
  • [29] Imbalanced text classification: A term weighting approach
    Liu, Ying
    Loh, Han Tong
    Sun, Aixin
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
  • [30] Supervised and Traditional Term Weighting Methods for Automatic Text Categorization
    Lan, Man
    Tan, Chew Lim
    Su, Jian
    Lu, Yue
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (04) : 721 - 735