Supervised term-category feature weighting for improved text classification

被引:12
|
作者
Attieh, Joseph [1 ]
Tekli, Joe [1 ,2 ]
机构
[1] Lebanese Amer Univ LAU, Elect & Comp Engn Dept, Byblos 36, Lebanon
[2] Univ Pay & Pays Adour UPPA, LIUPPA Lab, SPIDER Res Team, F-64600 Anglet, Aquitaine, France
关键词
Text classification; Document and text processing; Feature Engineering; Supervised term weighting; Inverse Category Frequency; TF-IDF; Text representation; SCHEMES; MODEL;
D O I
10.1016/j.knosys.2022.110215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category -based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Effective Text Classification by a Supervised Feature Selection Approach
    Basu, Tanmay
    Murthy, C. A.
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012), 2012, : 918 - 925
  • [32] Similarity Based Feature Weighting for Inter Domain Classification of Text
    Brindha, G. R.
    Santhi, B.
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2018, 13 (04): : 155 - 172
  • [33] DEEP FEATURE WEIGHTING IN NAIVE BAYES FOR CHINESE TEXT CLASSIFICATION
    Jiang, Qiaowei
    Wang, Wen
    Han, Xu
    Zhang, Shasha
    Wang, Xinyan
    Wang, Cong
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 160 - 164
  • [34] An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification
    Mikawa, Kenta
    Ishida, Takashi
    Goto, Masayuki
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2012, 11 (01): : 87 - 93
  • [35] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53
  • [36] A Comparative Study on Term Weighting Schemes for Text Classification
    Mazyad, Ahmad
    Teytaud, Fabien
    Fonlupt, Cyril
    MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 100 - 108
  • [37] A Study on Text Classification: Term Weighting Algorithm Analysis
    Tseng, Kuan-Hua
    Lin, Chun-Hung Richard
    Liu, Jain-Shing
    Huang, Chih-Ming Andrew
    Wang, Yue-Han
    JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (02): : 311 - 325
  • [38] Text classification using scores based k-NN approach and term to category relevance weighting scheme
    Ben Afia, Ahmed
    Amiri, Hamid
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 283 - 290
  • [39] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [40] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (02): : 259 - 268