Two-stage Feature Selection Method for Text Classification

被引:1
|
作者
Li Xi [1 ]
Dai Hang [1 ]
Wang Mingwen [2 ]
机构
[1] Jiangxi Sci & Technol Normal Univ, Sch Math & Comp Sci, Nanchang, Peoples R China
[2] Jiangxi Normal Univ, Sch Comp & Informat Engn, Nanchang, Peoples R China
来源
MINES 2009: FIRST INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY, VOL 1, PROCEEDINGS | 2009年
关键词
Text Classification; Feature Selection; TF-IDCFC; RLS; LARS; RLS-MARS; CATEGORIZATION;
D O I
10.1109/MINES.2009.127
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Dimension reduction is the process of reducing the number of random features under consideration, and can be divided into the feature selection and the feature extraction. A two-stage feature selection method based on the Regularized Least Squares-Multi Angle Regression and Shrinkage (RLS-MARS) model is proposed in this paper: In the first stage, a new weighting method, the Term Frequency Inverse Document and Category Frequency Collection normalization (TF-IDCFC) is applied to measure the features, and select the important features by using the category information as a factor. In the second stage, the RLS-MARS model is used to select the relevant information, while the Regularized Least Squares (RLS) with the Least Angle Regression and Shrinkage (LARS) can be viewed as an efficient approach. The experiments on Fudan University Chinese Text Classification Corpus and 20 Newsgroups, both of those datasets demonstrate the effectiveness of the new feature selection method for text classification in several classical algorithms: KNN and SVMLight.
引用
收藏
页码:234 / +
页数:2
相关论文
共 50 条
  • [1] Two-Stage Feature Selection for Text Classification
    Ozgur, Levent
    Gungor, Tunga
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 329 - 337
  • [2] On Two-Stage Feature Selection Methods for Text Classification
    Uysal, Alper Kursat
    IEEE ACCESS, 2018, 6 : 43233 - 43251
  • [3] A Two-stage Text Feature Selection Algorithm for Improving Text Classification
    Ashokkumar, P.
    Shankar, Siva G.
    Srivastava, Gautam
    Maddikunta, Praveen Kumar Reddy
    Gadekallu, Thippa Reddy
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (03)
  • [4] A two-stage feature selection method for text categorization
    Meng, Jiana
    Lin, Hongfei
    Yu, Yuhai
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2011, 62 (07) : 2793 - 2800
  • [5] Revisiting two-stage feature selection based on coverage policies for text classification
    Mendez-Molina, Arquimides
    Li Ona-Garcia, Ana
    Ariel Carrasco-Ochoa, Jesus
    Martinez-Trinidad, Jose Fco.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (05) : 2949 - 2957
  • [6] A two-stage Markov blanket based feature selection algorithm for text classification
    Javed, Kashif
    Maruf, Sameen
    Babri, Haroon A.
    NEUROCOMPUTING, 2015, 157 : 91 - 104
  • [7] Improving Farsi Multiclass Text Classification Using a Thesaurus and Two-Stage Feature Selection
    Maghsoodi, Nooshin
    Homayounpour, Mohammad Mehdi
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2011, 62 (10): : 2055 - 2066
  • [8] Adaptive Two-Stage Feature Selection for Sentiment Classification
    Chi, Xu
    Cambria, Erik
    Siew, Tan Puay
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1238 - 1243
  • [9] A two-stage feature selection method with its application
    Zhao, Xuehua
    Li, Daoliang
    Yang, Bo
    Chen, Huiling
    Yang, Xinbin
    Yu, Chenglong
    Liu, Shuangyin
    COMPUTERS & ELECTRICAL ENGINEERING, 2015, 47 : 114 - 125
  • [10] Two-stage classification with automatic feature selection for an industrial application
    Hader, S
    Hamprecht, FA
    Classification - the Ubiquitous Challenge, 2005, : 137 - 144