Effective Text Classification by a Supervised Feature Selection Approach

被引:34
|
作者
Basu, Tanmay [1 ]
Murthy, C. A. [1 ]
机构
[1] Indian Stat Inst, Machine Intelligence Unit, Kolkata 700108, India
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012) | 2012年
关键词
Feature Selection; Text Classification;
D O I
10.1109/ICDMW.2012.45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The high dimensionality of data is a great challenge for effective text classification. Each document in a document corpus contains many irrelevant and noisy information which eventually reduces the efficiency of text classification. Automatic feature selection methods are extremely important to handle the high dimensionality of data for effective text classification. Feature selection in text classification focuses on identifying relevant information without affecting the accuracy of the classifier. Several feature selection methods have been proposed to improve the classification accuracy by reducing the original feature space. To improve the performance of text classification a new supervised feature selection approach has been proposed which develops a similarity between a term and a class. In this way every term will generate a score based on their similarity with all the classes and then all the terms will be ranked accordingly. The experimental results are presented on several TREC and Reuter data sets using knn classifier. The performances of the classifiers are compared using precision, recall, f-measure and classification accuracy. The proposed term selection approach is compared with document frequency thresholding, information gain, mutual information and chi(2) statistic. The empirical studies have shown that the proposed method performs significantly better than the other methods.
引用
收藏
页码:918 / 925
页数:8
相关论文
共 50 条
  • [1] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184
  • [2] Supervised Hebb rule based feature selection for text classification
    Heyong, Wang
    Ming, Hong
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (01) : 167 - 191
  • [3] A new approach to feature selection in text classification
    Wang, Y
    Wang, XJ
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3814 - 3819
  • [4] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [5] RLS-MARS - an Effective Feature Selection Tool for Text Classification
    Li Xi
    Dai Hang
    Wang Mingwen
    2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY (MINES 2012), 2012, : 254 - 257
  • [6] A modified multi objective heuristic for effective feature selection in text classification
    Thiyagarajan, D.
    Shanthi, N.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 : 10625 - 10635
  • [7] Hybrid ACO and TOFA Feature Selection Approach for Text Classification
    Alghamdi, Hanan S.
    Tang, H. Lilian
    Alshomrani, Saleh
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [8] A New Big Data Feature Selection Approach for Text Classification
    Amazal, Houda
    Kissi, Mohamed
    SCIENTIFIC PROGRAMMING, 2021, 2021
  • [9] A Comprehensive Empirical Comparison of Modern Supervised Classification and Feature Selection Methods for Text Categorization
    Aphinyanaphongs, Yindalon
    Fu, Lawrence D.
    Li, Zhiguo
    Peskin, Eric R.
    Efstathiadis, Efstratios
    Aliferis, Constantin F.
    Statnikov, Alexander
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2014, 65 (10) : 1964 - 1987
  • [10] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675