Distance Variance Score: An Efficient Feature Selection Method in Text Classification

被引:16
|
作者
Wang, Heyong [1 ]
Hong, Ming [1 ]
机构
[1] S China Univ Technol, Dept E Business, Guangzhou 510006, Guangdong, Peoples R China
关键词
LAPLACIAN SCORE;
D O I
10.1155/2015/695720
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
With the rapid development of web applications such as social network, a large amount of electric text data is accumulated and available on the Internet, which causes increasing interests in text mining. Text classification is one of the most important subfields of text mining. In fact, text documents are often represented as a high-dimensional sparse document term matrix (DTM) before classification. Feature selection is essential and vital for text classification due to high dimensionality and sparsity of DTM. An efficient feature selection method is capable of both reducing dimensions of DTM and selecting discriminative features for text classification. Laplacian Score (LS) is one of the unsupervised feature selection methods and it has been successfully used in areas such as face recognition. However, LS is unable to select discriminative features for text classification and to effectively reduce the sparsity of DTM. To improve it, this paper proposes an unsupervised feature selection method named Distance Variance Score (DVS). DVS uses feature distance contribution (a ratio) to rank the importance of features for text documents so as to select discriminative features. Experimental results indicate that DVS is able to select discriminative features and reduce the sparsity of DTM. Thus, it is much more efficient than LS.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [2] A Variance-mean Based Feature Selection in Text Classification
    Yin, Shen
    Jiang, Zongli
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON EDUCATION TECHNOLOGY AND COMPUTER SCIENCE, VOL III, 2009, : 519 - 522
  • [3] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [4] Text feature selection method for hierarchical classification
    Zhu, Cui-Ling
    Ma, Jun
    Zhang, Dong-Mei
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
  • [5] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [6] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [7] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    IEEE ACCESS, 2024, 12 : 139316 - 139335
  • [8] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524
  • [9] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [10] A Hybrid Feature Selection Method For Vietnamese Text Classification
    Nguyen Tri Hai
    Tuan Dinh Le
    Nguyen Hoang Nghia
    Vu Thanh Nguyen
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96