A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data

被引:1
作者
Chao, Shilong [1 ]
Cai, Jie [1 ]
Yang, Sheng [1 ]
Wang, Shulin [1 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha, Hunan, Peoples R China
来源
INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I | 2016年 / 9771卷
关键词
Text classification; Feature selection; Cluster; Diversity; EFFICIENT FEATURE-SELECTION; MUTUAL INFORMATION; ALGORITHM; RELEVANCE;
D O I
10.1007/978-3-319-42291-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a key point in text classification. In this paper a new feature selection method based on feature clustering using information distance is put forward. This method using information distance measure builds a feature clusters space. Firstly, K-medoids clustering algorithm is employed to gather the features into k clusters. Secondly the feature which has the largest mutual information with class is selected from each cluster to make up a feature subset. Finally, choose target number features according to the mRMR algorithm from the selected subset. This algorithm fully considers the diversity between features. Unlike the incremental search algorithm mRMR, it avoids prematurely falling into local optimum. Experimental results show that the features selected by the proposed algorithm can gain better classification accuracy.
引用
收藏
页码:122 / 132
页数:11
相关论文
共 22 条
  • [1] [Anonymous], 2009, P 26 ANN INT C MACH
  • [2] Attribute clustering for grouping, selection, and classification of gene expression data
    Au, WH
    Chan, KCC
    Wong, AKC
    Wang, Y
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (02) : 83 - 101
  • [3] USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING
    BATTITI, R
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04): : 537 - 550
  • [4] A preprocess algorithm of filtering irrelevant information based on the minimum class difference
    Chen, Zhiping
    Lu, Kevin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2006, 19 (06) : 422 - 429
  • [5] FAYYAD UM, 1993, IJCAI-93, VOLS 1 AND 2, P1022
  • [6] Fleuret F, 2004, J MACH LEARN RES, V5, P1531
  • [7] Forman G., 2003, Journal of Machine Learning Research, V3, P1289, DOI 10.1162/153244303322753670
  • [8] Best terms: an efficient feature-selection algorithm for text categorization
    Fragoudis, D
    Meretakis, D
    Likothanassis, S
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (01) : 16 - 33
  • [9] Mutual information-based method for selecting informative feature sets
    Herman, Gunawan
    Zhang, Bang
    Wang, Yang
    Ye, Getian
    Chen, Fang
    [J]. PATTERN RECOGNITION, 2013, 46 (12) : 3315 - 3327
  • [10] Statistical pattern recognition: A review
    Jain, AK
    Duin, RPW
    Mao, JC
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (01) : 4 - 37