A new feature selection scheme using a data distribution factor for unsupervised nominal data

被引:20
作者
Chow, Tommy W. S. [1 ]
Wang, Piyang [1 ]
Ma, Eden W. M. [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2008年 / 38卷 / 02期
关键词
clustering; feature ranking; unsupervised feature selection;
D O I
10.1109/TSMCB.2007.914707
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new efficient unsupervised feature selection method is proposed to handle nominal data without data transformation. The proposed feature selection method introduces a new data distribution factor to select appropriate clusters. The proposed method combines the compactness and separation together with a newly introduced concept of singleton item. This new feature selection method considers all features globally. It is computationally inexpensive and able to deliver very promising results. Eight datasets from the University of California Irvine (UCI) machine learning repository and a high-dimensional cDNA dataset are used in this paper. The obtained results show that the proposed method is very efficient and able to deliver very reliable results.
引用
收藏
页码:499 / 509
页数:11
相关论文
共 19 条
[1]   Unsupervised feature selection using a neuro-fuzzy approach [J].
Basak, J ;
De, RK ;
Pal, SK .
PATTERN RECOGNITION LETTERS, 1998, 19 (11) :997-1006
[2]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[3]   Some new indexes of cluster validity [J].
Bezdek, JC ;
Pal, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03) :301-315
[4]   Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information [J].
Chow, TWS ;
Huang, D .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2005, 16 (01) :213-224
[5]   Feature selection for clustering - A filter solution [J].
Dash, M ;
Choi, K ;
Scheuermann, P ;
Liu, H .
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, :115-122
[6]   Dimensionality reduction of unsupervised data [J].
Dash, M ;
Liu, H ;
Yao, L .
NINTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1997, :532-539
[7]  
Dy J. G., 2000, ICML '00, P247, DOI DOI 10.5555/645529.657797
[8]   Effective feature selection scheme using mutual information [J].
Huang, D ;
Chow, TWS .
NEUROCOMPUTING, 2005, 63 :325-343
[9]   Efficiently searching the important input variables using Bayesian discriminant [J].
Huang, D ;
Chow, TWS .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2005, 52 (04) :785-793
[10]   Relevant, irredundant feature selection and noisy example elimination [J].
Lashkia, GV ;
Anthony, L .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (02) :888-897