Adaptive dimension reduction for clustering high dimensional data

被引:105
作者
Ding, C [1 ]
He, XF [1 ]
Zha, HY [1 ]
Simon, HD [1 ]
机构
[1] Univ Calif Berkeley, Lawrence Berkeley Lab, NERSC Div, Berkeley, CA 94720 USA
来源
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2002年
关键词
D O I
10.1109/ICDM.2002.1183897
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well-known that for high dimensional data clustering, standard algorithms such as EM and the K-means are often trapped in local minimum. Many initialization methods were proposed to tackle this problem, but with only limited success. In this paper we propose a new approach to resolve this problem by repeated dimension reductions such that K-means or EM are performed only in very low dimensions. Cluster membership is utilized as a bridge between the reduced dimensional sub-space and the original space, providing flexibility and ease of implementation. Clustering analysis performed on highly overlapped Gaussians, DNA gene expression profiles and internet newsgroups demonstrate the effectiveness of the proposed algorithm.
引用
收藏
页码:147 / 154
页数:8
相关论文
共 30 条
[1]  
Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
[2]  
Agrawal R., 1998, AUTOMATIC SUBSPACE C, P94, DOI DOI 10.1145/276304.276314
[3]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[4]  
[Anonymous], P 14 C UNC ART INT
[5]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[6]  
Bradley P. S., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P9
[7]  
Dasgupta S., 2000, P 16 C UNC ART INT U
[8]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[9]  
2-9
[10]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38