From cluster ensemble to structure ensemble

被引:31
作者
Yu, Zhiwen [1 ,2 ]
You, Jane [2 ]
Wong, Hau-San [3 ]
Han, Guoqiang [1 ]
机构
[1] S China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Hong Kong, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Cluster ensemble; Structure ensemble; CLASSIFIER ENSEMBLES; MICROARRAY DATA; RELIABILITY; STABILITY; CONSENSUS; CANCER;
D O I
10.1016/j.ins.2012.02.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates the problem of integrating multiple structures which are extracted from different sets of data points into a single unified structure. We first propose a new generalized concept called structure ensemble for the fusion of multiple structures. Unlike traditional cluster ensemble approaches the main objective of which is to align individual labels obtained from different clustering solutions, the structure ensemble approach focuses on how to unify the structures obtained from different data sources. Based on this framework, a new structure ensemble approach called the probabilistic bagging based structure ensemble approach (BSEA) is designed, which integrates the bagging technique, the force based self-organizing map (FBSOM) and the normalized cut algorithm into the proposed framework. BSEA views structures obtained from different datasets generated by the bagging technique as nodes in a graph, and adopts graph theory to find the most representative structure. In addition, the force based self-organizing map (FBSOM), which is a generalized form of SOM, is proposed to serve as the basic clustering algorithm in the structure ensemble framework. Finally, a new external index called correlation index (CI), which considers the correlation relationship of both the similarity and dissimilarity between the predicted solution and the true solution, is proposed to evaluate the performance of BSEA. The experiments show that (i) The performance of BSEA outperforms most of the state-of-the-art clustering approaches, and (ii) BSEA performs well on datasets from the UCI repository and real cancer gene expression profiles. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:81 / 99
页数:19
相关论文
共 72 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]  
Amasyali M.F., 2008, IEEE 16 SIGN PROC CO, P1
[3]  
[Anonymous], 11 ACM INT C KNOWL D
[4]  
[Anonymous], 2007, Uci machine learning repository
[5]  
[Anonymous], 1988, Algorithms for Clustering Data
[6]  
[Anonymous], P ICPR
[7]  
Ayad H., 2003, P 4 INT WORKSH MULT
[8]   Cumulative voting consensus method for partitions with a variable number of clusters [J].
Ayad, Hanan G. ;
Kamel, Mohamed S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (01) :160-173
[9]   Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses [J].
Bertoni, Alberto ;
Valentini, Giorgio .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2006, 37 (02) :85-109
[10]   Discovering multi-level structures in bio-molecular data through the Bernstein inequality [J].
Bertoni, Alberto ;
Valentini, Giorgio .
BMC BIOINFORMATICS, 2008, 9 (Suppl 2)