Weighted Cluster Ensemble Based on Partition Relevance Analysis With Reduction Step

被引:6
作者
Ilc, Nejc [1 ]
机构
[1] Univ Ljubljana, Fac Comp & Informat Sci, Ljubljana 1000, Slovenia
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Clustering algorithms; Partitioning algorithms; Feature extraction; Indexes; Gene expression; Robustness; Cluster analysis; cluster validity index; dimensionality reduction; ensemble learning; feature extraction; feature selection; gene expression; weighted ensemble; COMBINING MULTIPLE CLUSTERINGS; DIMENSION ESTIMATION; VALIDITY INDEX; VALIDATION; FRAMEWORK; SELECTION; AGGREGATION;
D O I
10.1109/ACCESS.2020.3003046
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the last decade, the advent of the cluster ensemble framework has enabled more accurate and robust data analysis than traditional single clustering algorithms. The improved clustering of microarray data has had a particularly strong impact in the fields of genomics and medicine. However, when we bring several ensemble members together to form a consensus, low-quality data partitions can seriously compromise the final solution. One way to overcome this problem is the weighted cluster ensemble approach based on Partition Relevance Analysis (PRA), which uses internal cluster validity indices to evaluate and weight the ensemble members before the fusion. Unfortunately, the selection of appropriate validation indices for given data is far from trivial. In this paper, we propose an additional step in PRA that reduces the size of the committee of cluster validation indices. It does so by eliminating redundant and noisy indices using data dimensionality reduction methods. Our extension works in an unsupervised way, minimizing the amount of user intervention and required expert knowledge. We adapted three conventional consensus functions based on the principle of evidence accumulation to work with PRA weights. We demonstrate the advantages of the proposed reduction step of PRA based on extensive experiments with 25 gene expression and 15 non-genetic real-world datasets, where we compared 15 consensus functions. The source code is available at https://github.com/nejci/PRAr.
引用
收藏
页码:113720 / 113736
页数:17
相关论文
共 87 条
[1]   Clustering ensemble selection considering quality and diversity [J].
Abbasi, Sadr-olah ;
Nejatian, Samad ;
Parvin, Hamid ;
Rezaie, Vahideh ;
Bagherifard, Karamolah .
ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (02) :1311-1340
[2]   Survey of State-of-the-Art Mixed Data Clustering Algorithms [J].
Ahmad, Amir ;
Khan, Shehroz S. .
IEEE ACCESS, 2019, 7 :31883-31902
[3]  
[Anonymous], 2009, J Mach Learn Res
[4]  
[Anonymous], 1957, Bull. Acad. Polon. Sci
[5]  
[Anonymous], 2002, Journal of Machine Learning Research
[6]  
[Anonymous], 2007, P 24 INT C MACH LEAR
[7]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[8]  
Benavoli A, 2017, J MACH LEARN RES, V18
[9]   Ensemble clustering based on weighted co-association matrices: Error bound and convergence properties [J].
Berikov, Vladimir ;
Pestunov, Igor .
PATTERN RECOGNITION, 2017, 63 :427-436
[10]   Some new indexes of cluster validity [J].
Bezdek, JC ;
Pal, NR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1998, 28 (03) :301-315