Clustering ensemble selection considering quality and diversity

被引:73
作者
Abbasi, Sadr-olah [1 ]
Nejatian, Samad [2 ,3 ]
Parvin, Hamid [4 ,5 ]
Rezaie, Vahideh [3 ,6 ]
Bagherifard, Karamolah [1 ,3 ]
机构
[1] Islamic Azad Univ, Yasooj Branch, Dept Comp Engn, Yasuj, Iran
[2] Islamic Azad Univ, Yasooj Branch, Dept Elect Engn, Yasuj, Iran
[3] Islamic Azad Univ, Yasooj Branch, Young Researchers & Elite Club, Yasuj, Iran
[4] Islamic Azad Univ, Nourabad Mamasani Branch, Dept Comp Engn, Nourabad Mamasani, Iran
[5] Islamic Azad Univ, Nourabad Mamasani Branch, Young Researchers & Elite Club, Nourabad Mamasani, Iran
[6] Islamic Azad Univ, Yasooj Branch, Dept Math, Yasuj, Iran
关键词
Clustering ensemble; Stability measure; Improved stability; Evidence accumulation; Extended EAC; Co-association matrix; Cluster evaluation; COMBINING MULTIPLE CLUSTERINGS; VALIDATION; FRAMEWORK; CONSENSUS;
D O I
10.1007/s10462-018-9642-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is highly likely that there is a partition that is judged by a stability measure as a bad one while it contains one (or more) high quality cluster(s); and then it is totally neglected. So, inspiring from the evaluation of partitions, researchers turn to define measures for evaluation of clusters. Many stability measures have been proposed such as Normalized Mutual Information to validate a partition. The defined measures are based on Normalized Mutual Information. The drawback of the commonly used approach will be discussed in this paper and a criterion is proposed to assess the association between a cluster and a partition which is called Edited Normalized Mutual Information, ENMI criterion. The ENMI criterion compensates the drawback of the common Normalized Mutual Information (NMI) measure. Also, a clustering ensemble method that is based on aggregating a subset of primary clusters is proposed. The proposed method uses the Average ENMI as fitness measure to select a number of clusters. The clusters that satisfy a predefined threshold of the mentioned measure are selected to participate in the final ensemble. To combine the chosen clusters a set of consensus function methods are employed. One class of the used consensus functions is the co-association based consensus functions. Since the Evidence Accumulation Clustering, EAC, method can't derive the co-association matrix from a subset of clusters, Extended EAC, EEAC, is employed to construct the co-association matrix from the chosen subset of clusters. The second class of the used consensus functions is based on hyper graph partitioning algorithms. The other class of the used consensus functions considers the chosen clusters as a new feature space and uses a simple clustering algorithm to extract the consensus partitioning. The empirical studies show that the proposed method outperforms other well-known ensembles.
引用
收藏
页码:1311 / 1340
页数:30
相关论文
共 80 条
[1]   Hierarchical cluster ensemble selection [J].
Akbari, Ebrahim ;
Dahlan, Halina Mohamed ;
Ibrahim, Roliana ;
Alizadeh, Hosein .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 39 :146-156
[2]  
ALIZADEH H, 2011, ARTIFICIAL INTELLI 1, P240
[3]   Wisdom of Crowds cluster ensemble [J].
Alizadeh, Hosein ;
Yousefnezhad, Muhammad ;
Bidgoli, Behrouz Minaei .
INTELLIGENT DATA ANALYSIS, 2015, 19 (03) :485-503
[4]   Cluster ensemble selection based on a new cluster stability measure [J].
Alizadeh, Hosein ;
Minaei-Bidgoli, Behrouz ;
Parvin, Hamid .
INTELLIGENT DATA ANALYSIS, 2014, 18 (03) :389-408
[5]   To improve the quality of cluster ensembles by selecting a subset of base clusters [J].
Alizadeh, Hosein ;
Minaei-Bidgoli, Behrouz ;
Parvin, Hamid .
JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2014, 26 (01) :127-150
[6]  
Alizadeh H, 2011, STUD COMPUT INTELL, V363, P1
[7]  
[Anonymous], ACM T KNOWL DISCOV D
[8]  
[Anonymous], INT C COMP INT MULT
[9]  
[Anonymous], 2008, SIAM INT C DATA MINI
[10]  
[Anonymous], 2007, MUSIC BRAIN COGNIT 2