Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling

被引:13
作者
Ben Ismail, Mohamed Maher [1 ]
Frigui, Hichem [2 ]
机构
[1] King Saud Univ, Dept Comp Sci, Coll Comp & Informat Sci, Riyadh 11548, Saudi Arabia
[2] Univ Louisville, CECS Dept, Louisville, KY 40292 USA
关键词
Unsupervised learning; Mixture model; Feature weighting; Generalized Dirichlet mixture; Possibilistic approach; Image collection categorization; FEATURE-SELECTION; FUZZY; ALGORITHM;
D O I
10.1016/j.ins.2014.02.146
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a possibilistic approach for Generalized Dirichlet mixture parameter estimation, data clustering, and feature weighting. The proposed algorithm, called Robust and Unsupervised Learning of Finite Generalized Dirichlet Mixture Models (RULe_GDM), exploits a property of the Generalized Dirichlet distributions that transforms the data to make the features independent and follow Beta distributions. Then, it learns optimal relevance weights for each feature within each cluster. This property makes RULe_GDM suitable for noisy and high-dimensional feature spaces. In addition, RULe_GDM associates two types of memberships with each data sample. The first one is the posterior probability and indicates how well a sample fits each estimated distribution. The second membership represents the degree of typicality and is used to identify and discard noise points and outliers. RULe_GDM minimizes one objective function which combines learning the two membership functions, distribution parameters, and the relevance weights for each feature within each distribution. We also extend our algorithm to find the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. The performance of RULe_GDM is illustrated and compared to similar algorithms. We use synthetic data to illustrate its robustness to noisy and high dimensional features. We also compare our approach to other relevant algorithms using several standard data sets. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:35 / 54
页数:20
相关论文
共 44 条
[31]   A possibilistic fuzzy c-means clustering algorithm [J].
Pal, NR ;
Pal, K ;
Keller, JM ;
Bezdek, JC .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2005, 13 (04) :517-530
[32]   Fuzzy clustering with supervision [J].
Pedrycz, W ;
Vukovich, G .
PATTERN RECOGNITION, 2004, 37 (07) :1339-1349
[33]  
Pedrycz W., 2013, Granular Computing: Analysis and Design of Intelligent Systems
[34]   Fuzzy Clustering With Viewpoints [J].
Pedrycz, Witold ;
Loia, Vincenzo ;
Senatore, Sabrina .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (02) :274-284
[35]   FEATURE-SELECTION BASED ON THE APPROXIMATION OF CLASS DENSITIES BY FINITE MIXTURES OF SPECIAL TYPE [J].
PUDIL, P ;
NOVOVICOVA, J ;
CHOAKJARERNWANIT, N ;
KITTLER, J .
PATTERN RECOGNITION, 1995, 28 (09) :1389-1398
[36]  
Rao P., 1987, WILEY SERIES PROB MA
[37]   MODELING BY SHORTEST DATA DESCRIPTION [J].
RISSANEN, J .
AUTOMATICA, 1978, 14 (05) :465-471
[38]  
Rousseeuw J.P., 1987, Robust Regression and Outlier Detection
[39]   ESTIMATING DIMENSION OF A MODEL [J].
SCHWARZ, G .
ANNALS OF STATISTICS, 1978, 6 (02) :461-464
[40]   A Nonsymmetric Mixture Model for Unsupervised Image Segmentation [J].
Thanh Minh Nguyen ;
Wu, Q. M. Jonathan .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (02) :751-765