Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling

被引:13
|
作者
Ben Ismail, Mohamed Maher [1 ]
Frigui, Hichem [2 ]
机构
[1] King Saud Univ, Dept Comp Sci, Coll Comp & Informat Sci, Riyadh 11548, Saudi Arabia
[2] Univ Louisville, CECS Dept, Louisville, KY 40292 USA
关键词
Unsupervised learning; Mixture model; Feature weighting; Generalized Dirichlet mixture; Possibilistic approach; Image collection categorization; FEATURE-SELECTION; FUZZY; ALGORITHM;
D O I
10.1016/j.ins.2014.02.146
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a possibilistic approach for Generalized Dirichlet mixture parameter estimation, data clustering, and feature weighting. The proposed algorithm, called Robust and Unsupervised Learning of Finite Generalized Dirichlet Mixture Models (RULe_GDM), exploits a property of the Generalized Dirichlet distributions that transforms the data to make the features independent and follow Beta distributions. Then, it learns optimal relevance weights for each feature within each cluster. This property makes RULe_GDM suitable for noisy and high-dimensional feature spaces. In addition, RULe_GDM associates two types of memberships with each data sample. The first one is the posterior probability and indicates how well a sample fits each estimated distribution. The second membership represents the degree of typicality and is used to identify and discard noise points and outliers. RULe_GDM minimizes one objective function which combines learning the two membership functions, distribution parameters, and the relevance weights for each feature within each distribution. We also extend our algorithm to find the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. The performance of RULe_GDM is illustrated and compared to similar algorithms. We use synthetic data to illustrate its robustness to noisy and high dimensional features. We also compare our approach to other relevant algorithms using several standard data sets. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:35 / 54
页数:20
相关论文
共 50 条
  • [21] Entropy-Based Variational Learning of Finite Generalized Inverted Dirichlet Mixture Model
    Ahmadzadeh, Mohammad Sadegh
    Manouchehri, Narges
    Ennajari, Hafsa
    Bouguila, Nizar
    Fan, Wentao
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 130 - 143
  • [22] Unsupervised Learning of Image Segmentation Based on Differentiable Feature Clustering
    Kim, Wonjik
    Kanezaki, Asako
    Tanaka, Masayuki
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8055 - 8068
  • [23] A New Unsupervised Binning Approach for Metagenomic Sequences Based on N-grams and Automatic Feature Weighting
    Liao, Ruiqi
    Zhang, Ruichang
    Guan, Jihong
    Zhou, Shuigeng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (01) : 42 - 54
  • [24] Feature selection in robust clustering based on Laplace mixture
    Cord, A
    Ambroise, C
    Cocquerez, JP
    PATTERN RECOGNITION LETTERS, 2006, 27 (06) : 627 - 635
  • [25] CWC: A clustering-based feature weighting approach for text classification
    Zhu, Lin
    Guan, Jihong
    Zhou, Shuigeng
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4617 : 204 - +
  • [26] Unsupervised feature selection method based on iterative similarity graph factorization and clustering by modularity
    Oliveira, Marcos de S.
    Queiroz, Sergio R. de M.
    de Carvalho, Francisco de A. T.
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 208
  • [27] Unsupervised feature selection for balanced clustering
    Zhou, Peng
    Chen, Jiangyong
    Fan, Mingyu
    Du, Liang
    Shen, Yi-Dong
    Li, Xuejun
    KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [28] Feature-Weighting and Clustering Random Forest
    Liu, Zhenyu
    Wen, Tao
    Sun, Wei
    Zhang, Qilong
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 257 - 265
  • [29] Feature Weighting for Clustering by Particle Swarm Optimization
    Swetha, K. P.
    Devi, V. Susheela
    2012 SIXTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING (ICGEC), 2012, : 441 - 444
  • [30] On multivariate binary data clustering and feature weighting
    Bouguila, Nizar
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2010, 54 (01) : 120 - 134