Unsupervised clustering and feature weighting based on Generalized Dirichlet mixture modeling

被引:13
|
作者
Ben Ismail, Mohamed Maher [1 ]
Frigui, Hichem [2 ]
机构
[1] King Saud Univ, Dept Comp Sci, Coll Comp & Informat Sci, Riyadh 11548, Saudi Arabia
[2] Univ Louisville, CECS Dept, Louisville, KY 40292 USA
关键词
Unsupervised learning; Mixture model; Feature weighting; Generalized Dirichlet mixture; Possibilistic approach; Image collection categorization; FEATURE-SELECTION; FUZZY; ALGORITHM;
D O I
10.1016/j.ins.2014.02.146
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a possibilistic approach for Generalized Dirichlet mixture parameter estimation, data clustering, and feature weighting. The proposed algorithm, called Robust and Unsupervised Learning of Finite Generalized Dirichlet Mixture Models (RULe_GDM), exploits a property of the Generalized Dirichlet distributions that transforms the data to make the features independent and follow Beta distributions. Then, it learns optimal relevance weights for each feature within each cluster. This property makes RULe_GDM suitable for noisy and high-dimensional feature spaces. In addition, RULe_GDM associates two types of memberships with each data sample. The first one is the posterior probability and indicates how well a sample fits each estimated distribution. The second membership represents the degree of typicality and is used to identify and discard noise points and outliers. RULe_GDM minimizes one objective function which combines learning the two membership functions, distribution parameters, and the relevance weights for each feature within each distribution. We also extend our algorithm to find the optimal number of clusters in an unsupervised and efficient way by exploiting some properties of the possibilistic membership function. The performance of RULe_GDM is illustrated and compared to similar algorithms. We use synthetic data to illustrate its robustness to noisy and high dimensional features. We also compare our approach to other relevant algorithms using several standard data sets. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:35 / 54
页数:20
相关论文
共 50 条
  • [41] Simultaneous feature selection and clustering using mixture models
    Law, MHC
    Figueiredo, MAT
    Jain, AK
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) : 1154 - 1166
  • [42] A countably infinite mixture model for clustering and feature selection
    Bouguila, Nizar
    Ziou, Djemel
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 33 (02) : 351 - 370
  • [43] Unsupervised learning of Dirichlet process mixture models with missing data
    Zhang, Xunan
    Song, Shiji
    Zhu, Lei
    You, Keyou
    Wu, Cheng
    SCIENCE CHINA-INFORMATION SCIENCES, 2016, 59 (01) : 1 - 14
  • [44] A Bayesian non-parametric approach for automatic clustering with feature weighting
    Paul, Debolina
    Das, Swagatam
    STAT, 2020, 9 (01):
  • [45] A general adaptive unsupervised feature selection with auto-weighting
    Liao, Huming
    Chen, Hongmei
    Yin, Tengyu
    Yuan, Zhong
    Horng, Shi-Jinn
    Li, Tianrui
    NEURAL NETWORKS, 2025, 181
  • [46] A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity
    Bouguila, Nizar
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) : 1649 - 1664
  • [47] Ensemble clustering and feature weighting in time series data
    Bahramlou, Ainaz
    Hashemi, Massoud Reza
    Zali, Zeinab
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (15) : 16442 - 16478
  • [48] Ensemble clustering and feature weighting in time series data
    Ainaz Bahramlou
    Massoud Reza Hashemi
    Zeinab Zali
    The Journal of Supercomputing, 2023, 79 : 16442 - 16478
  • [49] Multi-feature weighting neighborhood density clustering
    Xu, Shuliang
    Feng, Lin
    Liu, Shenglan
    Zhou, Jian
    Qiao, Hong
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13) : 9545 - 9565
  • [50] Graph-based unsupervised feature selection and multiview clustering for microarray data
    Tripti Swarnkar
    Pabitra Mitra
    Journal of Biosciences, 2015, 40 : 755 - 767