Semi-supervised projected model-based clustering

被引:0
作者
Luis Guerra
Concha Bielza
Víctor Robles
Pedro Larrañaga
机构
[1] Universidad Politécnica de Madrid,Computational Intelligence Group, Departamento de Inteligencia Artificial, Facultad de Informática
[2] Universidad Politécnica de Madrid,Departamento de Arquitectura y Tecnología de Sistemas Informáticos, Facultad de Informática
来源
Data Mining and Knowledge Discovery | 2014年 / 28卷
关键词
Clustering; Subspaces; Semi-supervised; Model-based; Partially labeled data;
D O I
暂无
中图分类号
学科分类号
摘要
We present an adaptation of model-based clustering for partially labeled data, that is capable of finding hidden cluster labels. All the originally known and discoverable clusters are represented using localized feature subset selections (subspaces), obtaining clusters unable to be discovered by global feature subset selection. The semi-supervised projected model-based clustering algorithm (SeSProC) also includes a novel model selection approach, using a greedy forward search to estimate the final number of clusters. The quality of SeSProC is assessed using synthetic data, demonstrating its effectiveness, under different data conditions, not only at classifying instances with known labels, but also at discovering completely hidden clusters in different subspaces. Besides, SeSProC also outperforms three related baseline algorithms in most scenarios using synthetic and real data sets.
引用
收藏
页码:882 / 917
页数:35
相关论文
共 84 条
  • [1] Aggarwal C(2000)Finding generalized projected clusters in high dimensional spaces SIGMOD Rec 29 70-81
  • [2] Yu P(1999)Fast algorithms for projected clustering SIGMOD Rec 28 61-72
  • [3] Aggarwal C(1998)Automatic subspace clustering of high dimensional data for data mining applications SIGMOD Rec 27 94-105
  • [4] Procopiuc C(2004)Class discovery and classification of tumor samples using mixture modeling of gene expression data, a unified approach Bioinformatics 20 2545-2552
  • [5] Wolf J(2010)Model based subspace clustering of non-Gaussian data Neurocomputing 73 1730-1739
  • [6] Yu P(2005)Learning from labeled and unlabeled data: an empirical study across techniques and domains J Artif Intell Res 23 331-366
  • [7] Park J(2012)Model-based method for projective clustering IEEE Trans Knowl Data Eng 24 1291-1305
  • [8] Agrawal R(1977)Maximum likelihood from incomplete data via the EM algorithm J R Stat Soc 39 1-38
  • [9] Gehrke J(1998)How many clusters? Which clustering method? Answers via model-based cluster analysis Comput J 41 578-588
  • [10] Gunopulos D(2004)Clustering objects on subsets of attributes J R Stat Soc 66 815-849