DUSC: Dimensionality unbiased subspace clustering

被引:36
作者
Assent, Ira [1 ]
Krieger, Ralph [1 ]
Mueller, Emmanuel [1 ]
Seidl, Thomas [1 ]
机构
[1] RWTH Aaahen Univ, Data Management & Data Explorat Grp, Aachen, Germany
来源
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2007年
关键词
D O I
10.1109/ICDM.2007.49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To gain insight into today's large data resources, data mining provides automatic aggregation techniques. Clustering aims at grouping data such that objects within groups are similar while objects in different groups are dissimilar. In scenarios with many attributes or with noise, clusters are often hidden in subspaces of the data and do not show up in the full dimensional space. For these applications, subspace clustering methods aim at detecting clusters in any subspace. Existing subspace clustering approaches fall prey to an effect we call dimensionality bias. As dimensionality of subspaces varies, approaches which do not take this effect into account fail to separate clusters from noise. We give a formal definition of dimensionality bias and analyze consequences for subspace clustering. A dimensionality unbiased subspace clustering (DUSC) definition based on statistical foundations is proposed In thorough experiments on synthetic and real world data, we show that our approach outperforms existing subspace clustering algorithms.
引用
收藏
页码:409 / 414
页数:6
相关论文
共 15 条
  • [1] Agrawal R., 1998, Proc. of ACM SIGMOD, P94
  • [2] ASSENT I, 2006, SSTDM ICDM
  • [3] Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
  • [4] Dehnad K., 2012, Density estimation for statistics and data analysis, V29, P495, DOI [10.1201/9781315140919, 10.1080/00401706.1987.10488295]
  • [5] Ester M., 1996, Presented at the proceedings of the second international conference on knowledge discovery and data mining KDD, V2, P226, DOI DOI 10.5555/3001460.3001507
  • [6] Hinneburg A., 1998, Proceedings Fourth International Conference on Knowledge Discovery and Data Mining, P58
  • [7] Kailing K, 2004, SIAM PROC S, P246
  • [8] Kailing K, 2003, LECT NOTES ARTIF INT, V2838, P241
  • [9] Keogh E. J., 2006, VLDB, P882
  • [10] A generic framework for efficient subspace clustering of high-dimensional data
    Kriegel, HP
    Kröger, P
    Renz, M
    Wurst, S
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 250 - 257