Visual analytics for the clustering capability of data

被引:0
作者
LU ZhiMao [1 ,2 ]
LIU Chen [1 ]
ZHANG Qi [1 ]
ZHANG ChunXiang [3 ]
FAN DongMei [1 ]
YANG Peng [1 ]
机构
[1] Pattern Recognition and Natural Computation Laboratory,Harbin Engineering University
[2] School of Computer Science and Technology,Dalian University of Technology
[3] School of Software,Harbin University of Science and Technology
基金
中国国家自然科学基金;
关键词
data mining; clustering analysis; visual analysis; minimum distance spectrum; nearest neighbor spectrum; outliers;
D O I
暂无
中图分类号
TP391.41 []; TP18 [人工智能理论];
学科分类号
080203 ; 081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering analysis is an unsupervised method to find hidden structures in datasets and has been widely used in various fields.However,it is always difficult for users to understand,evaluate,and explain the clustering results in the spaces with dimension greater than three.Although high-dimensional visualization of clustering technology can express clustering results well,it still has significant limitations.In this paper,a visualization cluster analysis method based on the minimum distance spectrum(MinDS) is proposed,aimed at reducing the problems of clustering multidimensional datasets.First,the concept of MinDS is defined based on the distance between high-dimensional data.MinDS can map any dataset from high-dimensional space to a lower dimension to determine whether the data set is separable.Next,a clustering method which can automatically determine the number of categories is designed based on MinDS.This method is not only able to cluster a dataset with clear boundaries,but can also cluster the dataset with fuzzy boundaries through the edge corrosion strategy based on the energy of each data point.In addition,strategies for removing noise and identifying outliers are designed to clean datasets according to the characteristics of MinDS.The experimental results presented validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach is simple,stable,and efficient,and can achieve multidimensional visualization cluster analysis of complex datasets.
引用
收藏
页码:131 / 144
页数:14
相关论文
共 7 条
  • [1] An unsupervised grid-based approach for clustering analysis[J]. YUE ShiHong1, WANG JeenShing2, TAO Gao1 & WANG HuaXiang1 1School of Electrical Engineering and Automation, Tianjin Key Laboratory of Process Measurement and Control, Tianjin University, Tianjin 300072, China;2Department of Electrical Engineering, National Cheng Kung University, Tainan 701, China.Science China(Information Sciences). 2010(07)
  • [2] iRaster: A novel information visualization tool to explore spatiotemporal patterns in multiple spike trains[J] . J. Somerville,L. Stuart,E. Sernagor,R. Borisyuk.Journal of Neuroscience Methods . 2010
  • [3] A general grid-clustering approach
    Yue, Shihong
    Wei, Miaomiao
    Wang, Jeen-Shing
    Wang, Huaxiang
    [J]. PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1372 - 1384
  • [4] Automatic subspace clustering of high dimensional data
    Agrawal, R
    Gehrke, J
    Gunopulos, D
    Raghavan, P
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (01) : 5 - 33
  • [5] Tree visualization with tree-maps[J] . Ben Shneiderman.ACM Transactions on Graphics (TOG) . 1992 (1)
  • [6] Visualizing n -dimensional virtual worlds with n -vision[J] . S. K. Feiner,Clifford Beshers.ACM SIGGRAPH Computer Graphics . 1990 (2)
  • [7] Exploring N-Dimensional Databases .2 J. LeBlanc,M.O. Ward,N. Wittels. Proc IEEE Visualization ‘1990 . 1990