An efficient semi-supervised graph based clustering

被引:7
作者
Viet-Vu Vu [1 ]
机构
[1] Vietnam Natl Univ, Informat Technol Inst, 144 Xuan Thuy St, Hanoi, Vietnam
关键词
Semi-supervised clustering; seed; k-nearest neighbors graph; ALGORITHM; SELECTION; NEIGHBORS;
D O I
10.3233/IDA-163296
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the most important tools in data mining and knowledge discovery from data. In recent years, semi-supervised clustering, that integrates side information (seeds or constraints) in the clustering process, has been known as a good strategy to boost clustering results. In this article, a new semi-supervised graph based clustering (SSGC) is presented. Using a graph of the k-nearest neighbors and a measure of local density for the similarity between vertex, SSGC integrates the seeds in the process of building clusters and hence can improve the quality of clustering. More over, SSGC can deal with noise, differential density of data, and uses only one parameter (i.e. the number of nearest neighbors). Experiments conducted on real data sets from UCI show that our method can produce good clustering results compared with the related techniques such as semi-supervised density based clustering (SSDBSCAN). Moreover, the computational cost of SSGC is much less than that of SSDBSCAN.
引用
收藏
页码:297 / 307
页数:11
相关论文
共 17 条
  • [1] CECM: Constrained evidential C-means algorithm
    Antoine, V.
    Quost, B.
    Masson, M. -H.
    Denoeux, T.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (04) : 894 - 914
  • [2] Antoine V, 2014, JOINT INT CONF SOFT, P706, DOI 10.1109/SCIS-ISIS.2014.7044676
  • [3] Asuncion A., 2015, UCI machine learning repository
  • [4] Basu S., 2002, P 19 INT C MACH LEAR, P281
  • [5] Basu S, 2009, CH CRC DATA MIN KNOW, P1
  • [6] Partially supervised clustering for image segmentation
    Bensaid, AM
    Hall, LO
    Bezdek, JC
    Clarke, LP
    [J]. PATTERN RECOGNITION, 1996, 29 (05) : 859 - 871
  • [7] Bohm C., 2008, P 11 INT C EXT DAT T, P440, DOI [DOI 10.1145/1353343.1353398, 10.1145/1353343.1353398]
  • [8] Ertöz L, 2003, SIAM PROC S, P47
  • [9] Data clustering: 50 years beyond K-means
    Jain, Anil K.
    [J]. PATTERN RECOGNITION LETTERS, 2010, 31 (08) : 651 - 666
  • [10] CLUSTERING USING A SIMILARITY MEASURE BASED ON SHARED NEAR NEIGHBORS
    JARVIS, RA
    PATRICK, EA
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1973, C-22 (11) : 1025 - 1034