Two-phase clustering process for outliers detection

被引:230
作者
Jiang, MF [1 ]
Tseng, SS [1 ]
Su, CM [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 30050, Taiwan
关键词
outliers; k-means clustering; two-phase clustering; MST;
D O I
10.1016/S0167-8655(00)00131-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a two-phase clustering algorithm for outliers detection is proposed. Tn;e first modify the traditional k-means algorithm in Phase 1 by using a heuristic "if one new input pattern is far enough away from all clusters centers, then assign it as a new cluster center". It results that the data points in the same cluster may be most likely all outliers or all non-outliers. And then we construct a minimum spanning tree (MST) in Phase 2 and remove the longest edge. The small clusters, the tree with less number of nodes, are selected and regarded as outlier. The experimental results show that our process works well. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:691 / 700
页数:10
相关论文
共 17 条
[1]  
BALL GH, 1964, P INT C MICR CIRC TH
[2]  
CEDNO AA, 1997, COMPUT IND ENG, V33, P225
[3]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[4]  
DUBES R, 1987, ALGORITHMS CLUSTER D
[5]  
FORGY EW, 1965, BIOMETRICS, V21, P768
[6]   PROCEDURES FOR THE IDENTIFICATION OF MULTIPLE OUTLIERS IN LINEAR-MODELS [J].
HADI, AS ;
SIMONOFF, JS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (424) :1264-1272
[7]  
Hart P.E., 1973, Pattern recognition and scene analysis
[8]  
HYVARINEN L, 1963, BIT, V2, P83
[9]  
Jain K, 1988, Algorithms for clustering data
[10]  
JIANG MF, 1996, P INT C SYST MAN CYB