Automatic PAM clustering algorithm for outlier detection

被引:9
作者
Lei, Dajiang [1 ]
Zhu, Qingsheng [1 ]
Chen, Jun [1 ]
Lin, Hai [1 ]
Yang, Peng [1 ]
机构
[1] College of Computer, Chongqing University, Chongqing
关键词
Cluster validation; Outlier detection; PAM clustering algorithm; Subtractive clustering;
D O I
10.4304/jsw.7.5.1045-1051
中图分类号
学科分类号
摘要
In this paper, we propose an automatic PAM (Partition Around Medoids) clustering algorithm for outlier detection. The proposed methodology comprises two phases, clustering and finding outlying score. During clustering phase we automatically determine the number of clusters by combining PAM clustering algorithm and a specific cluster validation metric, which is vital to find a clustering solution that best fits the given data set, especially for PAM clustering algorithm. During finding outlier scores phase we decide outlying score of data instance corresponding to the cluster structure. Experiments on different datasets show that the proposed algorithm has higher detection rate go with lower false alarm rate comparing with the state of art outlier detection techniques, and it can be an effective solution for detecting outliers. © 2012 ACADEMY PUBLISHER.
引用
收藏
页码:1045 / 1051
页数:6
相关论文
共 21 条
[1]  
Hawkins D.M., Identification of Outliers, (1980)
[2]  
Yang P., Zhu Q.S., Finding key attribute subset in dataset for outlier detection, Knowledge-Based Systems, 24, pp. 269-274, (2011)
[3]  
Dianmin Y., Xiaodan W., Yunfeng W., Yue L., Chao-Hsien C., A Survey of Outlier Detection Methods in Network Anomaly Identification, Computer Journal, 54, pp. 570-588, (2011)
[4]  
Bhaduri K., Stefanski M.D., Srivastava A.N., Privacy-Preserving Outlier Detection Through Random Nonlin-ear Data Distortion, IEEE Transactions On Systems Man and Cybernetics Part B-Cybernetics, 41, pp. 260-272, (2011)
[5]  
Hido S., Tsuboi Y., Kashima H., Sugiyama M., Kanamori T., Statistical outlier detection using direct density ratio estimation, Knowledge and Information Systems, 26, pp. 309-336, (2011)
[6]  
Marroquin-Guerra S.G., Velasco-Tapia F., Diaz- L., Gonzalez, Statistical evaluation of geochemical reference materials from the Centre de Recherches Petrographiques et Geochimiques (France) by applying a schema for the detection and elimination of discordant outlier values, Revista Mexicana De Ciencias Geologicas, 26, pp. 530-542, (2009)
[7]  
Zhang Y., Yang S., Wang Y., LDBOD: A novel local distribution based outlier detector, Pattern Recogn. Lett, 29, pp. 967-976, (2008)
[8]  
Ruts I., Rousseeuw P.J., Computing depth contours of bivariate point clouds, Computational Statistics & Data Analysis, 23, pp. 153-168, (1996)
[9]  
Knorr E.M., Ng R.T., Algorithms for Mining Distance-Based Outliers in Large Datasets, Presented At the Proceedings of the 24rd International Conference On Very Large Data Bases, (1998)
[10]  
Ramaswamy S., Rastogi R., Shim K., Efficient algorithms for mining outliers from large data sets, Sigmod Record, 29, pp. 427-438, (2000)