A concept lattice based outlier mining method in low-dimensional subspaces

被引:48
作者
Zhang, Jifu [1 ]
Jiang, Yiyong [1 ]
Chang, Kai H. [2 ]
Zhang, Sulan [1 ]
Cai, Jianghui [1 ]
Hu, Lihua [1 ]
机构
[1] Taiyuan Univ Sci & Technol, Sch Comp Sci & Technol, Taiyuan 030024, Peoples R China
[2] Auburn Univ, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
关键词
Outliers; Concept lattice; Sparsity coefficient; Density coefficient; Intent reduction; ALGORITHMS;
D O I
10.1016/j.patrec.2009.07.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional outlier mining methods identify outliers from a global point of view. It is usually difficult to find deviated data points in low-dimensional subspaces using these methods. The concept lattice, due to its straight-forwardness, conciseness and completeness in knowledge expression, has become an effective tool for data analysis and knowledge discovery. In this paper, a concept lattice based outlier mining algorithm (CLOM) for low-dimensional subspaces is proposed, which treats the intent of every concept lattice node as a subspace. First, sparsity and density coefficients, which measure outliers in low-dimensional subspaces. are defined and discussed. Second, the intent of a concept lattice node is regarded as a subspace, and sparsity subspaces are identified based on a predefined sparsity coefficient threshold. At this stage, whether the intent of any ancestor node of a sparsity subspace is a density subspace is identified based on a predefined density coefficient threshold. If it is a density subspace. then the objects in the extent of the node whose intent is a sparsity subspace are defined as outliers. Experimental results on a star spectral database show that CLOM is effective in mining outliers in low-dimensional subspaces. The accuracy of the results is also greatly improved. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1434 / 1439
页数:6
相关论文
共 15 条
[1]  
AGARWAL CC, 2005, VLDB J, V14, P211
[2]  
ANAMIKA G, 2005, LECT NOTES ARTIF INT, V3587, P11
[3]  
ARNING A, 1996, INT C KNOWL DISC DAT, P164
[4]  
Barnett V., 1994, Wiley series in probability and mathematical statistics applied probability and statistics, P224
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]   INCREMENTAL CONCEPT-FORMATION ALGORITHMS BASED ON GALOIS (CONCEPT) LATTICES [J].
GODIN, R ;
MISSAOUI, R ;
ALAOUI, H .
COMPUTATIONAL INTELLIGENCE, 1995, 11 (02) :246-267
[7]  
Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00009-5]
[8]  
HU KY, 2000, CHINESE J SOFTWARE, V11, P1478
[9]  
Knorr E. M., 1998, Proceedings of the Twenty-Fourth International Conference on Very-Large Databases, P392
[10]   A fast algorithm for building lattices [J].
Nourine, L ;
Raynaud, O .
INFORMATION PROCESSING LETTERS, 1999, 71 (5-6) :199-204