Frequent Pattern Outlier Detection Without Exhaustive Mining

被引:12
作者
Giacometti, Arnaud [1 ]
Soulet, Arnaud [1 ]
机构
[1] Univ Francois Rabelais Tours, LI EA 6300, 3 Pl Jean Jaures, F-41029 Blois, France
来源
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II | 2016年 / 9652卷
关键词
D O I
10.1007/978-3-319-31750-2_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outlier detection consists in detecting anomalous observations from data. During the past decade, pattern-based outlier detection methods have proposed to mine all frequent patterns in order to compute the outlier factor of each transaction. This approach remains too expensive despite recent progress in pattern mining field. In this paper, we provide exact and approximate methods for calculating the frequent pattern outlier factor (FPOF) without extracting any pattern or by extracting a small sample. We propose an algorithm that returns the exact FPOF without mining any pattern. Surprisingly, it works in polynomial time on the size of the dataset. We also present an approximate method where the end-user controls the maximum error on the estimated FPOF. Experiments show the interest of both methods for very large datasets where exhaustive mining fails to provide the exact solution. The accuracy of our approximate method outperforms the baseline approach for a same budget in time or number of patterns.
引用
收藏
页码:196 / 207
页数:12
相关论文
共 12 条
[1]  
Agrawal R., P 20 INT C VERY LARG
[2]  
[Anonymous], 2008, LeGo
[3]  
Boley Mario, 2011, P 17 ACM SIGKDD INT, P582, DOI DOI 10.1145/2020408.2020500
[4]  
Chaoji Vineet, 2008, SADM, V1, P67, DOI DOI 10.1002/SAM.10004
[5]  
Giacometti Arnaud, 2014, Advances in Knowledge Discovery and Data Mining. 18th Pacific-Asia Conference (PAKDD 2014). Proceedings: LNCS 8443, P53, DOI 10.1007/978-3-319-06608-0_5
[6]  
Hawkins D.M, 1980, IDENTIFICATION OUTLI, V11
[7]  
He Zengyou, 2005, Computer Science and Information Systems, V2, P103, DOI DOI 10.2298/CSIS0501103H
[8]   Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data [J].
Koufakou, Anna ;
Secretan, Jimmy ;
Georgiopoulos, Michael .
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 29 (03) :697-725
[9]  
Liu B., 1998, INT C KNOWL DISC DAT
[10]   CPCQ: Contrast pattern based clustering quality index for categorical data [J].
Liu, Qingbao ;
Dong, Guozhu .
PATTERN RECOGNITION, 2012, 45 (04) :1739-1748