MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream

被引:12
作者
Cai, Saihua [1 ]
Li, Sicong [1 ]
Yuan, Gang [1 ]
Hao, Shangbo [1 ]
Sun, Ruizhi [1 ,2 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr, Sci Res Base Integrated Technol Precis Agr Anim H, Beijing 100083, Peoples R China
关键词
Outlier detection; Minimal infrequent itemset mining; Uncertain data stream; Deviation indices; Data mining; CONCEPT DRIFT DETECTION; FREQUENT PATTERNS; EFFICIENT;
D O I
10.1016/j.knosys.2019.105268
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Massive outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing outlier detection approaches were not suitable for uncertain data stream environment. In addition, many outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected outliers not coincide with the definition of outlier. Itemset-based outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based outlier detection approach called MiFI-Outlier is proposed to effectively detect the outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFIUDSM is proposed to mine the minimal infrequent itemsets (Mins) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of "item cap" and "support cap". In outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and outlier detection phase. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:22
相关论文
共 42 条
[1]  
Agrawal R., P 20 INT C VERY LARG, DOI DOI 10.1055/S-2007-996789
[2]  
[Anonymous], 1980, IDENTIFICATION OUTLI, DOI [DOI 10.1007/978-94-015-3994-4, 10.1007/978-94-015-3994-4]
[3]  
[Anonymous], [No title captured]
[4]   An efficient algorithm for distributed density-based outlier detection on big data [J].
Bai, Mei ;
Wang, Xite ;
Xin, Junchang ;
Wang, Guoren .
NEUROCOMPUTING, 2016, 181 :19-28
[5]   Infrequent Weighted Itemset Mining Using Frequent Pattern Growth [J].
Cagliero, Luca ;
Garza, Paolo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (04) :903-915
[6]   Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream [J].
Cai, Saihua ;
Sun, Ruizhi ;
Hao, Shangbo ;
Li, Sicong ;
Yuan, Gang .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11) :6619-6639
[7]  
Cai SH, 2019, CHINA COMMUN, V16, P83, DOI 10.23919/JCC.2019.10.006
[8]   Mining Recent Maximal Frequent Itemsets Over Data Streams with Sliding Window [J].
Cai, Saihua ;
Hao, Shangbo ;
Sun, Ruizhi ;
Wu, Gang .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (06) :961-969
[9]   Abnormal Detecting over Data Stream Based on Maximal Pattern Mining Technology [J].
Cai, Saihua ;
Sun, Ruizhi ;
Li, Jiayao ;
Deng, Chao ;
Li, Sicong .
COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2018, 2019, 917 :371-385
[10]   Continuous Outlier Monitoring on Uncertain Data Streams [J].
Cao, Ke-Yan ;
Wang, Guo-Ren ;
Han, Dong-Hong ;
Ding, Guo-Hui ;
Wang, Ai-Xia ;
Shi, Ling-Xu .
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (03) :436-448