MiFI-Outlier: Minimal infrequent itemset-based outlier detection approach on uncertain data stream

被引:12
作者
Cai, Saihua [1 ]
Li, Sicong [1 ]
Yuan, Gang [1 ]
Hao, Shangbo [1 ]
Sun, Ruizhi [1 ,2 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr, Sci Res Base Integrated Technol Precis Agr Anim H, Beijing 100083, Peoples R China
关键词
Outlier detection; Minimal infrequent itemset mining; Uncertain data stream; Deviation indices; Data mining; CONCEPT DRIFT DETECTION; FREQUENT PATTERNS; EFFICIENT;
D O I
10.1016/j.knosys.2019.105268
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Massive outlier detection approaches have been proposed for static datasets in the past twenty years, and they have acquired good achievements. In real life, uncertain data stream is more and more common, but most existing outlier detection approaches were not suitable for uncertain data stream environment. In addition, many outlier detection approaches have not considered the appearing frequency of each element, which resulted the detected outliers not coincide with the definition of outlier. Itemset-based outlier detection approaches provided a good solution for this problem, and they have got more attentions in these years. In this paper, a novel two-step minimal infrequent itemset-based outlier detection approach called MiFI-Outlier is proposed to effectively detect the outliers from uncertain data stream. In itemset mining phase, a matrix-based method called MiFIUDSM is proposed to mine the minimal infrequent itemsets (Mins) from uncertain data stream, and then an improved approach called MiFI-UDSM* is proposed for more effectively mining these minimal infrequent itemsets using the ideas of "item cap" and "support cap". In outlier detection phase, based on the mined MiFIs, three deviation indices including minimal infrequent itemset deviation index (MiFIDI), similarity deviation index (SDI) and transaction deviation index (TDI) are defined to measure the deviation degree of each transaction, and then the MiFI-Outlier is used to identify the outliers from uncertain data stream. Several experimental studies are conducted on public datasets and synthetic datasets, and the results show that the proposed approaches outperform in infrequent itemset mining phase and outlier detection phase. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:22
相关论文
共 42 条
[41]   A false negative approach to mining frequent itemsets from high speed transactional data streams [J].
Yu, Jeffrey Xu ;
Chong, Zhihong ;
Lu, Hongjun ;
Zhang, Zhenjie ;
Zhou, Aoying .
INFORMATION SCIENCES, 2006, 176 (14) :1986-2015
[42]   Damped window based high average utility pattern mining over data streams [J].
Yun, Unil ;
Kim, Donggyu ;
Yoon, Eunchul ;
Fujita, Hamido .
KNOWLEDGE-BASED SYSTEMS, 2018, 144 :188-205