Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream

被引:8
作者
Cai, Saihua [1 ]
Sun, Ruizhi [1 ,2 ]
Hao, Shangbo [1 ]
Li, Sicong [1 ]
Yuan, Gang [1 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, 17 Tsinghua East Rd, Beijing 100083, Peoples R China
[2] Minist Agr, Key Lab Agr Informat Acquisit Technol, Beijing 100083, Peoples R China
关键词
Minimal infrequent itemset mining; Outlier detection; Uncertain weighted data stream; Deviation index; FREQUENT ITEMSETS; EFFICIENT; PATTERNS; ALGORITHMS; DATABASES;
D O I
10.1007/s00521-018-3876-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Outliers are a critical factor that affects the accuracy of data-based predictions and some other data-based processing; thus, outliers must be effectively detected as soon as possible to improve the credibility of the data. In recent years, massive outlier detection approaches have been proposed for static data and precise data; however, the uncertainty and weight information of each item was not considered in this prior work. Moreover, traditional outlier detection approaches only take the deviation degree of each data element as the standard for determining outliers; therefore, the detected outliers do not fit the definition of an outlier (i.e., rarely appearing and different from most of the other data). Aimed at these problems, a minimal weighted infrequent itemset mining-based outlier detection approach that can be applied to an uncertain data stream, called MWIFIM-OD-UDS, is proposed in this paper to effectively detect implicit outliers, which have a rarely occurring frequency, uncertainty and a certain weight of the itemset, while the characteristics of the data stream are considered. In particular, a matrix structure-based approach that is called MWIFIM-UDS is proposed to mine the minimal weighted infrequent itemsets (MWiFIs) from an uncertain data stream, and then, the MWIFIM-OD-UDS method is proposed based on the mined MWiFIs and the designed deviation indexes. Experimental results show that the proposed MWIFIM-OD-UDS method outperforms the frequent itemset mining-based outlier detection methods, FindFPOF and LFP, in terms of its runtime and detection accuracy.
引用
收藏
页码:6619 / 6639
页数:21
相关论文
共 49 条
[1]   Rare itemset mining [J].
Adda, Mehdi ;
Wu, Lei ;
Feng, Yi .
ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, :73-+
[2]  
Aggarwal C.C., 2013, Managing and Mining Sensor Data
[3]  
Agrawal R., P 20 INT C VERY LARG
[4]   Single-pass incremental and interactive mining for weighted frequent patterns [J].
Ahmed, Chowdhury Farhan ;
Tanbeer, Syed Khairuzzaman ;
Jeong, Byeong-Soo ;
Lee, Young-Koo ;
Choi, Ho-Jin .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) :7976-7994
[5]  
[Anonymous], 2005, Computer Science and Information Systems, DOI DOI 10.2298/CSIS0501103H
[6]   Anomaly Detection Based on LRD Behavior Analysis of Decomposed Control and Data Planes Network Traffic Using SOSS and FARIMA Models [J].
AsSadhan, Basil ;
Zeb, Khan ;
Al-Muhtadi, Jalal ;
Alshebeili, Saleh .
IEEE ACCESS, 2017, 5 :13501-13519
[7]   An efficient algorithm for distributed density-based outlier detection on big data [J].
Bai, Mei ;
Wang, Xite ;
Xin, Junchang ;
Wang, Guoren .
NEUROCOMPUTING, 2016, 181 :19-28
[8]   A new method for mining Frequent Weighted Itemsets based on WIT-trees [J].
Bay Vo ;
Coenen, Frans ;
Bac Le .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (04) :1256-1264
[9]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[10]   Infrequent Weighted Itemset Mining Using Frequent Pattern Growth [J].
Cagliero, Luca ;
Garza, Paolo .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (04) :903-915