Minimal weighted infrequent itemset mining-based outlier detection approach on uncertain data stream

被引:0
作者
Saihua Cai
Ruizhi Sun
Shangbo Hao
Sicong Li
Gang Yuan
机构
[1] China Agricultural University,College of Information and Electrical Engineering
[2] Ministry of Agriculture,Key Laboratory of Agricultural Information Acquisition Technology
来源
Neural Computing and Applications | 2020年 / 32卷
关键词
Minimal infrequent itemset mining; Outlier detection; Uncertain weighted data stream; Deviation index;
D O I
暂无
中图分类号
学科分类号
摘要
Outliers are a critical factor that affects the accuracy of data-based predictions and some other data-based processing; thus, outliers must be effectively detected as soon as possible to improve the credibility of the data. In recent years, massive outlier detection approaches have been proposed for static data and precise data; however, the uncertainty and weight information of each item was not considered in this prior work. Moreover, traditional outlier detection approaches only take the deviation degree of each data element as the standard for determining outliers; therefore, the detected outliers do not fit the definition of an outlier (i.e., rarely appearing and different from most of the other data). Aimed at these problems, a minimal weighted infrequent itemset mining-based outlier detection approach that can be applied to an uncertain data stream, called MWIFIM–OD–UDS, is proposed in this paper to effectively detect implicit outliers, which have a rarely occurring frequency, uncertainty and a certain weight of the itemset, while the characteristics of the data stream are considered. In particular, a matrix structure-based approach that is called MWIFIM–UDS is proposed to mine the minimal weighted infrequent itemsets (MWiFIs) from an uncertain data stream, and then, the MWIFIM–OD–UDS method is proposed based on the mined MWiFIs and the designed deviation indexes. Experimental results show that the proposed MWIFIM–OD–UDS method outperforms the frequent itemset mining-based outlier detection methods, FindFPOF and LFP, in terms of its runtime and detection accuracy.
引用
收藏
页码:6619 / 6639
页数:20
相关论文
共 124 条
[21]  
Shi LX(2016)Highway traffic accident prediction using VDS big data analysis J Supercomput 72 2815-2831
[22]  
Cuzzocrea A(2000)Efficient algorithms for mining outliers from large data sets ACM SIGMOD Record 29 427-438
[23]  
Leung CKS(2016)High utility pattern mining over data streams with sliding window technique Expert Syst Appl 57 214-231
[24]  
MacKinnon RK(2017)A local density-based approach for outlier detection Neurocomputing 241 171-180
[25]  
Han Jiawei(2014)A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets Data Min Knowl Discov 28 773-807
[26]  
Pei Jian(2013)A new method for mining frequent weighted itemsets based on WIT-trees Expert Syst Appl 40 1256-1264
[27]  
Yin Yiwen(2010)Outlier detection over sliding windows for probabilistic data streams J Comput Sci Technol 25 389-400
[28]  
He ZY(2004)WAR: weighted association rules for item intensities Knowl Inf Syst 6 203-229
[29]  
Xu XF(2012)Probabilistic distance based abnormal pattern detection in uncertain series data Knowl-Based Syst 36 182-190
[30]  
Huang JZ(2006)A false negative approach to mining frequent itemsets from high speed transactional data streams Inf Sci 176 1986-2015