Efficient strategies for incremental mining of frequent closed itemsets over data streams

被引:12
作者
Liu, Junqiang [1 ,2 ]
Ye, Zhousheng [1 ]
Yang, Xiangcai [1 ]
Wang, Xueling [1 ]
Shen, Linjie [2 ]
Jiang, Xiaoning [1 ,2 ]
机构
[1] Zhejiang Gongshang Univ, Sch Informat & Elect Engn, Hangzhou 310018, Peoples R China
[2] Zhejiang Gongshang Univ, Sussex Artificial Intelligence Inst, Hangzhou 310018, Peoples R China
基金
中国国家自然科学基金;
关键词
Data streams; Closed itemsets; Frequent itemsets; Data mining; Knowledge discovery; HIGH-UTILITY ITEMSETS; ALGORITHM; PATTERNS;
D O I
10.1016/j.eswa.2021.116220
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mining frequent closed itemsets over data streams is an important data mining problem. Mining data streams is more challenging than mining static data because of the nature of data streams, including high arrival rate, massive volume of incoming data, and concept drift. The existing algorithms for mining frequent closed itemsets over data streams suffer from scalability and efficiency bottlenecks. This paper proposes a novel algorithm for mining frequent closed itemsets over data streams both for the sliding window model and for the landmark model. An indexed prefix closed itemset tree is proposed for compressing all closed itemsets and for quick searching of closed itemsets, and novel search strategies are proposed to prune the search space in updating the set of closed itemsets. The proposed algorithm outperforms the state-of-the-art intersection-based algorithms, CICLAD, ConPatSet, and CloStream, by several times to 2 orders of magnitude in efficiency, and also outperforms the state-of-the-art pattern enumeration algorithm, Moment, by up to 2 orders of magnitude over data streams with large windows and sparse data streams. The proposed algorithm is also superior in scalability.
引用
收藏
页数:13
相关论文
共 36 条
[1]  
Agrawal R., 1994, 20 INT C VERY LARGE, P487
[2]  
[Anonymous], 2000, ACM SIGMOD workshop on research issues in data mining and knowledge discovery
[3]   Erasable pattern mining based on tree structures with damped window over data streams [J].
Baek, Yoonji ;
Yun, Unil ;
Kim, Heonho ;
Nam, Hyoju ;
Lee, Gangin ;
Yoon, Eunchul ;
Vo, Bay ;
Lin, Jerry Chun-Wei .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 94 (94)
[4]   An efficient pattern growth approach for mining fault tolerant frequent itemsets [J].
Bashir, Shariq .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 143
[5]   A novel approach for mining maximal frequent patterns [J].
Bay Vo ;
Sang Pham ;
Tuong Le ;
Deng, Zhi-Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 :178-186
[6]   Catch the moment: maintaining closed frequent itemsets over a data stream sliding window [J].
Chi, Yun ;
Wang, Haixun ;
Yu, Philip S. ;
Muntz, Richard R. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (03) :265-294
[7]   Mining top-k high-utility itemsets from a data stream under sliding window model [J].
Dawar, Siddharth ;
Sharma, Veronica ;
Goyal, Vikram .
APPLIED INTELLIGENCE, 2017, 47 (04) :1240-1255
[8]   Mining high occupancy itemsets [J].
Deng, Zhi-Hong .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 102 :222-229
[9]   PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning [J].
Deng, Zhi-Hong ;
Lv, Sheng-Long .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) :5424-5432
[10]  
Fan MH, 2016, INT CONF CLOUD COMPU, P434, DOI 10.1109/CCIS.2016.7790298