Mining frequent items in the time fading model

被引:17
作者
Cafaro, Massimo [1 ]
Pulimeno, Marco [1 ]
Epicoco, Italo [1 ]
Aloisio, Giovanni [1 ]
机构
[1] Univ Salento, Lecce, Italy
关键词
Frequent items; Time fading model; Cash register model; Sketches; FINDING FREQUENT; ELEMENTS; PARALLEL; STREAMS;
D O I
10.1016/j.ins.2016.07.077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present FDCMSS, a new sketch-based algorithm for mining frequent items in data streams. The algorithm cleverly combines key ideas borrowed from forward decay, the Count-Min and the Space Saving algorithms. It works in the time fading model, mining data streams according to the cash register model. We formally prove its correctness and show, through extensive experimental results, that our algorithm outperforms lambda-HCount, a recently developed algorithm, with regard to speed, space used, precision attained and error committed on both synthetic and real datasets. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:221 / 238
页数:18
相关论文
共 35 条
[1]  
[Anonymous], 2006, LECT NOTES COMPUTER
[2]  
[Anonymous], 2009, PVLDB
[3]  
[Anonymous], 2016, FREQUENT ITEMSET MIN
[4]  
Beyer K, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P359, DOI 10.1145/304181.304214
[5]  
Brin S., 1997, SIGMOD Record, V26, P255, DOI [10.1145/253262.253327, 10.1145/253262.253325]
[6]   A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution [J].
Cafaro, Massimo ;
Pulimeno, Marco ;
Tempesta, Piergiulio .
INFORMATION SCIENCES, 2016, 329 :1-19
[7]   Finding frequent items in parallel [J].
Cafaro, Massimo ;
Tempesta, Piergiulio .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (15) :1774-1788
[8]  
Charikar M, 2002, LECT NOTES COMPUT SC, V2380, P693
[9]   Mining frequent items in data stream using time fading model [J].
Chen, Ling ;
Mei, Qingling .
INFORMATION SCIENCES, 2014, 257 :54-69
[10]   An improved data stream summary: the count-min sketch and its applications [J].
Cormode, G ;
Muthukrishnan, S .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2005, 55 (01) :58-75