CMSS: Sketching based reliable tracking of large network flows

被引:9
作者
Cafaro, Massimo [1 ]
Epicoco, Italo [2 ]
Pulimeno, Marco [2 ]
机构
[1] Univ Salento, Dept Engn Innovat, Lecce, Italy
[2] Univ Salento, Lecce, Italy
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2019年 / 101卷
关键词
Frequent items; Sketches; TOP-K ELEMENTS; FINDING FREQUENT; MINING FREQUENT; PARALLEL; STREAMS; MODEL;
D O I
10.1016/j.future.2019.07.031
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reliably tracking large network flows in order to determine so-called elephant flows, also known as heavy hitters or frequent items, is a common data mining task. Indeed, this kind of information is crucial in many different contexts and applications. Since storing all of the traffic is impossible, owing to the fact that flows arrive as an unbounded, infinite length stream, many different algorithms have been designed for traffic measurement using only a limited amount of memory. COUNTMAX is a recently published algorithm that solves this problem approximately by combining ideas from the COUNTMIN and MISRAGRIES algorithms. In this paper we introduce CMSS which cleverly combines ideas from the COUNTMIN and SPACE SAVING algorithms. We compare CMSS and COUNTMAX on both synthetic and real datasets. The experimental results show that both algorithms achieve 100% recall, and can retrieve the frequent items without false negatives even with a limited amount of budgeted memory. Regarding the precision, CMSS proves to be robust and independent from all of the parameters, whilst COUNTMAX is severely affected by false positives. Finally, both algorithms provide, in practice, the same frequency estimation quality. It follows that CMSS outperforms COUNTMAX with regard to overall accuracy whilst providing the same frequency estimation quality. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:770 / 784
页数:15
相关论文
共 68 条
[11]   Mining frequent items in unstructured P2P networks [J].
Cafaro, Massimo ;
Epicoco, Italo ;
Pulimeno, Marco .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 95 :1-16
[12]   Parallel mining of time-faded heavy hitters [J].
Cafaro, Massimo ;
Pulimeno, Marco ;
Epicoco, Italo .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 96 :115-128
[13]   Parallel space saving on multi- and many-core processors [J].
Cafaro, Massimo ;
Pulimeno, Marco ;
Epicoco, Italo ;
Aloisio, Giovanni .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07)
[14]   On Frequency Estimation and Detection of Frequent Items in Time Faded Streams [J].
Cafaro, Massimo ;
Epicoco, Italo ;
Pulimeno, Marco ;
Aloisio, Giovanni .
IEEE ACCESS, 2017, 5 :24078-24093
[15]   CUDA Based Parallel Implementations of Space-Saving on a GPU [J].
Cafaro, Massimo ;
Epicoco, Italo ;
Aloisio, Giovanni ;
Pulimeno, Marco .
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, :707-714
[16]   Mining frequent items in the time fading model [J].
Cafaro, Massimo ;
Pulimeno, Marco ;
Epicoco, Italo ;
Aloisio, Giovanni .
INFORMATION SCIENCES, 2016, 370 :221-238
[17]   A parallel space saving algorithm for frequent items and the Hurwitz zeta distribution [J].
Cafaro, Massimo ;
Pulimeno, Marco ;
Tempesta, Piergiulio .
INFORMATION SCIENCES, 2016, 329 :1-19
[18]   Finding frequent items in parallel [J].
Cafaro, Massimo ;
Tempesta, Piergiulio .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (15) :1774-1788
[19]  
CAIDA, 2019, The CAIDA UCSD Anonymized Internet traces sampler
[20]  
Cao P., 2004, P 23 ANN ACM S PRINC, P206, DOI [10.1145/1011767.1011798, DOI 10.1145/1011767.1011798]