Top-k frequent items and item frequency tracking over sliding windows of any size

被引:15
|
作者
Song, Chunyao [1 ]
Liu, Xuanming [2 ]
Ge, Tingjian [2 ]
Ge, Yao [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin, Peoples R China
[2] Univ Massachusetts Lowell, Lowell, MA USA
基金
美国国家科学基金会;
关键词
Data stream; Top-k frequent items; Item frequency tracking; Sliding window; MINING FREQUENT;
D O I
10.1016/j.ins.2018.09.066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many big data applications today require querying highly dynamic and large-scale data streams to find the top-k most frequent items in the most recent window of a specified size at a specific time. This is a challenging problem. We propose a novel approach called Floating Top-k. Our algorithm does not need to explicitly maintain any item counts over time or deal with count updates upon item entry and expiration. Succinctly, we use only a small-size data structure to retrieve the top-k items dynamically in a window of any specified size within an upper bound. We prove that the space and time costs of Floating Top-k grow only logarithmically with the window size rather than the linear growth of previous methods. Our comprehensive experiments using three real-world datasets show that Floating Top-k not only provides accuracy guarantees but also has a memory footprint two to three orders of magnitude smaller and is one to two orders of magnitude faster than previous approaches. Hence, Floating Top-k is both effective and scalable, significantly outperforming competing methods. In addition, we devise a concise and efficient solution called Progressive Trend Model to address a related problem of tracking the frequency of selected items, improving upon previous methods by 20 to 30 times in model conciseness while maintaining the same accuracy and efficiency. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:100 / 120
页数:21
相关论文
empty
未找到相关数据