Top-k frequent items and item frequency tracking over sliding windows of any size

被引:15
|
作者
Song, Chunyao [1 ]
Liu, Xuanming [2 ]
Ge, Tingjian [2 ]
Ge, Yao [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin, Peoples R China
[2] Univ Massachusetts Lowell, Lowell, MA USA
基金
美国国家科学基金会;
关键词
Data stream; Top-k frequent items; Item frequency tracking; Sliding window; MINING FREQUENT;
D O I
10.1016/j.ins.2018.09.066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many big data applications today require querying highly dynamic and large-scale data streams to find the top-k most frequent items in the most recent window of a specified size at a specific time. This is a challenging problem. We propose a novel approach called Floating Top-k. Our algorithm does not need to explicitly maintain any item counts over time or deal with count updates upon item entry and expiration. Succinctly, we use only a small-size data structure to retrieve the top-k items dynamically in a window of any specified size within an upper bound. We prove that the space and time costs of Floating Top-k grow only logarithmically with the window size rather than the linear growth of previous methods. Our comprehensive experiments using three real-world datasets show that Floating Top-k not only provides accuracy guarantees but also has a memory footprint two to three orders of magnitude smaller and is one to two orders of magnitude faster than previous approaches. Hence, Floating Top-k is both effective and scalable, significantly outperforming competing methods. In addition, we devise a concise and efficient solution called Progressive Trend Model to address a related problem of tracking the frequency of selected items, improving upon previous methods by 20 to 30 times in model conciseness while maintaining the same accuracy and efficiency. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:100 / 120
页数:21
相关论文
共 46 条
  • [31] Top-k spatial-keyword publish/subscribe over sliding window
    Wang, Xiang
    Zhang, Wenjie
    Zhang, Ying
    Lin, Xuemin
    Huang, Zengfeng
    VLDB JOURNAL, 2017, 26 (03): : 301 - 326
  • [32] Location-Based Top-k Term Querying over Sliding Window
    Xu, Ying
    Chen, Lisi
    Yao, Bin
    Shang, Shuo
    Zhu, Shunzhi
    Zheng, Kai
    Li, Fang
    WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT I, 2017, 10569 : 299 - 314
  • [33] Spatio-temporal top-k term search over sliding window
    Chen, Lisi
    Shang, Shuo
    Yao, Bin
    Zheng, Kai
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (05): : 1953 - 1970
  • [34] Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges
    Ahmed, Pritom
    Hasan, Mahbub
    Kashyap, Abhijith
    Hristidis, Vagelis
    Tsotras, Vassilis J.
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1227 - 1241
  • [35] A Differentially Private Scheme for Top-k Frequent Itemsets Mining Over Data Streams
    Liang W.-J.
    Chen H.
    Zhao S.-Y.
    Li C.-P.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (04): : 741 - 760
  • [36] Geo-Social Keyword Top-k Data Monitoring over Sliding Window
    Nishio, Shunya
    Amagata, Daichi
    Hara, Takahiro
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 409 - 424
  • [37] Sliding window top-k dominating query processing over distributed data streams
    Amagata, Daichi
    Hara, Takahiro
    Nishio, Shojiro
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (04) : 535 - 566
  • [38] SKYPE: Top-k Spatial-keyword Publish/Subscribe Over Sliding Window
    Wang, Xiang
    Zhang, Ying
    Zhang, Wenjie
    Lin, Xuemin
    Huang, Zengfeng
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (07): : 588 - 599
  • [39] Sliding window top-k dominating query processing over distributed data streams
    Daichi Amagata
    Takahiro Hara
    Shojiro Nishio
    Distributed and Parallel Databases, 2016, 34 : 535 - 566
  • [40] Mining Top-k Frequent-regular Itemsets from Data Streams Based on Sliding Window Technique
    Mesama, Tashinee
    Amphawan, Komate
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 224 - 230