Top-k frequent items and item frequency tracking over sliding windows of any size

被引:15
|
作者
Song, Chunyao [1 ]
Liu, Xuanming [2 ]
Ge, Tingjian [2 ]
Ge, Yao [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin, Peoples R China
[2] Univ Massachusetts Lowell, Lowell, MA USA
基金
美国国家科学基金会;
关键词
Data stream; Top-k frequent items; Item frequency tracking; Sliding window; MINING FREQUENT;
D O I
10.1016/j.ins.2018.09.066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many big data applications today require querying highly dynamic and large-scale data streams to find the top-k most frequent items in the most recent window of a specified size at a specific time. This is a challenging problem. We propose a novel approach called Floating Top-k. Our algorithm does not need to explicitly maintain any item counts over time or deal with count updates upon item entry and expiration. Succinctly, we use only a small-size data structure to retrieve the top-k items dynamically in a window of any specified size within an upper bound. We prove that the space and time costs of Floating Top-k grow only logarithmically with the window size rather than the linear growth of previous methods. Our comprehensive experiments using three real-world datasets show that Floating Top-k not only provides accuracy guarantees but also has a memory footprint two to three orders of magnitude smaller and is one to two orders of magnitude faster than previous approaches. Hence, Floating Top-k is both effective and scalable, significantly outperforming competing methods. In addition, we devise a concise and efficient solution called Progressive Trend Model to address a related problem of tracking the frequency of selected items, improving upon previous methods by 20 to 30 times in model conciseness while maintaining the same accuracy and efficiency. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:100 / 120
页数:21
相关论文
共 46 条
  • [1] Top-k Frequent Items and Item Frequency Tracking over Sliding Windows of Any Sizes
    Song, Chunyao
    Liu, Xuanming
    Ge, Tingjian
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 199 - 202
  • [2] A Generic Framework for Top-k Pairs and Top-k Objects Queries over Sliding Windows
    Shen, Zhitao
    Cheema, Muhammad Aamir
    Lin, Xuemin
    Zhang, Wenjie
    Wang, Haixun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (06) : 1349 - 1366
  • [3] Efficiently Monitoring Top-k Pairs over Sliding Windows
    Shen, Zhitao
    Cheema, Muhammad Aamir
    Lin, Xuemin
    Zhang, Wenjie
    Wang, Haixun
    2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 798 - 809
  • [4] Efficient Identification of TOP-K Heavy Hitters over Sliding Windows
    Haina Tang
    Yulei Wu
    Tong Li
    Chunjing Han
    Jingguo Ge
    Xiangpeng Zhao
    Mobile Networks and Applications, 2019, 24 : 1732 - 1741
  • [5] Efficient Identification of TOP-K Heavy Hitters over Sliding Windows
    Tang, Haina
    Wu, Yulei
    Li, Tong
    Han, Chunjing
    Ge, Jingguo
    Zhao, Xiangpeng
    MOBILE NETWORKS & APPLICATIONS, 2019, 24 (05): : 1732 - 1741
  • [6] Mining top-k frequent patterns over data streams sliding window
    Chen, Hui
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2014, 42 (01) : 111 - 131
  • [7] Mining top-k frequent patterns over data streams sliding window
    Hui Chen
    Journal of Intelligent Information Systems, 2014, 42 : 111 - 131
  • [8] Achieving Top-K-fairness for Finding Global Top-K Frequent Items
    Zhao, Yikai
    Zhou, Wei
    Han, Wenchen
    Zhong, Zheng
    Zhang, Yinda
    Zheng, Xiuqi
    Yang, Tong
    Cui, Bin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 1508 - 1526
  • [9] Finding Top-k Most Frequent Items in Distributed Streams in the Time-Sliding Window Model
    Anceaume, Emmanuelle
    Busnel, Yann
    Cazacu, Vasile
    2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOPS (DSN-W), 2018, : 61 - 62
  • [10] Finding frequent items in sliding windows with multinomially-distributed item frequencies
    Golab, L
    DeHaan, D
    López-Ortiz, A
    Demaine, ED
    16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2004, : 425 - 426