Top-k frequent items and item frequency tracking over sliding windows of any size

被引:15
|
作者
Song, Chunyao [1 ]
Liu, Xuanming [2 ]
Ge, Tingjian [2 ]
Ge, Yao [1 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin, Peoples R China
[2] Univ Massachusetts Lowell, Lowell, MA USA
基金
美国国家科学基金会;
关键词
Data stream; Top-k frequent items; Item frequency tracking; Sliding window; MINING FREQUENT;
D O I
10.1016/j.ins.2018.09.066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many big data applications today require querying highly dynamic and large-scale data streams to find the top-k most frequent items in the most recent window of a specified size at a specific time. This is a challenging problem. We propose a novel approach called Floating Top-k. Our algorithm does not need to explicitly maintain any item counts over time or deal with count updates upon item entry and expiration. Succinctly, we use only a small-size data structure to retrieve the top-k items dynamically in a window of any specified size within an upper bound. We prove that the space and time costs of Floating Top-k grow only logarithmically with the window size rather than the linear growth of previous methods. Our comprehensive experiments using three real-world datasets show that Floating Top-k not only provides accuracy guarantees but also has a memory footprint two to three orders of magnitude smaller and is one to two orders of magnitude faster than previous approaches. Hence, Floating Top-k is both effective and scalable, significantly outperforming competing methods. In addition, we devise a concise and efficient solution called Progressive Trend Model to address a related problem of tracking the frequency of selected items, improving upon previous methods by 20 to 30 times in model conciseness while maintaining the same accuracy and efficiency. (C) 2018 Elsevier Inc. All rights reserved.
引用
收藏
页码:100 / 120
页数:21
相关论文
共 46 条
  • [41] Mining Frequent Item Sets in Asynchronous Transactional Data Streams over Time Sensitive Sliding Windows Model
    Javaid, Qaisar
    Memon, Farida
    Talpur, Shahnawaz
    Arif, Muhammad
    Awan, Muhammad Daud
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2016, 35 (04) : 625 - 644
  • [42] Efficient top-k recently-frequent term querying over spatio-temporal textual streams
    Dam, Thu-Lan
    Chester, Sean
    Norvag, Kjetil
    Duong, Quang-Huy
    INFORMATION SYSTEMS, 2021, 97
  • [43] IMSPMIS-Stream: incremental mining of top-k short sequential pattern over multiple item set streams
    Hao, Xiaobing
    Han, Gaowei
    Chen, Yuping
    Wang, Peilong
    Ren, Jiadong
    Journal of Computational Information Systems, 2014, 10 (06): : 2305 - 2312
  • [44] A sliding window method for finding top-k path traversal patterns over streaming Web click-sequences
    Li, Hua-Fu
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 4382 - 4386
  • [45] ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
    Cheng, Haodong
    Han, Meng
    Zhang, Ni
    Wang, Le
    Li, Xiaojuan
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (02) : 3317 - 3338
  • [46] ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model
    Cheng, Haodong
    Han, Meng
    Zhang, Ni
    Wang, Le
    Li, Xiaojuan
    Journal of Intelligent and Fuzzy Systems, 2021, 41 (02): : 3317 - 3338