Incremental k-Nearest Neighbors Using Reservoir Sampling for Data Streams

被引:3
作者
Bahri, Maroua [1 ]
Bifet, Albert [1 ,2 ]
机构
[1] IP Paris, LTCI, Telecom Paris, Paris, France
[2] Univ Waikato, Hamilton, New Zealand
来源
DISCOVERY SCIENCE (DS 2021) | 2021年 / 12986卷
关键词
Data stream classification; K-nearest neighbors; Reservoir sampling; Sliding window;
D O I
10.1007/978-3-030-88942-5_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The online and potentially infinite nature of data streams leads to the inability to store the flow in its entirety and thus restricts the storage to a part of - and/or synopsis information from - the stream. To process these evolving data, we need efficient and accurate methodologies and systems, such as window models (e.g., sliding windows) and summarization techniques (e.g., sampling, sketching, dimensionality reduction). In this paper, we propose, RW-kNN, a k-Nearest Neighbors (kNN) algorithm that employs a practical way to store information about past instances using the biased reservoir sampling to sample the input instances along with a sliding window to maintain the most recent instances from the stream. We evaluate our proposal on a diverse set of synthetic and real datasets and compare against state-of-the-art algorithms in a traditional test-then-train evaluation. Results show how our proposed RW-kNN approach produces high-predictive performance for both real and synthetic datasets while using a feasible amount of resources.
引用
收藏
页码:122 / 137
页数:16
相关论文
共 32 条
  • [1] Aggarwal C.C., 2007, Data streams: models and algorithms, P169, DOI [10.1007/978-0-387-47534-9_9, DOI 10.1007/978-0-387-47534-9_9]
  • [2] DATABASE MINING - A PERFORMANCE PERSPECTIVE
    AGRAWAL, R
    IMIELINSKI, T
    SWAMI, A
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1993, 5 (06) : 914 - 925
  • [3] Anguita Davide, 2012, P INT WORKSH AMB ASS, P216
  • [4] [Anonymous], 2006, PROC 32 INT C VERY L
  • [5] Bahri M., 2020, EUROPEAN C ARTIFICIA
  • [6] Bahri M, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4796
  • [7] Bahri M, 2018, IEEE INT CONF BIG DA, P604, DOI 10.1109/BigData.2018.8622178
  • [8] Bifet A, 2010, J MACH LEARN RES, V11, P1601
  • [9] Bifet A, 2007, PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, P443
  • [10] Bifet Albert., 2013, Proceedings of the 28th annual ACM symposium on applied computing, P801