Hercules Against Data Series Similarity Search

被引:8
|
作者
Echihabi, Karima [1 ]
Fatourou, Panagiota [2 ,3 ]
Zoumpatianos, Kostas [4 ]
Palpanas, Themis [2 ,5 ]
Benbrahim, Houda [6 ,7 ]
机构
[1] Mohammed VI Polytech Univ, Ben Guerir, Morocco
[2] Univ Paris Cite, Paris, France
[3] FORTH, Paris, France
[4] Snowflake Inc, Bozeman, MT USA
[5] IUF, Paris, France
[6] IRDA, Rabat IT Ctr, Rabat, Morocco
[7] ENSIAS, Rabat, Morocco
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 10期
关键词
LERNAEAN HYDRA;
D O I
10.14778/3547305.3547308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets.
引用
收藏
页码:2005 / 2018
页数:14
相关论文
共 50 条
  • [1] Time Series Similarity Search Methods for Sensor Data
    Jawale, Anupama
    Magar, Ganesh
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2022, 56 (02) : 120 - 129
  • [2] Time Series Similarity Search Methods for Sensor Data
    Automatic Control and Computer Sciences, 2022, 56 : 120 - 129
  • [3] Deep Learning Embeddings for Data Series Similarity Search
    Wang, Qitong
    Palpanas, Themis
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1708 - 1716
  • [4] Data Series Progressive Similarity Search with Probabilistic Quality Guarantees
    Gogolou, Anna
    Tsandilas, Theophanis
    Echihabi, Karima
    Bezerianos, Anastasia
    Palpanas, Themis
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 1857 - 1873
  • [5] SEAnet: A Deep Learning Architecture for Data Series Similarity Search
    Wang, Qitong
    Palpanas, Themis
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12972 - 12986
  • [6] Odyssey: A Journey in the Land of Distributed Data Series Similarity Search
    Chatzakis, Manos
    Fatourou, Panagiota
    Kosmas, Eleftherios
    Palpanas, Themis
    Peng, Botao
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (05): : 1140 - 1153
  • [7] Similarity Search in Time Series Data Using Time Weighted Slopes
    Toshniwal, Durga
    Joshi, R. C.
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2005, 29 (01): : 79 - 88
  • [8] Similarity search over time-series data using wavelets
    Popivanov, I
    Miller, RJ
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 212 - 221
  • [9] A novel method for similarity search over electric time series data
    Li, QD
    Chi, ZX
    Wang, ZC
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2315 - 2318
  • [10] Similarity search and pattern discovery in hydrological time series data mining
    Ouyang, Rulin
    Ren, Liliang
    Cheng, Weiming
    Zhou, Chenghu
    HYDROLOGICAL PROCESSES, 2010, 24 (09) : 1198 - 1210